Early WU Downloads

Message boards : Number crunching : Early WU Downloads

Author	Message
tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34640 - Posted: 13 Jan 2014 \| 16:51:14 UTC
	In the last 24 hours, one of my rigs, with a single GPU, has on three occasions had another WU downloaded very soon after the current WU has started. Network Usage is set to "Max additional work buffer" = 0.01 days, as it has been for years!! Am I missing something??
	ID: 34640 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34641 - Posted: 13 Jan 2014 \| 17:40:31 UTC
	It just did it again! 20% of the current WU done and another is downloading!!
	ID: 34641 \| Rating: 0 \| rate: / Reply Quote

Stoneageman Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 17 Level Scientific publications	Message 34642 - Posted: 13 Jan 2014 \| 19:27:54 UTC
	What is the 'Minimum work buffer' set to?
	ID: 34642 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34643 - Posted: 13 Jan 2014 \| 19:37:09 UTC - in response to Message 34642.
	What is the 'Minimum work buffer' set to? 0.00...
	ID: 34643 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 34647 - Posted: 14 Jan 2014 \| 6:59:32 UTC - in response to Message 34643.
	Max work buffer must be 0.00 as well.
	ID: 34647 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34651 - Posted: 14 Jan 2014 \| 14:14:54 UTC - in response to Message 34647. Last modified: 14 Jan 2014 \| 14:15:05 UTC
	If it predicts it will be out of work within 3 minutes, it will download more work, regardless of the min_buffer settings. You can turn on the <work_fetch_debug> flag, to get more logging information, for you and for us to look at. See http://boinc.berkeley.edu/wiki/Client_configuration
	ID: 34651 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 34652 - Posted: 14 Jan 2014 \| 17:43:39 UTC - in response to Message 34640.
	In the last 24 hours, one of my rigs, with a single GPU, has on three occasions had another WU downloaded very soon after the current WU has started. Network Usage is set to "Max additional work buffer" = 0.01 days, as it has been for years!! I saw that a day or two ago also, though which machine I don't remember (single or dual cards). But I keep a tight buffer too, though not quite that tight; usually about 0.10 days min, 0.05 days additional.
	ID: 34652 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34656 - Posted: 15 Jan 2014 \| 8:09:31 UTC - in response to Message 34652.
	I saw that a day or two ago also, though which machine I don't remember (single or dual cards). Which single cards are OK for GPUGrid?
	ID: 34656 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34657 - Posted: 15 Jan 2014 \| 8:52:59 UTC - in response to Message 34651.
	You can turn on the <work_fetch_debug> flag, to get more logging information, for you and for us to look at. See http://boinc.berkeley.edu/wiki/Client_configuration Seems that flag is enabled by default, so I checked out the log around the time I got an unwanted WU. There was a funny. Twice I got "This computer has reached a limit on tasks in progress" and the active WU restarted. It was after the uploading WU was finished that I got the unwanted WU. 13/01/2014 14:21:16 \| GPUGRID \| Starting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:21:29 \| GPUGRID \| Started upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_0 [removed other upload entries] 13/01/2014 14:35:40 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 14:35:40 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 14:35:43 \| GPUGRID \| Scheduler request completed: got 0 new tasks 13/01/2014 14:35:43 \| GPUGRID \| No tasks sent 13/01/2014 14:35:43 \| GPUGRID \| This computer has reached a limit on tasks in progress 13/01/2014 14:36:37 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:40:08 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:42:22 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:48:09 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:07:20 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 15:07:20 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 15:07:23 \| GPUGRID \| Scheduler request completed: got 0 new tasks 13/01/2014 15:07:23 \| GPUGRID \| No tasks sent 13/01/2014 15:07:23 \| GPUGRID \| This computer has reached a limit on tasks in progress 13/01/2014 15:08:20 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:08:45 \| GPUGRID \| Finished upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_9 13/01/2014 15:08:49 \| GPUGRID \| Sending scheduler request: To report completed tasks. 13/01/2014 15:08:49 \| GPUGRID \| Reporting 1 completed tasks 13/01/2014 15:08:49 \| GPUGRID \| Not requesting tasks: don't need 13/01/2014 15:08:52 \| GPUGRID \| Scheduler request completed 13/01/2014 15:20:12 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 15:20:12 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 15:20:16 \| GPUGRID \| Scheduler request completed: got 1 new tasks
	ID: 34657 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34659 - Posted: 15 Jan 2014 \| 9:19:57 UTC - in response to Message 34657.
	There was a funny. I checked the other two unwanted WUs I got. The situation was the same; limit reached x 2 and unwanted WU arrived after the uploaded one finished.
	ID: 34659 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34660 - Posted: 15 Jan 2014 \| 9:29:35 UTC - in response to Message 34657. Last modified: 15 Jan 2014 \| 9:36:17 UTC
	The debug flag Jacob suggested is not on by default. The default logging gives the messages you see which are mostly just what the scheduler does but not why. If you turn on the flag Jacob suggested you will get additional messages that explain why the scheduler does what it does. The devil is in those details. 13/01/2014 14:36:37 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:40:08 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:42:22 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:48:09 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:07:20 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 15:07:20 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 15:07:23 \| GPUGRID \| Scheduler request completed: got 0 new tasks 13/01/2014 15:07:23 \| GPUGRID \| No tasks sent 13/01/2014 15:07:23 \| GPUGRID \| This computer has reached a limit on tasks in progress 13/01/2014 15:08:20 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:08:45 \| GPUGRID \| Finished upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_9 The same task restarted 4X in 12 minutes, 5X in 24 minutes? Edit added: You have min. buffer = 0 and additional buffer = 0.1, IIUC. If you have min. buffer = 0 then you should also have additional buffer = 0. Or try min. buffer = 0.1 and additional buffer 0. ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34660 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 34661 - Posted: 15 Jan 2014 \| 9:54:42 UTC - in response to Message 34656. Last modified: 15 Jan 2014 \| 9:59:50 UTC
	I saw that a day or two ago also, though which machine I don't remember (single or dual cards). Which single cards are OK for GPUGrid? I use GTX 660s on both my machines (dual cards in one, single card in the other). I think it was very probably the single-card machine that had the problem, but I haven't seen it again. It was probably just a glitch in their server, though I may have been transferring cards between the machines around that time, and BOINC may have gotten confused as to which was which.
	ID: 34661 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34662 - Posted: 15 Jan 2014 \| 10:28:24 UTC - in response to Message 34661.
	I saw that a day or two ago also, though which machine I don't remember (single or dual cards). Which single cards are OK for GPUGrid? I use GTX 660s on both my machines (dual cards in one, single card in the other). I think it was very probably the single-card machine that had the problem, but I haven't seen it again. It was probably just a glitch in their server, though I may have been transferring cards between the machines around that time, and BOINC may have gotten confused as to which was which. I guess I should have ask "Who manufactures 660 singles?". I have two doubles in my rig but there's room for four singles!
	ID: 34662 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34663 - Posted: 15 Jan 2014 \| 11:32:57 UTC - in response to Message 34662.
	tomba, Jim is referring to a single quantity of cards, one card, not a single width card. You are misinterpreting his "single" to mean "single slot width". ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34663 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34665 - Posted: 15 Jan 2014 \| 11:46:08 UTC - in response to Message 34660.
	The debug flag Jacob suggested is not on by default. The default logging gives the messages you see which are mostly just what the scheduler does but not why. If you turn on the flag Jacob suggested you will get additional messages that explain why the scheduler does what it does. The devil is in those details. OK. cc_config.xml updated. The same task restarted 4X in 12 minutes, 5X in 24 minutes? Yep. That's what the log tells us. You have min. buffer = 0 and additional buffer = 0.1, IIUC. If you have min. buffer = 0 then you should also have additional buffer = 0. Or try min. buffer = 0.1 and additional buffer 0. Both are now set to zero.
	ID: 34665 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34666 - Posted: 15 Jan 2014 \| 11:51:02 UTC
	Just happened again, with cc_config.xml updated. New WU downloaded at 12:01:24. Active WU will complete at 13:00. 15/01/2014 11:51:50 \| \| Starting BOINC client version 7.2.33 for windows_x86_64 15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 15/01/2014 11:51:50 \| \| Data directory: C:\ProgramData\BOINC 15/01/2014 11:51:50 \| \| Running under account TOMBA 15/01/2014 11:51:50 \| \| CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 331.93, CUDA version 6.0, compute capability 3.0, 2048MB, 1962MB available, 1982 GFLOPS peak) 15/01/2014 11:51:50 \| \| OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 331.93, device version OpenCL 1.1 CUDA, 2048MB, 1962MB available, 1982 GFLOPS peak) 15/01/2014 11:51:50 \| \| Host name: XPS-435 15/01/2014 11:51:50 \| \| Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [Family 6 Model 26 Stepping 5] 15/01/2014 11:51:50 \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe 15/01/2014 11:51:50 \| \| OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 15/01/2014 11:51:50 \| \| Memory: 5.99 GB physical, 11.98 GB virtual 15/01/2014 11:51:50 \| \| Disk: 465.76 GB total, 374.30 GB free 15/01/2014 11:51:50 \| \| Local time is UTC +1 hours 15/01/2014 11:51:50 \| \| VirtualBox version: 4.3.4 15/01/2014 11:51:50 \| \| Config: use all coprocessors 15/01/2014 11:51:50 \| GPUGRID \| URL http://www.gpugrid.net/; Computer ID 157241; resource share 100 15/01/2014 11:51:50 \| GPUGRID \| General prefs: from GPUGRID (last modified 27-Aug-2013 13:11:17) 15/01/2014 11:51:50 \| GPUGRID \| Computer location: home 15/01/2014 11:51:50 \| GPUGRID \| General prefs: no separate prefs for home; using your defaults 15/01/2014 11:51:50 \| \| Reading preferences override file 15/01/2014 11:51:50 \| \| Preferences: 15/01/2014 11:51:50 \| \| max memory usage when active: 6134.97MB 15/01/2014 11:51:50 \| \| max memory usage when idle: 6134.97MB 15/01/2014 11:51:50 \| \| max disk usage: 232.88GB 15/01/2014 11:51:50 \| \| max CPUs used: 3 15/01/2014 11:51:50 \| \| max download rate: 2048000 bytes/sec 15/01/2014 11:51:50 \| \| max upload rate: 135004 bytes/sec 15/01/2014 11:51:50 \| \| (to change preferences, visit a project web site or select Preferences in the Manager) 15/01/2014 11:51:50 \| \| Not using a proxy 15/01/2014 11:51:55 \| GPUGRID \| project resumed by user 15/01/2014 11:51:56 \| GPUGRID \| Restarting task 39x756-NOELIA_FXArep-0-2-RND3960_0 using acemdlong version 814 (cuda55) in slot 0 15/01/2014 12:01:21 \| GPUGRID \| Sending scheduler request: To fetch work. 15/01/2014 12:01:21 \| GPUGRID \| Requesting new tasks for NVIDIA 15/01/2014 12:01:24 \| GPUGRID \| Scheduler request completed: got 1 new tasks 15/01/2014 12:01:26 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-LICENSE 15/01/2014 12:01:26 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/1c0/22x-SANTI_MAR419cap310-28-LICENSE 15/01/2014 12:01:26 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-COPYRIGHT 15/01/2014 12:01:26 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/230/22x-SANTI_MAR419cap310-28-COPYRIGHT 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-LICENSE 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] Throughput 9572 bytes/sec 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-COPYRIGHT 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] Throughput 1583 bytes/sec 15/01/2014 12:01:28 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/19f/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1 15/01/2014 12:01:28 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/388/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:33 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] Throughput 130346 bytes/sec 15/01/2014 12:01:33 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/81/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:38 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] Throughput 93616 bytes/sec 15/01/2014 12:01:38 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-pdb_file 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/103/22x-SANTI_MAR419cap310-28-pdb_file 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:42 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] Throughput 48728 bytes/sec 15/01/2014 12:01:42 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-psf_file 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/136/22x-SANTI_MAR419cap310-28-psf_file 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:01 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-pdb_file 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] Throughput 100795 bytes/sec 15/01/2014 12:02:01 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-par_file 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/26d/22x-SANTI_MAR419cap310-28-par_file 15/01/2014 12:02:07 \| GPUGRID \| Restarting task 39x756-NOELIA_FXArep-0-2-RND3960_0 using acemdlong version 814 (cuda55) in slot 0 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:08 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-psf_file 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] Throughput 179525 bytes/sec 15/01/2014 12:02:08 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-conf_file_enc 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/3c4/22x-SANTI_MAR419cap310-28-conf_file_enc 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:09 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-conf_file_enc 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] Throughput 2754 bytes/sec 15/01/2014 12:02:09 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-metainp_file 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/183/22x-SANTI_MAR419cap310-28-metainp_file 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:10 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-metainp_file 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] Throughput 166 bytes/sec 15/01/2014 12:02:10 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/2cd/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-par_file 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] Throughput 75904 bytes/sec 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] Throughput 0 bytes/sec 15/01/2014 12:02:11 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/13e/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10 15/01/2014 12:02:12 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:12 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:12 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10 15/01/2014 12:02:12 \| GPUGRID \| [file_xfer] Throughput 211 bytes/sec
	ID: 34666 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34668 - Posted: 15 Jan 2014 \| 12:24:11 UTC - in response to Message 34663.
	tomba, Jim is referring to a single quantity of cards, one card, not a single width card. You are misinterpreting his "single" to mean "single slot width". Oops... Thanks for the heads-up!
	ID: 34668 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34670 - Posted: 15 Jan 2014 \| 13:20:33 UTC - in response to Message 34666.
	Just happened again, with cc_config.xml updated. New WU downloaded at 12:01:24. Active WU will complete at 13:00. 15/01/2014 11:51:50 \| \| Starting BOINC client version 7.2.33 for windows_x86_64 15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>. I'll turn you back over to Jacob for help interpreting the info <work_fetch_debug> will spit out. It's Greek to me. ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34670 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34671 - Posted: 15 Jan 2014 \| 13:35:34 UTC
	Once I can see work_fetch_debug output, I might be able to diagnose the situation, as I am quite familiar with it. I even worked with David to improve work fetch (and work_fetch_debug output) in the most recent release of BOINC. Still waiting for work_fetch_debug output.
	ID: 34671 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34672 - Posted: 15 Jan 2014 \| 13:45:15 UTC
	Matt Harvey quite recently implemented a technique called 'boinc_temporary_exit' to try and reduce the number of total task failures (that was app version 8.14). If things start to go a bit wobbly with a task, GPUGrid tells it to stop, take a deep breath, and try again later. By 'take a deep breath', I mean that GPUGrid tells BOINC not to re-run the same task immediately, but to wait at least a few seconds - I don't know how long: you would have to enable yet another logging flag - 'task_debug' - in cc_config to see that. If tasks are exiting so frequently, and if you don't carry a spare at all times, that would explain the early WU downloads: task exits task is waiting before it can run again BOINC has nothing to do BOINC requests new task GPUGrid allocates new task BOINC starts downloading files scheduling delay on original task expires original task is re-started or something like that. In order to solve the 'early download' problem, you first have to find the cause of the temporary exits. It'll be a mild form of the 'GPUGrid stresses your GPU harder than anything else' issue that we've been discussing in the SANTI thread.
	ID: 34672 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34675 - Posted: 15 Jan 2014 \| 14:52:48 UTC - in response to Message 34670.
	15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>. cc_config.xml fixed. I really must listen to instructions!!
	ID: 34675 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34676 - Posted: 15 Jan 2014 \| 14:57:31 UTC
	Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch
	ID: 34676 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34677 - Posted: 15 Jan 2014 \| 15:22:14 UTC - in response to Message 34676.
	Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch It's normal when --- state for NVIDIA --- saturated 30261.82 [seconds] is larger than target work buffer: 180.00 + 0.00 sec[onds] - in other words, you have enough work for now, and don't need any more.
	ID: 34677 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34678 - Posted: 15 Jan 2014 \| 15:22:25 UTC - in response to Message 34676. Last modified: 15 Jan 2014 \| 15:33:24 UTC
	Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch Let's teach you how to read this. target work buffer: ...says you need work to keep busy for at least "180" seconds (that's the 3 minutes I was talking about earlier, where even if you set min_buffer to 0, BOINC uses 3 minutes intentionally, since it could take around 3 minutes to ask projects for work) This line also equates to "when getting work, try not to get much more than: 180.00 + 0.00", which takes your max_addition_buffer setting into account. For reference, I use 0.1 days and 0.5 days for my buffer settings. So, my line says: target work buffer: 8640.00 + 43200.00 sec project states: ... GPUGrid is listed as "can req work". If you had it set for no new tasks, or suspended, it would be noted here, and then excluded from work fetch operations. state for CPU: shortfall 540 means that, in order to keep all your CPUs busy for that min_buffer setting, you'd need 540 instance seconds of CPU work. nidle 3 means that you have 3 CPUs that are currently completely idle. (Note: This saddens me, might prove beneficial to put those to work with some CPU projects). Notice that the GPUGrid entry in that block says (no apps), meaning that the project told BOINC it doesn't have CPU apps, and BOINC won't ever request CPU work from it. state for NVIDIA: shortfall 0 means that, in order to keep all your NVIDIA GPUs busy for that min_buffer setting, you'd need 0 seconds. In fact, you have saturation, meaning that all instances are projected to be busy for 30261.82 seconds (8.4 hours). end work fetch state: Here is where it makes a decision, based on the info above, of whether to request work from a project or not. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: You have no idle NVIDIA devices, and also your saturation level (30261.82 seconds) is greater than your low water mark (180 seconds), so you don't need NVIDIA work either. So it correctly says "No project chosen for work fetch", and doesn't request work. Does that help? You should now be ready to read these log messages on your own, I'd think. Feel free to change some buffer values, or set GPUGrid for No New Tasks, to see the effects on this work_fetch_debug output.
	ID: 34678 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34680 - Posted: 15 Jan 2014 \| 16:58:18 UTC - in response to Message 34678.
	Let's teach you how to read this. Thanks for that, Jacob. I shall study it carefully. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: Yes. I've been feeling guilty about that. So I'm now running six Rosettas too. A bit worried that the CPU fan has gone from 3700 rpm to 4300rpm and the CPU temperature has gone from 55C to 64C but I guess that's a question for my other thread.
	ID: 34680 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34681 - Posted: 15 Jan 2014 \| 17:08:56 UTC - in response to Message 34680. Last modified: 15 Jan 2014 \| 17:15:13 UTC
	Note 1: Richard's previous post in this thread, is likely correct. Note 2: REC is Recent estimated credit, and is used by BOINC in the "prio" priority calculation when choosing which project to ask for work. The projects are listed in "prio" order, such that you can easily see which would be "next in line" in a request for work. Note 3: In case you're curious to see a more-involved work fetch cycle, or might be wanting a list of projects that I'm attached to, below is a work_fetch_debug that shows the projects I'm attached to. I run a lot of various CPU and NVIDIA projects on this machine. Note 4: Further information about work_fetch can be found in this slightly outdated, but highly useful, document: http://boinc.berkeley.edu/trac/wiki/ClientSched ------------------------------ A cycle of my work fetch: ------------------------------ 1/15/2014 12:09:59 PM \| \| [work_fetch] entering choose_project() 1/15/2014 12:09:59 PM \| \| [work_fetch] ------- start work fetch state ------- 1/15/2014 12:09:59 PM \| \| [work_fetch] target work buffer: 8640.00 + 43200.00 sec 1/15/2014 12:09:59 PM \| \| [work_fetch] --- project states --- 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] REC 0.126 prio -0.000000 can't req work: master URL fetch pending (backoff: 43700.11 sec) 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] REC 0.000 prio -0.000000 can't req work: "no new tasks" requested via Manager 1/15/2014 12:09:59 PM \| Quake-Catcher Network \| [work_fetch] REC 0.000 prio 0.000000 can't req work: non CPU intensive 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] REC 0.014 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| WUProp@Home \| [work_fetch] REC 0.014 prio -0.000002 can't req work: non CPU intensive 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] REC 110.106 prio -0.002449 can req work 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] REC 221.382 prio -0.004923 can req work 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] REC 221.453 prio -0.004925 can req work 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] REC 248.218 prio -0.005520 can req work 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] REC 1063.070 prio -0.006005 can req work 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] REC 287.694 prio -0.007551 can req work 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] REC 304.466 prio -0.007763 can req work 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] REC 306.047 prio -0.010929 can req work 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] REC 288.875 prio -0.011575 can req work 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] REC 590.890 prio -0.013141 can req work 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] REC 251.853 prio -0.022648 can req work 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] REC 229.825 prio -0.058248 can req work 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] REC 446.313 prio -0.070412 can req work 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] REC 662565.023 prio -14.773922 can req work 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] REC 76065.475 prio -169.162250 can req work 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] REC 80649.354 prio -179.356354 can req work 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] REC 80680.357 prio -179.425301 can req work 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] REC 86519.931 prio -192.412500 can req work 1/15/2014 12:09:59 PM \| \| [work_fetch] --- state for CPU --- 1/15/2014 12:09:59 PM \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 302655.24 busy 0.00 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] fetch share 0.190 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| \| [work_fetch] --- state for NVIDIA --- 1/15/2014 12:09:59 PM \| \| [work_fetch] shortfall 68058.62 nidle 0.00 saturated 8942.84 busy 0.00 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] fetch share 0.000 (blocked by configuration file) 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| \| [work_fetch] ------- end work fetch state ------- 1/15/2014 12:09:59 PM \| \| [work_fetch] No project chosen for work fetch
	ID: 34681 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34757 - Posted: 22 Jan 2014 \| 7:58:47 UTC
	Just had an early download; eight CPUs and two GPUs busy and no sign of a GPUGrid WU stopping: 22/01/2014 08:07:49 \| GPUGRID \| [work_fetch] fetch share 1.000 22/01/2014 08:07:49 \| \| [work_fetch] ------- end work fetch state ------- 22/01/2014 08:07:49 \| \| [work_fetch] No project chosen for work fetch 22/01/2014 08:08:29 \| \| [work_fetch] Request work fetch: application exited 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] REC 272425.128 prio -2.579669 can req work 22/01/2014 08:08:29 \| \| [work_fetch] --- state for CPU --- 22/01/2014 08:08:29 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2030.85 busy 0.00 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 22/01/2014 08:08:29 \| \| [work_fetch] --- state for NVIDIA --- 22/01/2014 08:08:29 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] fetch share 0.000 22/01/2014 08:08:29 \| \| [work_fetch] ------- end work fetch state ------- 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 22/01/2014 08:08:29 \| GPUGRID \| Sending scheduler request: To fetch work. 22/01/2014 08:08:29 \| GPUGRID \| Requesting new tasks for NVIDIA 22/01/2014 08:08:32 \| GPUGRID \| Scheduler request completed: got 1 new tasks 22/01/2014 08:08:32 \| \| [work_fetch] Request work fetch: RPC complete 22/01/2014 08:08:34 \| GPUGRID \| Started download of 72x-SANTI_MAR420cap310-29-LICENSE
	ID: 34757 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34764 - Posted: 22 Jan 2014 \| 14:04:15 UTC
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call.
	ID: 34764 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34765 - Posted: 22 Jan 2014 \| 15:52:52 UTC - in response to Message 34764.
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. It might be possible to work out what happened from the message log entries immediately before and after the section Tomba posted. Did a task restart, for example? The trouble with the extra log flags is that you can't use them retrospectively to diagnose a problem which has already happened - you have to set them anyway, and wait for it to happen again.
	ID: 34765 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34769 - Posted: 22 Jan 2014 \| 17:38:45 UTC - in response to Message 34764.
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. OK. I'll check 'em out.
	ID: 34769 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34771 - Posted: 22 Jan 2014 \| 18:27:56 UTC - in response to Message 34769.
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. Unfortunately, a task stopping isn't necessarily logged with normal settings - I think you'd need to add <task_debug> to be sure of seeing that. But you should see the restart afterwards, in the normal logs (I think).
	ID: 34771 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34793 - Posted: 23 Jan 2014 \| 18:11:30 UTC
	We start with two WUs confirmed running followed by a couple of nidles of 0. Then - oops - only one WU is running. One more 0 nidle then a 1 nidle, and one more 0 nidle!! Then we have a 1 "nidle_now", followed by an early WU fetch.... 23/01/2014 12:17:43 \| GPUGRID \| [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:17:43 \| GPUGRID \| [coproc] NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 23/01/2014 12:18:10 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:10 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:10 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:10 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] REC 275978.848 prio -3.498618 can req work 23/01/2014 12:18:10 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:10 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2568.97 busy 0.00 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:10 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:10 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 4775.59 busy 0.00 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] fetch share 0.500 23/01/2014 12:18:10 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:10 \| \| [work_fetch] No project chosen for work fetch 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited 23/01/2014 12:18:13 \| GPUGRID \| [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:18:15 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:15 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:15 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:15 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] REC 275979.860 prio -3.498092 can req work 23/01/2014 12:18:15 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:15 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2560.84 busy 0.00 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:15 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:15 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] fetch share 0.000 23/01/2014 12:18:15 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:19 \| \| [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:24 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:24 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:24 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:24 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] REC 275979.860 prio -2.505735 can req work 23/01/2014 12:18:24 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:24 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2546.51 busy 0.00 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:24 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:24 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] fetch share 0.000 23/01/2014 12:18:24 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 23/01/2014 12:18:24 \| GPUGRID \| Sending scheduler request: To fetch work. 23/01/2014 12:18:24 \| GPUGRID \| Requesting new tasks for NVIDIA 23/01/2014 12:18:27 \| GPUGRID \| Scheduler request completed: got 1 new tasks 23/01/2014 12:18:27 \| \| [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:29 \| GPUGRID \| Started download of 98x-SANTI_MARwtcap310-30-LICENSE
	ID: 34793 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34794 - Posted: 23 Jan 2014 \| 18:40:43 UTC
	23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited Any idea what application exited, causing the work fetch request? Also, are you using CPU Throttling (The "Use at most X% CPU Time" setting)? Also, can you please include the first messages at the beginning of the event log, so we can see what version you are using? I agree this looks a bit suspicious, but it sounds like a GPU task got unloaded, and work fetch decided to fill an idle spot, even if the timing isn't exactly perfect.
	ID: 34794 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34795 - Posted: 23 Jan 2014 \| 19:12:50 UTC - in response to Message 34793.
	I think that log sequence is pretty definitive. There are two sets of nidle: The --- state for CPU --- remains at zero throughout. No problems there. The --- state for NVIDIA --- starts at 0, jumps to 1, and then drops to 0 again. At the point of the jump, we can see 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited and NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 disappears from the record. That's result 7689115, which you can see has a pause in the middle: # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 6109000) I imagine that if you look a bit further down, you'd see, perhaps first a 'restarting' entry for 75x-SANTI_MARwtcap310-22-32-RND0081_0, and then two task instances being confirmed again at each [coproc] step. The good news is that 75x-SANTI_MARwtcap310-22-32-RND0081_0 completed successfully and validated, despite the pause in the middle.
	ID: 34795 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34796 - Posted: 23 Jan 2014 \| 19:16:35 UTC - in response to Message 34795.
	And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right?
	ID: 34796 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34797 - Posted: 23 Jan 2014 \| 19:51:32 UTC - in response to Message 34796.
	And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right? That's my guess. And I'm also guessing that BOINC restarted the missing 75x-SANTI_MARwtcap310-22-32-RND0081_0 (allowing it to run to completion and report success), before the file downloads for the replacement - probably result 7690601 - had completed and allowed it to be started on the idle GPU.
	ID: 34797 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34799 - Posted: 23 Jan 2014 \| 21:55:29 UTC - in response to Message 34797.
	Makes sense to me. Thanks for the lesson in debug message interpretation, Richard and Jacob, I swear I'll get it eventually. So what's causing the simulation to become unstable and pause to catch its breath? Clocks too high? Shouldn't the client recognize the pause as a temporary suspend and not request more work? ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34799 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 34802 - Posted: 24 Jan 2014 \| 10:11:27 UTC
	Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? It rings a bell.
	ID: 34802 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 34803 - Posted: 24 Jan 2014 \| 10:29:50 UTC - in response to Message 34802.
	Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH
	ID: 34803 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2353 Credit: 16,375,531,916 RAC: 4,974,124 Level Scientific publications	Message 34814 - Posted: 25 Jan 2014 \| 13:55:44 UTC - in response to Message 34803.
	Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH Exactly that happened when my Gigabyte GTX 780Ti OC was unreliable.
	ID: 34814 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 34829 - Posted: 26 Jan 2014 \| 21:20:56 UTC - in response to Message 34814. Last modified: 26 Jan 2014 \| 21:28:42 UTC
	While that might be the case now (compare Boinc logs to WU logs), Tomba reported this problem before 8.15 was being used in the Long queue. That said, under the 8.14app the task would have exited anyway with a driver restart. Just how well it did this is unknown and might have been an issue, but you are not going to see that again. Anyway, during one of many possible driver restarts (which I presume Boinc doesn't know anything about) Boinc probably asked for another task. Boinc isn't as nippy at asking for work these days (lazy or considered you decide), but with say 7 driver restarts there is a strong chance it did on at least one occasion. While the new app asks that a problem task is not immediately run again on the same GPU, and seems to prefer to try to run the task on another GPU (if available), I'm not aware of the method of doing this. If it transpires that another task is still being requested it might be better to allow the GPU in question 2 or 3min to cool down (literally in many cases) without running another WU. Even if it takes that long to download another task, it's still unwanted and unnecessary. Although we cannot suspend a specific GPU through Boinc's GUI, it's possible to edit the cc_config file (and tell Boinc to read it), so you could tell Boinc to not use a specific GPU, and then re-edit (and re-read) the cc_config file again (after time=t). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 34829 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : Early WU Downloads

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34640 - Posted: 13 Jan 2014 \| 16:51:14 UTC
	In the last 24 hours, one of my rigs, with a single GPU, has on three occasions had another WU downloaded very soon after the current WU has started. Network Usage is set to "Max additional work buffer" = 0.01 days, as it has been for years!! Am I missing something??
	ID: 34640 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34641 - Posted: 13 Jan 2014 \| 17:40:31 UTC
	It just did it again! 20% of the current WU done and another is downloading!!
	ID: 34641 \| Rating: 0 \| rate: / Reply Quote

Stoneageman Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 17 Level Scientific publications	Message 34642 - Posted: 13 Jan 2014 \| 19:27:54 UTC
	What is the 'Minimum work buffer' set to?
	ID: 34642 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34643 - Posted: 13 Jan 2014 \| 19:37:09 UTC - in response to Message 34642.
	What is the 'Minimum work buffer' set to? 0.00...
	ID: 34643 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 34647 - Posted: 14 Jan 2014 \| 6:59:32 UTC - in response to Message 34643.
	Max work buffer must be 0.00 as well.
	ID: 34647 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34651 - Posted: 14 Jan 2014 \| 14:14:54 UTC - in response to Message 34647. Last modified: 14 Jan 2014 \| 14:15:05 UTC
	If it predicts it will be out of work within 3 minutes, it will download more work, regardless of the min_buffer settings. You can turn on the <work_fetch_debug> flag, to get more logging information, for you and for us to look at. See http://boinc.berkeley.edu/wiki/Client_configuration
	ID: 34651 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 34652 - Posted: 14 Jan 2014 \| 17:43:39 UTC - in response to Message 34640.
	In the last 24 hours, one of my rigs, with a single GPU, has on three occasions had another WU downloaded very soon after the current WU has started. Network Usage is set to "Max additional work buffer" = 0.01 days, as it has been for years!! I saw that a day or two ago also, though which machine I don't remember (single or dual cards). But I keep a tight buffer too, though not quite that tight; usually about 0.10 days min, 0.05 days additional.
	ID: 34652 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34656 - Posted: 15 Jan 2014 \| 8:09:31 UTC - in response to Message 34652.
	I saw that a day or two ago also, though which machine I don't remember (single or dual cards). Which single cards are OK for GPUGrid?
	ID: 34656 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34657 - Posted: 15 Jan 2014 \| 8:52:59 UTC - in response to Message 34651.
	You can turn on the <work_fetch_debug> flag, to get more logging information, for you and for us to look at. See http://boinc.berkeley.edu/wiki/Client_configuration Seems that flag is enabled by default, so I checked out the log around the time I got an unwanted WU. There was a funny. Twice I got "This computer has reached a limit on tasks in progress" and the active WU restarted. It was after the uploading WU was finished that I got the unwanted WU. 13/01/2014 14:21:16 \| GPUGRID \| Starting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:21:29 \| GPUGRID \| Started upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_0 [removed other upload entries] 13/01/2014 14:35:40 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 14:35:40 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 14:35:43 \| GPUGRID \| Scheduler request completed: got 0 new tasks 13/01/2014 14:35:43 \| GPUGRID \| No tasks sent 13/01/2014 14:35:43 \| GPUGRID \| This computer has reached a limit on tasks in progress 13/01/2014 14:36:37 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:40:08 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:42:22 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:48:09 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:07:20 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 15:07:20 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 15:07:23 \| GPUGRID \| Scheduler request completed: got 0 new tasks 13/01/2014 15:07:23 \| GPUGRID \| No tasks sent 13/01/2014 15:07:23 \| GPUGRID \| This computer has reached a limit on tasks in progress 13/01/2014 15:08:20 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:08:45 \| GPUGRID \| Finished upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_9 13/01/2014 15:08:49 \| GPUGRID \| Sending scheduler request: To report completed tasks. 13/01/2014 15:08:49 \| GPUGRID \| Reporting 1 completed tasks 13/01/2014 15:08:49 \| GPUGRID \| Not requesting tasks: don't need 13/01/2014 15:08:52 \| GPUGRID \| Scheduler request completed 13/01/2014 15:20:12 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 15:20:12 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 15:20:16 \| GPUGRID \| Scheduler request completed: got 1 new tasks
	ID: 34657 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34659 - Posted: 15 Jan 2014 \| 9:19:57 UTC - in response to Message 34657.
	There was a funny. I checked the other two unwanted WUs I got. The situation was the same; limit reached x 2 and unwanted WU arrived after the uploaded one finished.
	ID: 34659 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34660 - Posted: 15 Jan 2014 \| 9:29:35 UTC - in response to Message 34657. Last modified: 15 Jan 2014 \| 9:36:17 UTC
	The debug flag Jacob suggested is not on by default. The default logging gives the messages you see which are mostly just what the scheduler does but not why. If you turn on the flag Jacob suggested you will get additional messages that explain why the scheduler does what it does. The devil is in those details. 13/01/2014 14:36:37 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:40:08 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:42:22 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 14:48:09 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:07:20 \| GPUGRID \| Sending scheduler request: To fetch work. 13/01/2014 15:07:20 \| GPUGRID \| Requesting new tasks for NVIDIA 13/01/2014 15:07:23 \| GPUGRID \| Scheduler request completed: got 0 new tasks 13/01/2014 15:07:23 \| GPUGRID \| No tasks sent 13/01/2014 15:07:23 \| GPUGRID \| This computer has reached a limit on tasks in progress 13/01/2014 15:08:20 \| GPUGRID \| Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0 13/01/2014 15:08:45 \| GPUGRID \| Finished upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_9 The same task restarted 4X in 12 minutes, 5X in 24 minutes? Edit added: You have min. buffer = 0 and additional buffer = 0.1, IIUC. If you have min. buffer = 0 then you should also have additional buffer = 0. Or try min. buffer = 0.1 and additional buffer 0. ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34660 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 34661 - Posted: 15 Jan 2014 \| 9:54:42 UTC - in response to Message 34656. Last modified: 15 Jan 2014 \| 9:59:50 UTC
	I saw that a day or two ago also, though which machine I don't remember (single or dual cards). Which single cards are OK for GPUGrid? I use GTX 660s on both my machines (dual cards in one, single card in the other). I think it was very probably the single-card machine that had the problem, but I haven't seen it again. It was probably just a glitch in their server, though I may have been transferring cards between the machines around that time, and BOINC may have gotten confused as to which was which.
	ID: 34661 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34662 - Posted: 15 Jan 2014 \| 10:28:24 UTC - in response to Message 34661.
	I saw that a day or two ago also, though which machine I don't remember (single or dual cards). Which single cards are OK for GPUGrid? I use GTX 660s on both my machines (dual cards in one, single card in the other). I think it was very probably the single-card machine that had the problem, but I haven't seen it again. It was probably just a glitch in their server, though I may have been transferring cards between the machines around that time, and BOINC may have gotten confused as to which was which. I guess I should have ask "Who manufactures 660 singles?". I have two doubles in my rig but there's room for four singles!
	ID: 34662 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34663 - Posted: 15 Jan 2014 \| 11:32:57 UTC - in response to Message 34662.
	tomba, Jim is referring to a single quantity of cards, one card, not a single width card. You are misinterpreting his "single" to mean "single slot width". ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34663 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34665 - Posted: 15 Jan 2014 \| 11:46:08 UTC - in response to Message 34660.
	The debug flag Jacob suggested is not on by default. The default logging gives the messages you see which are mostly just what the scheduler does but not why. If you turn on the flag Jacob suggested you will get additional messages that explain why the scheduler does what it does. The devil is in those details. OK. cc_config.xml updated. The same task restarted 4X in 12 minutes, 5X in 24 minutes? Yep. That's what the log tells us. You have min. buffer = 0 and additional buffer = 0.1, IIUC. If you have min. buffer = 0 then you should also have additional buffer = 0. Or try min. buffer = 0.1 and additional buffer 0. Both are now set to zero.
	ID: 34665 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34666 - Posted: 15 Jan 2014 \| 11:51:02 UTC
	Just happened again, with cc_config.xml updated. New WU downloaded at 12:01:24. Active WU will complete at 13:00. 15/01/2014 11:51:50 \| \| Starting BOINC client version 7.2.33 for windows_x86_64 15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 15/01/2014 11:51:50 \| \| Data directory: C:\ProgramData\BOINC 15/01/2014 11:51:50 \| \| Running under account TOMBA 15/01/2014 11:51:50 \| \| CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 331.93, CUDA version 6.0, compute capability 3.0, 2048MB, 1962MB available, 1982 GFLOPS peak) 15/01/2014 11:51:50 \| \| OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 331.93, device version OpenCL 1.1 CUDA, 2048MB, 1962MB available, 1982 GFLOPS peak) 15/01/2014 11:51:50 \| \| Host name: XPS-435 15/01/2014 11:51:50 \| \| Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [Family 6 Model 26 Stepping 5] 15/01/2014 11:51:50 \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe 15/01/2014 11:51:50 \| \| OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 15/01/2014 11:51:50 \| \| Memory: 5.99 GB physical, 11.98 GB virtual 15/01/2014 11:51:50 \| \| Disk: 465.76 GB total, 374.30 GB free 15/01/2014 11:51:50 \| \| Local time is UTC +1 hours 15/01/2014 11:51:50 \| \| VirtualBox version: 4.3.4 15/01/2014 11:51:50 \| \| Config: use all coprocessors 15/01/2014 11:51:50 \| GPUGRID \| URL http://www.gpugrid.net/; Computer ID 157241; resource share 100 15/01/2014 11:51:50 \| GPUGRID \| General prefs: from GPUGRID (last modified 27-Aug-2013 13:11:17) 15/01/2014 11:51:50 \| GPUGRID \| Computer location: home 15/01/2014 11:51:50 \| GPUGRID \| General prefs: no separate prefs for home; using your defaults 15/01/2014 11:51:50 \| \| Reading preferences override file 15/01/2014 11:51:50 \| \| Preferences: 15/01/2014 11:51:50 \| \| max memory usage when active: 6134.97MB 15/01/2014 11:51:50 \| \| max memory usage when idle: 6134.97MB 15/01/2014 11:51:50 \| \| max disk usage: 232.88GB 15/01/2014 11:51:50 \| \| max CPUs used: 3 15/01/2014 11:51:50 \| \| max download rate: 2048000 bytes/sec 15/01/2014 11:51:50 \| \| max upload rate: 135004 bytes/sec 15/01/2014 11:51:50 \| \| (to change preferences, visit a project web site or select Preferences in the Manager) 15/01/2014 11:51:50 \| \| Not using a proxy 15/01/2014 11:51:55 \| GPUGRID \| project resumed by user 15/01/2014 11:51:56 \| GPUGRID \| Restarting task 39x756-NOELIA_FXArep-0-2-RND3960_0 using acemdlong version 814 (cuda55) in slot 0 15/01/2014 12:01:21 \| GPUGRID \| Sending scheduler request: To fetch work. 15/01/2014 12:01:21 \| GPUGRID \| Requesting new tasks for NVIDIA 15/01/2014 12:01:24 \| GPUGRID \| Scheduler request completed: got 1 new tasks 15/01/2014 12:01:26 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-LICENSE 15/01/2014 12:01:26 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/1c0/22x-SANTI_MAR419cap310-28-LICENSE 15/01/2014 12:01:26 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-COPYRIGHT 15/01/2014 12:01:26 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/230/22x-SANTI_MAR419cap310-28-COPYRIGHT 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-LICENSE 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] Throughput 9572 bytes/sec 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:28 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-COPYRIGHT 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] Throughput 1583 bytes/sec 15/01/2014 12:01:28 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/19f/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1 15/01/2014 12:01:28 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2 15/01/2014 12:01:28 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/388/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:33 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] Throughput 130346 bytes/sec 15/01/2014 12:01:33 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3 15/01/2014 12:01:33 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/81/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:38 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] Throughput 93616 bytes/sec 15/01/2014 12:01:38 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-pdb_file 15/01/2014 12:01:38 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/103/22x-SANTI_MAR419cap310-28-pdb_file 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:01:42 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] Throughput 48728 bytes/sec 15/01/2014 12:01:42 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-psf_file 15/01/2014 12:01:42 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/136/22x-SANTI_MAR419cap310-28-psf_file 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:01 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-pdb_file 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] Throughput 100795 bytes/sec 15/01/2014 12:02:01 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-par_file 15/01/2014 12:02:01 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/26d/22x-SANTI_MAR419cap310-28-par_file 15/01/2014 12:02:07 \| GPUGRID \| Restarting task 39x756-NOELIA_FXArep-0-2-RND3960_0 using acemdlong version 814 (cuda55) in slot 0 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:08 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-psf_file 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] Throughput 179525 bytes/sec 15/01/2014 12:02:08 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-conf_file_enc 15/01/2014 12:02:08 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/3c4/22x-SANTI_MAR419cap310-28-conf_file_enc 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:09 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-conf_file_enc 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] Throughput 2754 bytes/sec 15/01/2014 12:02:09 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-metainp_file 15/01/2014 12:02:09 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/183/22x-SANTI_MAR419cap310-28-metainp_file 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:10 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-metainp_file 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] Throughput 166 bytes/sec 15/01/2014 12:02:10 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7 15/01/2014 12:02:10 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/2cd/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-par_file 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] Throughput 75904 bytes/sec 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:11 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] Throughput 0 bytes/sec 15/01/2014 12:02:11 \| GPUGRID \| Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10 15/01/2014 12:02:11 \| GPUGRID \| [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/13e/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10 15/01/2014 12:02:12 \| GPUGRID \| [file_xfer] http op done; retval 0 (Success) 15/01/2014 12:02:12 \| GPUGRID \| [file_xfer] file transfer status 0 (Success) 15/01/2014 12:02:12 \| GPUGRID \| Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10 15/01/2014 12:02:12 \| GPUGRID \| [file_xfer] Throughput 211 bytes/sec
	ID: 34666 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34668 - Posted: 15 Jan 2014 \| 12:24:11 UTC - in response to Message 34663.
	tomba, Jim is referring to a single quantity of cards, one card, not a single width card. You are misinterpreting his "single" to mean "single slot width". Oops... Thanks for the heads-up!
	ID: 34668 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34670 - Posted: 15 Jan 2014 \| 13:20:33 UTC - in response to Message 34666.
	Just happened again, with cc_config.xml updated. New WU downloaded at 12:01:24. Active WU will complete at 13:00. 15/01/2014 11:51:50 \| \| Starting BOINC client version 7.2.33 for windows_x86_64 15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>. I'll turn you back over to Jacob for help interpreting the info <work_fetch_debug> will spit out. It's Greek to me. ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34670 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34671 - Posted: 15 Jan 2014 \| 13:35:34 UTC
	Once I can see work_fetch_debug output, I might be able to diagnose the situation, as I am quite familiar with it. I even worked with David to improve work fetch (and work_fetch_debug output) in the most recent release of BOINC. Still waiting for work_fetch_debug output.
	ID: 34671 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34672 - Posted: 15 Jan 2014 \| 13:45:15 UTC
	Matt Harvey quite recently implemented a technique called 'boinc_temporary_exit' to try and reduce the number of total task failures (that was app version 8.14). If things start to go a bit wobbly with a task, GPUGrid tells it to stop, take a deep breath, and try again later. By 'take a deep breath', I mean that GPUGrid tells BOINC not to re-run the same task immediately, but to wait at least a few seconds - I don't know how long: you would have to enable yet another logging flag - 'task_debug' - in cc_config to see that. If tasks are exiting so frequently, and if you don't carry a spare at all times, that would explain the early WU downloads: task exits task is waiting before it can run again BOINC has nothing to do BOINC requests new task GPUGrid allocates new task BOINC starts downloading files scheduling delay on original task expires original task is re-started or something like that. In order to solve the 'early download' problem, you first have to find the cause of the temporary exits. It'll be a mild form of the 'GPUGrid stresses your GPU harder than anything else' issue that we've been discussing in the SANTI thread.
	ID: 34672 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34675 - Posted: 15 Jan 2014 \| 14:52:48 UTC - in response to Message 34670.
	15/01/2014 11:51:50 \| \| log flags: file_xfer, sched_ops, task, file_xfer_debug 15/01/2014 11:51:50 \| \| Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>. cc_config.xml fixed. I really must listen to instructions!!
	ID: 34675 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34676 - Posted: 15 Jan 2014 \| 14:57:31 UTC
	Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch
	ID: 34676 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34677 - Posted: 15 Jan 2014 \| 15:22:14 UTC - in response to Message 34676.
	Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch It's normal when --- state for NVIDIA --- saturated 30261.82 [seconds] is larger than target work buffer: 180.00 + 0.00 sec[onds] - in other words, you have enough work for now, and don't need any more.
	ID: 34677 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34678 - Posted: 15 Jan 2014 \| 15:22:25 UTC - in response to Message 34676. Last modified: 15 Jan 2014 \| 15:33:24 UTC
	Here's the log of a work_fetch cycle. Is the line in red normal? 15/01/2014 15:56:42 \| \| [work_fetch] entering choose_project() 15/01/2014 15:56:42 \| \| [work_fetch] ------- start work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] target work buffer: 180.00 + 0.00 sec 15/01/2014 15:56:42 \| \| [work_fetch] --- project states --- 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] REC 71611.135 prio -48.201093 can req work 15/01/2014 15:56:42 \| \| [work_fetch] --- state for CPU --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 15/01/2014 15:56:42 \| \| [work_fetch] --- state for NVIDIA --- 15/01/2014 15:56:42 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00 15/01/2014 15:56:42 \| GPUGRID \| [work_fetch] fetch share 1.000 15/01/2014 15:56:42 \| \| [work_fetch] ------- end work fetch state ------- 15/01/2014 15:56:42 \| \| [work_fetch] No project chosen for work fetch Let's teach you how to read this. target work buffer: ...says you need work to keep busy for at least "180" seconds (that's the 3 minutes I was talking about earlier, where even if you set min_buffer to 0, BOINC uses 3 minutes intentionally, since it could take around 3 minutes to ask projects for work) This line also equates to "when getting work, try not to get much more than: 180.00 + 0.00", which takes your max_addition_buffer setting into account. For reference, I use 0.1 days and 0.5 days for my buffer settings. So, my line says: target work buffer: 8640.00 + 43200.00 sec project states: ... GPUGrid is listed as "can req work". If you had it set for no new tasks, or suspended, it would be noted here, and then excluded from work fetch operations. state for CPU: shortfall 540 means that, in order to keep all your CPUs busy for that min_buffer setting, you'd need 540 instance seconds of CPU work. nidle 3 means that you have 3 CPUs that are currently completely idle. (Note: This saddens me, might prove beneficial to put those to work with some CPU projects). Notice that the GPUGrid entry in that block says (no apps), meaning that the project told BOINC it doesn't have CPU apps, and BOINC won't ever request CPU work from it. state for NVIDIA: shortfall 0 means that, in order to keep all your NVIDIA GPUs busy for that min_buffer setting, you'd need 0 seconds. In fact, you have saturation, meaning that all instances are projected to be busy for 30261.82 seconds (8.4 hours). end work fetch state: Here is where it makes a decision, based on the info above, of whether to request work from a project or not. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: You have no idle NVIDIA devices, and also your saturation level (30261.82 seconds) is greater than your low water mark (180 seconds), so you don't need NVIDIA work either. So it correctly says "No project chosen for work fetch", and doesn't request work. Does that help? You should now be ready to read these log messages on your own, I'd think. Feel free to change some buffer values, or set GPUGrid for No New Tasks, to see the effects on this work_fetch_debug output.
	ID: 34678 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34680 - Posted: 15 Jan 2014 \| 16:58:18 UTC - in response to Message 34678.
	Let's teach you how to read this. Thanks for that, Jacob. I shall study it carefully. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: Yes. I've been feeling guilty about that. So I'm now running six Rosettas too. A bit worried that the CPU fan has gone from 3700 rpm to 4300rpm and the CPU temperature has gone from 55C to 64C but I guess that's a question for my other thread.
	ID: 34680 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34681 - Posted: 15 Jan 2014 \| 17:08:56 UTC - in response to Message 34680. Last modified: 15 Jan 2014 \| 17:15:13 UTC
	Note 1: Richard's previous post in this thread, is likely correct. Note 2: REC is Recent estimated credit, and is used by BOINC in the "prio" priority calculation when choosing which project to ask for work. The projects are listed in "prio" order, such that you can easily see which would be "next in line" in a request for work. Note 3: In case you're curious to see a more-involved work fetch cycle, or might be wanting a list of projects that I'm attached to, below is a work_fetch_debug that shows the projects I'm attached to. I run a lot of various CPU and NVIDIA projects on this machine. Note 4: Further information about work_fetch can be found in this slightly outdated, but highly useful, document: http://boinc.berkeley.edu/trac/wiki/ClientSched ------------------------------ A cycle of my work fetch: ------------------------------ 1/15/2014 12:09:59 PM \| \| [work_fetch] entering choose_project() 1/15/2014 12:09:59 PM \| \| [work_fetch] ------- start work fetch state ------- 1/15/2014 12:09:59 PM \| \| [work_fetch] target work buffer: 8640.00 + 43200.00 sec 1/15/2014 12:09:59 PM \| \| [work_fetch] --- project states --- 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] REC 0.126 prio -0.000000 can't req work: master URL fetch pending (backoff: 43700.11 sec) 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] REC 0.000 prio -0.000000 can't req work: "no new tasks" requested via Manager 1/15/2014 12:09:59 PM \| Quake-Catcher Network \| [work_fetch] REC 0.000 prio 0.000000 can't req work: non CPU intensive 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] REC 0.000 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] REC 0.014 prio -0.000000 can req work 1/15/2014 12:09:59 PM \| WUProp@Home \| [work_fetch] REC 0.014 prio -0.000002 can't req work: non CPU intensive 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] REC 110.106 prio -0.002449 can req work 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] REC 221.382 prio -0.004923 can req work 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] REC 221.453 prio -0.004925 can req work 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] REC 248.218 prio -0.005520 can req work 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] REC 1063.070 prio -0.006005 can req work 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] REC 287.694 prio -0.007551 can req work 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] REC 304.466 prio -0.007763 can req work 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] REC 306.047 prio -0.010929 can req work 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] REC 288.875 prio -0.011575 can req work 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] REC 590.890 prio -0.013141 can req work 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] REC 251.853 prio -0.022648 can req work 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] REC 229.825 prio -0.058248 can req work 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] REC 446.313 prio -0.070412 can req work 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] REC 662565.023 prio -14.773922 can req work 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] REC 76065.475 prio -169.162250 can req work 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] REC 80649.354 prio -179.356354 can req work 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] REC 80680.357 prio -179.425301 can req work 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] REC 86519.931 prio -192.412500 can req work 1/15/2014 12:09:59 PM \| \| [work_fetch] --- state for CPU --- 1/15/2014 12:09:59 PM \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 302655.24 busy 0.00 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] fetch share 0.190 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] fetch share 0.048 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| \| [work_fetch] --- state for NVIDIA --- 1/15/2014 12:09:59 PM \| \| [work_fetch] shortfall 68058.62 nidle 0.00 saturated 8942.84 busy 0.00 1/15/2014 12:09:59 PM \| DrugDiscovery \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| The Lattice Project \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| superlinkattechnion \| [work_fetch] fetch share 0.000 1/15/2014 12:09:59 PM \| pogs \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| ralph@home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| DNA@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| correlizer \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| MindModeling@Beta \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| LHC@home 1.0 \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Test4Theory@Home \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| boincsimap \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| World Community Grid \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| rosetta@home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| climateprediction.net \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Cosmology@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| Docking \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| Poem@Home \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| climateathome \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| Milkyway@Home \| [work_fetch] fetch share 0.000 (blocked by configuration file) 1/15/2014 12:09:59 PM \| RNA World \| [work_fetch] fetch share 0.000 (no apps) 1/15/2014 12:09:59 PM \| GPUGRID \| [work_fetch] fetch share 0.111 1/15/2014 12:09:59 PM \| SETI@home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| Einstein@Home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| SETI@home Beta Test \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| Albert@Home \| [work_fetch] fetch share 0.001 1/15/2014 12:09:59 PM \| \| [work_fetch] ------- end work fetch state ------- 1/15/2014 12:09:59 PM \| \| [work_fetch] No project chosen for work fetch
	ID: 34681 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34757 - Posted: 22 Jan 2014 \| 7:58:47 UTC
	Just had an early download; eight CPUs and two GPUs busy and no sign of a GPUGrid WU stopping: 22/01/2014 08:07:49 \| GPUGRID \| [work_fetch] fetch share 1.000 22/01/2014 08:07:49 \| \| [work_fetch] ------- end work fetch state ------- 22/01/2014 08:07:49 \| \| [work_fetch] No project chosen for work fetch 22/01/2014 08:08:29 \| \| [work_fetch] Request work fetch: application exited 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] REC 272425.128 prio -2.579669 can req work 22/01/2014 08:08:29 \| \| [work_fetch] --- state for CPU --- 22/01/2014 08:08:29 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2030.85 busy 0.00 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 22/01/2014 08:08:29 \| \| [work_fetch] --- state for NVIDIA --- 22/01/2014 08:08:29 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] fetch share 0.000 22/01/2014 08:08:29 \| \| [work_fetch] ------- end work fetch state ------- 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 22/01/2014 08:08:29 \| GPUGRID \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 22/01/2014 08:08:29 \| GPUGRID \| Sending scheduler request: To fetch work. 22/01/2014 08:08:29 \| GPUGRID \| Requesting new tasks for NVIDIA 22/01/2014 08:08:32 \| GPUGRID \| Scheduler request completed: got 1 new tasks 22/01/2014 08:08:32 \| \| [work_fetch] Request work fetch: RPC complete 22/01/2014 08:08:34 \| GPUGRID \| Started download of 72x-SANTI_MAR420cap310-29-LICENSE
	ID: 34757 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34764 - Posted: 22 Jan 2014 \| 14:04:15 UTC
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call.
	ID: 34764 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34765 - Posted: 22 Jan 2014 \| 15:52:52 UTC - in response to Message 34764.
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. It might be possible to work out what happened from the message log entries immediately before and after the section Tomba posted. Did a task restart, for example? The trouble with the extra log flags is that you can't use them retrospectively to diagnose a problem which has already happened - you have to set them anyway, and wait for it to happen again.
	ID: 34765 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34769 - Posted: 22 Jan 2014 \| 17:38:45 UTC - in response to Message 34764.
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call. OK. I'll check 'em out.
	ID: 34769 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34771 - Posted: 22 Jan 2014 \| 18:27:56 UTC - in response to Message 34769.
	Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct? As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes. Unfortunately, a task stopping isn't necessarily logged with normal settings - I think you'd need to add <task_debug> to be sure of seeing that. But you should see the restart afterwards, in the normal logs (I think).
	ID: 34771 \| Rating: 0 \| rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 34793 - Posted: 23 Jan 2014 \| 18:11:30 UTC
	We start with two WUs confirmed running followed by a couple of nidles of 0. Then - oops - only one WU is running. One more 0 nidle then a 1 nidle, and one more 0 nidle!! Then we have a 1 "nidle_now", followed by an early WU fetch.... 23/01/2014 12:17:43 \| GPUGRID \| [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:17:43 \| GPUGRID \| [coproc] NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 23/01/2014 12:18:10 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:10 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:10 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:10 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] REC 275978.848 prio -3.498618 can req work 23/01/2014 12:18:10 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:10 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2568.97 busy 0.00 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:10 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:10 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 4775.59 busy 0.00 23/01/2014 12:18:10 \| GPUGRID \| [work_fetch] fetch share 0.500 23/01/2014 12:18:10 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:10 \| \| [work_fetch] No project chosen for work fetch 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited 23/01/2014 12:18:13 \| GPUGRID \| [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0 23/01/2014 12:18:15 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:15 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:15 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:15 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] REC 275979.860 prio -3.498092 can req work 23/01/2014 12:18:15 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:15 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2560.84 busy 0.00 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:15 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:15 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:15 \| GPUGRID \| [work_fetch] fetch share 0.000 23/01/2014 12:18:15 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:19 \| \| [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:24 \| \| [work_fetch] entering choose_project() 23/01/2014 12:18:24 \| \| [work_fetch] ------- start work fetch state ------- 23/01/2014 12:18:24 \| \| [work_fetch] target work buffer: 180.00 + 864.00 sec 23/01/2014 12:18:24 \| \| [work_fetch] --- project states --- 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] REC 275979.860 prio -2.505735 can req work 23/01/2014 12:18:24 \| \| [work_fetch] --- state for CPU --- 23/01/2014 12:18:24 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 2546.51 busy 0.00 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] fetch share 0.000 (no apps) 23/01/2014 12:18:24 \| \| [work_fetch] --- state for NVIDIA --- 23/01/2014 12:18:24 \| \| [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] fetch share 0.000 23/01/2014 12:18:24 \| \| [work_fetch] ------- end work fetch state ------- 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000 23/01/2014 12:18:24 \| GPUGRID \| [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst) 23/01/2014 12:18:24 \| GPUGRID \| Sending scheduler request: To fetch work. 23/01/2014 12:18:24 \| GPUGRID \| Requesting new tasks for NVIDIA 23/01/2014 12:18:27 \| GPUGRID \| Scheduler request completed: got 1 new tasks 23/01/2014 12:18:27 \| \| [work_fetch] Request work fetch: RPC complete 23/01/2014 12:18:29 \| GPUGRID \| Started download of 98x-SANTI_MARwtcap310-30-LICENSE
	ID: 34793 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34794 - Posted: 23 Jan 2014 \| 18:40:43 UTC
	23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited Any idea what application exited, causing the work fetch request? Also, are you using CPU Throttling (The "Use at most X% CPU Time" setting)? Also, can you please include the first messages at the beginning of the event log, so we can see what version you are using? I agree this looks a bit suspicious, but it sounds like a GPU task got unloaded, and work fetch decided to fill an idle spot, even if the timing isn't exactly perfect.
	ID: 34794 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34795 - Posted: 23 Jan 2014 \| 19:12:50 UTC - in response to Message 34793.
	I think that log sequence is pretty definitive. There are two sets of nidle: The --- state for CPU --- remains at zero throughout. No problems there. The --- state for NVIDIA --- starts at 0, jumps to 1, and then drops to 0 again. At the point of the jump, we can see 23/01/2014 12:18:13 \| \| [work_fetch] Request work fetch: application exited and NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 disappears from the record. That's result 7689115, which you can see has a pause in the middle: # The simulation has become unstable. Terminating to avoid lock-up (1) # Attempting restart (step 6109000) I imagine that if you look a bit further down, you'd see, perhaps first a 'restarting' entry for 75x-SANTI_MARwtcap310-22-32-RND0081_0, and then two task instances being confirmed again at each [coproc] step. The good news is that 75x-SANTI_MARwtcap310-22-32-RND0081_0 completed successfully and validated, despite the pause in the middle.
	ID: 34795 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 34796 - Posted: 23 Jan 2014 \| 19:16:35 UTC - in response to Message 34795.
	And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right?
	ID: 34796 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,295,466,723 RAC: 18,414,230 Level Scientific publications	Message 34797 - Posted: 23 Jan 2014 \| 19:51:32 UTC - in response to Message 34796.
	And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right? That's my guess. And I'm also guessing that BOINC restarted the missing 75x-SANTI_MARwtcap310-22-32-RND0081_0 (allowing it to run to completion and report success), before the file downloads for the replacement - probably result 7690601 - had completed and allowed it to be started on the idle GPU.
	ID: 34797 \| Rating: 0 \| rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 34799 - Posted: 23 Jan 2014 \| 21:55:29 UTC - in response to Message 34797.
	Makes sense to me. Thanks for the lesson in debug message interpretation, Richard and Jacob, I swear I'll get it eventually. So what's causing the simulation to become unstable and pause to catch its breath? Clocks too high? Shouldn't the client recognize the pause as a temporary suspend and not request more work? ____________ BOINC <<--- credit whores, pedants, alien hunters
	ID: 34799 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 34802 - Posted: 24 Jan 2014 \| 10:11:27 UTC
	Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? It rings a bell.
	ID: 34802 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 34803 - Posted: 24 Jan 2014 \| 10:29:50 UTC - in response to Message 34802.
	Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH
	ID: 34803 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2353 Credit: 16,375,531,916 RAC: 4,974,124 Level Scientific publications	Message 34814 - Posted: 25 Jan 2014 \| 13:55:44 UTC - in response to Message 34803.
	Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that? Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks. In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt. MJH Exactly that happened when my Gigabyte GTX 780Ti OC was unreliable.
	ID: 34814 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 34829 - Posted: 26 Jan 2014 \| 21:20:56 UTC - in response to Message 34814. Last modified: 26 Jan 2014 \| 21:28:42 UTC
	While that might be the case now (compare Boinc logs to WU logs), Tomba reported this problem before 8.15 was being used in the Long queue. That said, under the 8.14app the task would have exited anyway with a driver restart. Just how well it did this is unknown and might have been an issue, but you are not going to see that again. Anyway, during one of many possible driver restarts (which I presume Boinc doesn't know anything about) Boinc probably asked for another task. Boinc isn't as nippy at asking for work these days (lazy or considered you decide), but with say 7 driver restarts there is a strong chance it did on at least one occasion. While the new app asks that a problem task is not immediately run again on the same GPU, and seems to prefer to try to run the task on another GPU (if available), I'm not aware of the method of doing this. If it transpires that another task is still being requested it might be better to allow the GPU in question 2 or 3min to cool down (literally in many cases) without running another WU. Even if it takes that long to download another task, it's still unwanted and unnecessary. Although we cannot suspend a specific GPU through Boinc's GUI, it's possible to edit the cc_config file (and tell Boinc to read it), so you could tell Boinc to not use a specific GPU, and then re-edit (and re-read) the cc_config file again (after time=t). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 34829 \| Rating: 0 \| rate: / Reply Quote