Advanced search

Message boards : Number crunching : Early WU Downloads

Author Message
tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34640 - Posted: 13 Jan 2014 | 16:51:14 UTC

In the last 24 hours, one of my rigs, with a single GPU, has on three occasions had another WU downloaded very soon after the current WU has started.

Network Usage is set to "Max additional work buffer" = 0.01 days, as it has been for years!!

Am I missing something??

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34641 - Posted: 13 Jan 2014 | 17:40:31 UTC

It just did it again! 20% of the current WU done and another is downloading!!

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 231
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34642 - Posted: 13 Jan 2014 | 19:27:54 UTC

What is the 'Minimum work buffer' set to?

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34643 - Posted: 13 Jan 2014 | 19:37:09 UTC - in response to Message 34642.

What is the 'Minimum work buffer' set to?


0.00...

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34647 - Posted: 14 Jan 2014 | 6:59:32 UTC - in response to Message 34643.

Max work buffer must be 0.00 as well.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34651 - Posted: 14 Jan 2014 | 14:14:54 UTC - in response to Message 34647.
Last modified: 14 Jan 2014 | 14:15:05 UTC

If it predicts it will be out of work within 3 minutes, it will download more work, regardless of the min_buffer settings.
You can turn on the <work_fetch_debug> flag, to get more logging information, for you and for us to look at.
See http://boinc.berkeley.edu/wiki/Client_configuration

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34652 - Posted: 14 Jan 2014 | 17:43:39 UTC - in response to Message 34640.

In the last 24 hours, one of my rigs, with a single GPU, has on three occasions had another WU downloaded very soon after the current WU has started.

Network Usage is set to "Max additional work buffer" = 0.01 days, as it has been for years!!

I saw that a day or two ago also, though which machine I don't remember (single or dual cards). But I keep a tight buffer too, though not quite that tight; usually about 0.10 days min, 0.05 days additional.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34656 - Posted: 15 Jan 2014 | 8:09:31 UTC - in response to Message 34652.

I saw that a day or two ago also, though which machine I don't remember (single or dual cards).

Which single cards are OK for GPUGrid?

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34657 - Posted: 15 Jan 2014 | 8:52:59 UTC - in response to Message 34651.

You can turn on the <work_fetch_debug> flag, to get more logging information, for you and for us to look at.
See http://boinc.berkeley.edu/wiki/Client_configuration

Seems that flag is enabled by default, so I checked out the log around the time I got an unwanted WU.

There was a funny. Twice I got "This computer has reached a limit on tasks in progress" and the active WU restarted.

It was after the uploading WU was finished that I got the unwanted WU.

13/01/2014 14:21:16 | GPUGRID | Starting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:21:29 | GPUGRID | Started upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_0
[removed other upload entries]
13/01/2014 14:35:40 | GPUGRID | Sending scheduler request: To fetch work.
13/01/2014 14:35:40 | GPUGRID | Requesting new tasks for NVIDIA
13/01/2014 14:35:43 | GPUGRID | Scheduler request completed: got 0 new tasks
13/01/2014 14:35:43 | GPUGRID | No tasks sent
13/01/2014 14:35:43 | GPUGRID | This computer has reached a limit on tasks in progress
13/01/2014 14:36:37 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:40:08 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:42:22 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:48:09 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 15:07:20 | GPUGRID | Sending scheduler request: To fetch work.
13/01/2014 15:07:20 | GPUGRID | Requesting new tasks for NVIDIA
13/01/2014 15:07:23 | GPUGRID | Scheduler request completed: got 0 new tasks
13/01/2014 15:07:23 | GPUGRID | No tasks sent
13/01/2014 15:07:23 | GPUGRID | This computer has reached a limit on tasks in progress
13/01/2014 15:08:20 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 15:08:45 | GPUGRID | Finished upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_9
13/01/2014 15:08:49 | GPUGRID | Sending scheduler request: To report completed tasks.
13/01/2014 15:08:49 | GPUGRID | Reporting 1 completed tasks
13/01/2014 15:08:49 | GPUGRID | Not requesting tasks: don't need
13/01/2014 15:08:52 | GPUGRID | Scheduler request completed
13/01/2014 15:20:12 | GPUGRID | Sending scheduler request: To fetch work.
13/01/2014 15:20:12 | GPUGRID | Requesting new tasks for NVIDIA
13/01/2014 15:20:16 | GPUGRID | Scheduler request completed: got 1 new tasks

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34659 - Posted: 15 Jan 2014 | 9:19:57 UTC - in response to Message 34657.

There was a funny.

I checked the other two unwanted WUs I got. The situation was the same; limit reached x 2 and unwanted WU arrived after the uploaded one finished.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34660 - Posted: 15 Jan 2014 | 9:29:35 UTC - in response to Message 34657.
Last modified: 15 Jan 2014 | 9:36:17 UTC

The debug flag Jacob suggested is not on by default. The default logging gives the messages you see which are mostly just what the scheduler does but not why. If you turn on the flag Jacob suggested you will get additional messages that explain why the scheduler does what it does. The devil is in those details.

13/01/2014 14:36:37 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:40:08 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:42:22 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 14:48:09 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 15:07:20 | GPUGRID | Sending scheduler request: To fetch work.
13/01/2014 15:07:20 | GPUGRID | Requesting new tasks for NVIDIA
13/01/2014 15:07:23 | GPUGRID | Scheduler request completed: got 0 new tasks
13/01/2014 15:07:23 | GPUGRID | No tasks sent
13/01/2014 15:07:23 | GPUGRID | This computer has reached a limit on tasks in progress
13/01/2014 15:08:20 | GPUGRID | Restarting task 38x23-NOELIA_FXArep-0-2-RND0942_0 using acemdlong version 814 (cuda42) in slot 0
13/01/2014 15:08:45 | GPUGRID | Finished upload of 31x-SANTI_MAR422cap310-19-32-RND2983_0_9


The same task restarted 4X in 12 minutes, 5X in 24 minutes?

Edit added:

You have min. buffer = 0 and additional buffer = 0.1, IIUC. If you have min. buffer = 0 then you should also have additional buffer = 0. Or try min. buffer = 0.1 and additional buffer 0.
____________
BOINC <<--- credit whores, pedants, alien hunters

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34661 - Posted: 15 Jan 2014 | 9:54:42 UTC - in response to Message 34656.
Last modified: 15 Jan 2014 | 9:59:50 UTC

I saw that a day or two ago also, though which machine I don't remember (single or dual cards).

Which single cards are OK for GPUGrid?

I use GTX 660s on both my machines (dual cards in one, single card in the other). I think it was very probably the single-card machine that had the problem, but I haven't seen it again. It was probably just a glitch in their server, though I may have been transferring cards between the machines around that time, and BOINC may have gotten confused as to which was which.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34662 - Posted: 15 Jan 2014 | 10:28:24 UTC - in response to Message 34661.

I saw that a day or two ago also, though which machine I don't remember (single or dual cards).

Which single cards are OK for GPUGrid?

I use GTX 660s on both my machines (dual cards in one, single card in the other). I think it was very probably the single-card machine that had the problem, but I haven't seen it again. It was probably just a glitch in their server, though I may have been transferring cards between the machines around that time, and BOINC may have gotten confused as to which was which.

I guess I should have ask "Who manufactures 660 singles?". I have two doubles in my rig but there's room for four singles!

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34663 - Posted: 15 Jan 2014 | 11:32:57 UTC - in response to Message 34662.

tomba,

Jim is referring to a single quantity of cards, one card, not a single width card. You are misinterpreting his "single" to mean "single slot width".

____________
BOINC <<--- credit whores, pedants, alien hunters

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34665 - Posted: 15 Jan 2014 | 11:46:08 UTC - in response to Message 34660.

The debug flag Jacob suggested is not on by default. The default logging gives the messages you see which are mostly just what the scheduler does but not why. If you turn on the flag Jacob suggested you will get additional messages that explain why the scheduler does what it does. The devil is in those details.

OK. cc_config.xml updated.


The same task restarted 4X in 12 minutes, 5X in 24 minutes?


Yep. That's what the log tells us.
You have min. buffer = 0 and additional buffer = 0.1, IIUC. If you have min. buffer = 0 then you should also have additional buffer = 0. Or try min. buffer = 0.1 and additional buffer 0.

Both are now set to zero.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34666 - Posted: 15 Jan 2014 | 11:51:02 UTC

Just happened again, with cc_config.xml updated.

New WU downloaded at 12:01:24. Active WU will complete at 13:00.

15/01/2014 11:51:50 | | Starting BOINC client version 7.2.33 for windows_x86_64
15/01/2014 11:51:50 | | log flags: file_xfer, sched_ops, task, file_xfer_debug
15/01/2014 11:51:50 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
15/01/2014 11:51:50 | | Data directory: C:\ProgramData\BOINC
15/01/2014 11:51:50 | | Running under account TOMBA
15/01/2014 11:51:50 | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 331.93, CUDA version 6.0, compute capability 3.0, 2048MB, 1962MB available, 1982 GFLOPS peak)
15/01/2014 11:51:50 | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 331.93, device version OpenCL 1.1 CUDA, 2048MB, 1962MB available, 1982 GFLOPS peak)
15/01/2014 11:51:50 | | Host name: XPS-435
15/01/2014 11:51:50 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [Family 6 Model 26 Stepping 5]
15/01/2014 11:51:50 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe
15/01/2014 11:51:50 | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
15/01/2014 11:51:50 | | Memory: 5.99 GB physical, 11.98 GB virtual
15/01/2014 11:51:50 | | Disk: 465.76 GB total, 374.30 GB free
15/01/2014 11:51:50 | | Local time is UTC +1 hours
15/01/2014 11:51:50 | | VirtualBox version: 4.3.4
15/01/2014 11:51:50 | | Config: use all coprocessors
15/01/2014 11:51:50 | GPUGRID | URL http://www.gpugrid.net/; Computer ID 157241; resource share 100
15/01/2014 11:51:50 | GPUGRID | General prefs: from GPUGRID (last modified 27-Aug-2013 13:11:17)
15/01/2014 11:51:50 | GPUGRID | Computer location: home
15/01/2014 11:51:50 | GPUGRID | General prefs: no separate prefs for home; using your defaults
15/01/2014 11:51:50 | | Reading preferences override file
15/01/2014 11:51:50 | | Preferences:
15/01/2014 11:51:50 | | max memory usage when active: 6134.97MB
15/01/2014 11:51:50 | | max memory usage when idle: 6134.97MB
15/01/2014 11:51:50 | | max disk usage: 232.88GB
15/01/2014 11:51:50 | | max CPUs used: 3
15/01/2014 11:51:50 | | max download rate: 2048000 bytes/sec
15/01/2014 11:51:50 | | max upload rate: 135004 bytes/sec
15/01/2014 11:51:50 | | (to change preferences, visit a project web site or select Preferences in the Manager)
15/01/2014 11:51:50 | | Not using a proxy
15/01/2014 11:51:55 | GPUGRID | project resumed by user
15/01/2014 11:51:56 | GPUGRID | Restarting task 39x756-NOELIA_FXArep-0-2-RND3960_0 using acemdlong version 814 (cuda55) in slot 0
15/01/2014 12:01:21 | GPUGRID | Sending scheduler request: To fetch work.
15/01/2014 12:01:21 | GPUGRID | Requesting new tasks for NVIDIA
15/01/2014 12:01:24 | GPUGRID | Scheduler request completed: got 1 new tasks
15/01/2014 12:01:26 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-LICENSE

15/01/2014 12:01:26 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/1c0/22x-SANTI_MAR419cap310-28-LICENSE
15/01/2014 12:01:26 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-COPYRIGHT
15/01/2014 12:01:26 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/230/22x-SANTI_MAR419cap310-28-COPYRIGHT
15/01/2014 12:01:28 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:01:28 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:01:28 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:01:28 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-LICENSE
15/01/2014 12:01:28 | GPUGRID | [file_xfer] Throughput 9572 bytes/sec
15/01/2014 12:01:28 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:01:28 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-COPYRIGHT
15/01/2014 12:01:28 | GPUGRID | [file_xfer] Throughput 1583 bytes/sec
15/01/2014 12:01:28 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1
15/01/2014 12:01:28 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/19f/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1
15/01/2014 12:01:28 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2
15/01/2014 12:01:28 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/388/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2
15/01/2014 12:01:33 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:01:33 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:01:33 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_2
15/01/2014 12:01:33 | GPUGRID | [file_xfer] Throughput 130346 bytes/sec
15/01/2014 12:01:33 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3
15/01/2014 12:01:33 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/81/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3
15/01/2014 12:01:38 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:01:38 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:01:38 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_3
15/01/2014 12:01:38 | GPUGRID | [file_xfer] Throughput 93616 bytes/sec
15/01/2014 12:01:38 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-pdb_file
15/01/2014 12:01:38 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/103/22x-SANTI_MAR419cap310-28-pdb_file
15/01/2014 12:01:42 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:01:42 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:01:42 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_1
15/01/2014 12:01:42 | GPUGRID | [file_xfer] Throughput 48728 bytes/sec
15/01/2014 12:01:42 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-psf_file
15/01/2014 12:01:42 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/136/22x-SANTI_MAR419cap310-28-psf_file
15/01/2014 12:02:01 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:01 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:01 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-pdb_file
15/01/2014 12:02:01 | GPUGRID | [file_xfer] Throughput 100795 bytes/sec
15/01/2014 12:02:01 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-par_file
15/01/2014 12:02:01 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/26d/22x-SANTI_MAR419cap310-28-par_file
15/01/2014 12:02:07 | GPUGRID | Restarting task 39x756-NOELIA_FXArep-0-2-RND3960_0 using acemdlong version 814 (cuda55) in slot 0
15/01/2014 12:02:08 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:08 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:08 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-psf_file
15/01/2014 12:02:08 | GPUGRID | [file_xfer] Throughput 179525 bytes/sec
15/01/2014 12:02:08 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-conf_file_enc
15/01/2014 12:02:08 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/3c4/22x-SANTI_MAR419cap310-28-conf_file_enc
15/01/2014 12:02:09 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:09 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:09 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-conf_file_enc
15/01/2014 12:02:09 | GPUGRID | [file_xfer] Throughput 2754 bytes/sec
15/01/2014 12:02:09 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-metainp_file
15/01/2014 12:02:09 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/183/22x-SANTI_MAR419cap310-28-metainp_file
15/01/2014 12:02:10 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:10 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:10 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-metainp_file
15/01/2014 12:02:10 | GPUGRID | [file_xfer] Throughput 166 bytes/sec
15/01/2014 12:02:10 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7
15/01/2014 12:02:10 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/2cd/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7
15/01/2014 12:02:11 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:11 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:11 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:11 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-par_file
15/01/2014 12:02:11 | GPUGRID | [file_xfer] Throughput 75904 bytes/sec
15/01/2014 12:02:11 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:11 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_7
15/01/2014 12:02:11 | GPUGRID | [file_xfer] Throughput 0 bytes/sec
15/01/2014 12:02:11 | GPUGRID | Started download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10
15/01/2014 12:02:11 | GPUGRID | [file_xfer] URL: http://www.gpugrid.org/PS3GRID/download/13e/22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10
15/01/2014 12:02:12 | GPUGRID | [file_xfer] http op done; retval 0 (Success)
15/01/2014 12:02:12 | GPUGRID | [file_xfer] file transfer status 0 (Success)
15/01/2014 12:02:12 | GPUGRID | Finished download of 22x-SANTI_MAR419cap310-28-22x-SANTI_MAR419cap310-27-32-RND3945_10
15/01/2014 12:02:12 | GPUGRID | [file_xfer] Throughput 211 bytes/sec

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34668 - Posted: 15 Jan 2014 | 12:24:11 UTC - in response to Message 34663.

tomba,

Jim is referring to a single quantity of cards, one card, not a single width card. You are misinterpreting his "single" to mean "single slot width".

Oops... Thanks for the heads-up!

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34670 - Posted: 15 Jan 2014 | 13:20:33 UTC - in response to Message 34666.

Just happened again, with cc_config.xml updated.

New WU downloaded at 12:01:24. Active WU will complete at 13:00.

15/01/2014 11:51:50 | | Starting BOINC client version 7.2.33 for windows_x86_64
15/01/2014 11:51:50 | | log flags: file_xfer, sched_ops, task, file_xfer_debug
15/01/2014 11:51:50 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6


The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>.

I'll turn you back over to Jacob for help interpreting the info <work_fetch_debug> will spit out. It's Greek to me.

____________
BOINC <<--- credit whores, pedants, alien hunters

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34671 - Posted: 15 Jan 2014 | 13:35:34 UTC

Once I can see work_fetch_debug output, I might be able to diagnose the situation, as I am quite familiar with it. I even worked with David to improve work fetch (and work_fetch_debug output) in the most recent release of BOINC.

Still waiting for work_fetch_debug output.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,086,851
RAC: 8,770,537
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34672 - Posted: 15 Jan 2014 | 13:45:15 UTC

Matt Harvey quite recently implemented a technique called 'boinc_temporary_exit' to try and reduce the number of total task failures (that was app version 8.14). If things start to go a bit wobbly with a task, GPUGrid tells it to stop, take a deep breath, and try again later.

By 'take a deep breath', I mean that GPUGrid tells BOINC not to re-run the same task immediately, but to wait at least a few seconds - I don't know how long: you would have to enable yet another logging flag - 'task_debug' - in cc_config to see that.

If tasks are exiting so frequently, and if you don't carry a spare at all times, that would explain the early WU downloads:

task exits
task is waiting before it can run again
BOINC has nothing to do
BOINC requests new task
GPUGrid allocates new task
BOINC starts downloading files
scheduling delay on original task expires
original task is re-started

or something like that. In order to solve the 'early download' problem, you first have to find the cause of the temporary exits. It'll be a mild form of the 'GPUGrid stresses your GPU harder than anything else' issue that we've been discussing in the SANTI thread.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34675 - Posted: 15 Jan 2014 | 14:52:48 UTC - in response to Message 34670.


15/01/2014 11:51:50 | | log flags: file_xfer, sched_ops, task, file_xfer_debug
15/01/2014 11:51:50 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6

The red line lists the logging flags currently turned on. It looks like you turned on the <file_xfer_debug> instead of <work_fetch_debug>.

cc_config.xml fixed. I really must listen to instructions!!

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34676 - Posted: 15 Jan 2014 | 14:57:31 UTC

Here's the log of a work_fetch cycle. Is the line in red normal?

15/01/2014 15:56:42 | | [work_fetch] entering choose_project()
15/01/2014 15:56:42 | | [work_fetch] ------- start work fetch state -------
15/01/2014 15:56:42 | | [work_fetch] target work buffer: 180.00 + 0.00 sec
15/01/2014 15:56:42 | | [work_fetch] --- project states ---
15/01/2014 15:56:42 | GPUGRID | [work_fetch] REC 71611.135 prio -48.201093 can req work
15/01/2014 15:56:42 | | [work_fetch] --- state for CPU ---
15/01/2014 15:56:42 | | [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00
15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
15/01/2014 15:56:42 | | [work_fetch] --- state for NVIDIA ---
15/01/2014 15:56:42 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00
15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 1.000
15/01/2014 15:56:42 | | [work_fetch] ------- end work fetch state -------
15/01/2014 15:56:42 | | [work_fetch] No project chosen for work fetch

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,086,851
RAC: 8,770,537
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34677 - Posted: 15 Jan 2014 | 15:22:14 UTC - in response to Message 34676.

Here's the log of a work_fetch cycle. Is the line in red normal?

15/01/2014 15:56:42 | | [work_fetch] entering choose_project()
15/01/2014 15:56:42 | | [work_fetch] ------- start work fetch state -------
15/01/2014 15:56:42 | | [work_fetch] target work buffer: 180.00 + 0.00 sec
15/01/2014 15:56:42 | | [work_fetch] --- project states ---
15/01/2014 15:56:42 | GPUGRID | [work_fetch] REC 71611.135 prio -48.201093 can req work
15/01/2014 15:56:42 | | [work_fetch] --- state for CPU ---
15/01/2014 15:56:42 | | [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00
15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
15/01/2014 15:56:42 | | [work_fetch] --- state for NVIDIA ---
15/01/2014 15:56:42 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00
15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 1.000
15/01/2014 15:56:42 | | [work_fetch] ------- end work fetch state -------
15/01/2014 15:56:42 | | [work_fetch] No project chosen for work fetch

It's normal when

--- state for NVIDIA --- saturated 30261.82 [seconds]

is larger than

target work buffer: 180.00 + 0.00 sec[onds]

- in other words, you have enough work for now, and don't need any more.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34678 - Posted: 15 Jan 2014 | 15:22:25 UTC - in response to Message 34676.
Last modified: 15 Jan 2014 | 15:33:24 UTC

Here's the log of a work_fetch cycle. Is the line in red normal?

15/01/2014 15:56:42 | | [work_fetch] entering choose_project()
15/01/2014 15:56:42 | | [work_fetch] ------- start work fetch state -------
15/01/2014 15:56:42 | | [work_fetch] target work buffer: 180.00 + 0.00 sec
15/01/2014 15:56:42 | | [work_fetch] --- project states ---
15/01/2014 15:56:42 | GPUGRID | [work_fetch] REC 71611.135 prio -48.201093 can req work
15/01/2014 15:56:42 | | [work_fetch] --- state for CPU ---
15/01/2014 15:56:42 | | [work_fetch] shortfall 540.00 nidle 3.00 saturated 0.00 busy 0.00
15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
15/01/2014 15:56:42 | | [work_fetch] --- state for NVIDIA ---
15/01/2014 15:56:42 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 30261.82 busy 0.00
15/01/2014 15:56:42 | GPUGRID | [work_fetch] fetch share 1.000
15/01/2014 15:56:42 | | [work_fetch] ------- end work fetch state -------
15/01/2014 15:56:42 | | [work_fetch] No project chosen for work fetch


Let's teach you how to read this.

target work buffer:
...says you need work to keep busy for at least "180" seconds (that's the 3 minutes I was talking about earlier, where even if you set min_buffer to 0, BOINC uses 3 minutes intentionally, since it could take around 3 minutes to ask projects for work) This line also equates to "when getting work, try not to get much more than: 180.00 + 0.00", which takes your max_addition_buffer setting into account. For reference, I use 0.1 days and 0.5 days for my buffer settings. So, my line says: target work buffer: 8640.00 + 43200.00 sec

project states:
... GPUGrid is listed as "can req work". If you had it set for no new tasks, or suspended, it would be noted here, and then excluded from work fetch operations.

state for CPU:
shortfall 540 means that, in order to keep all your CPUs busy for that min_buffer setting, you'd need 540 instance seconds of CPU work. nidle 3 means that you have 3 CPUs that are currently completely idle. (Note: This saddens me, might prove beneficial to put those to work with some CPU projects). Notice that the GPUGrid entry in that block says (no apps), meaning that the project told BOINC it doesn't have CPU apps, and BOINC won't ever request CPU work from it.

state for NVIDIA:
shortfall 0 means that, in order to keep all your NVIDIA GPUs busy for that min_buffer setting, you'd need 0 seconds. In fact, you have saturation, meaning that all instances are projected to be busy for 30261.82 seconds (8.4 hours).

end work fetch state:
Here is where it makes a decision, based on the info above, of whether to request work from a project or not. You have no CPU projects available for your idle CPUs, so they get left idle :sadface: You have no idle NVIDIA devices, and also your saturation level (30261.82 seconds) is greater than your low water mark (180 seconds), so you don't need NVIDIA work either. So it correctly says "No project chosen for work fetch", and doesn't request work.

Does that help? You should now be ready to read these log messages on your own, I'd think. Feel free to change some buffer values, or set GPUGrid for No New Tasks, to see the effects on this work_fetch_debug output.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34680 - Posted: 15 Jan 2014 | 16:58:18 UTC - in response to Message 34678.

Let's teach you how to read this.

Thanks for that, Jacob. I shall study it carefully.

You have no CPU projects available for your idle CPUs, so they get left idle :sadface:

Yes. I've been feeling guilty about that. So I'm now running six Rosettas too. A bit worried that the CPU fan has gone from 3700 rpm to 4300rpm and the CPU temperature has gone from 55C to 64C but I guess that's a question for my other thread.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34681 - Posted: 15 Jan 2014 | 17:08:56 UTC - in response to Message 34680.
Last modified: 15 Jan 2014 | 17:15:13 UTC

Note 1: Richard's previous post in this thread, is likely correct.
Note 2: REC is Recent estimated credit, and is used by BOINC in the "prio" priority calculation when choosing which project to ask for work. The projects are listed in "prio" order, such that you can easily see which would be "next in line" in a request for work.
Note 3: In case you're curious to see a more-involved work fetch cycle, or might be wanting a list of projects that I'm attached to, below is a work_fetch_debug that shows the projects I'm attached to. I run a lot of various CPU and NVIDIA projects on this machine.
Note 4: Further information about work_fetch can be found in this slightly outdated, but highly useful, document: http://boinc.berkeley.edu/trac/wiki/ClientSched

------------------------------
A cycle of my work fetch:
------------------------------

1/15/2014 12:09:59 PM | | [work_fetch] entering choose_project()

1/15/2014 12:09:59 PM | | [work_fetch] ------- start work fetch state -------
1/15/2014 12:09:59 PM | | [work_fetch] target work buffer: 8640.00 + 43200.00 sec

1/15/2014 12:09:59 PM | | [work_fetch] --- project states ---
1/15/2014 12:09:59 PM | DrugDiscovery | [work_fetch] REC 0.000 prio -0.000000 can req work
1/15/2014 12:09:59 PM | The Lattice Project | [work_fetch] REC 0.000 prio -0.000000 can req work
1/15/2014 12:09:59 PM | superlinkattechnion | [work_fetch] REC 0.126 prio -0.000000 can't req work: master URL fetch pending (backoff: 43700.11 sec)
1/15/2014 12:09:59 PM | pogs | [work_fetch] REC 0.000 prio -0.000000 can't req work: "no new tasks" requested via Manager
1/15/2014 12:09:59 PM | Quake-Catcher Network | [work_fetch] REC 0.000 prio 0.000000 can't req work: non CPU intensive
1/15/2014 12:09:59 PM | ralph@home | [work_fetch] REC 0.000 prio -0.000000 can req work
1/15/2014 12:09:59 PM | DNA@Home | [work_fetch] REC 0.000 prio -0.000000 can req work
1/15/2014 12:09:59 PM | correlizer | [work_fetch] REC 0.014 prio -0.000000 can req work
1/15/2014 12:09:59 PM | WUProp@Home | [work_fetch] REC 0.014 prio -0.000002 can't req work: non CPU intensive
1/15/2014 12:09:59 PM | MindModeling@Beta | [work_fetch] REC 110.106 prio -0.002449 can req work
1/15/2014 12:09:59 PM | LHC@home 1.0 | [work_fetch] REC 221.382 prio -0.004923 can req work
1/15/2014 12:09:59 PM | Test4Theory@Home | [work_fetch] REC 221.453 prio -0.004925 can req work
1/15/2014 12:09:59 PM | boincsimap | [work_fetch] REC 248.218 prio -0.005520 can req work
1/15/2014 12:09:59 PM | World Community Grid | [work_fetch] REC 1063.070 prio -0.006005 can req work
1/15/2014 12:09:59 PM | rosetta@home | [work_fetch] REC 287.694 prio -0.007551 can req work
1/15/2014 12:09:59 PM | climateprediction.net | [work_fetch] REC 304.466 prio -0.007763 can req work
1/15/2014 12:09:59 PM | Cosmology@Home | [work_fetch] REC 306.047 prio -0.010929 can req work
1/15/2014 12:09:59 PM | Docking | [work_fetch] REC 288.875 prio -0.011575 can req work
1/15/2014 12:09:59 PM | Poem@Home | [work_fetch] REC 590.890 prio -0.013141 can req work
1/15/2014 12:09:59 PM | climateathome | [work_fetch] REC 251.853 prio -0.022648 can req work
1/15/2014 12:09:59 PM | Milkyway@Home | [work_fetch] REC 229.825 prio -0.058248 can req work
1/15/2014 12:09:59 PM | RNA World | [work_fetch] REC 446.313 prio -0.070412 can req work
1/15/2014 12:09:59 PM | GPUGRID | [work_fetch] REC 662565.023 prio -14.773922 can req work
1/15/2014 12:09:59 PM | SETI@home | [work_fetch] REC 76065.475 prio -169.162250 can req work
1/15/2014 12:09:59 PM | Einstein@Home | [work_fetch] REC 80649.354 prio -179.356354 can req work
1/15/2014 12:09:59 PM | SETI@home Beta Test | [work_fetch] REC 80680.357 prio -179.425301 can req work
1/15/2014 12:09:59 PM | Albert@Home | [work_fetch] REC 86519.931 prio -192.412500 can req work

1/15/2014 12:09:59 PM | | [work_fetch] --- state for CPU ---
1/15/2014 12:09:59 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 302655.24 busy 0.00
1/15/2014 12:09:59 PM | DrugDiscovery | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | The Lattice Project | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | superlinkattechnion | [work_fetch] fetch share 0.000
1/15/2014 12:09:59 PM | pogs | [work_fetch] fetch share 0.000
1/15/2014 12:09:59 PM | ralph@home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | DNA@Home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | correlizer | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | MindModeling@Beta | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | LHC@home 1.0 | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | Test4Theory@Home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | boincsimap | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | World Community Grid | [work_fetch] fetch share 0.190
1/15/2014 12:09:59 PM | rosetta@home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | climateprediction.net | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | Cosmology@Home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | Docking | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | Poem@Home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | climateathome | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | Milkyway@Home | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | RNA World | [work_fetch] fetch share 0.048
1/15/2014 12:09:59 PM | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | SETI@home | [work_fetch] fetch share 0.000
1/15/2014 12:09:59 PM | Einstein@Home | [work_fetch] fetch share 0.000
1/15/2014 12:09:59 PM | SETI@home Beta Test | [work_fetch] fetch share 0.000
1/15/2014 12:09:59 PM | Albert@Home | [work_fetch] fetch share 0.000

1/15/2014 12:09:59 PM | | [work_fetch] --- state for NVIDIA ---
1/15/2014 12:09:59 PM | | [work_fetch] shortfall 68058.62 nidle 0.00 saturated 8942.84 busy 0.00
1/15/2014 12:09:59 PM | DrugDiscovery | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | The Lattice Project | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | superlinkattechnion | [work_fetch] fetch share 0.000
1/15/2014 12:09:59 PM | pogs | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | ralph@home | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | DNA@Home | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | correlizer | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | MindModeling@Beta | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | LHC@home 1.0 | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | Test4Theory@Home | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | boincsimap | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | World Community Grid | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | rosetta@home | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | climateprediction.net | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | Cosmology@Home | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | Docking | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | Poem@Home | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | climateathome | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | Milkyway@Home | [work_fetch] fetch share 0.000 (blocked by configuration file)
1/15/2014 12:09:59 PM | RNA World | [work_fetch] fetch share 0.000 (no apps)
1/15/2014 12:09:59 PM | GPUGRID | [work_fetch] fetch share 0.111
1/15/2014 12:09:59 PM | SETI@home | [work_fetch] fetch share 0.001
1/15/2014 12:09:59 PM | Einstein@Home | [work_fetch] fetch share 0.001
1/15/2014 12:09:59 PM | SETI@home Beta Test | [work_fetch] fetch share 0.001
1/15/2014 12:09:59 PM | Albert@Home | [work_fetch] fetch share 0.001

1/15/2014 12:09:59 PM | | [work_fetch] ------- end work fetch state -------
1/15/2014 12:09:59 PM | | [work_fetch] No project chosen for work fetch

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34757 - Posted: 22 Jan 2014 | 7:58:47 UTC

Just had an early download; eight CPUs and two GPUs busy and no sign of a GPUGrid WU stopping:

22/01/2014 08:07:49 | GPUGRID | [work_fetch] fetch share 1.000
22/01/2014 08:07:49 | | [work_fetch] ------- end work fetch state -------
22/01/2014 08:07:49 | | [work_fetch] No project chosen for work fetch
22/01/2014 08:08:29 | | [work_fetch] Request work fetch: application exited
22/01/2014 08:08:29 | GPUGRID | [work_fetch] REC 272425.128 prio -2.579669 can req work
22/01/2014 08:08:29 | | [work_fetch] --- state for CPU ---
22/01/2014 08:08:29 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2030.85 busy 0.00
22/01/2014 08:08:29 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
22/01/2014 08:08:29 | | [work_fetch] --- state for NVIDIA ---
22/01/2014 08:08:29 | | [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00
22/01/2014 08:08:29 | GPUGRID | [work_fetch] fetch share 0.000
22/01/2014 08:08:29 | | [work_fetch] ------- end work fetch state -------
22/01/2014 08:08:29 | GPUGRID | [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000
22/01/2014 08:08:29 | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst)
22/01/2014 08:08:29 | GPUGRID | Sending scheduler request: To fetch work.
22/01/2014 08:08:29 | GPUGRID | Requesting new tasks for NVIDIA
22/01/2014 08:08:32 | GPUGRID | Scheduler request completed: got 1 new tasks
22/01/2014 08:08:32 | | [work_fetch] Request work fetch: RPC complete
22/01/2014 08:08:34 | GPUGRID | Started download of 72x-SANTI_MAR420cap310-29-LICENSE

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34764 - Posted: 22 Jan 2014 | 14:04:15 UTC

Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct?

if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,086,851
RAC: 8,770,537
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34765 - Posted: 22 Jan 2014 | 15:52:52 UTC - in response to Message 34764.

Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct?

if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call.

It might be possible to work out what happened from the message log entries immediately before and after the section Tomba posted. Did a task restart, for example?

The trouble with the extra log flags is that you can't use them retrospectively to diagnose a problem which has already happened - you have to set them anyway, and wait for it to happen again.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34769 - Posted: 22 Jan 2014 | 17:38:45 UTC - in response to Message 34764.

Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct?

As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes.

if you think it was incorrect, then the next steps might be to turn on cpu_sched and coproc_debug, to "prove" that work fetch made a bad call.

OK. I'll check 'em out.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,086,851
RAC: 8,770,537
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34771 - Posted: 22 Jan 2014 | 18:27:56 UTC - in response to Message 34769.

Work fetch asked GPUGRID for NVIDIA work because it detected 1 idle NVIDIA instance. Is it possible that it was correct?

As I said in my post "no sign of a GPUGrid WU stopping" and I did check back for 30 minutes.

Unfortunately, a task stopping isn't necessarily logged with normal settings - I think you'd need to add <task_debug> to be sure of seeing that.

But you should see the restart afterwards, in the normal logs (I think).

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34793 - Posted: 23 Jan 2014 | 18:11:30 UTC

We start with two WUs confirmed running followed by a couple of nidles of 0.

Then - oops - only one WU is running.

One more 0 nidle then a 1 nidle, and one more 0 nidle!!

Then we have a 1 "nidle_now", followed by an early WU fetch....

23/01/2014 12:17:43 | GPUGRID | [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0
23/01/2014 12:17:43 | GPUGRID | [coproc] NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0

23/01/2014 12:18:10 | | [work_fetch] entering choose_project()
23/01/2014 12:18:10 | | [work_fetch] ------- start work fetch state -------
23/01/2014 12:18:10 | | [work_fetch] target work buffer: 180.00 + 864.00 sec
23/01/2014 12:18:10 | | [work_fetch] --- project states ---
23/01/2014 12:18:10 | GPUGRID | [work_fetch] REC 275978.848 prio -3.498618 can req work
23/01/2014 12:18:10 | | [work_fetch] --- state for CPU ---
23/01/2014 12:18:10 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2568.97 busy 0.00
23/01/2014 12:18:10 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
23/01/2014 12:18:10 | | [work_fetch] --- state for NVIDIA ---
23/01/2014 12:18:10 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 4775.59 busy 0.00
23/01/2014 12:18:10 | GPUGRID | [work_fetch] fetch share 0.500
23/01/2014 12:18:10 | | [work_fetch] ------- end work fetch state -------
23/01/2014 12:18:10 | | [work_fetch] No project chosen for work fetch
23/01/2014 12:18:13 | | [work_fetch] Request work fetch: application exited
23/01/2014 12:18:13 | GPUGRID | [coproc] NVIDIA instance 0: confirming for I91R1-NATHAN_KIDc22_full2-3-10-RND2112_0
23/01/2014 12:18:15 | | [work_fetch] entering choose_project()
23/01/2014 12:18:15 | | [work_fetch] ------- start work fetch state -------
23/01/2014 12:18:15 | | [work_fetch] target work buffer: 180.00 + 864.00 sec
23/01/2014 12:18:15 | | [work_fetch] --- project states ---
23/01/2014 12:18:15 | GPUGRID | [work_fetch] REC 275979.860 prio -3.498092 can req work
23/01/2014 12:18:15 | | [work_fetch] --- state for CPU ---
23/01/2014 12:18:15 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2560.84 busy 0.00
23/01/2014 12:18:15 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
23/01/2014 12:18:15 | | [work_fetch] --- state for NVIDIA ---
23/01/2014 12:18:15 | | [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00
23/01/2014 12:18:15 | GPUGRID | [work_fetch] fetch share 0.000
23/01/2014 12:18:15 | | [work_fetch] ------- end work fetch state -------
23/01/2014 12:18:19 | | [work_fetch] Request work fetch: RPC complete
23/01/2014 12:18:24 | | [work_fetch] entering choose_project()
23/01/2014 12:18:24 | | [work_fetch] ------- start work fetch state -------
23/01/2014 12:18:24 | | [work_fetch] target work buffer: 180.00 + 864.00 sec
23/01/2014 12:18:24 | | [work_fetch] --- project states ---
23/01/2014 12:18:24 | GPUGRID | [work_fetch] REC 275979.860 prio -2.505735 can req work
23/01/2014 12:18:24 | | [work_fetch] --- state for CPU ---
23/01/2014 12:18:24 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2546.51 busy 0.00
23/01/2014 12:18:24 | GPUGRID | [work_fetch] fetch share 0.000 (no apps)
23/01/2014 12:18:24 | | [work_fetch] --- state for NVIDIA ---
23/01/2014 12:18:24 | | [work_fetch] shortfall 1044.00 nidle 1.00 saturated 0.00 busy 0.00
23/01/2014 12:18:24 | GPUGRID | [work_fetch] fetch share 0.000
23/01/2014 12:18:24 | | [work_fetch] ------- end work fetch state -------
23/01/2014 12:18:24 | GPUGRID | [work_fetch] set_request() for NVIDIA: ninst 2 nused_total 1.000000 nidle_now 1.000000 fetch share 0.000000 req_inst 1.000000 req_secs 1044.000000
23/01/2014 12:18:24 | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (1044.00 sec, 1.00 inst)
23/01/2014 12:18:24 | GPUGRID | Sending scheduler request: To fetch work.
23/01/2014 12:18:24 | GPUGRID | Requesting new tasks for NVIDIA
23/01/2014 12:18:27 | GPUGRID | Scheduler request completed: got 1 new tasks
23/01/2014 12:18:27 | | [work_fetch] Request work fetch: RPC complete
23/01/2014 12:18:29 | GPUGRID | Started download of 98x-SANTI_MARwtcap310-30-LICENSE

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34794 - Posted: 23 Jan 2014 | 18:40:43 UTC

23/01/2014 12:18:13 | | [work_fetch] Request work fetch: application exited

Any idea what application exited, causing the work fetch request?
Also, are you using CPU Throttling (The "Use at most X% CPU Time" setting)?
Also, can you please include the first messages at the beginning of the event log, so we can see what version you are using?

I agree this looks a bit suspicious, but it sounds like a GPU task got unloaded, and work fetch decided to fill an idle spot, even if the timing isn't exactly perfect.


Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,086,851
RAC: 8,770,537
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34795 - Posted: 23 Jan 2014 | 19:12:50 UTC - in response to Message 34793.

I think that log sequence is pretty definitive. There are two sets of nidle:

The --- state for CPU --- remains at zero throughout. No problems there.

The --- state for NVIDIA --- starts at 0, jumps to 1, and then drops to 0 again.

At the point of the jump, we can see

23/01/2014 12:18:13 | | [work_fetch] Request work fetch: application exited

and NVIDIA instance 1: confirming for 75x-SANTI_MARwtcap310-22-32-RND0081_0 disappears from the record.

That's result 7689115, which you can see has a pause in the middle:

# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 6109000)

I imagine that if you look a bit further down, you'd see, perhaps first a 'restarting' entry for 75x-SANTI_MARwtcap310-22-32-RND0081_0, and then two task instances being confirmed again at each [coproc] step.

The good news is that 75x-SANTI_MARwtcap310-22-32-RND0081_0 completed successfully and validated, despite the pause in the middle.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34796 - Posted: 23 Jan 2014 | 19:16:35 UTC - in response to Message 34795.

And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,599,086,851
RAC: 8,770,537
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34797 - Posted: 23 Jan 2014 | 19:51:32 UTC - in response to Message 34796.

And so, for that brief pause, work fetch correctly tried to fill an idle GPU, right?

That's my guess. And I'm also guessing that BOINC restarted the missing 75x-SANTI_MARwtcap310-22-32-RND0081_0 (allowing it to run to completion and report success), before the file downloads for the replacement - probably result 7690601 - had completed and allowed it to be started on the idle GPU.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34799 - Posted: 23 Jan 2014 | 21:55:29 UTC - in response to Message 34797.

Makes sense to me. Thanks for the lesson in debug message interpretation, Richard and Jacob, I swear I'll get it eventually. So what's causing the simulation to become unstable and pause to catch its breath? Clocks too high?

Shouldn't the client recognize the pause as a temporary suspend and not request more work?

____________
BOINC <<--- credit whores, pedants, alien hunters

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 34802 - Posted: 24 Jan 2014 | 10:11:27 UTC

Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that?
It rings a bell.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 34803 - Posted: 24 Jan 2014 | 10:29:50 UTC - in response to Message 34802.


Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that?



Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks.

In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt.

MJH

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34814 - Posted: 25 Jan 2014 | 13:55:44 UTC - in response to Message 34803.


Is this maybe the pause time that Matt implemented in the latest versions that after a crash it pauses to avoid consecutive crashes or something like that?

Quite likely. If the task crashes having made some progress, it will be restarted after a delay of at least 60 sec. BOINC will try and start some other work during this idle time, which may involve downloading another tasks.

In the worst case, a very unreliable machine may be continuously cycling between two or three tasks, each time making just enough progress to merit a restart attempt.

MJH

Exactly that happened when my Gigabyte GTX 780Ti OC was unreliable.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34829 - Posted: 26 Jan 2014 | 21:20:56 UTC - in response to Message 34814.
Last modified: 26 Jan 2014 | 21:28:42 UTC

While that might be the case now (compare Boinc logs to WU logs), Tomba reported this problem before 8.15 was being used in the Long queue. That said, under the 8.14app the task would have exited anyway with a driver restart. Just how well it did this is unknown and might have been an issue, but you are not going to see that again. Anyway, during one of many possible driver restarts (which I presume Boinc doesn't know anything about) Boinc probably asked for another task. Boinc isn't as nippy at asking for work these days (lazy or considered you decide), but with say 7 driver restarts there is a strong chance it did on at least one occasion.

While the new app asks that a problem task is not immediately run again on the same GPU, and seems to prefer to try to run the task on another GPU (if available), I'm not aware of the method of doing this. If it transpires that another task is still being requested it might be better to allow the GPU in question 2 or 3min to cool down (literally in many cases) without running another WU. Even if it takes that long to download another task, it's still unwanted and unnecessary. Although we cannot suspend a specific GPU through Boinc's GUI, it's possible to edit the cc_config file (and tell Boinc to read it), so you could tell Boinc to not use a specific GPU, and then re-edit (and re-read) the cc_config file again (after time=t).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Number crunching : Early WU Downloads

//