Advanced search

Message boards : Graphics cards (GPUs) : NO Work

Author Message
Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9911 - Posted: 17 May 2009 | 15:13:56 UTC

Just as bad as tasks that die are tasks that are not available ...

Very depressing ...

Out of work ...

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9914 - Posted: 17 May 2009 | 16:47:29 UTC

Went through that earlier today on a 2x gpu rig. Struggled with update for a bit and finally got some more work. Right now update isn't being cooperative and no WUs in cache on that rig......

The other 3 seem to be doing ok work-wise...

All are running 6.6.28

naja002
Avatar
Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9917 - Posted: 17 May 2009 | 17:59:12 UTC

Now a different 1x gpu rig is out of work.

No work sent
Blah, blah, blah not available for your type of computer

I checked the long-term debt and it's zero.....

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9924 - Posted: 17 May 2009 | 20:01:30 UTC

Well, at the time I posted the opener there was literally no work in the queue. Now there is 157, get 'em while they are hot ...

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9947 - Posted: 18 May 2009 | 19:44:55 UTC
Last modified: 18 May 2009 | 19:45:56 UTC

None of my machines are able to DL work today. The message log reads:

GPUGRID 5/18/2009 1:31:06 PM Sending scheduler request: To fetch work.
GPUGRID 5/18/2009 1:31:06 PM Requesting new tasks
GPUGRID 5/18/2009 1:31:12 PM Scheduler request completed: got 0 new tasks
GPUGRID 5/18/2009 1:31:12 PM Message from server: No work sent
GPUGRID 5/18/2009 1:31:12 PM Message from server: CUDA app exists for Full-atom molecular dynamics but no CUDA work requested
GPUGRID 5/18/2009 1:31:12 PM Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

Using BOINC v6.6.24 & v6.6.28, 2 9600 GSO cards and 1 GTX 260.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9951 - Posted: 18 May 2009 | 20:08:28 UTC - in response to Message 9947.

None of my machines are able to DL work today. The message log reads:

GPUGRID 5/18/2009 1:31:06 PM Sending scheduler request: To fetch work.
GPUGRID 5/18/2009 1:31:06 PM Requesting new tasks
GPUGRID 5/18/2009 1:31:12 PM Scheduler request completed: got 0 new tasks
GPUGRID 5/18/2009 1:31:12 PM Message from server: No work sent
GPUGRID 5/18/2009 1:31:12 PM Message from server: CUDA app exists for Full-atom molecular dynamics but no CUDA work requested
GPUGRID 5/18/2009 1:31:12 PM Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

Using BOINC v6.6.24 & v6.6.28, 2 9600 GSO cards and 1 GTX 260.

YOu got hit with the work fetch bug. Two choices, reset the project or reset debts.

Resetting the project may or may not be enough to pull you out. THe problem is that the LTD are out of whack and you are not asking for CUDA work from GPU Grid.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9952 - Posted: 18 May 2009 | 20:11:00 UTC - in response to Message 9951.
Last modified: 18 May 2009 | 20:58:00 UTC

YOu got hit with the work fetch bug. Two choices, reset the project or reset debts.

Thanks Paul! How do I reset debts in v6.6.28?

Edit: I stopped the client and saw that the CUDA_debt in client_state.xml was at a huge value (long tern debt was already at 0). Set CUDA_debt to 0.000000 and the client immediately requested & received Wus upon restart. Fixed for now, but v6.6.28 needs work as you've been saying.
Thanks again!

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9954 - Posted: 18 May 2009 | 21:06:55 UTC - in response to Message 9951.

You got hit with the work fetch bug. Two choices, reset the project or reset debts.


3rd choice: use 6.5.0 ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9956 - Posted: 18 May 2009 | 21:11:51 UTC - in response to Message 9954.

You got hit with the work fetch bug. Two choices, reset the project or reset debts.


3rd choice: use 6.5.0 ;)

Sheesh, it has it's own problems. I much prefer v6.6.23-v6.6.28.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9957 - Posted: 18 May 2009 | 22:05:41 UTC - in response to Message 9952.

YOu got hit with the work fetch bug. Two choices, reset the project or reset debts.

Thanks Paul! How do I reset debts in v6.6.28?

Edit: I stopped the client and saw that the CUDA_debt in client_state.xml was at a huge value (long tern debt was already at 0). Set CUDA_debt to 0.000000 and the client immediately requested & received Wus upon restart. Fixed for now, but v6.6.28 needs work as you've been saying.
Thanks again!

Use the flag in the cc_config file. Stop the attached client, exit BOINC Manager, restart with the flag set to "1" (One) then change the flag back to 0 (zero) ... you cannot jsut read the config file and have it work. (Sadly)

<cc_config>
<log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<zero_debts>1</zero_debts>
</options>
</cc_config>

You may want to, for safety sake leave the use all GPU flag set too ... just incase you add GPUs ...

With all KNOWN versions of 6.6.x you are going to get hit with the LTD bug, people tell me they are not affected ... until they are ... The faster and wider the system the faster the bug hits. I generally see it on my i7 with the 2 GTX295 cards about every 2-3 days ...

My NEW systems I am not sure what is going on there, but on one of them I cannot get it to queue work to save my life ... and it is nearly identical to the other I just built and that one has work queued ... sigh ...

Not that it has been doing any good, but I have been documenting a number of issues on the dev/alpha boards; some of them are long standing bugs that have kinda been hiding and though I had my suspicions for a long time it was not until last week that I was able to prove that one project's application ***CAN*** cause other project's tasks to fail ... a problem I predicted back in BOINC Beta, I was also told that it would never happen ... in a way I hate being right all the time .... :)

Ok, one side effect is that using the debt reset does fardle up all the shares so your long running balances will be probably out of kilter ... of course I have demonstrated to my satisfaction that the work fetch is hosed too ... but, cannot prove it with numbers yet. SOrry, guys, I am still looking at some death type issues and am hoping I may be able to get a lead on the long running task issue ... which was real bad in 6.6.20 but I am not sure it is fully gone ...

Rats, got wordy again ...

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 10370 - Posted: 1 Jun 2009 | 22:07:19 UTC - in response to Message 9957.

"you cannot jsut read the config file and have it work. (Sadly)"

Yes you can, fortunately, in the 6.4.x version. Because in that version the tag zero_debts does not work. I ran across this issue tonight and manually editing the client_state.xml solved it (for now). Thanks for that suggestion !
____________
Join team Bletchley Park, the innovators.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1947
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 10371 - Posted: 1 Jun 2009 | 22:54:35 UTC - in response to Message 10370.

More work uploaded.

gdf

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 10434 - Posted: 5 Jun 2009 | 21:47:08 UTC - in response to Message 10371.

What do the debts actually mean?
In my client_state config file I have:

<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-177104.643607</long_term_debt>
<cpu_backoff_interval>0.000000</cpu_backoff_interval>
<cpu_backoff_time>0.000000</cpu_backoff_time>
<cuda_debt>-54.294054</cuda_debt>
<cuda_backoff_interval>86400.000000</cuda_backoff_interval>
<cuda_backoff_time>1244256480.527597</cuda_backoff_time>

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10441 - Posted: 7 Jun 2009 | 10:27:17 UTC - in response to Message 10434.

What do the debts actually mean?
In my client_state config file I have:

<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-177104.643607</long_term_debt>
<cpu_backoff_interval>0.000000</cpu_backoff_interval>
<cpu_backoff_time>0.000000</cpu_backoff_time>
<cuda_debt>-54.294054</cuda_debt>
<cuda_backoff_interval>86400.000000</cuda_backoff_interval>
<cuda_backoff_time>1244256480.527597</cuda_backoff_time>


What they tell me is that your numbers are messed up ...

The last one is way out of line... do you have any work at all?

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 10444 - Posted: 7 Jun 2009 | 15:21:01 UTC - in response to Message 10441.

Hi Paul,

Yes, my GPU is always occupied (I have 2 CPU cores, 1 GPU), but the reason I posted was that at the time I read the thread, I only had work from GPUGrid, climateprediction and boincsimap, and hadn't seen any from rosetta, docking, spinhenge, uFluids for a while, despite their web pages reporting work. Now the former had higher resource share, but something wasn't right.

I now have work from all active projects... I notice that my long-term debt has gone down significantly, I now have short-term debt and the cuda debt has gone to 0.


<short_term_debt>-5148.575270</short_term_debt>
<long_term_debt>-27925.193119</long_term_debt>
<cpu_backoff_interval>0.000000</cpu_backoff_interval>
<cpu_backoff_time>0.000000</cpu_backoff_time>
<cuda_debt>0.000000</cuda_debt>
<cuda_backoff_interval>61440.000000</cuda_backoff_interval>
<cuda_backoff_time>1244409908.184854</cuda_backoff_time>


Do you know what these numbers actually mean Paul? And should I manually correct?

Thanks for your help ;)


p.s. you're right about the huge numbers; the CUDA backoff time is about a fortnight!

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10446 - Posted: 7 Jun 2009 | 23:33:00 UTC

If you are not fetching work I would stop BOINC and the client (use the advanced menu and make sure the science appliations are stopped), then change the back-off time to 0 ... the rest, for the moment I would not fiddle with if you don't have too ...

If you still have problems, then set all to zero ...

But, again, only do this if you have troubles getting work ...

For the information on the exact meaning, well the best I can do is this in that the powers that be change things without much thought of what it means and how it impacts the universe ... sadly, this page is out of date but I cannot tell you for sure what exactly is wrong or missing ...

Short term debt is meant to decide what to run next ... LTD, what to fetch next.

Sadly, the assumption is that letting the system freewheel is being tested and to this point the developers are resisting all notions that this might be a "bad thing" ... specifically, allowing GPU Grid to accumulate CPU values when there is no CPU work and not likely to be any for some time in the future ... just as if you look at Rosetta (if attached) you are likely to see it runnning up numbers for the GPU side ...

Personally, I think that is a bad design decision that is only going to get worse as we add computing types to the current three NCI, CPU, GPU (Nvida) ... where we already have definition problems and work fetch problems caused by the current non-design...

To see this, attach to FreeHAL as an NCI and it is likley that you will see times where you will not fetch work because the class is not separated from CPU resource shares ... a bad design error ...

b1
Send message
Joined: 17 Jan 09
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 10449 - Posted: 8 Jun 2009 | 9:10:35 UTC

Hello alltogether

Recently I upgraded my Computer with a 9600GT and wanted to join GPUGRID. Unfortunatley I can`t fetch any work units.I`m getting not the exact same error, as described above, however I gave your solutions a try, but without success.
Until now I have tested it with the 6.6.31 and the 6.4.5 version of BOINC.
Im am runnig BOINC on Ubuntu Linux 9.04 with the Nvidia Driver 185.18.08.
Starting Boinc(Version 6.4.5) I get this Output:

|Starting BOINC client version 6.4.5 for i686-pc-linux-gnu
|CUDA devices found
|Coprocessor: GeForce 9600 GT (1)
|Sending scheduler request: To fetch work. Requesting 86400 seconds of work, reporting 0 completed tasks
|Scheduler request completed: got 0 new tasks
|Message from server: No work sent
|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
|Message from server: Full-atom molecular dynamics is not available for your type of computer.

I have no idea what might be wrong. Or are there simply no WUs available at the moment.

Any help is greatly appreciated

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 51,279,371
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 10455 - Posted: 8 Jun 2009 | 17:03:40 UTC - in response to Message 10449.

Hello alltogether...
|Starting BOINC client version 6.4.5 for i686-pc-linux-gnu
...
I have no idea what might be wrong. Or are there simply no WUs available at the moment.

Any help is greatly appreciated


You're using the wrong architecture. There's no 32 bit GPUGRID Linux app. Only a 64 bit one...
____________

pixelicious.at - my little photoblog

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 10456 - Posted: 8 Jun 2009 | 19:41:16 UTC - in response to Message 10446.

Cheers for that link Paul.
Well it seems to be working fine now so I'll just leave it, as should be!

Post to thread

Message boards : Graphics cards (GPUs) : NO Work

//