Advanced search

Message boards : Graphics cards (GPUs) : initialization errors continue to flush work queue

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4240 - Posted: 10 Dec 2008 | 17:34:51 UTC

Got another initialization error, this time after about 2-3 weeks of processing WU's with not a hiccup. Once the initialization error occurs my 9800gtx+ can no longer process WU's and quickly runs thru all the WU's in the queue and within a hour all the available ones for the day. ie: they download and quickly get a compute error and this keeps up until the daily quota is hit. This then repeats till I get around to noticing the problem and cycling the power off and on. I mentioned this several weeks ago and even posted in the CUDA forum for help on how to reset the nvidia board without having to do a power off. The suggestion on the CUDA forum was the graphics board had a hard lockup and needed a power off.

Anyway, it would be nice if the next version of BOINC or the gpugrid app would handle an initialization error by stopping the gpu processing till the nvidia board responded. I am using 6.4.1 and will try 6.4.4 as I see it is out.

This board, a 9800gtx+, is not used for any gameing.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4242 - Posted: 10 Dec 2008 | 18:34:56 UTC - in response to Message 4240.

Sorry, can't help you with your problem. But while taking a look at your results I noticed that you have an extremly low cpu usage (for windows) but still (very) good GPU times. How is this possible? (Compare e.g. with my machine, which is somewhat similar)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4243 - Posted: 11 Dec 2008 | 3:04:07 UTC - in response to Message 4242.

Hi ETA!

I am positive that your cores are set for 3+1 and I am 4+1 so my CPU time is very low since I rarely get a time slice but the GPU keeps crunching. Look here for more detailed info including statistics comparing my GPU system to your GPU system. I assume you have only 1 nvidia system as my stats program averages all GPUGRID together for the user id.

http://swri.info/images/gpu_compare.png

You have an inordinately large variation in your ms/step and elapsed time compared to mine. However, my system runs 24/7 and is not gaming so possibly that explains why my StdDev is very low.

I do not see anyone else complaining about initialization failures which concerns me.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4257 - Posted: 11 Dec 2008 | 20:33:36 UTC

My CPU time is now up to where yours were. This seems to have been a problem with 6.4.1. The first WU I returned with 6.4.5 is now up at 28,029 seconds so the problem was in boinc mgr.

I checked your gpu stats and you did have a 8800 gpu prior to your 9800gtx+ so that accounts for the variation in statistics as I should not have summed them all up.

One of the BOINC developers is going to look into checking to see if the GPU is alive before sending a job. This was after I reposted my problem to the boinc core forum.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4437 - Posted: 17 Dec 2008 | 20:03:51 UTC

Hi BeemerBiker,

thanks for your interesting feedback! I was running 4+1 previously and switched to 3+1 some time ago, which increased CPU time a bit (20000 s -> 30000 s) and improved WU times (55 ms/step -> 49 ms/step). The large variation in some of my times occurs when I let the GPU crunch away while I play Civ 4 ;)
And I had a 8800? Uh missed that one.. :D No seriously, I switched straight from a ATI 1950Pro to the 9800GTX+.. maybe the early drivers had problems recognizing it.

And your results are still very interesting. You say it was probably a bug in 6.4.1, but I'd say that what you had is actually the ideal behaviour, the one we're looking for since the beginning of the open beta! Low cpu usage in 4+1 config combined with high (maximum?) GPU speed.

Is something about your software config so much better, that it just works(ed) as expected? Maybe you have some fancy patch applied to the Vista kernel / scheduler? Or Vista is measuring the GPU time in a different way?

I also noticed that your CPU times increases just when you switched from app 6.52 to 6.53 .. which seems highly related. Just strange that 6.52 didn't behave this way on other machines. Well, not that I know!

Regards,
MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : initialization errors continue to flush work queue

//