Advanced search

Message boards : Number crunching : GPU initialization failure should not abort with client error

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,386,728,882
RAC: 1,236,810
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3676 - Posted: 5 Nov 2008 | 13:13:51 UTC

Using 6.3.21 and Nvidia 9800gtx+ on gpugrid.net. If the GPU fails to initialize a client error is reported and within minutes BOINC will run thru the entire daily quota for that project. With no WU's available for 24 hours it is difficult to debug the problem. This happened twice for me
http://tinyurl.com/57wf53 and I was lucky to spot the problem before the failed workunits were deleted by the boinc manager. Thanks to BOINCVIEW.

I recently started using a GPU and have a cooling problem that I am working on. When the GPU freezes (this is my guess and IANAE) a hard reboot is required. Issueing a reboot thru remote desktop or a restart from the VISTA start button is not capable of clearing whatever caused the GPU to lock up. In the mean time, the motherboard CPU is still running, it reports a client error to the project, gets another WU, attempts the initialize which fails and quickly goes thru all the available work units till the quota limit is reached.

This is what the error looks like:
# Using CUDA device 0
Cuda error in file 'deviceQuery.cu' in line 59 : initialization error

if the GPU is not locked up it should look like this
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"


There is another deviceQuery error: out of memory
I suspect that error is also handled incorrectly.

..thanks..

ps: I also posted this to the BOINC forum "server"

Post to thread

Message boards : Number crunching : GPU initialization failure should not abort with client error

//