Advanced search

Message boards : Graphics cards (GPUs) : GPU run failures

Author Message
Reddogg
Send message
Joined: 14 Dec 08
Posts: 2
Credit: 7,522,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13254 - Posted: 22 Oct 2009 | 16:32:10 UTC

Greetings,
in the last days the failure rates of GPUGRID using is increasing. Here are some failure reports:
<core_client_version>6.10.13</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 8800 GTS 512"
# Clock rate: 1.73 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 8800 GTS 512"
# Clock rate: 1.73 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [geomhash_kernel] failed in file 'gridcell.cu' in line 209 : unknown error.

</stderr_txt>
]]>

another one:
<core_client_version>6.10.13</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 8800 GTS 512"
# Clock rate: 1.73 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: cufftExecR2C (gridcalc1)
called boinc_finish

</stderr_txt>
]]>

and another too:
<core_client_version>6.10.13</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce 8800 GTS 512"
# Clock rate: 1.73 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [pme_fill_charges_grid_kernel] failed in file 'fillcharges.cu' in line 55 : unknown error.

</stderr_txt>
]]>

I think the most of this failures I get when I started a HD-Video, Games (it is irrelevant if I started the game when boinc is running or not).
Can anyone help me please to minimize the failure rates.
It is very horrible if u running GPUGRID for 10 hours and it fails.

Thank you for all your hints.

Regards,
Reddogg

Profile Jet
Send message
Joined: 14 Jun 09
Posts: 25
Credit: 5,835,455
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 13255 - Posted: 22 Oct 2009 | 17:27:53 UTC - in response to Message 13254.

Probably, your card is overclocked ? According to official nVidia specs, you card should run with 1,625 GHz shader clock, if it isn't some kind of a special vendors edition, factory OC'ed. Try to get the core temp reads, as well. Check the overall temp in PC case. Overheating with or due overclocking could result simultaneous & unpredictable calculation errors.

Reddogg
Send message
Joined: 14 Dec 08
Posts: 2
Credit: 7,522,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13257 - Posted: 22 Oct 2009 | 17:56:10 UTC

Hi,
the pc temperature is ok, the card is an AMP! Edition so it is factory-overclocked. But I think it is not the reason, because weeks ago under Win XP 32bit/Win 7 64bit GPUGRId runs very well, it is a lately problem.
That's the reason I put some problems in the thread here. Because it seems that are different failure's?

Dave_In_Oz
Send message
Joined: 13 Jul 09
Posts: 32
Credit: 287,042,950
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13281 - Posted: 26 Oct 2009 | 13:49:58 UTC

Hi,

I recently saw that I had an increasingly high WU failure rate on my i7 system with a GT295 card.

I reset the project and found I was still getting about an 80% failure, with both concurrent WU's failing at same time.

I had recently upgraded the NVIDIA driver to the latest offered. I backed out this upgrade and it appears the system is now running happily.

Dave

Pwa O_o
Send message
Joined: 23 Sep 09
Posts: 5
Credit: 9,089,039
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13683 - Posted: 24 Nov 2009 | 14:33:02 UTC - in response to Message 13281.

Hi,
I am also experiencing similar problems with my new GTX295
http://www.gpugrid.net/result.php?resultid=1547063

I have the lastest NVIDIA driver, so that may be the problem? Dave, could you please provide driver version information so that I may install the same ones?

Regards

Post to thread

Message boards : Graphics cards (GPUs) : GPU run failures

//