Advanced search

Message boards : Graphics cards (GPUs) : ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme....

Author Message
Neil A
Send message
Joined: 9 Oct 08
Posts: 50
Credit: 12,676,739
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9110 - Posted: 29 Apr 2009 | 12:28:39 UTC
Last modified: 29 Apr 2009 | 12:34:43 UTC

Hello All,

I have been struggling with quite a few GPU WU failures over the past weeks and am not sure what they are. There are a number of failure scenarios, but one common one is included below.

I run a Q9550 quad core on a EVGA 790i Ultra mobo with 2x GTX 260 Core 216's with some overclocking applied. I have been successful for quite a while with the overclock... running around 650 Mhz and linked with Shader. What also hasn't worked for a while is the EVGA GPU Voltage Tuner..which they broke with the 182 series drivers and up. I have used it to help stabilize the card and GPU WU's in the past while it was working. I am currently running a 185.68 driver. Any thoughts on what I can do or check would be appreciated.

The C: drive file mentioned below is NOT on my hard drive so must have been compiled in with the GPU WU or Nvidia driver??


<core_client_version>6.6.23</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 1
# Device 0: "GeForce GTX 260"
# Clock rate: 1458000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Device 1: "GeForce GTX 260"
# Clock rate: 799200 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)
called boinc_finish

</stderr_txt>
]]>


Thanks.
Neil
____________
Crunching for the benefit of humanity and in memory of my dad and other family members.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9119 - Posted: 29 Apr 2009 | 19:50:09 UTC - in response to Message 9110.

Do I understand you correctly: you used the EVGA tool to increase you GPU voltage to stabilize your OC? In that case you'll probably loose stability without the voltage bump. The maximum stable clock frequency of chips is approximately proportional to the voltage over small voltage ranges.

Another possible factor is temperature: here and I guess also in Canada summer's coming. The higher the temperature the smaller the maximum stable frequency will be.

So I suggest to back off you OC by a substantial margin and see if you're stable again. ~50MHz on the core should do the trick. You could also run some stability tests, maybe a 1h loop of 3D Mark 06 and / or FurMark.

MrS
____________
Scanning for our furry friends since Jan 2002

Neil A
Send message
Joined: 9 Oct 08
Posts: 50
Credit: 12,676,739
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9129 - Posted: 29 Apr 2009 | 22:06:59 UTC

Thanks ET. I have backed off to 181.22 driver and started GPU voltage tuner and bumped up the default voltage about 50 mv. I am waiting for GPU WU's to download and I'll track my progress and report back.

What I am still interested in is what the heck is the c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu? part of the message.

Neil
____________
Crunching for the benefit of humanity and in memory of my dad and other family members.

Profile Crunch3r
Send message
Joined: 16 Mar 09
Posts: 3
Credit: 207,697,314
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 9130 - Posted: 29 Apr 2009 | 22:23:33 UTC - in response to Message 9129.


What I am still interested in is what the heck is the c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu? part of the message.

Neil


That's just for debugging purposes. When the app was compiled,some debug info was generated by the compiler to make it easier for the developer to see where exactly in the source code it crashed.

It simply says that the crash occurred in "CPME_cufft.cu" and that this file is located in "c:\cygwin\home\speechserver\gpumd2\src\pme\" on the developers machine, NOT yours.

Anyway, i'd be interested to hear why they use cygwin/gcc instead of VS to compile the app...



Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9134 - Posted: 30 Apr 2009 | 3:55:29 UTC - in response to Message 9130.

Anyway, i'd be interested to hear why they use cygwin/gcc instead of VS to compile the app...

Cheaper license ...

Neil A
Send message
Joined: 9 Oct 08
Posts: 50
Credit: 12,676,739
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9170 - Posted: 1 May 2009 | 2:44:59 UTC

I've backed off to a 181.xx driver and EVGA GPU Voltage Tuner works. I've tweaked the voltage up a little and things are looking promising. The last 5 or so work units completed successfully between my 2 GTX 260's.... I'll report again by the weekend. Looks like it was probably a GPU voltage issue.


____________
Crunching for the benefit of humanity and in memory of my dad and other family members.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9172 - Posted: 1 May 2009 | 9:54:26 UTC - in response to Message 9170.

What's the temperature of your cards?

MrS
____________
Scanning for our furry friends since Jan 2002

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9295 - Posted: 4 May 2009 | 11:56:51 UTC

I got this error in this wu.
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)


And this got this error:
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : the launch timed out and was terminated.


And a third got this error:
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)

They were run on a GTX260+. Would have been running 185.81 drivers. Cards aren't OC'ed. No idea about temperatures as they seem to have gotten rid of the fan control option from vtune. Cards seem happy crunching Seti cuda work.

Probably stuffed (beta) drivers, so i'll go back to 182.50 drivers.
____________
BOINC blog

Neil A
Send message
Joined: 9 Oct 08
Posts: 50
Credit: 12,676,739
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 9317 - Posted: 5 May 2009 | 2:24:30 UTC

I've been very successful since backing off to 181.xx drivers and running EVGA voltage tuner. As MarkJ suggest above, I got these same errors before I downgraded my driver and upped by GPU card voltage slightly (about 50 mv). Now I'm running very well on that box. Mark, you might try an experiment and try the same thing.

I'll check card temperatures next time (2xGTX 260 Core 216 Superclocked running at around 665 Mhz), but they typically run in the high 60's to high 70's depending on temperature in the room which can vary quite a bit. I have the fans set on auto using Precision 1.7.1.


____________
Crunching for the benefit of humanity and in memory of my dad and other family members.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9343 - Posted: 5 May 2009 | 20:57:29 UTC - in response to Message 9317.

Neil,

with temperatures in the high 70's I wouldn't want to increase GPU voltage. At mid 60's I, for myself, could justify it (but wouldn't do so myself). However, there is no hard limit in this range: it's simply the less the better. And my "threashold temperatures" are purely subjective.. so your mileage can and likely will vary ;)

Mark,

recently you had really many errors with 0s cpu time, i.e. the WU did not even start. This points to a software problem. Since yesterday you seem to be going fine, did you change anything, e.g. downgrade the driver?

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme....

//