Advanced search

Message boards : Number crunching : Application v8.15 (cuda60)

Author Message
Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35749 - Posted: 19 Mar 2014 | 22:05:51 UTC
Last modified: 19 Mar 2014 | 22:22:28 UTC

Rig 1: Win7_64 with GTX670 and GTX660Ti - Pass

Rig 2: Win7_64 with GTX480 - Fail (too many exists)

Rig 2: Win7_64 with GTX295 - Fail (too many exists) Yes, I still have one of the old versions that were 2 full cards and not just 2 GPUs on 1 card.

Beta app 8.15(cuda55) passed on all so I'm assuming (ha!) that it is cuda6.0 that's the issue. I'll downgrade the driver on Rig 2 to see if I can get back to work.

Installed 331.82 and requested work ... got this message in the event log:

3/19/2014 6:16:23 PM | GPUGRID | NVIDIA GPU: Upgrade to the latest driver to use all of this project's GPU applications

I don't think that's really what I want to do right now :-)
I did get a NOELIA long 8.15(cuda55) - and it is processing fine!
____________
Thanks - Steve

TheFiend
Send message
Joined: 26 Aug 11
Posts: 100
Credit: 2,557,052,477
RAC: 2,225,950
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35750 - Posted: 19 Mar 2014 | 22:42:48 UTC

I've also been suffering a lot of failures with app v8.15

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35752 - Posted: 20 Mar 2014 | 0:20:42 UTC - in response to Message 35750.

I've also been suffering a lot of failures with app v8.15

Your errors are being caused by your GPU getting too hot, turn the fans up to keep them cooler and you should be OK.
____________
Thanks - Steve

TheFiend
Send message
Joined: 26 Aug 11
Posts: 100
Credit: 2,557,052,477
RAC: 2,225,950
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35754 - Posted: 20 Mar 2014 | 2:28:44 UTC - in response to Message 35752.

I've also been suffering a lot of failures with app v8.15

Your errors are being caused by your GPU getting too hot, turn the fans up to keep them cooler and you should be OK.


Apart from 1 failure it's not the GPU's getting too hot.... They are in well ventilated cases. The one that came up with high temps was after a reboot when I failed to apply the fan speed after a reboot.

App8.15(CUDA60) units have all been failing on both my systems

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35755 - Posted: 20 Mar 2014 | 6:43:30 UTC - in response to Message 35754.
Last modified: 20 Mar 2014 | 7:04:30 UTC

Apart from 1 failure it's not the GPU's getting too hot.... They are in well ventilated cases. The one that came up with high temps was after a reboot when I failed to apply the fan speed after a reboot.

App8.15(CUDA60) units have all been failing on both my systems

Same here. I have three GTX 660s and one GTX 650 Ti, all below 70 C, stable and with no recent failures on CUDA 5.5 work. They have been failing all CUDA 6.0 work units; about 10 thus far. They are all on WinXP with the latest (335.23) drivers.

It looks like some GPU chips are more susceptible than others (both the GTX 660 and 650 Ti use the GK106). The good news is that they fail rapidly (0 seconds). I will try severely under-clocking some more; they have all been over-volted, under-clocked, etc. to be stable thus far, but you never know what new work will bring.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 158,710
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35762 - Posted: 20 Mar 2014 | 11:27:13 UTC

My 780Ti finished one successfully and another one is 50% complete. No errors yet. These WUs run hot.

I just clocked my GPU back to reduce the heat (906 MHz, 77C w/ 70% fan speed. CPU usage is minimal (~2%). GPU load is 82% and power usage is ~85% TDP.

Windows 8.1 with Driver version 335.23

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35770 - Posted: 20 Mar 2014 | 20:57:02 UTC - in response to Message 35755.

I will try severely under-clocking some more; they have all been over-volted, under-clocked, etc. to be stable thus far, but you never know what new work will bring.

No luck. I went down to 800 MHz on the GPU core (from 993 MHz on one card and 967 MHz on the other), and also reduced the memory clock 200 MHz, but they still crash right away. I think the only hope for these cards is to go back to an earlier driver that does not support CUDA 6.0.

TheFiend
Send message
Joined: 26 Aug 11
Posts: 100
Credit: 2,557,052,477
RAC: 2,225,950
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35771 - Posted: 20 Mar 2014 | 21:26:44 UTC

It's my Windows XP machine that seems to be suffering from the CUDA60 failures...

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35772 - Posted: 20 Mar 2014 | 21:31:56 UTC - in response to Message 35771.
Last modified: 20 Mar 2014 | 21:44:50 UTC

It's my Windows XP machine that seems to be suffering from the CUDA60 failures...

Interesting. I could do Linux, or move them back to a Win7 64-bit machine from whence they came. It might depend on whether there is any performance gain from CUDA 6.0 on these cards, or maybe that is just for the Maxwell cards.

Jeremy Zimmerman
Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 35777 - Posted: 20 Mar 2014 | 23:50:00 UTC - in response to Message 35772.

I have two WinXP-32 Pro machines which each have a GTX680 and a GTX460 both of them. They were on the 334.89 driver and were receiving Cuda60 tasks and erroring out within 1-2 seconds. I updated them to 335.23 and they are still erroring out on EVERY Cuda60 task. The Cuda55 and Cuda42 tasks are not a problem. Here is the error message on both machines for each task.

Exit status -1073741511 (0xffffffffc0000139) Unknown error number
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741511 (0xc0000139)
</message>
]]>



The Win7-64 Pro machine has a pair of GTX780Ti. This machine was not receiving any Cuda60 units with the 331.82 driver. I updated to the 335.23 driver and it is now receiving Cuda60 tasks, and they are completing (3 completed, 2 in progress, 0 errors).

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35787 - Posted: 21 Mar 2014 | 18:44:24 UTC

Hello: Completed one task v8.15 Application (cuda60) without problems in a GTX770 Windows 8.1-64bits.

Jeremy Zimmerman
Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 35790 - Posted: 21 Mar 2014 | 20:42:19 UTC - in response to Message 35787.

Update

1) XP Pro32 GTX680/GTX460 Driver=335.28
Cuda60 Completed 0, 0 in progress, 30 errors (all <3 seconds)

2) XP Pro32 GTX680/GTX460 Driver=335.28
Cuda60 Completed 0, 0 in progress, 28 errors (all <3 seconds)

3) Win7 Pro64 GTX780Ti/GTX780Ti Driver=335.23
Cuda60 Completed 9, 1 in progress, 0 errors

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 35794 - Posted: 21 Mar 2014 | 22:43:05 UTC - in response to Message 35755.

Jim - the error suggests that not all the application files have downloaded. Could you try a project reset/reattach and force the client to d/l all the files again.

Matt

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35799 - Posted: 22 Mar 2014 | 3:07:30 UTC - in response to Message 35794.

Jim - the error suggests that not all the application files have downloaded. Could you try a project reset/reattach and force the client to d/l all the files again.

Matt


OK, I did that on both machines, and each CUDA 6.0 work unit again errored out immediately. Here is a sample:

GPUGRID 8.15 Short runs (2-3 hours on fastest card) (cuda60) 274x-SANTI_MAR423cap310-32-84-RND3062_0 00:00:01 (00:00:00) 3/21/2014 10:57:32 PM 0.729C + 1NV 0.00 Computation error (313,)


Would you like a screen shot of the BOINC folder to look at the files (or of a sub-folder)? I have no anti-virus on either machine, and have disabled the Windows firewall, since the are dedicated machines behind a router with only BOINC running on them.

TheFiend
Send message
Joined: 26 Aug 11
Posts: 100
Credit: 2,557,052,477
RAC: 2,225,950
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35801 - Posted: 22 Mar 2014 | 9:52:37 UTC

CUDA60 tasks still erroring out after after project reset on my XP system. Haven't tried detaching and reattaching yet.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35821 - Posted: 23 Mar 2014 | 11:51:29 UTC - in response to Message 35801.
Last modified: 23 Mar 2014 | 11:54:40 UTC

Going by the error reported in another thread, this looks like the app is calling a dll file that is for x64 systems.

Is this XP problem limited to x86 (32bit) versions?
Is XP x64 ok with CUDA6?

What about Vista/W7/W8 x86 (32bit) versions?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 35829 - Posted: 23 Mar 2014 | 16:07:19 UTC

815 CUDA6 failed on my XP x64 5 out of 5 times, including after project reset. Always crashes after about 235 seconds.

MJH mentioned an 820 app in the other thread - haven't seen one, yet.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,295,466,723
RAC: 18,414,230
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35830 - Posted: 23 Mar 2014 | 16:19:06 UTC - in response to Message 35829.

815 CUDA6 failed on my XP x64 5 out of 5 times, including after project reset. Always crashes after about 235 seconds.

MJH mentioned an 820 app in the other thread - haven't seen one, yet.

Got one running now on my GTX 750Ti. No problems so far in the first 15%...

enels
Send message
Joined: 16 Sep 08
Posts: 9
Credit: 915,807,167
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35832 - Posted: 23 Mar 2014 | 18:30:46 UTC - in response to Message 35830.

Apparently the 820 app is only available for short run WUs.
The short run seems to be running fine though.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 35839 - Posted: 24 Mar 2014 | 2:01:54 UTC

No luck here - 8.20 CUDA6 Short run failed on XP pro x64 and GTX Titan. Crashed at the same spot as the 8.15 does - 235 seconds in.

ROBtheLIONHEART
Send message
Joined: 21 Nov 13
Posts: 34
Credit: 636,026,131
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 35840 - Posted: 24 Mar 2014 | 2:46:55 UTC
Last modified: 24 Mar 2014 | 2:48:16 UTC

8.20 cuda 6 short run ; ran one each on gtx770 & gtx780 WinXP Pro 64bit with the 335.28 driver all went fine. Only just slightly longer GPU run time than observed with 5.5 and 332.21 driver.

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 35854 - Posted: 24 Mar 2014 | 11:43:02 UTC

2 820 short runs have now completed successfully on my XP pro x64 - lookin' good!

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 35866 - Posted: 24 Mar 2014 | 15:42:52 UTC - in response to Message 35839.

petebe,

That error's a different one to that most clients were exhibiting. Do you get any pop'up dialog with more details about the error 0xc0000022 ?

Matt

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 35867 - Posted: 24 Mar 2014 | 17:17:59 UTC

Sorry, Matt, I only actually observed one as it blew up - there were no pop-ups or anything unusual. Just stopped and displayed (something like) "Error while computing" in BOINCTasks.

Post to thread

Message boards : Number crunching : Application v8.15 (cuda60)

//