Advanced search

Message boards : Number crunching : GPU computation errors

Author Message
CodeRedDewd
Send message
Joined: 11 Nov 09
Posts: 27
Credit: 4,925,174
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 16164 - Posted: 5 Apr 2010 | 6:39:53 UTC

I was a victim of the new nVidia driver that had a fan speed problem and my card ran over 95deg C for a while and now has compute errors. But it only errors when I run 2 GPUGRID tasks simultaneously on my 9800GX x2. I can run 1 GPUGRID and one anything else and not have any errors. I cannot run 2 of the other GPU task either, because those tasks error out as well. How can I force BOINC to always run 1 GPUGRID task and one of something else? I just can't figure it out... Anyone's help is much appreciated!

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16166 - Posted: 5 Apr 2010 | 9:02:28 UTC - in response to Message 16164.

I was a victim of the new nVidia driver that had a fan speed problem and my card ran over 95deg C for a while and now has compute errors. But it only errors when I run 2 GPUGRID tasks simultaneously on my 9800GX x2. I can run 1 GPUGRID and one anything else and not have any errors. I cannot run 2 of the other GPU task either, because those tasks error out as well. How can I force BOINC to always run 1 GPUGRID task and one of something else? I just can't figure it out... Anyone's help is much appreciated!


Short answer. You can't

You can tell it not to use a particular gpu, but it applies to all projects. In your cc_config file put (within the options tag)

<ignore_cuda_dev>0</ignore_cuda_dev>

Where 0 is the cuda device number.
____________
BOINC blog

CodeRedDewd
Send message
Joined: 11 Nov 09
Posts: 27
Credit: 4,925,174
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 16185 - Posted: 6 Apr 2010 | 0:34:13 UTC - in response to Message 16166.
Last modified: 6 Apr 2010 | 0:38:39 UTC

Thanks for the help... I would like to use both GPUs though. Is there a way to
BOINC or GPUGRID only request or send one task at a time? I know it sounds like the same question, but it's different. Instead of controlling what is being worked on, starve BOINC with only 1 GPUGRID GPU task at a time, is what I'm asking. I don't want to queue anything from GPUGRID... The tasks are long in duration at 14 hours or so, wheras the other gpu tasks I want to run are 45 minutes. I just don't understand why I cannot run 2 tasks from any one project without getting an error. GPUGRID tasks error both immediately and the other project will error the task not completed when a task is completed.

CodeRedDewd
Send message
Joined: 11 Nov 09
Posts: 27
Credit: 4,925,174
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 16191 - Posted: 7 Apr 2010 | 7:10:51 UTC - in response to Message 16185.

Anyone have an answer to this????

I just need to find a way to be sent only one CUDA task at a time... It cannot run both cores, or they both error during the first 2 seconds or less.

Thanks for the help!!

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16192 - Posted: 7 Apr 2010 | 11:08:52 UTC - in response to Message 16191.

Anyone have an answer to this????

I just need to find a way to be sent only one CUDA task at a time... It cannot run both cores, or they both error during the first 2 seconds or less.

Thanks for the help!!


About all you can do is set your cache to zero, but even then I think it will pickup a new wu when its close to finishing the one thats running.

If it was me i'd get the card fixed or replaced. Given nvidia have admitted their driver was faulty you'd have a pretty good chance at getting them to wear the cost or compensating you in some way. Is the thing still under warranty? A lot of them come with 2 or more years now.
____________
BOINC blog

CodeRedDewd
Send message
Joined: 11 Nov 09
Posts: 27
Credit: 4,925,174
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 16196 - Posted: 7 Apr 2010 | 16:30:00 UTC - in response to Message 16192.
Last modified: 7 Apr 2010 | 16:31:13 UTC

I submitted a ticket to XFX since the card is suppose to have a double lifetime warranty. We'll see what happens.

This is what it says, that one has to agree to when a driver is downloaded from nVidia:

6.2 No Liability for Consequential Damages. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL NVIDIA OR ITS SUPPLIERS BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT, OR CONSEQUENTIAL DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR ANY OTHER PECUNIARY LOSS) ARISING OUT OF THE USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

I guess this means they can create a driver to destroy everything and not be liable....

I thought of something yesterday. I don't quite know how the "switch between tasks" works exactly, but I set GPUGRID to 9999 and the other project to run on my GPU to 9999 also. If I'm running one of each, it should not try to switch for that many minutes, right? Then when it does, they will both switch around the same time, and what I'm guessing will happen is that project A on core 0 and project B on core 1, will get switched to project A on core 1, and project B on core 0. I don't know, it's just a thoery. What do you think? If it works, it seems only one of each will run, unless there is overlap of the times somehow. How can I start the 9999 timers at the same time so there's no overlap?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16198 - Posted: 8 Apr 2010 | 9:31:15 UTC - in response to Message 16196.

You could also try upgrading to the latest BOINC client 6.10.43
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16199 - Posted: 8 Apr 2010 | 10:31:18 UTC - in response to Message 16196.

I thought of something yesterday. I don't quite know how the "switch between tasks" works exactly, but I set GPUGRID to 9999 and the other project to run on my GPU to 9999 also. If I'm running one of each, it should not try to switch for that many minutes, right? Then when it does, they will both switch around the same time, and what I'm guessing will happen is that project A on core 0 and project B on core 1, will get switched to project A on core 1, and project B on core 0. I don't know, it's just a thoery. What do you think? If it works, it seems only one of each will run, unless there is overlap of the times somehow. How can I start the 9999 timers at the same time so there's no overlap?


The switch between tasks is used for CPU tasks. It will (if it needs to share between projects) swap tasks based upon this time. The default is 60 (mins). Which means if it needs to swap one out it has to run for an hour before it can do it. GPU task run from beginning to end, they don't get swapped out normally.
____________
BOINC blog

CodeRedDewd
Send message
Joined: 11 Nov 09
Posts: 27
Credit: 4,925,174
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 16224 - Posted: 9 Apr 2010 | 15:55:19 UTC - in response to Message 16199.

Sorry, but I have to say that on my system, GPU tasks don't run to completion, as I have one Collatz that ran 16 of 45 minutes, then switched to running to 2 GPUGRID tasks, from one of each. Maybe something isn't right because it does this?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16225 - Posted: 9 Apr 2010 | 17:02:33 UTC - in response to Message 16224.
Last modified: 9 Apr 2010 | 17:04:09 UTC

GPU tasks still have to use the CPU to some extent. Perhaps that explains it; the default switch is 60min for the CPU and a GPUGrid task could use more than an hour of CPU time. I think I saw the same thing in the past with MW tasks starting mid-run through a GPUGrid task (I dont much care for MW or Aqua, and the others have never got a look in). If you have a GTX275 it is likely to be able to finish a task in less that 1hour of CPU time, but lots of other cards are not so fast.

Post to thread

Message boards : Number crunching : GPU computation errors

//