Advanced search

Message boards : Graphics cards (GPUs) : Some wu-s erroring out.

Author Message
Profile sir sant
Send message
Joined: 1 Jul 09
Posts: 5
Credit: 27,036,793
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 22501 - Posted: 10 Nov 2011 | 21:00:01 UTC

Hi, haven't been running gpugrid for a long time. A few days ago desided to run gpugrid on my second box. Half the wu-s errored out. Been running Primegrid on that for a long time w\o errors, distrtgen also runs w\o errors.
The box is: gtx 570 + 3xGts450 512 mbt, 4 gbt ddr3 memory, athlon II quad, antec hcg 900 wt psu, win xp 32 pro sp3, nvidia driver 266.58. boinc 6.10.60 x86.
While running the gpu loads were fine, around 99% and it didn't pull anything from the cpu, which was idling basically.

So where do i look for the problem?

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 23
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22502 - Posted: 10 Nov 2011 | 22:36:20 UTC - in response to Message 22501.

Hello: The first thing to recommend is to update BOINC 6.12.34 is the latest stable version. Greetings.

Profile sir sant
Send message
Joined: 1 Jul 09
Posts: 5
Credit: 27,036,793
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 22710 - Posted: 15 Dec 2011 | 11:46:53 UTC - in response to Message 22502.

Hi, didn't get around to try it asap, so did it now. Using newest boinc now, and some wu-s still erroring out. All else remains the same. So what's really wrong?

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22712 - Posted: 15 Dec 2011 | 11:54:20 UTC - in response to Message 22710.
Last modified: 15 Dec 2011 | 12:07:19 UTC

It will probably get confused with the different cards in the machine. The GTX570 is CC 2.0 and the GTS450's are CC 2.1 cards. Unfortunately BOINC tries to treat them all the same. The GTX570 is ideal for GPUgrid, but the others are best used for some other project. Is it possible to put the GTS450's into another machine or relocate the GTX570?

If not you might have to wait for BOINC 7 as that allows the user to configure which GPU's can be used for a project. But don't try it yet as its still in alpha test.
____________
BOINC blog

Profile sir sant
Send message
Joined: 1 Jul 09
Posts: 5
Credit: 27,036,793
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 22835 - Posted: 26 Dec 2011 | 21:09:23 UTC

Hi, boinc does not get confused with different card's in the same box. I have had many different configurations with mixed ati and nvidia cards, and sometimes running the same project and no issues. I'll try again when i have spare time with different drivers, and maybe different os. The issue is minor probably, but its not easy to find it.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 698
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22836 - Posted: 26 Dec 2011 | 23:45:13 UTC - in response to Message 22835.
Last modified: 26 Dec 2011 | 23:52:12 UTC

GPUGrid is a greater stress to your GPUs, than other projects.
Check your GPU temperatures. (below 80°C is recommended, raise your fan speeds if necessary)
Run your GPUs at factory preset clock frequencies and voltages (or below, if temps are still high)

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22838 - Posted: 27 Dec 2011 | 1:41:18 UTC

I've had lots of GPUgrid tasks crashing lately and I think I have found the cause. Maybe this affects the OP too.

I've been running some other applications that run at high priority and preempt BOINC client for many seconds. When that happens, science apps from other projects exit with exit code 0 and the "no heartbeat from BOINC" message and when BOINC gets more CPU time it restarts those apps and the tasks continue. However when the other science apps exit with code 0, the GPUgrid app exits with a non-zero error code which causes BOINC to not restart the task. BOINC gives the task "compute error" and gets a new task.

Is that what is happening? Is that a known problem? Does the GPUgrid app really experience an error or could it be changed to give an exit code = 0?

Post to thread

Message boards : Graphics cards (GPUs) : Some wu-s erroring out.

//