Advanced search

Message boards : Graphics cards (GPUs) : Computation Error on Resume while running other CUDA

Author Message
Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 8959 - Posted: 26 Apr 2009 | 17:09:46 UTC

Hey, just thought I'd let the developers know,
I was running Folding@home GPU, and while that was still going, resumed my GPUGRID WU. My GPUGRID WU then immediately had a computation error. And after 12 hours of work. Oops.

Perhaps this is a rare error, but perhaps there is some mishandling of CUDA exceptions, or could just be drivers.

FYI I did once start Folding@Home GPU while GPUGrid was going, which didn't result in errors (however I think F@H stole all the GPU rather than sharing!)

I only tried because I was curious btw - I wouldn't recommend running things in parallel as i suspect it would lead to lots of cache misses in the GPU's caches.

popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 40,277,822
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8960 - Posted: 26 Apr 2009 | 19:23:45 UTC

Not only cache misses but "CUDA" misses as well.
Since the GPU can only do one task at a time it will constantly be switching tasks which is a very slow operation on the GPU.

I didn't even bother testing it here at GPU grid but at seti the tasks took 20x longer while running F@H and there was also a 50% ppd reduction on F@H.

So unless someone wants things to run over 20x slower... don't run both at the same time.

Bob

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 8962 - Posted: 26 Apr 2009 | 20:40:06 UTC

Haha, looks like it wasn't just me who was curious :P

Still shouldn't have had the error I think, but you're right, not a good idea! I was wondering since Vista virtualizes the GPU as a shared resource.

In fact, I started off with GPUGrid on BOINC, but back then the timestep was too big and the desktop (Aero) was kind of unusable (btw switching to software desktop seemed to be more jittery). So I switched to F@H for a while, before trying BOINC again. I tend to do more GPUGrid now 'cos it uses the GPU less efficently, so I can run my 8800GT overclocked while the fan is still quiet.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9019 - Posted: 27 Apr 2009 | 21:07:32 UTC

Was it this WU? It's an "out of memory" error, which happens when some app (e.g. a game) occupies so much GPU memory that there's not enough left for GPU Grid. It's not a nice way to error out in such cases, but it's a known problem. If it's indeed this WU (you also had 2 other errors, which may be caused by the OC) then it looks like F@H reserves quite a lot of GPU memory, as GPU-Grid itself doesn't need that much (~70 MB with old WUs).

MrS
____________
Scanning for our furry friends since Jan 2002

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 9036 - Posted: 27 Apr 2009 | 22:00:28 UTC

Ok, must have been out of memory I guess (does have 512MB, but can't monitor GPU mem usage in vista).

Is it the other two you think may have been OC errors:
http://www.gpugrid.net/result.php?resultid=572872
http://www.gpugrid.net/result.php?resultid=588056

However, while the second one is overclocked by 20%, the first one is actually underclocked (62% of stock). So I'm not sure about OC error.

Quite high claimed credit for some of the aborted ones, not sure why that would have happened as i'm sure i wouldn't cancel nearly-finished WUs!

Whatever though, although seeing a line pointing up is nice, it's really for the science, so I'll monitor to check it's working fine in the future.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9083 - Posted: 28 Apr 2009 | 21:06:29 UTC - in response to Message 9036.

Didn't check individually, but you only have 2 other errors. And you're right, 920 MHz is not exactly an excessive speed for your card ;) It was only a speculation anyway, so as long as you don't get more errors never mind these two.

And the credits per WU are fixed so they don't depend on crunching time, that's why the claim appear high for the errors.

MrS
____________
Scanning for our furry friends since Jan 2002

Andrew
Send message
Joined: 9 Dec 08
Posts: 29
Credit: 18,754,468
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 9112 - Posted: 29 Apr 2009 | 17:50:57 UTC - in response to Message 9083.

Cheers

Post to thread

Message boards : Graphics cards (GPUs) : Computation Error on Resume while running other CUDA

//