Advanced search

Message boards : Number crunching : Problems with Task ID: 4071833, 4081911 and 4082018

Author Message
showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 21443 - Posted: 14 Jun 2011 | 13:26:06 UTC

As mentioned, that 3 wu's all ended in error.
http://www.gpugrid.net/result.php?resultid=4071833
http://www.gpugrid.net/result.php?resultid=4081911
http://www.gpugrid.net/result.php?resultid=4082018

Any ideas?

Thank you in advance.
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21445 - Posted: 14 Jun 2011 | 14:14:28 UTC - in response to Message 21443.

All 3 WU's were failed with the new 6.14 client. The new 6.14 application runs at higher GPU usage. I guess your system can't handle the consequences of this higher GPU utilization. Maybe the power supply, maybe the cooling, maybe both.

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 21447 - Posted: 14 Jun 2011 | 15:09:24 UTC

It shouldn't be an hardware issue: Corsair 750W CMPSU-750AXEU, CM 690 II LITE, and a Gtx 560 Ti that never surpassed 74°C with GpuGrid (no problem up to more than 85°C with Furmark).
____________

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21448 - Posted: 14 Jun 2011 | 15:10:55 UTC - in response to Message 21447.

First port of call would be to upgrade to latest driver.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21452 - Posted: 14 Jun 2011 | 16:51:08 UTC - in response to Message 21448.
Last modified: 14 Jun 2011 | 16:52:57 UTC

All showa's errors were for standard length KASHIF_HIVPR tasks, and all occurred within 20sec. Now shows does not have any tasks.

2 "Energies have become nan" errors
1 "SWAN: FATAL : swanMemcpyDtoH failed - Assertion failed: 0, file swanlib_nv.c, line 390".

All 3 tasks have been resent to other systems. No errors returned from other systems so far and 1 validated. So I expect the problem is with the system.
I would be inclined to shut down, give the computer (especially GPU) a clean, and increase the fan speeds before trying again.
The latest driver might help too.
I would also suggest you try to manually control what tasks you get (one at a time, and if they fail stop and try the long tasks).

Good luck,

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 21473 - Posted: 16 Jun 2011 | 10:17:37 UTC
Last modified: 16 Jun 2011 | 10:17:54 UTC

I'm quite sure it's not an hardware issue: my pc crunched 10/15 wus for Milkyway@Home, and I got no errors whatsoever. GPU load up to 97-98%, temperatures up to 78°C.
As soon as I can, I'll install new drivers, and I'll try to crunch other wus for this project.
By the way, what drivers are recommended?

Bye.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21478 - Posted: 16 Jun 2011 | 13:39:04 UTC - in response to Message 21473.
Last modified: 16 Jun 2011 | 13:47:27 UTC

I'm quite sure it's not an hardware issue: my pc crunched 10/15 wus for Milkyway@Home, and I got no errors whatsoever. GPU load up to 97-98%, temperatures up to 78°C.

That just proves that your card is not totally messed up; GPUGrid tasks are not the same and tax the system in a different way. Also a 10 or 15min MW task is not the same as crunching a 12h task here. The chances of failure over 10min is much less than over 12h. It also suggests that the drivers are not totally messed up, and yet you still have a problem.

Did you try anything I suggested?
Shut down, clean fans, increase fan speeds, use the latest driver (27533).

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 21485 - Posted: 17 Jun 2011 | 6:49:03 UTC

I'm aware that Milky wu aren't the same of Gpugrid.
As soon as I'm able to perform a system restart, I'll:
- clean the fans
- install Windoze update
- install new drivers for my Nvidia card
not necessarily in this order ;)
____________

Dirk
Send message
Joined: 10 Oct 08
Posts: 18
Credit: 39,100,916
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21487 - Posted: 17 Jun 2011 | 8:30:54 UTC

Did you overclock the GPU? GPUGRID and overclocking doesn't seem to go well together. Anything over 10% seems to cause my WUs to error out.

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 21652 - Posted: 9 Jul 2011 | 10:30:41 UTC - in response to Message 21487.

No, the gpu is @ default.
Since my post, I haven't been able to crunch for Gpugrid anymore, while I still can for Milkyway.
Don't know what to think.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21655 - Posted: 9 Jul 2011 | 12:07:06 UTC - in response to Message 21652.

We are now using a different Windows app (6.15), so you might want try here again.

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 21718 - Posted: 22 Jul 2011 | 11:50:21 UTC

Don't know why, but I can crunch (at least, so far) for 6.15 version.
____________

Post to thread

Message boards : Number crunching : Problems with Task ID: 4071833, 4081911 and 4082018

//