Advanced search

Message boards : Number crunching : Error after 7 hours

Author Message
ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15148 - Posted: 11 Feb 2010 | 16:35:29 UTC
Last modified: 11 Feb 2010 | 16:41:45 UTC

11-2-2010 17:17:58 GPUGRID Computation for task 391-GIANNI_BIND_166_119-55-100-RND0887_0 finished
11-2-2010 17:17:58 GPUGRID Output file 391-GIANNI_BIND_166_119-55-100-RND0887_0_1 for task 391-GIANNI_BIND_166_119-55-100-RND0887_0 absent
11-2-2010 17:17:58 GPUGRID Output file 391-GIANNI_BIND_166_119-55-100-RND0887_0_2 for task 391-GIANNI_BIND_166_119-55-100-RND0887_0 absent
11-2-2010 17:17:58 GPUGRID Output file 391-GIANNI_BIND_166_119-55-100-RND0887_0_3 for task 391-GIANNI_BIND_166_119-55-100-RND0887_0 absent

Task = 1840436
WUid = 1157468
____________
Ton (ftpd) Netherlands

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15151 - Posted: 11 Feb 2010 | 19:27:06 UTC - in response to Message 15148.

Unlucky!
I can empathise, after having to abort task 1796870; it fan for 148674sec (41h), but it was a Beta.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15157 - Posted: 11 Feb 2010 | 21:02:15 UTC - in response to Message 15151.

this was not a beta! 6.71 cuda 2.3
____________
Ton (ftpd) Netherlands

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15563 - Posted: 2 Mar 2010 | 13:33:07 UTC

Until now still the same problems.

All units cancelled after a few seconds and not only the 6.71 cuda 2.3.

Can someone look at it?

XP - 6.10.18 - gtx295 driver 196.34
____________
Ton (ftpd) Netherlands

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15565 - Posted: 2 Mar 2010 | 13:45:51 UTC - in response to Message 15563.

I think this may the the line ion the log that the techs can help identify the root cause of the errors from ...

Cuda error: Kernel [shake_step_1] failed in file 'shake.cu' in line 79 : unspecified launch failure.


Are your cards downclocking?
Have you excluded the BOINC directories from your AntiVirus scanning?
Are you running BOINC with the same account you installed it as?
Have you tried removing, cleaning up and reinstalling drivers with a tool like DriverSweeper?


____________
Thanks - Steve

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15602 - Posted: 4 Mar 2010 | 10:46:40 UTC - in response to Message 15565.

No downclocking.

Installed new driver 196.75 for gtx295.

Job 6.71 cancelled after 6 hours (both)- output file absent.

No problems with boinc-manager.

Stop processing GPUGRID.net???
____________
Ton (ftpd) Netherlands

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15603 - Posted: 4 Mar 2010 | 13:26:35 UTC - in response to Message 15602.

Stop processing GPUGRID.net???
No.

Have you excluded the BOINC directories from antivirus scanning?

Have you checked your anitvirus logs to see it it is stopping the ACEMD2 application? ACEMD2 is the program that GPUGrid uses to process WQUs.

Have you rebooted your machine?

Have you tried switching SLI on / off?

Have you tried switching Physix on/ off?
____________
Thanks - Steve

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15616 - Posted: 5 Mar 2010 | 8:56:45 UTC - in response to Message 15603.

No problems with McAfee virus-scanner.

I have rebooted the system.

I have tried to switch PhysX and Multi-GPU.

Tonight 1 job finished and 1 job cancelled after more than 5 hours.

At this moment driver 196.75 and physx/multi-gpu = on - gtx295

Any ideas?
____________
Ton (ftpd) Netherlands

Siegfried Niklas
Avatar
Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 15623 - Posted: 5 Mar 2010 | 20:29:16 UTC - in response to Message 15616.
Last modified: 5 Mar 2010 | 20:35:58 UTC

Remove Driver 196.75

http://www.gpugrid.net/forum_thread.php?id=2047

http://www.pcmag.com/article2/0,2817,2360991,00.asp

Nvidia: "We are aware that some customers have reported fan speed issues with the latest 196.75 WHQL drivers on NVIDIA.com. Until we can verify and root cause this issue, we recommend that customers stay with, or return to 196.21 WHQL drivers. Release 196.75 drivers have been temporarily removed from our Web site in the meantime."

http://www.incgamers.com/News/21293/nvidia-19675-kills-video-cards

incgamers.com: "We're getting reports where users are getting intermittent low FPS after installing these drivers. It seems that it is related to the fan control included in these drivers not working correctly and is causing the video card to overheat on 3D applications."

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15631 - Posted: 6 Mar 2010 | 9:28:39 UTC - in response to Message 15623.

I am back to 196.21
____________
Ton (ftpd) Netherlands

Rick A. Sponholz
Avatar
Send message
Joined: 20 Jan 09
Posts: 52
Credit: 2,518,707,115
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15744 - Posted: 13 Mar 2010 | 21:24:54 UTC - in response to Message 15631.

I've been using 196.21 all along and have also had the error issue. I have two GTX 295 cards running on two separate desktops. One machine is getting the errors, the other GTX 295 machine, as well as my 4 machines with 9800GTX+'s are completing the workunits without errors.
____________

Post to thread

Message boards : Number crunching : Error after 7 hours

//