Advanced search

Message boards : Number crunching : BEWARE: 2p0m-SDOERR_OPMamber6P2-0-1-RND4183

Author Message
Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45898 - Posted: 24 Dec 2016 | 17:41:11 UTC

If you get one of these, be careful. This bad WU failed on 9 machines and had the additional insult of putting my 1060 in a state where it failed the next WU. Luckily I happened to catch it because of the super long DL times here and rebooted, fixing the problem. Here's the WU, there are most likely more like this floating around:

https://www.gpugrid.net/workunit.php?wuid=12205143

It's possible that it only locks up the GPU with the CUDA80 app as all the other machines were running CUDA65.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45909 - Posted: 25 Dec 2016 | 6:15:16 UTC - in response to Message 45898.

This bad WU failed on 9 machines and had the additional insult of putting my 1060 in a state where it failed the next WU.

Edit: now 10 machines...

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 45924 - Posted: 26 Dec 2016 | 2:30:09 UTC - in response to Message 45909.

This bad WU failed on 9 machines and had the additional insult of putting my 1060 in a state where it failed the next WU.

Edit: now 10 machines...

Probably 8 errors and 2 'ghost' downloads. I usually have at least 1 'ghost' in the system at all times on my main system. It shows there is a 7th WU out there, but the machine itself shows that task as not being on it and the logs and xml shows that WU never even existed.

Post to thread

Message boards : Number crunching : BEWARE: 2p0m-SDOERR_OPMamber6P2-0-1-RND4183