Advanced search

Message boards : Graphics cards (GPUs) : Do I have to be made to click "OK" on failed task?

Author Message
far
Send message
Joined: 5 Jan 09
Posts: 32
Credit: 1,412,042,305
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27423 - Posted: 26 Nov 2012 | 3:23:07 UTC

Hi
I don't switch to the different machines I have grid computing particularly often
and I just noticed an error message for a task that had failed - it had put a little window up needing OK to be clicked.

In Boinc manager I could see the GPUGrid task had been running for 3 days waiting for someone to click OK its failed, lets move on. As soon as I clicked OK, the task status shifted to computing error, and the next task started.

Is there any way this can be avoided? I assume I just lost 3 days of GPU processing and hopefully not electricity too.
Thanks,
Far

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27424 - Posted: 26 Nov 2012 | 5:25:17 UTC - in response to Message 27423.

I had the same issue when I was using the 295 nvidia drivers. CUDA would crash when the monitor went to sleep and a new task was started. Either downgrade or upgrade the drivers.
____________
XtremeSystems.org - #1 Team in GPUGrid

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27427 - Posted: 26 Nov 2012 | 11:23:44 UTC - in response to Message 27424.
Last modified: 26 Nov 2012 | 11:24:11 UTC

Far, I had the same issue recently. I expect this happened on one of your XP systems?
Anyway, a restart is in order. Also, change the monitor to never turn itself off, and just turn it off manually.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27431 - Posted: 26 Nov 2012 | 22:26:54 UTC

Recently I was creating quite a few errors here (my fault) and this never happened on 2 hosts. So it's definitely some special case on your side. Don't know what causes it, though. Trying a never driver is a good idea. And maybe you recently installed some GPU programming tools which activated some debug mode, which causes this message?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27436 - Posted: 27 Nov 2012 | 1:27:08 UTC - in response to Message 27423.
Last modified: 27 Nov 2012 | 1:28:46 UTC

There is a way to restart your pc whenever an error message pops up. I've wrote a little batch program when I had similar error messages.
Here it is.
You have to modify (or add) the filenames of the GPUGrid applications in the first two lines, like this.

far
Send message
Joined: 5 Jan 09
Posts: 32
Credit: 1,412,042,305
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27441 - Posted: 27 Nov 2012 | 5:46:33 UTC - in response to Message 27436.

Thanks for all the suggestions guys. I'm running the latest (non-beta) drivers 306.81, on XP. There are no GPU programming tools installed.

The monitor is already set to never sleep (as is anything else under the energy profile).

I can't reboot the machine as it has other processes running which require a pw/uid logon and I can't store the info in a file anywhere.

It sounds like the simplest option is just for me to try to remember to connect to the machines to checkup on them more often. I'm going to have a look at the eFMer boinc app at some point when I can find time, in case I can spot from my phone that something is taking an unduly long amount of time and go check it out.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27442 - Posted: 27 Nov 2012 | 6:09:20 UTC - in response to Message 27441.

Use the auto-logon and if you want to run some app, put it in the startup folder using a run as batch file.

I guess you could also use Zoltan's script and set up an administrator alert by email, or disable and re-enable the card/driver, but exactly how to get the alert going could be tricky and it's a fair bit of work.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 27448 - Posted: 27 Nov 2012 | 21:31:07 UTC

Instead of checking each machine manually you could look at your hosts in GPU-Grid, under your account, and see when they last contacted the server.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Do I have to be made to click "OK" on failed task?

//