Advanced search

Message boards : Graphics cards (GPUs) : Steps to diagnose failure to run tasks?

Author Message
AndyCivil
Send message
Joined: 5 Feb 14
Posts: 6
Credit: 25,848,270
RAC: 0
Level
Val
Scientific publications
watwatwatwatwat
Message 52872 - Posted: 19 Oct 2019 | 22:31:28 UTC

I have a Win7 / Core-i7 / 8GB / GTX-660 computer that's not much in use, and being winter I'd like to run jobs since the waste heat is also useful. Years ago, I was able to run both rosetta@home CPU jobs and GPUgrid jobs, but now only rosetta@home seems to be working.

I have already updated BOINC, and checked my card compatibility ("Still works") and deleted and re-installed the NVidia driver, and searched the forum for ideas, but to no avail. It seems that periodically, the count of "tasks failed" increments by two. It was 31, then 33, and now 35.

I can't spend too much time on this, but please give me a checklist of things that might be wrong, so I can go through it. It was a $200 card, and it's sitting there idle, and apparently mining bitcoin isn't viable any more.


General
URL
http://www.gpugrid.net/
User name
AndyCivil
Team name
Canada
Resource share
100
Disk usage
38.89 MB
Computer ID
170831
Suspended via GUI
no
Don't request tasks
no
Host location
default
Tasks completed
0
Tasks failed
35
Credit
User
16,728,965 total, 0.09 average
Host
16,581,515 total, 0.09 average
Scheduling
Scheduling priority
-0.10
CPU task request deferred for
09:37:58
CPU task request deferral interval
10:40:00
NVIDIA GPU task request deferred for
05:11:33
NVIDIA GPU task request deferral interval
10:40:00
Last scheduler reply
2019-10-19 6:19:31 PM

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52873 - Posted: 20 Oct 2019 | 0:46:40 UTC - in response to Message 52872.

Your nVidia driver is very old (r327.23 released Sept 2013). Your setup should be able to load latest nVidia driver r436.8.
This is the most likely cause of your errors.
nVidia drivers can be downloaded from nVidia here: https://www.geforce.com/drivers

AndyCivil
Send message
Joined: 5 Feb 14
Posts: 6
Credit: 25,848,270
RAC: 0
Level
Val
Scientific publications
watwatwatwatwat
Message 52874 - Posted: 20 Oct 2019 | 4:17:22 UTC - in response to Message 52873.

Thank you, that's probably it. I googled for a driver, and the first page I found (which was https://www.geforce.com/drivers/results/66884 incidentally) didn't say anything about "why don't you use a newer driver instead"). I checked that it was a legitimate domain (nothing scammy) and valid for my GPU and went ahead and installed it. I just assumed it was the latest. Normally when you download an old driver for something it's on a page labelled "previous versions" and has a whole list of them. It hasn't started a job yet, but I'll give it a day.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52875 - Posted: 20 Oct 2019 | 5:35:39 UTC - in response to Message 52874.

... It hasn't started a job yet, but I'll give it a day.

Unfortunately, tasks are NOT available all the time, so just be patient.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,936,877,024
RAC: 10,953,624
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52876 - Posted: 20 Oct 2019 | 11:43:40 UTC

Additionally, in a gross estimate, your GTX660 graphics card should finish a long ACEMD WU in a range from 12 hours to 18 hours of continuous processing time, depending on kind of task.
This estimate comes from comparing computing power between your GTX660 (about 1882 GFLOPs) to my GTX750TI (about 1306 GFLOPs) and applying rule of three to my times.
Font: http://www.gpureview.com/show_cards.php?card1=680&card2=695

For this reason, I would recommend to set initially BOINC Manager processing preferences to:
* Usage limits
- Use at most: 75 % of the CPUs - (To reserve a free full CPU core)
- Use at most: 100 % of CPU time
* When to suspend: Uncheck all options in this section
* Other
- Store at least 0.01 days of work - (To download only one WU at a time)
- Store up to an additional 0.01 days of work - (To download only one WU at a time)
- Switch between tasks every 1440 minutes - (To not alernate between other projects tasks, if any. Check if GPUGrid task is running. If not, pause other GPU running task until it is)

When you have processed some sample WUs, it will be time to set more conservative values if desired.

Also visit GPUGrid preferences page for your account, and check to receive new ACEMD3 tasks. Old ACEMD short and long WUs are currently to extinguish.

And:

Unfortunately, tasks are NOT available all the time, so just be patient.


Good luck!

AndyCivil
Send message
Joined: 5 Feb 14
Posts: 6
Credit: 25,848,270
RAC: 0
Level
Val
Scientific publications
watwatwatwatwat
Message 52878 - Posted: 20 Oct 2019 | 13:36:35 UTC

It's working now, thanks.
I did change those preferences.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,936,877,024
RAC: 10,953,624
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52888 - Posted: 22 Oct 2019 | 10:59:35 UTC

Nice to see that You’ve been lucky to catch two ACEMD WUs, and both of them have finished successfully :-)

AndyCivil
Send message
Joined: 5 Feb 14
Posts: 6
Credit: 25,848,270
RAC: 0
Level
Val
Scientific publications
watwatwatwatwat
Message 52889 - Posted: 23 Oct 2019 | 14:15:36 UTC - in response to Message 52888.

I did! They are taking about 25 hours to complete, but that's OK.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 566
Credit: 5,936,877,024
RAC: 10,953,624
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52890 - Posted: 24 Oct 2019 | 6:59:55 UTC

Currently it seems to be an issue affecting a specific batch of Long runs v9.22 (cuda65).
My cuda 65 computers are also failing theese tasks immediately, erroring with exit status -44 as yours.
Same WUs resent to cuda 80 computers seem to progress correctly.
Therefore, I deduce it is not due to our computers, but to this specific cuda 65 batch of WUs.
I reccomend not to alter your current setup and wait the problem be resolved on GPUGrid side.

AndyCivil
Send message
Joined: 5 Feb 14
Posts: 6
Credit: 25,848,270
RAC: 0
Level
Val
Scientific publications
watwatwatwatwat
Message 52891 - Posted: 24 Oct 2019 | 18:36:01 UTC - in response to Message 52890.

Random information: I have another newer computer, Rayquaza, but I didn't spend much on the graphics card, it's only a GT1030 (below entry level) but I thought I'd give it a shot, just to see what happened - but tasks on it fail immediately too (I've turned off fetching tasks.) However, I notice that the driver on that is old, as well! I don't know how this happens because I only built it recently. When I get a minute, I'll update the driver and try again.

Side story: I know this is a GPU grid forum, but I found an old broken laptop of my daughter's, and reset it to see if it would run BOINC. The screen is cracked, the touchscreen feature only works on the bit to the right of the fault line, the keyboard doesn't connect properly, so it's basically useless, but it's running Rosetta@home tasks! And, it's only using 5 watts of power...

Post to thread

Message boards : Graphics cards (GPUs) : Steps to diagnose failure to run tasks?

//