Advanced search

Message boards : Graphics cards (GPUs) : Bad WU crashing GPU

Author Message
lohphat
Send message
Joined: 21 Jan 10
Posts: 44
Credit: 788,587,359
RAC: 2,100,310
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59564 - Posted: 5 Nov 2022 | 8:14:23 UTC

https://www.gpugrid.net/workunit.php?wuid=27339139

I just had a WU crash the GPU and it took 105 minutes of reboots to get the monitors to not be full green screens and boot to the desktop.

I then updated the driver from 522.25 to 526.47 to see if that will be better.

However all three hosts trying that WU errored out.

Bad CUDA app?

https://www.gpugrid.net/result.php?resultid=33130348

https://www.gpugrid.net/show_host_detail.php?hostid=581748

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1043
Credit: 40,220,807,483
RAC: 3,892,747
Level
Trp
Scientific publications
wat
Message 59565 - Posted: 5 Nov 2022 | 23:49:05 UTC - in response to Message 59564.

sounds like something wrong with the GPU honestly.
____________

jjch
Send message
Joined: 10 Nov 13
Posts: 98
Credit: 15,413,200,388
RAC: 2,018,179
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59566 - Posted: 6 Nov 2022 | 1:41:24 UTC - in response to Message 59565.

A faulty WU should not cause that much of a problem with your GPU. Many of the Python WU's fail but it is somewhat inherent in the type of computing they do.

A couple of the other failures referenced occurred due to insufficient swap space or resources which is usually memory.

Upgrading to the latest driver may help but if you are still having problems I would suggest running DDU to fully remove the driver and try installing it again.

Seems that the 980Tis might run a bit on the hot side. Make sure it is cooling properly. I would suggest setting an aggressive fan curve with EVGA Precision X or similar.

I would also suggest checking for Windows 11 updates sometimes there can be weird problems caused by Windows itself.

Just do some basic health check and cleanup activities. Defrag or trim your disk. Run virus scans etc. Blow out the dust bunnies too.

Other than that it might be time to upgrade your GPU to something a bit newer. It doesn't have to be the latest but I would suggest looking for at least a GTX 1060 6GB or better if you can.

lohphat
Send message
Joined: 21 Jan 10
Posts: 44
Credit: 788,587,359
RAC: 2,100,310
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59567 - Posted: 7 Nov 2022 | 2:40:38 UTC

I process other CUDA WU from other projects without issue and get decent credits. Just wondering if the task software isn't heeding available resources and forging ahead destined to crash.

Yes, the plan is to update to a newer GPU but the last few years have made that improbable -- I refuse to be gouged.

I also have to make a decision to move away from nVidia or not -- their market behavior isn't my cup of tea.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1305
Credit: 5,722,646,959
RAC: 8,371,108
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59568 - Posted: 7 Nov 2022 | 5:57:36 UTC - in response to Message 59567.

If you read these forums regularly you should know that these tasks use mainly just the cpu and not much of any gpu.

And that you need to have a large amount of storage space for each task to properly crunch and finish correctly.

I would look for weaknesses in the rest of your system ignoring any gpu first.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1043
Credit: 40,220,807,483
RAC: 3,892,747
Level
Trp
Scientific publications
wat
Message 59569 - Posted: 7 Nov 2022 | 15:10:27 UTC - in response to Message 59567.
Last modified: 7 Nov 2022 | 15:10:46 UTC

I also have to make a decision to move away from nVidia or not -- their market behavior isn't my cup of tea.


just be aware that if you decide to move to AMD (or even Intel), you wont be able to contribute to this project anymore. GPUGRID only has CUDA apps, and only Nvidia can process CUDA.
____________

Post to thread

Message boards : Graphics cards (GPUs) : Bad WU crashing GPU

//