Advanced search

Message boards : Number crunching : Nvidia OpenCL problem for 364.* drivers

Author Message
Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 409
Credit: 306,444,146
RAC: 418,402
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43215 - Posted: 17 Apr 2016 | 2:09:06 UTC

The OpenCL section of the Nvidia 364.72 driver, and earlier 364.* drivers, has a problem which can cause an entire computer to lock up, or cause a few dozen OpenCL tasks (often not all from the same BOINC project) to give a quick Compute Error. Problem not seen in the 362.00 driver.

Tasks from POEM@home seem the most likely to trigger this problem.

Threads on the problems:

https://www.primegrid.com/forum_thread.php?id=6769#94223

http://boinc.fzk.de/poem/forum_thread.php?id=1205#10896

I currently do not have hardware that can check whether GPUGRID has this problem, but you may want to watch for it.

MossyRock
Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 43219 - Posted: 18 Apr 2016 | 11:47:20 UTC - in response to Message 43215.
Last modified: 18 Apr 2016 | 11:54:10 UTC

Great. I just applied the 364.72 Nvidia update yesterday and now all of my GPUGrid tasks are crashing. One failed after considerable time had elapsed and the last two crashed just after starting.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 668
Credit: 2,498,095,550
RAC: 15
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43220 - Posted: 18 Apr 2016 | 12:15:40 UTC - in response to Message 43219.
Last modified: 18 Apr 2016 | 12:17:36 UTC

Could be something else entirely because this board is not full of WU failures due to these drivers and I've run them myself since they came out.

GPUGrid doesn't use OpenCL

MossyRock
Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 43221 - Posted: 18 Apr 2016 | 14:04:00 UTC - in response to Message 43220.

I'll try a clean install of the drivers and see if that fixes the issue.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 409
Credit: 306,444,146
RAC: 418,402
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43222 - Posted: 18 Apr 2016 | 14:58:13 UTC

I have no information on whether this problem also affects CUDA tasks, but for OpenCL tasks, one task crashes after a few hours, then perhaps two dozen more (not necessarily from the same BOINC project) crash quickly. Restarting Windows appears to be required to make any more OpenCL tasks complete properly.

MossyRock
Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 43226 - Posted: 19 Apr 2016 | 17:41:58 UTC - in response to Message 43222.

I've reverted back to ver. 362.00 to see if this fixes my GPUGrid WU problems - when there's more WUs available I'll be able to tell.

It looks like my ASUS GTX650-E-1GD5 GeForce GPU didn't run ver. 364.xx very well. Yeah, I know it's an old card. There were multiple errors in Windows Event Viewer and my ASUS GPUTweak was also blowing up. Ver. 362.00 fixed that.


MossyRock
Send message
Joined: 26 Nov 13
Posts: 17
Credit: 50,096,588
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 43233 - Posted: 22 Apr 2016 | 3:36:25 UTC - in response to Message 43226.

Yeah, that fixed it. My WUs are completing normally now.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,924,646,710
RAC: 188,766
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43254 - Posted: 26 Apr 2016 | 14:00:09 UTC - in response to Message 43233.
Last modified: 26 Apr 2016 | 14:01:30 UTC

'Upgraded' to 364.72 WHQL (Clean install wouldn't work on W10x64) and found that it crashed all POEM tasks (OpenCL) [driver restarts].
Ran MW and Einstein tasks without problems and so far it's running a task here without difficulty.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,800,734,170
RAC: 1,136,636
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43255 - Posted: 26 Apr 2016 | 14:51:30 UTC - in response to Message 43254.

'Upgraded' to 364.72 WHQL (Clean install wouldn't work on W10x64) and found that it crashed all POEM tasks (OpenCL) [driver restarts].
Ran MW and Einstein tasks without problems and so far it's running a task here without difficulty.

Jacob Klein has already reported that one to NVidia, and got David Anderson to add an option to disallow OpenCL tasks, wherever they might pop up from.

I think that's a sledgehammer to crack a very small nut, and I've told him so, but you might like to test the new v7.6.32 (you'll have to find the download yourself - it hasn't even gone into alpha testing yet).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43261 - Posted: 27 Apr 2016 | 17:59:58 UTC - in response to Message 43255.
Last modified: 27 Apr 2016 | 18:40:11 UTC

:) I see my name got mentioned. Yeah, it's nice to have an option to disable OpenCL at the client, in my opinion, for cases like this where you may want the latest drivers for gaming, but can't support running OpenCL tasks due to NVIDIA.

My ticket with them is regarding the OpenCL SDK examples failing on Maxwell, but I also mentioned to them that R364 drivers are failing Poem@Home tasks and causing TDRs, BSODs, restarts, and even making other tasks fail.

The BOINC 7.6.32+ cc_config option for <no_opencl>1</no_opencl> ... works nicely as a workaround, for the scenario.

The R364 drivers are still trash, in my opinion. The main reason I run them is to help find problems to get them fixed. In addition to the horrible OpenCL woes, the R364 drivers also have a bug with brief full screen corruption any time a CUDA task starts on my eVGA GTX 980Ti FTW at 144 Hz. Junk.

The 362.00 drivers are the latest that have my solid recommendation.

Regards,
Jacob

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,618,071,004
RAC: 345,340
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43262 - Posted: 27 Apr 2016 | 18:36:37 UTC

Thanks for posting the information about this problem and also for the recommendation concerning the latest relatively bug free drivers (362.00).

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 181
Credit: 221,883,797
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 43271 - Posted: 28 Apr 2016 | 15:21:17 UTC

I read somewhere that Primegrid will no longer send tasks to any computer that has these problematic drivers installed.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43272 - Posted: 28 Apr 2016 | 15:46:22 UTC
Last modified: 28 Apr 2016 | 16:07:37 UTC

I read somewhere that Primegrid will no longer send tasks to any computer that has these problematic drivers installed.


That would be wise, as the tasks supposedly gracefully complete with miscalculated results! :)

We're tracking the problem/solution here:
http://www.primegrid.com/forum_thread.php?id=6775
... where I have an NVIDIA dev looking into it.
So, look there for updates.

Edit: Made hyperlink clickable, sorry about that.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 883
Credit: 1,800,734,170
RAC: 1,136,636
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43273 - Posted: 28 Apr 2016 | 15:55:09 UTC - in response to Message 43272.
Last modified: 28 Apr 2016 | 16:07:31 UTC

http://www.primegrid.com/forum_thread.php?id=6775

(just making it clicky so I can follow without editing every time)

Edit - looks like you've got some experienced debuggers active there. Excellent news.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43297 - Posted: 2 May 2016 | 18:47:41 UTC

I have confirmed that today's 365.10 drivers do NOT fix the OpenCL problems -- PrimeGrid miscalculation and Poem@Home TDRs.

I'd recommend users to stick with 362.00, and projects to take action to prevent issuing OpenCL tasks to R364 users.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43312 - Posted: 4 May 2016 | 12:40:51 UTC

I have a small status update, regarding my NVIDIA bug (Bug ID 1754468) for these OpenCL issues:

- Status changed from "Open - pending review" to "Open - in progress"

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43313 - Posted: 5 May 2016 | 12:20:18 UTC
Last modified: 5 May 2016 | 12:35:53 UTC

Another small update -- basically, while NVIDIA fixes the problems, they're requesting additional info to potentially make "Poem@Home" and "PrimeGrid calculation" test cases that could be used in their checklist to release new drivers. That's a GREAT idea, in my opinion :)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43371 - Posted: 12 May 2016 | 5:43:14 UTC

Lots of updates in these 2 threads:

Basically, the problems have been solved, but only the POEM crashes will land in the upcoming (any day) driver release. The PrimeGrid miscalcs will have to wait until the (sometime later this month) driver release.

http://www.primegrid.com/forum_thread.php?id=6775
http://boinc.fzk.de/poem/forum_thread.php?id=1205

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43400 - Posted: 13 May 2016 | 14:50:56 UTC
Last modified: 13 May 2016 | 14:51:04 UTC

I have confirmed that the new Doom 365.19 drivers:
- Do NOT fix the OpenCL/CUDA miscalculations (Internal NVIDIA Bug ID: 200197534)
- DO fix the Poem@Home TDR/crashes (NVIDIA Bug ID: 1754468)

So... If you do any distributed computing involving OpenCL/CUDA calculating, I recommend that you **stick with 362.00** for correct calculations, until the next driver release which should have the miscalculation fix.

Thanks,
Jacob

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1960
Credit: 12,622,269,019
RAC: 6,563,983
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43401 - Posted: 13 May 2016 | 15:19:08 UTC - in response to Message 43400.

I have confirmed that the new Doom 365.19 drivers:
- Do NOT fix the OpenCL/CUDA miscalculations (Internal NVIDIA Bug ID: 200197534)
- DO fix the Poem@Home TDR/crashes (NVIDIA Bug ID: 1754468)

So... If you do any distributed computing involving OpenCL/CUDA calculating, I recommend that you **stick with 362.00** for correct calculations, until the next driver release which should have the miscalculation fix.

Thanks,
Jacob
I have the 364.72 driver on 3 of my hosts, and my Einstein@home tasks are validating just fine.
So I'm not sure about the extent this issue has on CUDA tasks.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43402 - Posted: 13 May 2016 | 15:27:06 UTC - in response to Message 43401.

I have confirmed that the new Doom 365.19 drivers:
- Do NOT fix the OpenCL/CUDA miscalculations (Internal NVIDIA Bug ID: 200197534)
- DO fix the Poem@Home TDR/crashes (NVIDIA Bug ID: 1754468)

So... If you do any distributed computing involving OpenCL/CUDA calculating, I recommend that you **stick with 362.00** for correct calculations, until the next driver release which should have the miscalculation fix.

Thanks,
Jacob
I have the 364.72 driver on 3 of my hosts, and my Einstein@home tasks are validating just fine.
So I'm not sure about the extent this issue has on CUDA tasks.


:) PrimeGrid was having miscalculated tasks end up validating each other. The miscalculation doesn't effect every operation, but because it affects some, I stand by my recommendation.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1960
Credit: 12,622,269,019
RAC: 6,563,983
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43445 - Posted: 17 May 2016 | 23:27:13 UTC - in response to Message 43402.
Last modified: 17 May 2016 | 23:41:37 UTC

I have the 364.72 driver on 3 of my hosts, and my Einstein@home tasks are validating just fine.
So I'm not sure about the extent this issue has on CUDA tasks.

:) PrimeGrid was having miscalculated tasks end up validating each other. The miscalculation doesn't effect every operation, but because it affects some, I stand by my recommendation.
While I do not want to deny your recommendation, I have to clarify my statement:
I have the 364.72 driver on 3 of my hosts, and my Einstein@home (CUDA) tasks are validated just fine even by AMD Radeon (OpenCL) tasks.

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,924,646,710
RAC: 188,766
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43448 - Posted: 19 May 2016 | 5:39:38 UTC - in response to Message 43254.
Last modified: 19 May 2016 | 11:54:07 UTC

'Upgraded' to 364.72 WHQL (Clean install wouldn't work on W10x64) and found that it crashed all POEM tasks (OpenCL) [driver restarts].
Ran MW and Einstein tasks without problems and so far it's running a task here without difficulty.

My experience with 364.72 on W10 (WDDM 2.0):
No issues at GPUGrid (CUDA 6.5).
No issues with Einstein tasks (CUDA 3.2 + CUDA 5.5).
Issues with POEM tasks (OpenCL).
PG is also OpenCL AFAIK.
MW uses a small and simple OpenCL app, so might not be effected because of that.

After updating to 365.19 - so far:
Able to run MW tasks,
Able to run POEM tasks,
Able to run Einstein tasks,
Able to run a mix of MW and POEM tasks or a mix of MW and Einstein tasks,
No work for GPUGrid so far, and I don't crunch at PG.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43513 - Posted: 24 May 2016 | 6:50:39 UTC

368.22 passed my OpenCL/CUDA calculation checking, which had been failing on all of the R364 drivers. And POEM task crashes, which were fixed in 365.19, are still fixed.

So ... I recommend 368.22 now, instead of 362.00. :)

Thanks,
Jacob Klein

Details:
http://www.primegrid.com/forum_thread.php?id=6775

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,618,071,004
RAC: 345,340
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43521 - Posted: 24 May 2016 | 14:07:45 UTC - in response to Message 43513.

Thanks Jacob. For us non-gamers is there any particular reason to upgrade?

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,924,646,710
RAC: 188,766
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43523 - Posted: 24 May 2016 | 14:26:25 UTC - in response to Message 43521.
Last modified: 24 May 2016 | 14:32:39 UTC

Thanks Jacob. For us non-gamers is there any particular reason to upgrade?

If it's not broken...

If your driver works (tasks all complete, no system issues) and you don't game then there is no need to upgrade the driver.
The present app uses CUDA6.5, which came out some time ago. If the project recompiled the ACEMD app to a CUDA 7.5 only app using the latest CUDA toolkit (now 8 months old) then some people might need to update their drivers to CUDA 7.5 capable drivers. Sometimes they do this in the lab only to discover that there is no benefit to the app so they dont release it and stick with their existing app; if it's not broken...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1093
Credit: 1,429,750,839
RAC: 1,080,631
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43524 - Posted: 24 May 2016 | 14:50:32 UTC - in response to Message 43521.

Thanks Jacob. For us non-gamers is there any particular reason to upgrade?


There are always other fixes put in, too. Ultimately, it's up to you. If you don't have a reason to upgrade to R367, then sticking with 362.00 would be fine.

Just avoid R364 -- too many nasty problems (like BSODs, black screens, TDRs, miscalculations, etc.) in my opinion.

rkodey
Send message
Joined: 22 Dec 11
Posts: 1
Credit: 205,115,300
RAC: 208,718
Level
Leu
Scientific publications
watwat
Message 43534 - Posted: 24 May 2016 | 19:38:29 UTC

For what it's worth... One of the big new features of the 364.* drivers was optimized VR support for Oculus and Vive headsets. Rolling back to 362.00 definitely fixed the OpenCL errors (and other instabilities), but was a clear downgrade in VR performance. I can confirm the new 368.22 brings back the VR performance improvement while also fixing the OpenCL issues Jacob has been testing. So, the new driver seems to work well in all cases!

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 409
Credit: 306,444,146
RAC: 418,402
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43574 - Posted: 25 May 2016 | 16:15:18 UTC - in response to Message 43534.

Exception: The 368.22 driver does NOT work well under Windows Vista. If you install it there, expect many hours for which the computer won't even boot.

Post to thread

Message boards : Number crunching : Nvidia OpenCL problem for 364.* drivers