Advanced search

Message boards : Number crunching : Pablo WU erroring out at a high rate -

Author Message
Rion Family
Send message
Joined: 13 Jan 14
Posts: 21
Credit: 15,415,926,517
RAC: 20,497
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48209 - Posted: 18 Nov 2017 | 19:47:08 UTC

Anyone else experince this over the past 12 hours ? I have multiple machines that errored out on multiple WUs consecutively.

Looking at the logs - most have failed multiple times before I hit them.

A few items to note -

Windows appears only to be affected, my Linux nodes are humming along - not conclusive as if a batch of bad ones may not have hit those hosts yet.

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -55 (0xffffffc9)</message>
]]>

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965.
# SWAN swan_assert 0

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48212 - Posted: 18 Nov 2017 | 22:25:24 UTC - in response to Message 48209.

Anyone else experince this over the past 12 hours ? I have multiple machines that errored out on multiple WUs consecutively.
I've experienced the same on one of my Windows 10 PCs. The source of this error is that the latest NVidia driver (388.13) has been released through Windows update, and it will update your GPU driver without stopping BOINC GPU tasks first.
There are two workarounds:
1. Update the drivers manually before it rolls out on Windows update (the tricky part is that not all NVidia driver versions get rolled out on Windows update, and I don't have a source to tell which version actually does)
2. Disable Windows update to update drivers. Here's how. The drawback of this method is that it will stop all driver updates, not just the NVidia drivers.

Rion Family
Send message
Joined: 13 Jan 14
Posts: 21
Credit: 15,415,926,517
RAC: 20,497
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48214 - Posted: 19 Nov 2017 | 12:14:07 UTC

Thank You Zoltan !

That explains it.

Kind Regards

WPrion
Send message
Joined: 30 Apr 13
Posts: 96
Credit: 1,801,259,111
RAC: 17,268,827
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48215 - Posted: 19 Nov 2017 | 22:32:23 UTC - in response to Message 48209.

This method supposedly blocks only your GPU from automatic Windows updates, but I have not tried it.

https://superuser.com/questions/964475/how-do-i-stop-windows-10-from-updating-my-graphics-driver

Win

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48217 - Posted: 20 Nov 2017 | 10:32:51 UTC - in response to Message 48215.

This method supposedly blocks only your GPU from automatic Windows updates, but I have not tried it.

https://superuser.com/questions/964475/how-do-i-stop-windows-10-from-updating-my-graphics-driver

Win

Well, there are 5 methods in the blog post you've linked.
1. Hiding a driver update is irrelevant regarding this issue, as you can use it after an automatic driver update happened.
2. Disabling Windows update completely to avoid driver updates makes your OS vulnerable, so this is forbidden.
3. Blocking driver installation for a particular device ID (through group policy) is good, but it will prevent you to update the driver manually without first removing the blocking group policy object (then you should disable the given device ID again). This method is linked in the post I've linked.
4. This method is the same I've posted.
5. The Windows update preferences does not have the option of driver updates in the latest release (Creators Fall Update, v1709), it has been removed from the system properties dialog box also.

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 84
Credit: 1,663,883,415
RAC: 1
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48377 - Posted: 15 Dec 2017 | 3:06:32 UTC

Darn, this happened to me and I didn't see it until it was way too late.

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965.

Run time
72,375.30

Is there any way to update the application to make it automatically pause during the update process? I assume this is happening at least once per user per update.

Post to thread

Message boards : Number crunching : Pablo WU erroring out at a high rate -

//