Advanced search

Message boards : Graphics cards (GPUs) : All WU fail after resuming computation

Author Message
Marco Plassio
Send message
Joined: 16 Aug 12
Posts: 2
Credit: 2,257,335
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 31431 - Posted: 12 Jul 2013 | 14:10:36 UTC

Hi,
i have upgraded my PC to Debian 7 wheezy, and i have installed the nvidia proprietary driver for GTX 560.

After this, all WU start correctly, but if they are suspended fail to restart with this output:

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
SIGSEGV: segmentation violation
Stack trace (12 frames):
../../projects/www.gpugrid.net/acemd.2868(boinc_catch_signal+0x4d)[0x56709d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7f44fa258030]
/lib/x86_64-linux-gnu/libc.so.6(fwrite+0x34)[0x7f44f950b034]
../../projects/www.gpugrid.net/acemd.2868[0x47f9c7]
../../projects/www.gpugrid.net/acemd.2868[0x4813a0]
../../projects/www.gpugrid.net/acemd.2868[0x492d74]
../../projects/www.gpugrid.net/acemd.2868[0x47f18a]
../../projects/www.gpugrid.net/acemd.2868[0x422c27]
../../projects/www.gpugrid.net/acemd.2868[0x408c04]
../../projects/www.gpugrid.net/acemd.2868[0x407bc9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f44f94c0ead]
../../projects/www.gpugrid.net/acemd.2868[0x407a39]

Exiting...

</stderr_txt>
]]>


Other project using GPU work correcty.

Do you have any idea?

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 324
Credit: 72,394,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31442 - Posted: 12 Jul 2013 | 19:06:48 UTC

Hello: Have you checked that is not marked - Leave aplicacionbes in memory to adjourn - usually cause problems. Greetings.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31449 - Posted: 13 Jul 2013 | 0:09:17 UTC - in response to Message 31442.
Last modified: 13 Jul 2013 | 0:14:09 UTC

Marco, I agree that LAIM should be on and use GPU when system is in use should also be on.
What driver is it and are the recommended lib files installed?

Don't know all the details, but I suggest you do two things,
Use fan control settings to set the fan speed and reduce the GPU temperature.
Don't use all 8 CPU threads to crunch on, use 7 at most.
The following FAQ might help,
http://www.gpugrid.net/forum_thread.php?id=2123&nowrap=true#20169
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31506 - Posted: 14 Jul 2013 | 14:07:18 UTC

"Leave applications in memory" should not apply to GPU tasks anyway, they're always exited to avoid problems. But the driver version could cause such issues.

MrS
____________
Scanning for our furry friends since Jan 2002

Marco Plassio
Send message
Joined: 16 Aug 12
Posts: 2
Credit: 2,257,335
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 31871 - Posted: 7 Aug 2013 | 15:34:47 UTC - in response to Message 31506.

I suspended the project until the release of a new version of the driver...

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 31876 - Posted: 7 Aug 2013 | 16:04:50 UTC

Is this maybe related to this problem?
http://www.gpugrid.net/forum_thread.php?id=3333

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31881 - Posted: 7 Aug 2013 | 18:44:00 UTC - in response to Message 31876.
Last modified: 7 Aug 2013 | 19:30:40 UTC

Possibly the same thing that is triggering the driver restart issue on Windows but the Win phenomenon seen from Vista onwards is WDDM related. Marco is using Linux - Debian 7 wheezy. Don't know what driver he is using on his system, as its not reported on Linux rigs.

My guess is missing libs, bad driver or an upgrade issue (but from what to what I don't know, possibly to 7.1 as it came out 15th June).
The SIGSEGV: segmentation violation suggests an access/security issue but it could be lots of things causing this, including hardware. Might be worth checking the user and Boinc has the correct folder security (read and write).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Graphics cards (GPUs) : All WU fail after resuming computation

//