Advanced search

Message boards : Number crunching : Diagnose error: TONI_SMDTRYP exit code -1 (0xffffffffffffffff)

Author Message
Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20347 - Posted: 5 Feb 2011 | 0:12:38 UTC

Hello,

I have recently had several tasks fail on both of my GeForce 9800 GT video cards on Windows 7 Professional x64 using WHQL nVidia driver 266.58 on Boinc 6.12.13. Can anybody help to resolve this problem?

The tasks are of type:
F???-TONI_SMDTRYP?-?-?-RND????_?

Examples include:
http://www.gpugrid.net/result.php?resultid=3648289
http://www.gpugrid.net/result.php?resultid=3648288

In all of the cases, it appears that there is < 2 seconds of processing run time, and then a compute error with exit status: -1 (0xffffffffffffffff)

I'm not sure if it's relevant, but my system's processor is a quad-core Intel Core i7 965 eXtreme Edition, and I make sure that the CPU is fully busy (ie: because of hyperthreading, I have 8 Boinc CPU tasks working the CPU, all in addition to the GPUGrid GPU tasks). Also, there is an eVGA GeForce GTX 460 in the system too.

I believe this has resulted in the GPU Grid servers refusing to give me new tasks, and so I can't work until this is fixed! I can't even get tasks for my GTX 460 -- Please help!

The logs, which aren't very helpful, look like:
<core_client_version>6.12.13</core_client_version>
<![CDATA[
<message>
- exit code -1 (0xffffffff)
</message>
<stderr_txt>
# Using device 1
# There are 3 devices supporting CUDA
# Device 0: "GeForce GTX 460"
# Clock rate: 1.53 GHz
# Total amount of global memory: 1041694720 bytes
# Number of multiprocessors: 7
# Number of cores: 56
# Device 1: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 515571712 bytes
# Number of multiprocessors: 14
# Number of cores: 112
# Device 2: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 515571712 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>


PLEASE HELP!
Thanks,
Jacob

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20348 - Posted: 5 Feb 2011 | 12:45:30 UTC - in response to Message 20347.

I'm seeing a bunch of R-series tasks fail too:
R???-TONI_SMDTRYP?-?-?-RND????_?

Does anybody know why?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 20379 - Posted: 9 Feb 2011 | 11:07:12 UTC - in response to Message 20348.
Last modified: 9 Feb 2011 | 11:08:02 UTC

Hi,

feel free to abort those WUs - there should not be more of them. However, looking at your tasks' outcomes, I can see failures on other WUs types as well. Perhaps the SMDTRYP fail just faster.

eg
http://www.gpugrid.net/result.php?resultid=3666043
http://www.gpugrid.net/result.php?resultid=3663605

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20380 - Posted: 9 Feb 2011 | 13:27:06 UTC - in response to Message 20379.

Thanks -- I'm creating different threads for the different problems. This thread is specifically about the TONI_SMDTRYP work units that immediately error out with exit code -1 (0xffffffffffffffff).

I'm glad you say that there shouldn't be any more of them. I'll keep a look out, and if it happens again, I'll try to report back here.

Thanks again,
Jacob Klein

Post to thread

Message boards : Number crunching : Diagnose error: TONI_SMDTRYP exit code -1 (0xffffffffffffffff)

//