Advanced search

Message boards : Graphics cards (GPUs) : Kernel [frc_sum_kernel_bond] failed

Author Message
Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 7306 - Posted: 9 Mar 2009 | 17:17:56 UTC

The final line - unknown error - probably gives a subtle clue :) However, anything about this one strike anyone? I had some dramas 2 weeks ago as motherboards melted around me, but thats sorted now and the errent PCs rebuilt. The GPUGrid WUs are remarkably stable as far as I am concerned - plaudits to the devs - so its unusual for me to see this. Just curious in case there is something obvious I did to cause it.

Regards
Zy

<core_client_version>6.5.0</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1915000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1915000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 283 : unknown error.

</stderr_txt>
]]>

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7322 - Posted: 10 Mar 2009 | 15:35:34 UTC - in response to Message 7306.

Do you have another client like s@h on cuda running also ?
Because when i stopped the simultanous run with the other cuda client it seems to run ok again now.
But time will tell i am on second unit

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 7325 - Posted: 10 Mar 2009 | 19:10:40 UTC - in response to Message 7322.
Last modified: 10 Mar 2009 | 19:13:39 UTC

I do run S@H CUDA, but not at the same time. I split the time the card is used between SETI & GPUGrid, at present its about 50% to each. I will crunch a GPUGrid WU, then when its done, run SETI CUDA for 12 hours or so. From time to time I might end up suspending the GPRGrid WU for a couple of hours or so (set to unload from memory not suspend held in memory), for many reasons, but that does not seem to have caused an issue in the past.

The hassle a couple of weeks ago in the crunching - self evident from my task list - was due to multiple hardware failures on two PCs, ending up rebuilding both - new motherboards, cpus, cards etc - just Murphy's Law hitting all at once. The GPUGrid WU is - from where I sit - remarkably stable [thank you to the Devs - nice one :) ], and I can safely say, never had any problem attributed to the WU itself, issues were caused by me or my equipment. Thats why I was curious on this one, as all appears ok this end, and its so rare to have a WU issue, I thought there may be something obvious I have missed.

At the end of the day, the world has not ended, just a natural follow up in case there is conventional wisdom I need to be aware of, and have missed.

Regards
Zy

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7331 - Posted: 10 Mar 2009 | 19:27:36 UTC - in response to Message 7325.

At the end of the day, the world has not ended, just a natural follow up in case there is conventional wisdom I need to be aware of, and have missed.


There may be some wisdom be to found here, but I don't think it's common ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 7384 - Posted: 12 Mar 2009 | 16:48:23 UTC - in response to Message 7306.

I have almost the same error and not a clue what’s it's about:
Regards Thomas
-----------------------------------------------------------

Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 28987
Report deadline 16 Mar 2009 10:18:06 UTC
CPU time 492.5469
stderr out <core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
Cuda error: Kernel [frc_sum_nb_forces] failed in file 'force.cu' in line 244 : unknown error.

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 2478.98611111111
Granted credit 0
application version 6.62


The final line - unknown error - probably gives a subtle clue :) However, anything about this one strike anyone? I had some dramas 2 weeks ago as motherboards melted around me, but thats sorted now and the errent PCs rebuilt. The GPUGrid WUs are remarkably stable as far as I am concerned - plaudits to the devs - so its unusual for me to see this. Just curious in case there is something obvious I did to cause it.

Regards
Zy

<core_client_version>6.5.0</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1915000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1915000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 16
# Number of cores: 128
Cuda error: Kernel [frc_sum_kernel_bond] failed in file 'force.cu' in line 283 : unknown error.

</stderr_txt>
]]>



____________
"Silakka"
Hello from Turku > Åbo.

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 7399 - Posted: 12 Mar 2009 | 20:27:14 UTC - in response to Message 7384.

I've had a couple of BSOD's lately, there is a driver clash somewhere, so I came to the (guessing) conclusion it was probably related to the BSODs, and put it down to Life's Sweet Pattern :) If it happens again in a short space of time, I'll "perk up" and dig a little.

Regards
Zy

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 7426 - Posted: 13 Mar 2009 | 20:02:39 UTC - in response to Message 7399.

My error was a remote desktop type, and now all working fine.
Do newer use windovs remote desktop in gpu projekts !

Regards

Silakka

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7465 - Posted: 15 Mar 2009 | 10:50:35 UTC

lol agreed i have learned that very soon after running seti
it crashed several times when doing rdc :D

Post to thread

Message boards : Graphics cards (GPUs) : Kernel [frc_sum_kernel_bond] failed

//