Advanced search

Message boards : Graphics cards (GPUs) : Client error - Compute error

Author Message
[boinc.at] Fireman69
Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3696 - Posted: 6 Nov 2008 | 15:21:41 UTC

Does anyone ever had such problem?? I am using 6.3.21 client and had this second times in a few days. Before I was using 6.3.19 and never had this problem!
My system is not running 24/7, only crunching when I am at home. Normally it takes me 2-3 days to finish.
Someone has an idea??


<core_client_version>6.3.21</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 1
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
Cuda error: Kernel [reduce4_kernel] failed in file 'reduction.cu' in line 143 : unspecified launch failure.


<core_client_version>6.3.21</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [angle_kernel] failed in file 'bonded.cu' in line 547 : unspecified launch failure.


____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3702 - Posted: 6 Nov 2008 | 18:53:35 UTC

I'd reboot the machine and take the OC back, even if it's a factory-set one, and see if the error still appears.

You've got an interesting config with 2 quite different GPUs.

MrS
____________
Scanning for our furry friends since Jan 2002

[boinc.at] Fireman69
Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3705 - Posted: 6 Nov 2008 | 22:34:47 UTC

As I said I don't overclock the cards. This special configuration normally I am using for games. I use an Intel board with no SLI. Videocard is the GTX280, the 9800GT is only for PhysX. I hope in near future it will be possible to assign which card is used for crunching.
This configuration worked for crunching with client 6.3.19. Maybe the new client has a problem with it. Maybe I should reinstall the older one.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3706 - Posted: 6 Nov 2008 | 23:01:20 UTC

The normal shader clock of a GTX 280 is 1.29 GHz and for 9800GT it's 1.51 GHz, so both of your cards are factory overclocked. Which of your cards is actually crunching now? 2-3 days looks like the 9800GT. The OC on the GT lokks not so bad, but the one on the GTX might lead to problems.

There have already been cases of factory overclocked cards which failed when a new game came out, which stressed the cards in a way which no software had done before. I think it was Doom 3.. back in the days.

I seriously doubt it's got anything to do with 6.3.21. BOINC only launches the GPU-Grid client, the software which does the actual calculations hasn't changed. Your computers are hidden so I can't take a look.. do the WUs file somewhere in the middle or upon intialization?

MrS
____________
Scanning for our furry friends since Jan 2002

[boinc.at] Fireman69
Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3712 - Posted: 7 Nov 2008 | 6:51:49 UTC
Last modified: 7 Nov 2008 | 6:57:57 UTC

You're right, they are factory oced. But it was no problem with the first, lets say about 10 WU's. I saw that sometimes when the WU is crunched with reboot it uses not always the same GPU. How this is done I don't know. Therefore sometimes, when 9800GT is then doing the work, it is hard to reach the deadline.
If you want to check the details, I have two computers crunching with GPU's. 12834 (GTX280/9800GT) and 16306 (GTX280). It seems only 12834 with two cards has this problem. Maybe the application has a problem when starting the WU on one card and after new start of pc resuming on the other card.


But I had such WU's already and there was no problem working on both cards.

<core_client_version>6.3.14</core_client_version>
<![CDATA[
<stderr_txt>
# Using CUDA device 1
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 1
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 1
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Time per step: 32.226 ms
# Approximate elapsed time for entire WU: 27391.880 s
called boinc_finish

</stderr_txt>
]]>


Hmm, I checked some of the WU's on this machine and those ending with no error was all crunched with client 6.3.14. And for me it looked they where crunched normally, the I saw in my account they got an error. I had only the same problems all had with the "download bug" a few days ago.
____________

[boinc.at] Fireman69
Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3716 - Posted: 7 Nov 2008 | 16:34:03 UTC
Last modified: 7 Nov 2008 | 16:35:39 UTC

The WU a few minutes ago had no problem with Boinc client 6.3.21. Very strange (and interesting this "MDIO ERROR")! Looks like the application (this WU was 6.48) is alwas using CUDA device 0, good to know because this is the faster one.

<core_client_version>6.3.21</core_client_version>
<![CDATA[
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 799200 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
# Time per step: 24.948 ms
# Approximate elapsed time for entire WU: 21205.906 s
called boinc_finish

</stderr_txt>
]]>
____________

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 56
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 3717 - Posted: 7 Nov 2008 | 16:41:13 UTC - in response to Message 3716.

Hey Fireman!

Have you enabled SLI? If SLI is enabled, only GPU 0 will be used...
____________

pixelicious.at - my little photoblog

[boinc.at] Fireman69
Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3718 - Posted: 7 Nov 2008 | 16:46:26 UTC

Hey Stefan! How are you? Hope fine?!

No, I use an Intel board with no SLI. Videocard is the GTX280, the 9800GT is for Games which use PhysX. But for me it is better when only GTX280 is crunching, otherwise it would be sometimes a problem not to go over dead line. My system does not run 24/7!

____________

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3726 - Posted: 7 Nov 2008 | 17:34:54 UTC

{quote]Hide
Move

Last modified: 7 Nov 2008 16:35:39 UTC
The WU a few minutes ago had no problem with Boinc client 6.3.21. Very strange (and interesting this "MDIO ERROR")! Looks like the application (this WU was 6.48) is alwas using CUDA device 0, good to know because this is the faster one.

<core_client_version>6.3.21</core_client_version>
<![CDATA[
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 280"
# Clock rate: 799200 kilohertz
# Device 1: "GeForce 9800 GT"
# Clock rate: 1620000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
[/quote]
That MDIO error, is not a real error. What happens is at initial start of task, it looks for a checkpoint file, since it has never written one yet, it doe not find one, hence the erorr only shows up on initial start. Once it checkpoints if it is re-started it finds the file and you don't get the error again. Look at any task on your or any one else's computer and it is always there.

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 56
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 3729 - Posted: 7 Nov 2008 | 17:56:49 UTC - in response to Message 3718.

Yea Keith, but I think Fireman is talking about these errors -

Cuda error: Kernel [angle_kernel] failed in file 'bonded.cu' in line 547 : unspecified launch failure.

and

Cuda error: Kernel [reduce4_kernel] failed in file 'reduction.cu' in line 143 : unspecified launch failure.


Hey Stefan! How are you? Hope fine?!

No, I use an Intel board with no SLI. Videocard is the GTX280, the 9800GT is for Games which use PhysX. But for me it is better when only GTX280 is crunching, otherwise it would be sometimes a problem not to go over dead line. My system does not run 24/7!


Yes, thank you. Hope you're fine too!?
Sorry I forgot about you using an Intel board without SLI when I read you last post about only GPU beeing used...
I also get computation errors sometimes. Mostly on my 9800 GTX SC which is also factory overclocked. Sometimes it runs fine for weeks and then it has two or three different errors in a row and then it runs fine again... Could you try to set the shader/GPU/memory speed to factory settings of a normal card to see if the problem vanishes?

____________

pixelicious.at - my little photoblog

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3730 - Posted: 7 Nov 2008 | 18:10:58 UTC

@stefan
Can't quote properly since forum buttons are missing,

but I thought Very strange (and interesting this "MDIO ERROR")! in his post meant the MDIO error shown in his post.

...
MDIO ERROR: cannot open file "restart.coor"
...

My comments refer to that.

[boinc.at] Fireman69
Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3743 - Posted: 7 Nov 2008 | 20:23:28 UTC

Sorry. I didn't know that MDIO ERROR is not an error. Not much time reading all descriptions and diskussions.
I will look if it is running now because I had an idea today. Maybe it is no good idea to let the GPU crunch when I play LOTROL. Now I always stop the Boinc client and restart after playing. Hope I don't forget this in future. Possible conflict in video memory??? We will see.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3746 - Posted: 7 Nov 2008 | 21:17:58 UTC - in response to Message 3743.

Maybe it is no good idea to let the GPU crunch when I play LOTROL.


The usual error caused by gaming would be the "out of memory"-message, which you're not getting. It won't hurt to switch off BOINC while gaming (or just suspend the active GPU-Gird task), but I doubt it will remove the cause of errors. On a side note: how long would you have to run like that to be sure the error is gone?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3750 - Posted: 7 Nov 2008 | 21:50:22 UTC - in response to Message 3743.

Sorry. I didn't know that MDIO ERROR is not an error. Not much time reading all descriptions and diskussions.
I will look if it is running now because I had an idea today. Maybe it is no good idea to let the GPU crunch when I play LOTROL. Now I always stop the Boinc client and restart after playing. Hope I don't forget this in future. Possible conflict in video memory??? We will see.


You can set the new option (6.3.13 or above) in cc_config.xml to suspend boinc when an application is running, you need to fill in the application name, in this case your game. When you start the game boinc will suspend all tasks for you, and automatically restart when you exit the game. You of course need to have the 'Leave applications in memory while suspended?' option also set in preferences to no, so they will be removed from memory.

cc_config.xml
<cc_config>
<options>
<exclusive_app>appname.exe</exclusive_app>
</options>
</cc_config>

appname must appear exacly as the o/s shows it, case sensative, No drive\paths allowed. You can have multiple exclusive_app options in the file.

Note: At this time, bboinc suspends all tasks, both CPU and GPU, when using this option.

Profile [AF>HFR>RR] alipse
Send message
Joined: 28 Oct 08
Posts: 4
Credit: 20,372,566
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 3776 - Posted: 9 Nov 2008 | 12:32:49 UTC

Hi
I just have two WU in error

0937.44
stderr out

<core_client_version>6.3.19</core_client_version>
<![CDATA[
<message>
Fonction incorrecte. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1836000 kilohertz
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"
# Clock rate: 1836000 kilohertz
Cuda error: Kernel [frc_sum_kernel_angle] failed in file 'force.cu' in line 223 : unspecified launch failure.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 3232.06365740741


Its a bit borring, cause they bugged at the end of units, meaning, i lost about 24 Hours of crunch (and then no credits :( )

Im using Windows XP32
9800 GTX+
Boinc 6.3.19
Drivers 180.43

Do u want me to post for the other invaldi WU ?

Post to thread

Message boards : Graphics cards (GPUs) : Client error - Compute error

//