Advanced search

Message boards : Graphics cards (GPUs) : Strange experiences...

Author Message
Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 3142 - Posted: 19 Oct 2008 | 9:16:03 UTC

...in the last few days: at first most of my rigs suddenly started to error out almost all WUs (code 1 (0x1, -255)). I wouldn't have been surprised about this if I had changed anything on them...but I didn't. They were running without any modifications (e.g. oc'ing) for some days/WUs. What I did was setting network activity from 'suspended' to 'always available'...but I can't believe that this could cause such errors. Now I switched the video cards between two of them and yet it seems that they are back to normal again (but I suspended network activity)...
Then I had a look at my 'oldest' rig lately: the last WU was successfully submitted at 02:03...and 6 hours later, when I wanted to send another two finished WUs I got the message: 'Client detached'...
What ? I haven't done anything to that rig in the meantime... ???

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3147 - Posted: 19 Oct 2008 | 10:14:20 UTC
Last modified: 19 Oct 2008 | 10:14:44 UTC

That's really strange. You're runnig Linux, so the usual "well, guess it was time for a reboot" doesn't fit. And you're running the well-tested 6.3.10, which should eleminate another possible cause of errors.
However, changeing GPUs does require you to reboot.. even under Linux, doesn't it?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 3150 - Posted: 19 Oct 2008 | 10:59:12 UTC

Yep, for sure... ;)
But I changed cards after the rigs started acting stupid...
And my 'old' rig kept its cards...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3152 - Posted: 19 Oct 2008 | 12:30:17 UTC - in response to Message 3150.

But I changed cards after the rigs started acting stupid...


I wanted to imply that maybe the reboot necessary for swapping the cards solved the problem. But this wouldn't explain why all of your machines are / were affected.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 3155 - Posted: 19 Oct 2008 | 13:42:31 UTC

Oh, I see...
Had a reboot before changing cards...and the rigs still acted crazy...
Anyway, I hope that's history now...

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 50,064
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 3183 - Posted: 21 Oct 2008 | 0:03:53 UTC

I got those type of error on earlier versions of the BOINC client, that when I set it to "always allow network access" but the network failed, it chrashed almost all units. Some projects were more resistant to this than others, so they kept running while the others errored out.

Now it is better somehow, even when my wireless fails, everything is running fine, no errors.

Post to thread

Message boards : Graphics cards (GPUs) : Strange experiences...

//