Advanced search

Message boards : Graphics cards (GPUs) : 6.3.14 just quits

Author Message
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3044 - Posted: 14 Oct 2008 | 21:42:52 UTC

Hi guys,

since I switched from 6.3.10 to 6.3.14 this weekend I get some strange behaviour: the BOINC manager says "disconnected" and I find that the BOINC client quit itself and all apps. The last entry in the log is always a file transfer, but that does not necessarily have to mean anything.

My CPU is overclocked, but the GPU is not and there has not been a single error in the last 1000 QMC units. The machine is freshly rebootet. I had these quits 3 times within 3 days now and have never seen such behaviour from BOINC before, so I tend to blame the new client (not 100% sure though).

I'm going to downgrade to 6.3.10 and see if it happens again. Any other suggestions?

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3045 - Posted: 14 Oct 2008 | 21:51:34 UTC

Well, actually I did play a bit with the GPU clocks.. but the error still happened when I went back to standard clocks. You can see an example here. Note the message "No heartbeat from core client for 30 sec - exiting". Shortly after the WU was finished the error happened again, which should be visible in the output from the next WU.. coming soon ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 3046 - Posted: 14 Oct 2008 | 22:44:48 UTC
Last modified: 14 Oct 2008 | 22:46:07 UTC

Some of our team mates have also made a downgrade to 6.3.10. They had too much problems with the scheduler, if they use ncpus=5. PS3Grid was then on "waiting to run" and 5 CPU-tasks are running. This don't happened in the 6.3.10.
____________

Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 3050 - Posted: 15 Oct 2008 | 9:24:43 UTC - in response to Message 3046.

Had the same prob on one of my Linux-rigs: BOINC would run normal until it started to 'phone home': then the app would quit completely. When setting it to 'don't contact server' or deinstalling the LAN-cable it would run without problems...until I had to connect again to submit results...
So I had to switch back to 6.3.10...
Got no idea what could be the reason for this behaviour...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3059 - Posted: 15 Oct 2008 | 17:34:05 UTC

Thanks for reporting! I think I'll run 6.3.10 for some time (to see if it's still as stable as before) and if I switch back to 6.3.14 or a later version I'll investigate if I get the same behaviour as you do (and the quits are directly related to file transfers).

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3066 - Posted: 15 Oct 2008 | 20:18:15 UTC

I beleive the disconnected issue has been resloved. We have to wait for the next release to see. Stick with 6.3.10 if you have this issue for now.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3070 - Posted: 15 Oct 2008 | 20:37:38 UTC

That's good to hear, thanks for the information.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 3081 - Posted: 17 Oct 2008 | 10:01:16 UTC

New try...and new surprise... ;)

Made a fresh installation of 6.3.14 on an recently updated Linux-rig (Q9550 @ 3,96GHz, Kernel 2.6.24-21, NVIDIA-driver 177.80) and everything was fine as long as Milkyway was running solo. Then I attached to PS3GRID, downloaded 4 WUs...and the next time BOINC connected to the server the app stopped...
Haven't tried yet what's happening when I choose PS3GRID to install as first one...

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3086 - Posted: 17 Oct 2008 | 14:49:56 UTC - in response to Message 3081.

New try...and new surprise... ;)

Made a fresh installation of 6.3.14 on an recently updated Linux-rig (Q9550 @ 3,96GHz, Kernel 2.6.24-21, NVIDIA-driver 177.80) and everything was fine as long as Milkyway was running solo. Then I attached to PS3GRID, downloaded 4 WUs...and the next time BOINC connected to the server the app stopped...
Haven't tried yet what's happening when I choose PS3GRID to install as first one...

6.3.14 has bugs in the scheduler that prevent it from making the correct decisons on what to run. It runs CPU only jobs fine, but when you combine CPU with CPU/GPU jobs it can get confused, sometimes hovever it will do OK. I found it does better with less CPU work on standby, I've set connect to 0.04 and additional work to 0.08 so as to keep less CPU work ready, it seems to be doing better, but not always.

Everyone should stay with 6.3.10 until a new version has been tested. Fixes are under way, so I do not know when it will be avaialable.

Jayargh
Send message
Joined: 21 Dec 07
Posts: 47
Credit: 5,252,135
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 3093 - Posted: 18 Oct 2008 | 4:43:41 UTC - in response to Message 3086.

Keith staying with 6.3.10 would be fine however when adding a new Q9550 today I could not find it anywhere.They have yanked the links from Boinc and here. The only available client is 6.3.14 .

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 3094 - Posted: 18 Oct 2008 | 5:02:44 UTC - in response to Message 3093.

@JR - try here......http://boinc.berkeley.edu/dl/?C=M;O=D

P.

Post to thread

Message boards : Graphics cards (GPUs) : 6.3.14 just quits

//