Advanced search

Message boards : Graphics cards (GPUs) : Too many errors....

Author Message
Chris S
Send message
Joined: 18 Jan 09
Posts: 21
Credit: 3,950,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6416 - Posted: 5 Feb 2009 | 0:18:07 UTC

Well I dunno whats happened but the last 16 work units have errored out. I went back from 6.6.3 to 6.5.0 as it was not getting any new work, but its not helping. I cant carry on like this its ridiculous. So I'm suspending any new work for the time being.

Sorry guys, but its pointless carrying on at the moment until some sort of fix is available.

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 6429 - Posted: 5 Feb 2009 | 8:14:39 UTC - in response to Message 6416.

I've got the same problem but with my slower card only, the 8800GTS. It did one 6.62 application with Boinc 6.6.3 and one when I went back to 6.5.0 but since the February 1st it hasn't been able to complete a single task successfully.

I've tried:

Re-booting
Reseting the project
Detach and Re-attach
etc etc.

I thought it might be disk fault as it mentions it can't open a file in the log but Chkdisk didn't find anything.

So far my GTX260 is still doing its stuff so I don't know whether the 6.62 application doesn't like the slower cards and I was just lucky to get a couple through.

I'll try 6.6.4 on the machine with the 8800GTS later and see if that helps.

Phoneman1

STE\/E
Send message
Joined: 18 Sep 08
Posts: 366
Credit: 268,380,907
RAC: 720
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6433 - Posted: 5 Feb 2009 | 12:53:26 UTC

So far the 8800GT OC Card I'm running has been okay with the 6.62 Version, are you both running the 181.22 Nvidia Drivers ... ???

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 6437 - Posted: 5 Feb 2009 | 14:53:49 UTC - in response to Message 6433.

No. I'm still running 178.24 after an abortive try of 180.48 last year. Since going back to 178.24 I've had no problems until Feb 1st so I am not sure that is the answer. Of course 6.62 may be using features only in the 181.22 drivers but I've not seen any posts to that effect.

I've got a WU running at the moment that should finish in 36 hours or so if it doesn't go the distance I'll try 181.22 next. Thanks for the suggestion.

Phoneman1

Profile Bender10
Avatar
Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 6438 - Posted: 5 Feb 2009 | 15:32:43 UTC - in response to Message 6429.
Last modified: 5 Feb 2009 | 15:33:23 UTC


I'll try 6.6.4 on the machine with the 8800GTS later and see if that helps.

Phoneman1


I know you meant to say 8600 GTS...;)
____________


Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 6441 - Posted: 5 Feb 2009 | 16:43:14 UTC - in response to Message 6438.

Yes, one typo then I typed the same again!

Well, it has failed again and I'm downloading 181.22 so we'll see how that goes with Boinc 6.6.4 first.

Phoneman1

Chris S
Send message
Joined: 18 Jan 09
Posts: 21
Credit: 3,950,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6443 - Posted: 5 Feb 2009 | 17:41:54 UTC

I'm running 181.20 + reverted back from 6.6.3 to 6.5.0

I'll wait to see what success others have with various combinations... :-)

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 6448 - Posted: 5 Feb 2009 | 18:20:50 UTC - in response to Message 6441.

Well, it has failed again and I'm downloading 181.22 so we'll see how that goes with Boinc 6.6.4 first.


First task failed within 30 minutes! Second one is still running after 40 minutes. If this fails I'll try another gpu project to see if it could be the card has developed a fault.

I don't really want to swap cards with GTX260 as it was such a fiddle getting that one in the box due to its size!

Phoneman1

Profile [XTBA>XTC] D.I.Y.Calculat...
Send message
Joined: 2 Sep 08
Posts: 4
Credit: 15,705,436
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 6450 - Posted: 5 Feb 2009 | 19:28:22 UTC - in response to Message 6448.

I'm so desapointed :

i just buy two GTX295 to crunch GPUGRID but :
1- driver is nvidia 180.87/XP32, (because 181.22 have many problems when "unlink" SLI mode")
2- it worked one week ago, but now every wu is flag with a errored result
3- but my GTX295 works well for...GPU/seti !
4- and my GTX260 works well for...GPUGRID, but with 178.24/XP32 drivers

Please, is somebody doing something ?

____________

Chris S
Send message
Joined: 18 Jan 09
Posts: 21
Credit: 3,950,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6452 - Posted: 5 Feb 2009 | 19:39:11 UTC

we have to accept that this is an Alpha project......

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6454 - Posted: 5 Feb 2009 | 20:02:45 UTC

The BOINC manager is not doing the actual crunching, it's the science app and this one is reporting the errors. So don't put too much effort into switching BOINC versions (as long as you use one which is known to work reasonably well, e.g. 6.4.5. or 6.5.0).

@Chris

you had many errors before your current run of "all errors". Stock shader speed of 8600GT is 1.20 GHz, whereas yours started at 1.30 GHz, so its likely factory overclocked. Now you are at 1.50 GHz, quite an increase!
Try to take back your OC, switch tzhe machine off for >10 min and see if it helps. Also report your GPU temperature (may take some time to stabilize).

In principle 6.62 did work on your machine. I didn't get a single error with that 6.62 and drivers 178.24 on a 9800GTX+, which is faster but the same chip architecture. So I don't think we're seeing any systematic error here.

@Phoneman

I checked your last 40 results and each every WU you ran at 1.45 GHz shader clock (stock for your 8600GTS) was successful, whereas each and every WU you ran at 1.55 GHz errored out. Do I need to say more? ;)

@DIY

Sorry, I don't see anything obvious in your results. You did have some successful ones. We already had 2 GTX 295 fail after a short use, so maybe it's a good idea to check the cards seperately in another box. You could also try to take one of them out to relieve the power supply from stress.

MrS
____________
Scanning for our furry friends since Jan 2002

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 6457 - Posted: 5 Feb 2009 | 20:59:53 UTC - in response to Message 6454.

Thank you MrS. I had forgotten that change.

The current work unit has been run without Riva Turner running as I got a warning about getting a later version when I installed the latest drivers earlier this evening. I have temporarily suspended gpu to set the RT overclock panel to defaults and exited RT. I think I'll only use RT to monitor temps in future!

Phoneman1

Profile Edboard
Avatar
Send message
Joined: 24 Sep 08
Posts: 72
Credit: 12,410,275
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6481 - Posted: 6 Feb 2009 | 23:37:27 UTC

Since a few days ago I'm getting "compute error" but only with the kind of units which grant 3718.48 points. The error units are mainly in the gpu1, but I have two of them made OK in that gpu core.

gtx295 stock clocks processing in both gpus (gpu0/gpu1)
Drivers 180.87
Windows Vista Home Premium 32 bits

Chris S
Send message
Joined: 18 Jan 09
Posts: 21
Credit: 3,950,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6494 - Posted: 8 Feb 2009 | 12:01:49 UTC

@Chris

you had many errors before your current run of "all errors". Stock shader speed of 8600GT is 1.20 GHz, whereas yours started at 1.30 GHz, so its likely factory overclocked. Now you are at 1.50 GHz, quite an increase!
Try to take back your OC, switch tzhe machine off for >10 min and see if it helps. Also report your GPU temperature (may take some time to stabilize).


Yes you are right, it is a factory overclocked card, and I may have been a bit overkeen with the clocking ! I've wound it back some now, ansd reverted to 6.5.0 which seems to be working. Fingers crossed.

I should have been able to push the card further but maybe not after all. Seems you were right, thanks for the advice.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6498 - Posted: 8 Feb 2009 | 14:12:52 UTC - in response to Message 6494.

Good to hear that it's working again! Regarding the "should clock higher": if you compare with typical overclocks you find in forums they are likely not that stable. In games a graphical artefact every now and then doesn't matter much..

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Too many errors....

//