Advanced search

Message boards : Graphics cards (GPUs) : More bad WUs

Author Message
Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9823 - Posted: 16 May 2009 | 4:02:43 UTC

I had a WU error out tonight on a GPU that never errors (except for the known bad WUs).
Looked at the WU and it's errored 3 times already and is still being sent out. Take a look:

http://www.gpugrid.net/workunit.php?wuid=465437

Old drivers, new drivers, fast cards, slow cards, it doesn't matter. Are all the xxx-GIANNI_FB WUs bad?

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9824 - Posted: 16 May 2009 | 4:25:17 UTC
Last modified: 16 May 2009 | 5:14:45 UTC

More failed ones:

http://www.gpugrid.net/workunit.php?wuid=465495
http://www.gpugrid.net/workunit.php?wuid=465306
http://www.gpugrid.net/workunit.php?wuid=465388
http://www.gpugrid.net/workunit.php?wuid=465441
http://www.gpugrid.net/workunit.php?wuid=465460
http://www.gpugrid.net/workunit.php?wuid=465441

Edit: Here's one that completed on a GTX 260 using v185.85 drivers:

http://www.gpugrid.net/workunit.php?wuid=465330

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9826 - Posted: 16 May 2009 | 6:09:48 UTC

Another failed one, that's 2 in a row:

http://www.gpugrid.net/result.php?resultid=677190

That's it, I'm aborting these when I see them...

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9827 - Posted: 16 May 2009 | 7:47:52 UTC - in response to Message 9826.
Last modified: 16 May 2009 | 8:11:12 UTC

This is the mystery of the century. I am sorry for the problems, but these wus run fine in the lab for a 8800GT. We can't see any reasons why other work, and these don't.
Feel free to cancel them. We will further debug it on Monday.

Sorry for any inconvenience.

gdf

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9828 - Posted: 16 May 2009 | 7:51:20 UTC - in response to Message 9827.

Can you summarize for me one of your systems where you have problems:
OS/driver/BOINC version/cuda toolkit installed

gdf

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9842 - Posted: 16 May 2009 | 11:47:17 UTC - in response to Message 9828.
Last modified: 16 May 2009 | 11:49:48 UTC

Can you summarize for me one of your systems where you have problems:
OS/driver/BOINC version/cuda toolkit installed

gdf


The machine that I've had 2 of these fail in a row has a 9600 GSO card, v185.81 drivers, BOINC v6.6.24, AMD Phenom 9600 quad, WinXP64. It's had no other failures except one of the dreaded KASHIF_HIVPR WUs. You'll notice though that these xxx-GIANNI_FB WUs are failing on a variety of machines with old & new drivers, slower and faster cards. Of the ones that I've now found completed, 2 were done on a GTX 295 and 1 on a GTX 260. Thanks for the quick reply!

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9854 - Posted: 16 May 2009 | 13:16:14 UTC

I don't know if I have just been real unlucky to get right in the middle of a batch of tasks that will not run anywhere... or, I have a problem with the new motherboards that is not obvious....

New i7 920 machines, Asus Rampage II mother boards, 6G Tri-channel ram

In one of the new machines I put two new GTX260 cards and when the tasks started to fail I assumed bad driver, so I rolled back to 182.50, burned a couple more tasks. built the second system and put into it the GTX280 card that had been running tasks in the Q9300 just fine thank you very much.

It started to burn tasks too... I have gotten a couple different error patterns one is invalid function which may be related to an over-clock mode being turned on because I did not know what I was doing. Getting that turned off and the bios updated cleared that up to move to:

The other error is: One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003)

I am running SaH and SaH Beta tasks and they are completing, though with the long delays in validation it is hard to know if I am returning junk or not.

The new systems are w03 and w04.

I have not yet updated W03 yet so that is next ...

But helpful hints would be nice ...

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 9865 - Posted: 16 May 2009 | 15:43:14 UTC - in response to Message 9828.

Can you summarize for me one of your systems where you have problems:
OS/driver/BOINC version/cuda toolkit installed

gdf


I have never installed the toolkit on any setup.....Is this a requirement of Nvidia for proper CUDA operation???

I have been without the toolkit for 5 months so far, and until recently without issues.

____________
mike

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9871 - Posted: 16 May 2009 | 18:33:52 UTC - in response to Message 9865.

Can you summarize for me one of your systems where you have problems:
OS/driver/BOINC version/cuda toolkit installed

gdf


I have never installed the toolkit on any setup.....Is this a requirement of Nvidia for proper CUDA operation???

I have been without the toolkit for 5 months so far, and until recently without issues.

He was asking for the version of it in case you had ... in your case, and mine, it would be "not installed".

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9875 - Posted: 16 May 2009 | 20:56:06 UTC - in response to Message 9824.

More failed ones:

http://www.gpugrid.net/workunit.php?wuid=465495
http://www.gpugrid.net/workunit.php?wuid=465306
http://www.gpugrid.net/workunit.php?wuid=465388
http://www.gpugrid.net/workunit.php?wuid=465441
http://www.gpugrid.net/workunit.php?wuid=465460
http://www.gpugrid.net/workunit.php?wuid=465441

Edit: Here's one that completed on a GTX 260 using v185.85 drivers:

http://www.gpugrid.net/workunit.php?wuid=465330

All but one of these has finished now but all on the fastest cards (all but one was on GTX 295 cards in fact), the slower cards failed every time.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9879 - Posted: 16 May 2009 | 21:10:28 UTC

Horrible time last night while I contemplated my brand new shiny new car smelling computers were bad ... well, I completed first good task on W03 just now ...

Downloaded two more new ones and since all the prior experience was that they would not even start, and they both have 4-5 minutes on the clock ... looking better ... so now on to test to see if they two cards will run tasks to completion ... later tonight we shall know ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9881 - Posted: 16 May 2009 | 21:23:24 UTC - in response to Message 9879.

I didn't think by "burn in" they actually meant burning the WUs.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9889 - Posted: 16 May 2009 | 23:48:54 UTC - in response to Message 9881.

I didn't think by "burn in" they actually meant burning the WUs.

See, I KNEW I was doing something wrong ... just could not put my finger on it ... well we are still chunking along on both new systems working on the three tasks I have in hand.

When I shut down the other Dell I will pull the 9800GT from it and put it in the system with a GTX280 until I can afford to up-engine again ... likely will wait for another sale at Frys ... :)

Profile Bymark
Avatar
Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 9953 - Posted: 18 May 2009 | 20:43:06 UTC - in response to Message 9889.
Last modified: 18 May 2009 | 20:48:41 UTC

Here was one bad behaving...

name p2450000-IBUCH_pYIpYV_1205-4-10-RND8115
application Full-atom molecular dynamics
created 16 May 2009 19:41:56 UTC

errors Too many error results

Task ID:

681535 30079 16 May 2009 20:15:32 UTC 16 May 2009 20:17:10 UTC Over Client error Compute error 3.08 0.02 ---
681619 20240 16 May 2009 20:23:33 UTC 16 May 2009 22:54:49 UTC Over Client error Compute error 4.68 0.02 ---
682057 33159 16 May 2009 23:14:09 UTC 17 May 2009 5:28:42 UTC Over Client error Compute error 3.27 0.02 ---
683015 31318 17 May 2009 5:29:35 UTC 17 May 2009 23:53:37 UTC Over Client error Compute error 3.40 0.01 ---
686647 35052 17 May 2009 23:54:16 UTC 18 May 2009 7:21:43 UTC Over Client error Compute error 2.93 0.01 ---
687813 28987 18 May 2009 7:22:03 UTC 18 May 2009 16:33:35 UTC Over Client error Compute error 3.39 0.02 ---
____________
"Silakka"
Hello from Turku > Åbo.

Post to thread

Message boards : Graphics cards (GPUs) : More bad WUs

//