Advanced search

Message boards : Graphics cards (GPUs) : Recent Error's on my i7 !!!

Author Message
STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10710 - Posted: 19 Jun 2009 | 23:46:34 UTC
Last modified: 19 Jun 2009 | 23:48:55 UTC

Noticed I've been geting a lot of Errors on my i7 the last 2 day's, it was running good up until then so I'm not srue if it's the Computer or the Wu's causing the Errors.

Name p10000-IBUCH_pYIpYV_1406-5-10-RND8554_1
Workunit 550695
Created 19 Jun 2009 11:47:45 UTC
Sent 19 Jun 2009 11:49:17 UTC
Received 19 Jun 2009 23:35:10 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 98 (0x62)
Computer ID 15842
Report deadline 24 Jun 2009 11:49:17 UTC
CPU time 2088.641
stderr out <core_client_version>6.5.0</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 3
# Device 0: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 2: "GeForce GTX 280"
# Clock rate: 1404000 kilohertz
# Total amount of global memory: 1073479680 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 3: "GeForce GTX 260"
# Clock rate: 799200 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: cufftExecR2C (gridcalc1)
called boinc_finish

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 3908.76967592593
Granted credit 0
application version 6.64

--------------------------------------------------------------------------------

popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 40,277,822
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10711 - Posted: 19 Jun 2009 | 23:51:44 UTC

Looks like a problem with your gtx260
Bob

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10712 - Posted: 19 Jun 2009 | 23:59:59 UTC - in response to Message 10711.
Last modified: 20 Jun 2009 | 0:09:44 UTC

Looks like a problem with your gtx260
Bob


How did you come to that conclusion ???

If your looking @ the Clock Rate for the 260 it's low because it's not in an X2 Slot.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10719 - Posted: 20 Jun 2009 | 10:17:13 UTC

On your current last 2 result pages I see 9 errors which occurred while running on the GTX 260 (usually Cuda device 3), whereas only one complete and one partial WU were crunched successfully by this card. The remaining errors are related to the oversize WUs, so none to assign to the other cards.

That doesn't have to mean it's broken.. I'd check it on a different PC, preferrably with less other GPUs. could also be that the 2D settings are somehow messed up, e.g. voltage or fan speed too low due to whatever reason.

MrS
____________
Scanning for our furry friends since Jan 2002

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10727 - Posted: 20 Jun 2009 | 12:14:20 UTC - in response to Message 10719.
Last modified: 20 Jun 2009 | 12:14:57 UTC

On your current last 2 result pages I see 9 errors which occurred while running on the GTX 260 (usually Cuda device 3), whereas only one complete and one partial WU were crunched successfully by this card. The remaining errors are related to the oversize WUs, so none to assign to the other cards.

That doesn't have to mean it's broken.. I'd check it on a different PC, preferably with less other GPUs. could also be that the 2D settings are somehow messed up, e.g. voltage or fan speed too low due to whatever reason.

MrS


That's what I was just about to do ETA, pull the 260 from the i7 and swap it into another Computer & put the Card from the other Computer into the i7. That way if I keep getting the errors on the i7 it probably just don't want to run the Wu's properly in the 3'rd Slot I guess for some reason.

Also I'll get to see if the Box I put the 260 into start to get Error's or not.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10729 - Posted: 20 Jun 2009 | 13:45:14 UTC
Last modified: 20 Jun 2009 | 13:52:50 UTC

I'm switching the cards now, had some other stuff to do first to the Box the 260 was going in. Since it was going to be down a little I figured I'd do that first. Putting a GTX 295 in the Slot the 260 was in to see if that starts getting errors in that Slot.

But how can you tell the 260 was the one that erred, is it because it's the last card listed or because of the following Message with it showing Error & Warnings:

# Device 3: "GeForce GTX 260"
# Clock rate: 799200 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : the launch timed out and was terminated.

popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 40,277,822
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10730 - Posted: 20 Jun 2009 | 19:21:39 UTC - in response to Message 10729.

But how can you tell the 260 was the one that erred


Simply look at the erroneous task and look for the line that says "# Using CUDA device X"

Bob

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10732 - Posted: 20 Jun 2009 | 19:25:49 UTC - in response to Message 10730.

But how can you tell the 260 was the one that erred


Simply look at the erroneous task and look for the line that says "# Using CUDA device X"

Bob

Okay, Thanks Bob ..

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10747 - Posted: 21 Jun 2009 | 9:53:45 UTC

Looks like the GTX 260 could be bad, it already erred out 1 Wu in it's new Box while the i7 it came out of hasn't had an error since I pulled the 260 out of it.

I think i'll switch the Box to running AQUA CUDA Wu's & see if it gets error's on them too while I get ahold of BFG & see if they'll RMA it. Supposed to have a Lifetime Warranty so I hope they will.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10754 - Posted: 21 Jun 2009 | 15:22:53 UTC - in response to Message 10747.

What's its clock speed in the other system? Well, might be time for RMA anyway.

MrS
____________
Scanning for our furry friends since Jan 2002

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10761 - Posted: 21 Jun 2009 | 17:39:37 UTC
Last modified: 21 Jun 2009 | 17:40:06 UTC

BFG is willing to do a Warranty Replacement, so I may as well go that way. The 260 is an OCX so it came pretty well overclocked right from the Factory. But it's not Overclocked any higher than any of the other BFG Cards I have and their running the Wu's ok. Hopefully this isn't a sign of things to come from running the Cards 24/7 since I've had them.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10797 - Posted: 23 Jun 2009 | 11:56:17 UTC
Last modified: 23 Jun 2009 | 11:56:32 UTC

My i7 on Steroids, no Error's so far with this setup. I did RMA the GTX 260 & should have the new Card by the end of the week ... :)

stderr out <core_client_version>6.5.0</core_client_version>
<![CDATA[
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 2: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 3: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 4: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 5: "GeForce GTX 295"
# Clock rate: 1512000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
# Time per step: 37.299 ms
# Approximate elapsed time for entire WU: 23311.734 s
called boinc_finish

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 10812 - Posted: 24 Jun 2009 | 8:08:08 UTC - in response to Message 10797.

Sorry if I ask but what is this? How can you have some many 295 on the same host?

gdf

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10813 - Posted: 24 Jun 2009 | 8:52:30 UTC - in response to Message 10812.
Last modified: 24 Jun 2009 | 8:54:57 UTC

Sorry if I ask but what is this? How can you have some many 295 on the same host?

gdf


Um, Errrr, running 3 GTX 295's on it will do the Trick every time ... ;) ... On a Gigabyte Extreme X58 Mother Board, just popped 3 of them in there and it was off and running without any tinkering.

Only running 2 right now as the Backup couldn't handle it and kept shutting the PC off, and plugging it in directly to the outlet the PC would shut off every time the Central Air came on so I went back to 2 295's for now until I can get a bigger Backup Power Supply.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10836 - Posted: 24 Jun 2009 | 21:26:48 UTC

The 295 counts as one card, but as 2 CUDA devices. Seems confusing but correct :D

MrS
____________
Scanning for our furry friends since Jan 2002

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 317,097,298
RAC: 201,120
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 10843 - Posted: 24 Jun 2009 | 23:49:24 UTC
Last modified: 24 Jun 2009 | 23:49:43 UTC

Another Host of mine showing 3 GTX 280's when in actuality it's 1 GTX 295 & 1 GTX 280 ... Weird:

6/24/2009 7:05:42 PM CUDA device: GeForce GTX 280 (driver version 18618, compute capability 1.3, 1024MB, est. 130GFLOPS)
6/24/2009 7:05:42 PM CUDA device: GeForce GTX 280 (driver version 18618, compute capability 1.3, 1024MB, est. 130GFLOPS)
6/24/2009 7:05:42 PM CUDA device: GeForce GTX 280 (driver version 18618, compute capability 1.3, 1024MB, est. 130GFLOPS)

Post to thread

Message boards : Graphics cards (GPUs) : Recent Error's on my i7 !!!

//