Advanced search

Message boards : Graphics cards (GPUs) : Continual computing errors

Author Message
Administrator
Send message
Joined: 25 Jan 09
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 5978 - Posted: 25 Jan 2009 | 9:45:08 UTC

Hi there.
I have overclocked the shaders on my 9500GT card but I keep getting computing errors. I have now reset my card back to default values but the errors are still occuring. To date I think its up to around 20. Anyone have any ideas how I might fix this. I have tried a reset of the project and a detach but nothing works yet.
Thanks.

Eric

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5980 - Posted: 25 Jan 2009 | 10:40:32 UTC - in response to Message 5978.

Hi there.
I have overclocked the shaders on my 9500GT card but I keep getting computing errors. I have now reset my card back to default values but the errors are still occuring. To date I think its up to around 20. Anyone have any ideas how I might fix this. I have tried a reset of the project and a detach but nothing works yet.
Thanks.

Eric


Check the fan and the air path around the card to make sure that it can be cooled. Use one of the monitroing tools to get the card temps ...

ONe other thing to try is to run a few SaH tasks to see if they complete (though you will have to wait for them to validate and be paired up with a wingman) ...

These are the first things that come to my mind ... weak as it is ...
____________

Eric
Send message
Joined: 17 Nov 08
Posts: 13
Credit: 15,272,287
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5994 - Posted: 25 Jan 2009 | 15:14:39 UTC - in response to Message 5980.

Thanks for the info. I found the fans and air vents were ok. I then did a complete re install of my operating system as it was unstable. Re did all the drivers and re did Bionc. All is fine now.
Once again thanks for repling.
Eric

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6010 - Posted: 25 Jan 2009 | 22:11:29 UTC - in response to Message 5994.

Thanks for the info. I found the fans and air vents were ok. I then did a complete re install of my operating system as it was unstable. Re did all the drivers and re did Bionc. All is fine now.
Once again thanks for repling.
Eric


Eric,

It is what we are here for ... I help you ... others are trying to help me with my issues ... all to the betterment of the universe ... :)

One of the lessons I learned when I was writing documentation for BOINC is none of us knows it all ... or can do it all ... there is always more to learn ...

Just because I can't get Linux beat into submission at the moment does not make me an idiot, nor does your difficulties ... I am just glad that you found the problem. *MY* experience with windows is that I would have to do a clean install about every 6 months to keep the systems running stably. When I only run BOINC and don't use the system for much of anything else it seems to last longer ... YMMV ...
____________

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 6303 - Posted: 1 Feb 2009 | 22:03:30 UTC

http://www.gpugrid.net/result.php?resultid=269806

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 8200 - Posted: 5 Apr 2009 | 7:12:04 UTC

Incorrect function. (0x1) - exit code 1 (0x1)

http://www.gpugrid.net/result.php?resultid=489157

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8248 - Posted: 6 Apr 2009 | 16:40:13 UTC

Kodak, what are you trying to say? Do you have "Continual computing errors", as the thread title implies? I can only see one error for the host you linked to. And I see his 9600GSO is overclocked quite a bit, so an error every nwo and then might well be within expectations.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 8766 - Posted: 23 Apr 2009 | 8:17:08 UTC

IN LAST 2 DAY MANY errors
in new WU old WU is ok
and after
23.04.2009 11:11:44 GPUGRID Message from server: No work sent
23.04.2009 11:11:44 GPUGRID Message from server: (reached daily quota of 4 results)
23.04.2009 11:11:44 GPUGRID Message from server: (Project has no jobs available)
(((

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 837,894
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 8767 - Posted: 23 Apr 2009 | 8:45:19 UTC - in response to Message 8248.

Kodak, what are you trying to say? Do you have "Continual computing errors", as the thread title implies? I can only see one error for the host you linked to. And I see his 9600GSO is overclocked quite a bit, so an error every nwo and then might well be within expectations.

MrS


I think he's talking about his other computer, which is throwing a ton of errors. Here's one: http://www.gpugrid.net/result.php?resultid=569330


____________
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8798 - Posted: 23 Apr 2009 | 19:32:48 UTC

Kodak, your 9600GSO is overclocked by ~350 MHz. If you run such a high OC and at some point it starts to fail the first thing you should try is to lower the OC and see if it helps.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 8816 - Posted: 24 Apr 2009 | 4:42:43 UTC
Last modified: 24 Apr 2009 | 5:38:37 UTC

i about http://www.gpugrid.net/show_host_detail.php?hostid=31714
in it only one overclocked by shaders from 13xx to 1734
second is asus top shaders =1674 OC to 1734
today low OC to 1674 same errors(((
cards not hot ~ 56-60 C
--info
1s WU run's ok ( no more WU) -> update
+3 WU
start 2nd run->ok BUT
start's 3rd wu run-> fail OND start's 4s wu run-> fail !!!
and return run 2nd wu and ok
Whay start's 3 and 4 (deadline is almost same!!!) ??????

and ?
what better use 2 GPU in one PC or 1gpu+pc +1gpu+pc ?????

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 837,894
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 8825 - Posted: 24 Apr 2009 | 11:18:32 UTC - in response to Message 8816.

what better use 2 GPU in one PC or 1gpu+pc +1gpu+pc ?????


That's hard to say, but right now, I'd go with 1+1 and 1+1. GPU computing is relatively new to BOINC, and the BOINC scheduling software is far from perfect. It seems to have issues with propper scheduling when there's more than one GPU (especially when there's different GPUs in the same computer).

Once the scheduling issues are eventually resolved, things might change. But for right now, I'd put each GPU in a separate computer. It's also easier on the power supplies and the cooling (summer is coming, after all.)

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 8884 - Posted: 25 Apr 2009 | 8:45:26 UTC

1s WU run's ok ( no more WU) -> update
+3 WU
start 2nd run->ok BUT
start's 3rd wu run-> fail OND start's 4s wu run-> fail !!!
and return run 2nd wu and ok
Whay start's 3 and 4 (deadline is almost same!!!) ??????

IT is only 2xGPU (

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8913 - Posted: 25 Apr 2009 | 13:57:57 UTC

You started at a reported shader clock of ~1730 MHz, then you went to ~1715 MHz and still get errors and now you're running 1693 and 1700 MHz and still get errors.

Do you know that the clock speed on current nVidia GPUs is nto continous (i.e. 1 MHz steps), but discrete with much larger steps? For the shader the step size is about 54 MHz (can't remember the exact value) and changes smaller than this likely don't change anything. Most tools (also the GPU-Grid task output) only report the requested clock speed, but not the real one.

So back off to 1600 MHz shader or so and see if it helps. Also don't forget the chip and memory clocks.. if they're also overclocked you should reduce them as well. It could also be a too tight OC on the CPU and/or memory.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 8958 - Posted: 26 Apr 2009 | 17:09:18 UTC

OC only shader \ chip and memory - is default

will be fine work
9800GTX+ VS 250GTS ? (it is only diff name , shader - 128 )

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 8972 - Posted: 27 Apr 2009 | 6:27:01 UTC

Hi. I have an overclocked 9800GTX which was running very smoothly. Until 3/4 days ago, when I behgan to obtain a long list of "computation error". Nothing has changed when that problem has begun (for example, I haven't installed any new driver).
This is a typical error:
<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
- exit code 1073741845 (0x40000015)
</message>
<stderr_txt>
Failed to set low-cpu sync mode
# Using CUDA device 0
# Device 0: "Device Emulation (CPU)"
# Clock rate: 1350000 kilohertz
# Total amount of global memory: -1 bytes
# Number of multiprocessors: 16
# Number of cores: 128
Cuda error in file '..\cuda/cutil.h' in line 968 : initialization error.
Memory usage: host: bytes device: bytes
Assertion failed: 0, file ..\cuda/cutil.h, line 968

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

Can you help me?
Thank you in advance.
____________

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 8975 - Posted: 27 Apr 2009 | 7:19:41 UTC - in response to Message 8972.

You are running in device emulation, no card is being used:

# Using CUDA device 0
# Device 0: "Device Emulation (CPU)"

Your last success result 566035 shows right config:

# Using CUDA device 0
# Device 0: "GeForce 9800 GTX/9800 GTX+"

Re-install?

i

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 8976 - Posted: 27 Apr 2009 | 7:46:34 UTC - in response to Message 8975.

I have to reinstall the video drivers, or BOINC? Or both?
Thank for your answer.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9023 - Posted: 27 Apr 2009 | 21:31:03 UTC - in response to Message 8958.

will be fine work
9800GTX+ VS 250GTS ? (it is only diff name , shader - 128 )


What are you trying to say?

9800GTX+ and GTS250 are the same speed, but the GTS250 can have a lower power consumption.

MrS
____________
Scanning for our furry friends since Jan 2002

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9027 - Posted: 27 Apr 2009 | 21:42:24 UTC - in response to Message 8976.

I have to reinstall the video drivers, or BOINC? Or both?
Thank for your answer.


I'd try BOINC first, try with version 6.5.0 though...
Although folks here in the forum may give you better idea on what client version is "safer".

i

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9035 - Posted: 27 Apr 2009 | 21:57:52 UTC - in response to Message 9027.

I have to reinstall the video drivers, or BOINC? Or both?
Thank for your answer.


I'd try BOINC first, try with version 6.5.0 though...
Although folks here in the forum may give you better idea on what client version is "safer".

i

6.5.0 is safer ...

I have seen reports of problems with all versions of 6.6.x; though some stay in a state of de-nile which can be hot this time of year... :)

Seriously, I have tried several and keep going back home ...

YMMV ... but if it screws up ... you been warned ... :)

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9049 - Posted: 28 Apr 2009 | 6:16:50 UTC

I tried installing new drivers for my video card, and 6.6.22 version of BOINC... I'm crossing my fingers.
____________

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9062 - Posted: 28 Apr 2009 | 11:13:03 UTC - in response to Message 9049.

No, the problem is still here. I got almost the same error:

<core_client_version>6.6.24</core_client_version>
<![CDATA[
<message>
- exit code 1073741845 (0x40000015)
</message>
<stderr_txt>
Failed to set low-cpu sync mode
# Using CUDA device 0
# Device 0: "Device Emulation (CPU)"
# Clock rate: 1350000 kilohertz
# Total amount of global memory: -1 bytes
# Number of multiprocessors: 16
# Number of cores: 128
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
Cuda error in file '..\cuda/cutil.h' in line 968 : initialization error.
Memory usage: host: bytes device: bytes
Assertion failed: 0, file ..\cuda/cutil.h, line 968

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>


Now I'm trying to crunch for Folding@Home.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9084 - Posted: 28 Apr 2009 | 21:10:52 UTC - in response to Message 9062.

Now I'm trying to crunch for Folding@Home.


Drop us a line if it works.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 9085 - Posted: 28 Apr 2009 | 21:11:52 UTC

So back off to 1600 MHz ((
yes( work's
will GF250 +9800GTX+ wokr's fine like a 2x 9800GTX+ or 2x GF250 ???
or will troble?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9089 - Posted: 28 Apr 2009 | 21:25:25 UTC - in response to Message 9085.

Should be as fine as every multi-gpu setup.

MrS
____________
Scanning for our furry friends since Jan 2002

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9102 - Posted: 29 Apr 2009 | 7:05:15 UTC - in response to Message 9084.

Now I'm trying to crunch for Folding@Home.


Drop us a line if it works.

MrS

F@H has no problem with my video card... don't know what else to try...
____________

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9104 - Posted: 29 Apr 2009 | 7:54:24 UTC - in response to Message 9102.

Now I'm trying to crunch for Folding@Home.


Drop us a line if it works.

MrS

F@H has no problem with my video card... don't know what else to try...


Have you tried BOINC client 6.5.0?
Because that would be the first difference with crunching for F@H...

i

showa
Send message
Joined: 2 Mar 09
Posts: 28
Credit: 4,975,808
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9141 - Posted: 30 Apr 2009 | 11:32:13 UTC - in response to Message 9104.

I got this error message:
04/30/09 13:28:19|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

____________

mscharmack
Avatar
Send message
Joined: 20 Aug 07
Posts: 18
Credit: 1,319,274
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 9145 - Posted: 30 Apr 2009 | 16:50:56 UTC - in response to Message 9141.

I got this error message:
04/30/09 13:28:19|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.


This error usually pops up with an incompatible/or non-CUDA video card.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9154 - Posted: 30 Apr 2009 | 19:52:57 UTC

Now we're getting somewhere: your card is not recognized by BOINC and the recent 6.6.x clients changed their response to such a situation in that they'l try to run the app using the cpu emulation.

As mscharmack said it could be the driver. Under Vista the problem could also be that you installed (upgraded) BOINC as a service.. but you run XP home. Are you using remote desktop?

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Continual computing errors

//