Advanced search

Message boards : Graphics cards (GPUs) : compute errors on gtx 295.

Author Message
Profile nutcase
Avatar
Send message
Joined: 16 Oct 08
Posts: 7
Credit: 5,348,057
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 8909 - Posted: 25 Apr 2009 | 13:30:58 UTC

ok, I am trying to track down a problem on my gtx 295.

it will crunch a wu, then suddenly the wu will error out with this message:

Cuda error in file '..\cuda/cutil.h' in line 968 : initialization error.
Memory usage: host: bytes device: bytes
Assertion failed: 0, file ..\cuda/cutil.h, line 968

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

system is running XP64 with driver version 181.20.

after this, The driver stops working and all wu's will immediately error out.

any help would be greatly appreciated.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8918 - Posted: 25 Apr 2009 | 15:03:22 UTC - in response to Message 8909.

The straight forward suggestion would be a driver update. Either to a 182 series one or to the beta 185.6x.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile nutcase
Avatar
Send message
Joined: 16 Oct 08
Posts: 7
Credit: 5,348,057
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 8955 - Posted: 26 Apr 2009 | 14:49:36 UTC

just upgraded to 182.50. lets hope that is the problem.

BTW: the beta 185.68 drivers for XP64 have a problem as they are missing files.

Profile nutcase
Avatar
Send message
Joined: 16 Oct 08
Posts: 7
Credit: 5,348,057
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 8965 - Posted: 26 Apr 2009 | 21:42:46 UTC

upgrading drivers did not fix anything :(

any other suggestions?

Profile madas91
Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 8978 - Posted: 27 Apr 2009 | 9:13:33 UTC

What changed between the 23rd and 24th as jobs where running ok then suddenly errors.
Windows updates?
New hardware or drivers?

Are you running stock speeds on your gtx295 or tweaking a little.

I added a few to clock and memory and returned 2 errors. Dropped it back down to stock and everythings fine again.

Profile nutcase
Avatar
Send message
Joined: 16 Oct 08
Posts: 7
Credit: 5,348,057
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 8996 - Posted: 27 Apr 2009 | 14:40:20 UTC

nope, no overclock.

I have gotten it down to a Bad Video Card :(

stuck it into another computer with Different type CPu, OS and drivers and it gives same error.

So, only thing left is a bad Gtx 295

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9028 - Posted: 27 Apr 2009 | 21:42:40 UTC - in response to Message 8996.

So, only thing left is a bad Gtx 295


They're said to have a rather high overall failure rate.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile madas91
Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 9050 - Posted: 28 Apr 2009 | 7:23:12 UTC - in response to Message 9028.

Fingers crossed for mine then :)

glad you have solved the problem though

Joe
Send message
Joined: 1 Sep 08
Posts: 37
Credit: 5,864,088
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 9149 - Posted: 30 Apr 2009 | 19:32:15 UTC - in response to Message 9050.

I have a new bad one, too. It's a Gainward GTX295...
The "Multi-GPU-Mode" is disabled and one core is ok - no problems to finish a WU. The other isn't ok and makes a lot of errors... There are still errors in the Multi-GPU-Mode...
I try to give back the card...

Kind regards

Joe

PS Both of my Point of View GTX295 are error free...

Spear
Send message
Joined: 28 Jan 09
Posts: 19
Credit: 15,297,622
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9176 - Posted: 1 May 2009 | 16:50:11 UTC - in response to Message 9149.

There's a Vista issue where the second core is unavailable to do CUDA work, hence the errors. There are suggested reg fixes and other tricks such as dummy monitor connectors that can possibly fix it. Though they didn't work in my case.

[AF] Profanateur
Avatar
Send message
Joined: 25 Oct 08
Posts: 42
Credit: 42,812,268
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9177 - Posted: 1 May 2009 | 17:21:36 UTC

I have errors too.

01/05/2009 19:09:56 GPUGRID Computation for task 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0 finished
01/05/2009 19:09:56 GPUGRID Output file 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0_1 for task 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0 absent
01/05/2009 19:09:56 GPUGRID Output file 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0_2 for task 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0 absent
01/05/2009 19:09:56 GPUGRID Output file 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0_3 for task 36-KASHIF_HIVPR_dim_ba2-2-100-RND7433_0 absent


All my wus finish like that (vista 64) GTX260 + 8800 GT Boinc 6.6.20

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9178 - Posted: 1 May 2009 | 19:10:01 UTC - in response to Message 9177.

Profanateur, both of your cards are substantially (factory?) overclocked. Try reverting to stock and see if it helps. Some of yours WU start, so your software setup should be fine.

MrS
____________
Scanning for our furry friends since Jan 2002

[AF] Profanateur
Avatar
Send message
Joined: 25 Oct 08
Posts: 42
Credit: 42,812,268
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9181 - Posted: 1 May 2009 | 20:50:13 UTC

I believe not.
It work fine with driver 185.20, on the 2 graphic cards.
and now with higher driver, I have these problem.

I try to return on the 185.20. But I lost ambiant occlusion on game. (for more information : http://www.hardware.fr/articles/756-1/dossier-nvidia-ameliore-qualite-graphique-avec-l-occlusion-ambiante.html
But it's in french.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9185 - Posted: 2 May 2009 | 0:45:54 UTC - in response to Message 9181.

Well, either you try one of them (downclock or loose ambient occlusion) or I'm afraid we can't help you any further. We could speculate until next century about what might be going on, but that's not worth much without some actual tests.

MrS
____________
Scanning for our furry friends since Jan 2002

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9193 - Posted: 2 May 2009 | 11:19:29 UTC

I've had a whole bunch of errors since the 1st of May. They seem to be all complaining about "One or more arguments are invalid".

Latest couple are here and here

The 1st couple that started doing this are here and here

Machine is an i7 running two GTX260+ cards. Not overclocked. Driver is 182.50. It seems happy munching on Seti cuda tasks.
____________
BOINC blog

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9196 - Posted: 2 May 2009 | 12:27:31 UTC - in response to Message 9193.

I can't find anything wrong with your config. Something you could try:

- reboot
- power off and remove the power cord for >10 min
- try BOINC 6.5.0 or 6.4.7
- upgrade to a 185.6x driver, or maybe the new 185.8x

MrS
____________
Scanning for our furry friends since Jan 2002

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9199 - Posted: 2 May 2009 | 13:05:30 UTC - in response to Message 9196.

I can't find anything wrong with your config. Something you could try:

- reboot
- power off and remove the power cord for >10 min
- try BOINC 6.5.0 or 6.4.7
- upgrade to a 185.6x driver, or maybe the new 185.8x

MrS


It was rebooted a couple of days ago after doing windows updates.

I did go up to BOINC 6.6.25 and after noticing the errors back down to 6.6.23.

Strange that my other machines don't seem to be erroring out, although they have single cuda cards (GTS250's). They run the same drivers and BOINC version.
____________
BOINC blog

[AF] Profanateur
Avatar
Send message
Joined: 25 Oct 08
Posts: 42
Credit: 42,812,268
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9201 - Posted: 2 May 2009 | 14:23:11 UTC - in response to Message 9185.

Well, either you try one of them (downclock or loose ambient occlusion) or I'm afraid we can't help you any further. We could speculate until next century about what might be going on, but that's not worth much without some actual tests.

MrS

Thanks.

apparemment it works fine with driver 182.5. And not with new drivers which support AO. :/

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9203 - Posted: 2 May 2009 | 17:35:51 UTC - in response to Message 9201.

Which new driver did you try? 185.6x? You could give the 185.8x a try, they appeared just a few days ago.

MrS
____________
Scanning for our furry friends since Jan 2002

[AF] Profanateur
Avatar
Send message
Joined: 25 Oct 08
Posts: 42
Credit: 42,812,268
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 9209 - Posted: 2 May 2009 | 18:49:39 UTC - in response to Message 9203.

Yeah.

I finish the wu who zre launched, and I try 185.x

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 15,982
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9212 - Posted: 2 May 2009 | 19:06:32 UTC - in response to Message 9176.

There's a Vista issue where the second core is unavailable to do CUDA work, hence the errors. There are suggested reg fixes and other tricks such as dummy monitor connectors that can possibly fix it. Though they didn't work in my case.


That explains why the second core I think I have isn't used, then. Thanks for the information.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9214 - Posted: 2 May 2009 | 19:10:32 UTC - in response to Message 9212.

There's a Vista issue where the second core is unavailable to do CUDA work, hence the errors. There are suggested reg fixes and other tricks such as dummy monitor connectors that can possibly fix it. Though they didn't work in my case.


That explains why the second core I think I have isn't used, then. Thanks for the information.

There is also the "fix" in 6 BOINC 6.6.24 where the minor memory delta on the second core renders it unusable, 6.6.25 has a cc_config setting to force use of all GPUs (which I am using with my GTX295 pair and have 4 turning and burning as I would expect).

So there are more than one reason for a GPU to be unused.

Post to thread

Message boards : Graphics cards (GPUs) : compute errors on gtx 295.

//