Advanced search

Message boards : Number crunching : FYI to all Nvidia Crunches out there... Clock speed Problems

Author Message
Tex1954
Send message
Joined: 20 May 11
Posts: 16
Credit: 86,798,974
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 21245 - Posted: 25 May 2011 | 13:17:48 UTC

There is an ongoing problem with CUDA tasks and the Clock Rates being dropped in Nvidia I've written it up and the problem is with Vista and Win7 both.

What happens, is the clock rate gets dropped to conserve power/heat etc. and never returns to high speed again. This always happens with DUAL Nvidia cards installed and seems only magic prevents it from happening on it's own most of the time. Doesn't matter what power settings are set, performance mode seems to help, but not totally correct it. Snoozing or Suspending tasks is a 95% guarantee the clocks with drop and never regain full speed again.

I've informed Nvidia tech support and the forums.


http://forums.nvidia.com/index.php?s=9f29a996e0ac9d6ea44a506f6631f805&showtopic=200414&pid=1237460&st=0&#entry1237460

8-)

Tex1954

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,183,469,585
RAC: 19,207,828
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21247 - Posted: 25 May 2011 | 16:42:36 UTC - in response to Message 21245.

As I've posted in reply to your identical post at SETI:

This problem has recently been drawn to the attention of the SETI CUDA developers, and is in the process of being reported onwards to other CUDA-enabled BOINC projects. It seems to be related specifically to the release of nVidia drivers which will support the forthcoming v4 release of the CUDA run-time support files. The central BOINC library code isn't yet fully compatible with CUDA v4: the problem has been overcome with test programs, and should go away with the next round of application releases.

Unfortunately, v3 and earlier nVidia drivers aren't available for the very latest generation of nVidia cards, but if you have a card which can run with nVidia driver 266.58, that should avoid the problem.

@ project devs:

It's been isolated to a CUDA task exit handling problem in the BOINC API library code. Eric Korpela (SETI@home) has a copy of the proposed (and tested) solution for evaluation, and is considering checking it in following testing and evaluation. I can put you in touch with the developer concerned, or pass messages, if you wish.

Ross*
Send message
Joined: 6 May 09
Posts: 34
Credit: 443,507,669
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21250 - Posted: 25 May 2011 | 23:07:23 UTC - in response to Message 21247.

As I've posted in reply to your identical post at SETI:

This problem has recently been drawn to the attention of the SETI CUDA developers, and is in the process of being reported onwards to other CUDA-enabled BOINC projects. It seems to be related specifically to the release of nVidia drivers which will support the forthcoming v4 release of the CUDA run-time support files. The central BOINC library code isn't yet fully compatible with CUDA v4: the problem has been overcome with test programs, and should go away with the next round of application releases.

Unfortunately, v3 and earlier nVidia drivers aren't available for the very latest generation of nVidia cards, but if you have a card which can run with nVidia driver 266.58, that should avoid the problem.

@ project devs:

It's been isolated to a CUDA task exit handling problem in the BOINC API library code. Eric Korpela (SETI@home) has a copy of the proposed (and tested) solution for evaluation, and is considering checking it in following testing and evaluation. I can put you in touch with the developer concerned, or pass messages, if you wish.

Hi
I have had the same problem with various cards . the new beta 270 driver seems to have fixed the problem. [I hope ]
Cheers
Ross*

____________

Tex1954
Send message
Joined: 20 May 11
Posts: 16
Credit: 86,798,974
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 21251 - Posted: 26 May 2011 | 4:33:35 UTC - in response to Message 21247.

Thanks for the update. I had a watch on the Nvdia forum but never got an alert.

I have been testing the new Beta 275 drivers and have the same problems with them on 3 different pairs of cards...

Hopefully, we can get this resolved.

Again, thanks for the update!

:)

Tex1954

Dirk
Send message
Joined: 10 Oct 08
Posts: 18
Credit: 39,100,916
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21253 - Posted: 26 May 2011 | 9:48:43 UTC

Has anyone tried out how much the downclocking affects runtimes? I caught my GPU at 405 mhz a while ago while it was crunching but the WU it was on seemed to be progressing at a rather normal pace. Could be it had only just downclocked though, pretty sure I snoozed boinc that day and it caused the downclock, just no idea how long it was before I noticed it.

Anyways, it'd be nice if it would be fixed in the next WHQL release. I need the new drivers to run games like the witcher 2 and dragon age 2 properly. Stupid thing is it stays downclocked even if I start gaming.

Siegfried Niklas
Avatar
Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 21254 - Posted: 26 May 2011 | 11:58:16 UTC - in response to Message 21253.
Last modified: 26 May 2011 | 12:04:48 UTC

Has anyone tried out how much the downclocking affects runtimes? I caught my GPU at 405 mhz a while ago while it was crunching but the WU it was on seemed to be progressing at a rather normal pace. Could be it had only just downclocked though, pretty sure I snoozed boinc that day and it caused the downclock, just no idea how long it was before I noticed it.

Anyways, it'd be nice if it would be fixed in the next WHQL release. I need the new drivers to run games like the witcher 2 and dragon age 2 properly. Stupid thing is it stays downclocked even if I start gaming.



You may read this thread at SETI-forum:

http://setiathome.berkeley.edu/forum_thread.php?id=64243


- or my post at this forum:

http://www.gpugrid.net/forum_thread.php?id=2502&nowrap=true#21188

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,183,469,585
RAC: 19,207,828
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21338 - Posted: 7 Jun 2011 | 16:16:36 UTC

Does anyone know whether this problem has been observed with Linux CUDA drivers - like, perhaps, the 270.4119 release or the 275.09 Beta, both released May 20, 2011?

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21340 - Posted: 7 Jun 2011 | 18:10:41 UTC - in response to Message 21338.

Does anyone know whether this problem has been observed with Linux CUDA drivers - like, perhaps, the 270.4119 release or the 275.09 Beta, both released May 20, 2011?


Hi, I'm using the 270.41.19 driver GTX 295 with no problem in Ubuntu Natty 11.04. Greetings.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21342 - Posted: 7 Jun 2011 | 20:54:00 UTC - in response to Message 21338.

Downclocking has been observed by several disgruntled GPUGrid members under Linux, including myself, several months ago. Previous drivers definitely caused severe downclocking problems for GT240 and similar cards, and it also messed with some Fermi's, but not so much with the more mature GTX200 cards.
I'm not running any Linux systems right now so I can't re-test with more recent drivers. I expect there has been continuous improvements but that it has not been fully resolved.

Dirk
Send message
Joined: 10 Oct 08
Posts: 18
Credit: 39,100,916
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21343 - Posted: 7 Jun 2011 | 21:11:11 UTC

Still downclocks occasionally if I manually suspend tasks or snooze boinc with the 275.33 WHQL drivers. This is on win7 64 bit.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21348 - Posted: 8 Jun 2011 | 8:06:28 UTC - in response to Message 21338.
Last modified: 8 Jun 2011 | 8:08:06 UTC

Does anyone know whether this problem has been observed with Linux CUDA drivers - like, perhaps, the 270.4119 release or the 275.09 Beta, both released May 20, 2011?


I installed the 270.41.19 Linux drivers a couple hours ago on Fedora 14 with GTX 570. I've been alternately suspending/resuming a GPUgrid task to see if it causes down clocking, as suggested in the OP. So far the clocks have remained constant at:

nvclock=742
memclock=1900
processorclock=1484

Is there anything else that causes down clocking that I could test? Is it more prevalent with certain GPUgrid app(s) than others? If so I'll try again with those apps.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21350 - Posted: 8 Jun 2011 | 10:32:04 UTC - in response to Message 21348.

Downclocking appears to be caused by perhaps three things:

A defensive mechanism whereby the card prevents itself being damaged from overheating/overvolting/overclocking (Fermi's only). I think it looks for recoverable errors and when too many are spotted the card is downclocked. This actually prevents a lot of task failures.

It sounds like the downclocking as a result of a cuda task exiting problem might be the GPU trying to protect the system from unwanted thread execution. Any thoughts on this?

Environmental settings - Adaptive power mode is in use and kicks in.

The problem is not really the downclocking, it's that the clocks do not automatically rise again. So no matter what the downclock reason, the problem is that the clocks stay down; they should rise when GPU use increases again. With some earlier drivers I found that even a restart did not resolve the problem on occasions, but this depended on driver, OS and card (3 variables). As there have been several different drivers released since this started it's difficult to assess how and when downclocking occurs. The randomness of the downclock period also make finding the problem more difficult; I have seen everything from a downclock of a few seconds to 405MHz to the clocks being stuck at 50MHz after repeated reboots.

I think Boinc versions would not have any impact. WRT GPUGrid we have now moved away from 6.12, just leaving 6.13 but there is still differences in tasks. The tasks that better utilize the GPU are more likely to tax the GPU to the extent that it overheats/draws too much power (you can use GPUZ to see the utilization of the task). Task type should not however influence the likelihood of the GPU downclocking due to using the Adaptive power mode - not recommended, stick to Full power where possible (not available on XP).
I expect task type would not influence any cuda task exiting problem either.

JLConawayII
Send message
Joined: 31 May 10
Posts: 48
Credit: 28,893,779
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 23558 - Posted: 19 Feb 2012 | 21:51:50 UTC

Amazingly this is still an issue. I uninstalled all Nvidia software and re-installed my old reliable 266.58 drivers, but now it's still downclocking randomly. So why is the driver that once worked without ever downclocking anything now screwed up? I think Nvidia is the new ATI with these horrible drivers.

Post to thread

Message boards : Number crunching : FYI to all Nvidia Crunches out there... Clock speed Problems

//