Advanced search

Message boards : Graphics cards (GPUs) : how to fix low memory clocks on GM204 cards

Author Message
Dave
Send message
Joined: 12 Jun 14
Posts: 12
Credit: 166,790,475
RAC: 0
Level
Ile
Scientific publications
watwatwatwat
Message 39154 - Posted: 16 Dec 2014 | 11:40:01 UTC

Hi,

has anyone noticed that memory clock speeds were a lower while running GPUGRID on Maxwell cards?

I found this topic in the Einstein@Home community:

http://einstein.phys.uwm.edu/forum_thread.php?id=11044

I checked on my memory clocks and found out that my card was running at 1502MHz only instead of 1750MHz.

How does this affect performance of GPUGrid?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39155 - Posted: 16 Dec 2014 | 12:41:32 UTC - in response to Message 39154.
Last modified: 19 Dec 2014 | 14:30:47 UTC

Edit:
There's the original error description. The section "Increasing memory clock speed" should contain the information you need to correct it.

Additionally there's how to overclock the memory, should you wish to do so. The gains are rather limited at GPU-Grid, typically about 2% performance increase for a GTX970 going from 6 to 7 GHz memory clock. But it adds up nevertheless if many people do it. And I expect the gain to be larger for GTX980.

To apply the settings from nVidia Inspector upon each boot:
- in nVidia Inspector right click on "Create Clocks Shortcut" and choose "Create Clock Startup Task"
- or click "Create Clocks Shortcut" and execute the created link automatically via windows task scheduling

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39166 - Posted: 17 Dec 2014 | 18:49:24 UTC

Perhaps this is something particular from these little Maxwells with this project. I see it on my 970 SC from EVGA and on my EVGA 980 SC. I hoped the latter did better with the memory clock. Unfortunately that one runs at 1502.3MHz too, and it will not go higher. But a GTX780 costs almost 300 Euri more for the housing of the card and the 384 more CUDA cores.

____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39188 - Posted: 18 Dec 2014 | 15:31:16 UTC

Update:
It seems I was wrong about the memory OC causing my blue screens! Encouraged by successful runs at 3.5 GHz over at Einstein I tried again and it's been running flawless since almost 2 days :)

So the message to get out to GM204 owners is just: Use nVidia Inspector to set 3.5 GHz in P2! It's less importasnt at GPU-Grid than Einstein or SETI, but still helps a bit.

@TJ: no, this is not limited to this project. So far I'm counting 4 CUDA programs, 1 OpenCL and one where I don't know the API being used. All of them are affected, with not a single counter-example being found. So it's safe to assume it simply affects all GP-GPU programs, be it CUDA or OpenCL.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39190 - Posted: 18 Dec 2014 | 16:25:44 UTC - in response to Message 39188.

Thanks for explaining and conformation ETA.
Will try nVidia inspector settings.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39199 - Posted: 18 Dec 2014 | 20:11:44 UTC

Overclocking the memory:

Thanks to skgiven I can now overclock my memory! Here's how:

- the GPU must not be crunching BOINC (either pause your GPU project, or all GPUs, or suspend BOINC completely)
- in the nVidia Inspector OC tab set the overclock for P0 (because you can't go any higher than this in P2)
- now you can set up to this memory clock for P2 as well
- apply & have fun

I can not yet say how much performance this will bring, but given the relatively small performance increase going from 3.0 to 3.5 GHz we shouldn't expect wonders at GPU-Grid.

MrS
____________
Scanning for our furry friends since Jan 2002

Dave
Send message
Joined: 12 Jun 14
Posts: 12
Credit: 166,790,475
RAC: 0
Level
Ile
Scientific publications
watwatwatwat
Message 39220 - Posted: 19 Dec 2014 | 22:24:35 UTC

Memory controller load running Einstein is at 80%. GPUGrid at around 50%. So the assumption seems to be right that GPUGrid doesn't benefit from higher memory clocks all too much in the end.


ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39233 - Posted: 20 Dec 2014 | 20:12:05 UTC - in response to Message 39220.

I don't have solid statistical data yet, but the benefit grows with the memory controller load, which depends on the WU type. For NOELIA_20MG, which went from ~50% load at 3.0 GHz to ~40% at 3.75 GHz I'm seeing approximately a 3.3% performance increase. It's not as much as in other projects, but a nice amount of "free" credits anyway. Power draw increased only by a few W.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39255 - Posted: 22 Dec 2014 | 11:40:17 UTC - in response to Message 39233.

One thing to watch out for is decrease in GPU clocks due to a user defined (100% or 105%...) power cap; if you increase the GDDR speed slightly but the GPU is restricted by your power limit then the gains from increasing the GDDR are lost. It would also impede accurate measurements of performance increase when running at 7GHz rather than 6GHz. This may be why it is the way it is. My guess is that there is something to be gained here but probably more where the load is higher.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39276 - Posted: 23 Dec 2014 | 13:13:44 UTC - in response to Message 39255.

Agreed - that's why I raised my power limit by 1% (2 W) when I increased the memory clock, to keep my card working at fairly efficient ~1.10 V without "underusing my new gem".

MrS
____________
Scanning for our furry friends since Jan 2002

UnknownPL1337
Send message
Joined: 22 Mar 15
Posts: 12
Credit: 530,700
RAC: 0
Level
Gly
Scientific publications
watwat
Message 40602 - Posted: 24 Mar 2015 | 12:36:00 UTC

Memory OC don't even give 0.1GFlops more....
I tested my GTX580 on 855Mhz/2100mhz = 1750GFlops and 855Mhz/1040mhz = 1750GFlops.
Overclocking Memory = Graphic card eats only more Watts...

Mem OC is only usefull in games...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40606 - Posted: 24 Mar 2015 | 19:45:36 UTC - in response to Message 40602.

The GTX580 is GF110 not GM204.
Performance is not the same thing as GFlops.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40620 - Posted: 25 Mar 2015 | 22:38:05 UTC - in response to Message 40602.

UnknownPL1337, it seems you still have to learn a lot.

Anything that measures performance in terms of GFlops is very likely a low-level benchmark. It runs some very simplistic calculations, designed to extract the maximum performance from the hardware. The theoretical GFlops of a GPU are just "clock speed * number of shaders * operations per shader per clock".

Real applications can't sustain this speed because something always gets in the way. For example dependencies between instructions have to be resolved first, or data may have to be fetched from memory.

It's true that faster memory doesn't calculate anything - but it's keeping your chip from being slowed down by memory operations.

What ever you used to measure those 1750 GFlops was either just calculating them based on your GPu clock speed and architecture, or was running some low-level test designed not to be held back by memory operations.

It's also true that on your GTX580 the memory speed does not matter much. It's simply got enough bandwidth to feed the shaders. GM204 has to feed 4 times as many shaders, at comparable clock speeds, with just ~10% more memory bandwidth. The other Maxwell chips (GM107, GM206, GM200) are balanced in a similar way and thus also benefit far more from memory overclocks than your card.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40623 - Posted: 25 Mar 2015 | 23:42:11 UTC - in response to Message 40620.
Last modified: 25 Mar 2015 | 23:50:30 UTC

Anything that measures performance in terms of GFlops is very likely a low-level benchmark. It runs some very simplistic calculations, designed to extract the maximum performance from the hardware. The theoretical GFlops of a GPU are just "clock speed * number of shaders * operations per shader per clock".

Even so: GFlops is a basic metric and a standard known meaning. (reason it heard so often) NVidia says it in presentations. BOINC uses it. GFLOPS/W ratio is a metric that figures the placement of Greenest TOP500 supercomputers. Applied to a number of performance ratios: Gflops/core or (M)GFlops/instructions per cycle - Gflops/bandwidth - etc...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40630 - Posted: 26 Mar 2015 | 21:40:53 UTC - in response to Message 40623.

Yes, its meaning is known.. and so are its limitations ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40631 - Posted: 26 Mar 2015 | 21:48:33 UTC - in response to Message 40623.
Last modified: 26 Mar 2015 | 22:06:12 UTC

GFlops is a theoretical maximum expressed against double precision or in the case of here single precision (x16 will be added to Volta).
GPU's have different but fixed architectures, so actual GPU performance depends on how the app and task expose architectural weaknesses (bottlenecks); what it has been asked to do and how it has to do it. Different architectures are relatively better or worse at doing different things.

WRT NVidia, GFlops is a reasonably accurate way of comparing cards performances within a series (or two based on the same architecture) as it's based on what they are theoretically capable of (which can be calculated). However, there are other 'factors' which have to be considered when looking at performance running a specific app.

As MrS said, by calculating and applying these 'correction factors' against compute capabilities we were able to compare performances of cards from different generations, up to and including Fermi. With Kepler we saw a greater variety of architectures within the two series so there were additional factors to consider.

Differences in bandwidth, boost, cache size, memory rates and memory size came to the fore as important considerations when comparing Kepler cards/thinking about buying one to crunch here - these impacted upon actual performance.
Cache size variation seemed to be important, with larger cache sizes being less restrictive.
Same type cards boosted to around about the same speed, irrespective of the price tag or number of fans.
Some cards in a series were even from a different generation GF not GK.
Bandwidth was lower for some Kepler's and was a significant impedance varying with WU types. This is still the case with Maxwell's; some WU's require more bandwidth than others. So for example, running one WU type (say a NOELIA_pnpx) on a GTX970 might incur a 49% Memory Controller Load and a 56% MCL on a GTX980 whereas other WU's would only incur a 26% MCL on a GTX970 and a 30% MCL on a GTX980. In the latter case a GTX980 might for the sake of argument appear to be 19% faster than a GTX970 whereas with the NOELIA WU's the GTX970's performance would be slightly better relatively; with the GTX980 only 15% faster than a GTX970. Increasing memory rates (to what they are supposed to be) alleviates the MCL but it's not as noticeable if the MCL is low to begin with.
The GDDR5 usage I'm seeing from a NOELIA_pnpx task is 1.144GB, so it's not going to perform well on a 1GB card!

Comparing NVidia cards to AMD's ATI range based on GFlops is almost pointless without consideration to the app. Equally pointless is comparing apps that use different features; OpenCL vs CUDA.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40637 - Posted: 26 Mar 2015 | 23:17:01 UTC - in response to Message 40631.

GFlops is a theoretical maximum expressed against double precision or in the case of here single precision (x16 will be added to Volta).
GPU's have different but fixed architectures, so actual GPU performance depends on how the app and task expose architectural weaknesses (bottlenecks); what it has been asked to do and how it has to do it. Different architectures are relatively better or worse at doing different things.

WRT NVidia, GFlops is a reasonably accurate way of comparing cards performances within a series (or two based on the same architecture) as it's based on what they are theoretically capable of (which can be calculated). However, there are other 'factors' which have to be considered when looking at performance running a specific app.

As MrS said, by calculating and applying these 'correction factors' against compute capabilities we were able to compare performances of cards from different generations, up to and including Fermi. With Kepler we saw a greater variety of architectures within the two series so there were additional factors to consider.

Differences in bandwidth, boost, cache size, memory rates and memory size came to the fore as important considerations when comparing Kepler cards/thinking about buying one to crunch here - these impacted upon actual performance.
Cache size variation seemed to be important, with larger cache sizes being less restrictive.
Same type cards boosted to around about the same speed, irrespective of the price tag or number of fans.
Some cards in a series were even from a different generation GF not GK.
Bandwidth was lower for some Kepler's and was a significant impedance varying with WU types. This is still the case with Maxwell's; some WU's require more bandwidth than others. So for example, running one WU type (say a NOELIA_pnpx) on a GTX970 might incur a 49% Memory Controller Load and a 56% MCL on a GTX980 whereas other WU's would only incur a 26% MCL on a GTX970 and a 30% MCL on a GTX980. In the latter case a GTX980 might for the sake of argument appear to be 19% faster than a GTX970 whereas with the NOELIA WU's the GTX970's performance would be slightly better relatively; with the GTX980 only 15% faster than a GTX970. Increasing memory rates (to what they are supposed to be) alleviates the MCL but it's not as noticeable if the MCL is low to begin with.
The GDDR5 usage I'm seeing from a NOELIA_pnpx task is 1.144GB, so it's not going to perform well on a 1GB card!

Comparing NVidia cards to AMD's ATI range based on GFlops is almost pointless without consideration to the app. Equally pointless is comparing apps that use different features; OpenCL vs CUDA.

Through points -- Knowing architectural differences (strengths and weaknesses) are key for peak instruction output - as are the underlining data transfers. To fully understanding the affect different types of code have on any "compute" device - one has to be knowledgeable in creative complexes'(s). Many facets to master - always learning knowledge by failure and practice. It takes many years to learn the language(s) of computers: a never ending journey. Deferring to people who have worked (experts) with computers since single digits bits a natural course for those who are willing to learn. The ones who started it all can teach younger heads much.

Post to thread

Message boards : Graphics cards (GPUs) : how to fix low memory clocks on GM204 cards

//