Advanced search

Message boards : Graphics cards (GPUs) : WUs still moderately CPU intensive?

Author Message
Michael Milan
Send message
Joined: 19 Jan 09
Posts: 4
Credit: 1,037,300
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 5811 - Posted: 20 Jan 2009 | 9:45:01 UTC

Hello! I recieved a new GTX 280 yesterday, so I decided to join this project to try it out.

I was surprised to see that my CPU runs at a consistent 60% during a CUDA WU. Is this normal? I was under the impression that the CPU would more or less be idle while the GPU does all the work. My cpu is quite old: a single core Athlon 64 3200+ running 64-bit Windows Vista.

I also noticed that if I have another project attached and looking for work, BOINC will run a CPU WU and a CUDA WU at the same time, but then the CUDA WU slows way, way down.

It's interesting stuff. I've decided to run GPUGrid exclusively because of this. But I'm wondering if the CUDA performance issue will still be there when I upgrade to a multi-core CPU in the near future.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5817 - Posted: 20 Jan 2009 | 11:37:12 UTC - in response to Message 5811.

Hello! I recieved a new GTX 280 yesterday, so I decided to join this project to try it out.

I was surprised to see that my CPU runs at a consistent 60% during a CUDA WU. Is this normal? I was under the impression that the CPU would more or less be idle while the GPU does all the work. My cpu is quite old: a single core Athlon 64 3200+ running 64-bit Windows Vista.

I also noticed that if I have another project attached and looking for work, BOINC will run a CPU WU and a CUDA WU at the same time, but then the CUDA WU slows way, way down.

It's interesting stuff. I've decided to run GPUGrid exclusively because of this. But I'm wondering if the CUDA performance issue will still be there when I upgrade to a multi-core CPU in the near future.


It is normal for the moment.

The CPU usage is higher than it should be... the project is working on it.

Depending on what system you buy, YMMV ... :)

BUT, I am running a Q9300 with a 9800 GT and an i7 with a GTX280 and aside from losing a percentage of the ability of the core to support a normal project at full speed there are not significant issues. In other words, I am happily running from 5 to 20 projects on the two systems and though I know there is an impact, I cannot see it ... I am sure that if I ran only one project and did careful measurements I could figure out EXACTLY what the impact is ...

BUt it is a who cares argument... I support this project because I can put a GPU to work ... the pay is good and the support is (at the moment, it could change tomorrow) excellent from the project. In other words, they are working hard at the issues ...

I would guess, because the system is old that you are seeing bus contention issues and you probably have one of the slower PCI-e bus ... tips, get a PCI-e 2.0 bus that supports the 16 times speed. I would also suggest if you might be subject to BOINC addiction that you get a MB that can support multiple GPUs (also assuming you can afford upgrades) in which case get a bigger PSU to start than you need for the basic system.

I really like the i7 in that you have 4 cores with HT so you get the effective power of 8 CPUs ... each of which, for my two systems beats the equivelent in the Q9300 though not by huge margins. Still, that is a testament to the cache, tri-channel RAM and general vibes of the world (or something) ...
____________

Profile Lazarus-uk
Send message
Joined: 16 Nov 08
Posts: 29
Credit: 122,821,515
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5818 - Posted: 20 Jan 2009 | 11:38:04 UTC - in response to Message 5811.

Hello! I recieved a new GTX 280 yesterday, so I decided to join this project to try it out.

I was surprised to see that my CPU runs at a consistent 60% during a CUDA WU. Is this normal? I was under the impression that the CPU would more or less be idle while the GPU does all the work. My cpu is quite old: a single core Athlon 64 3200+ running 64-bit Windows Vista.

I also noticed that if I have another project attached and looking for work, BOINC will run a CPU WU and a CUDA WU at the same time, but then the CUDA WU slows way, way down.

It's interesting stuff. I've decided to run GPUGrid exclusively because of this. But I'm wondering if the CUDA performance issue will still be there when I upgrade to a multi-core CPU in the near future.



Unfortunately, under Windows, you will probably need a dedicated CPU for GPU processing. We are all waiting for the release of the new App where this will no longer be necessary.

On my quad, I run 3 CPU tasks + 1 GPU task under Windows, however, under Linux, on the same machine, I can run 4 CPU + 1 GPU with little loss of time for the CPU tasks. Needless to say, I have been using Linux a lot lately ;)

So, you will probably find that it's better, and more profitable, credit-wise and time-wise, to run solely GPU on your machine.

HTH


Mark


Michael Milan
Send message
Joined: 19 Jan 09
Posts: 4
Credit: 1,037,300
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 5820 - Posted: 20 Jan 2009 | 12:17:46 UTC

Thanks for the answers, guys!
You're right, Paul, I'm using the older PCI-E 1.0 bus. Anyway, I'm planning to upgrade to a Core i7 sometime soon, hopefully!

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5823 - Posted: 20 Jan 2009 | 18:35:18 UTC - in response to Message 5820.

Thanks for the answers, guys!
You're right, Paul, I'm using the older PCI-E 1.0 bus. Anyway, I'm planning to upgrade to a Core i7 sometime soon, hopefully!


I like mine ... :)

Want another ... in dual configuration if possible ... 16 CPUs .. yum!



____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5824 - Posted: 20 Jan 2009 | 18:58:15 UTC - in response to Message 5817.
Last modified: 20 Jan 2009 | 18:58:42 UTC

I would guess, because the system is old that you are seeing bus contention issues and you probably have one of the slower PCI-e bus ... tips, get a PCI-e 2.0 bus that supports the 16 times speed.


Well, if someone buys a new mobo it shouldn't be less than PCIe 2.0. But don't count on it to improve performance in any way! It's not that there is any real data being transferred: the CPU is just asking the GPU "are you finished yet?" all the time and all calculations are done locally on the GPU.

And besides, PCIe 2.0 is "just" double the speed of version 1.0. You can have both with 1, 4, 8 or 16 lanes in parallel.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5829 - Posted: 20 Jan 2009 | 21:08:38 UTC - in response to Message 5824.

I would guess, because the system is old that you are seeing bus contention issues and you probably have one of the slower PCI-e bus ... tips, get a PCI-e 2.0 bus that supports the 16 times speed.


Well, if someone buys a new mobo it shouldn't be less than PCIe 2.0. But don't count on it to improve performance in any way! It's not that there is any real data being transferred: the CPU is just asking the GPU "are you finished yet?" all the time and all calculations are done locally on the GPU.

And besides, PCIe 2.0 is "just" double the speed of version 1.0. You can have both with 1, 4, 8 or 16 lanes in parallel.

MrS


I guess I was not completely clear ... I did say slower PCI-e bus but also mentioned contention and there i was thinking but not clearly expressing that there is likely bus contention of all the I/O systems, and system memory bus too.

These were issues with systems of the era I understood was the case from the class of processor he described, note that this is in comparison with today's systems. The key is the reported increase in processing time of the GPU tasks when a CPU task is run at the same time.

Moot points I suppose ... but good justification for a new system ...
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5830 - Posted: 20 Jan 2009 | 21:43:40 UTC - in response to Message 5829.
Last modified: 20 Jan 2009 | 21:46:42 UTC

Oh, now I get your point.

Something like that never crossed my mind because even though a Pentium (4) D is relatively slow, each clock cycle still takes about 0.3 ns. The minimum time interval the windows scheduler knows is 1 ms. For optimal GPU performance with active polling we'd have to poll the GPU constantly, which is exactly what the first clients were doing. Newer ones tried to poll the GPU only when it's neccessary, i.e. when it's reasonable to assume that it would be ready. On windows the minimum accuracy for such polls is 1 ms. That he's seeing a slow down under heavy cpu usage indicates that other software is blocking the neccessary polls.. by more than milliseconds. That's more than 6 orders of magnitude longer than the cpu cycles. It's <1 kHz compared to 3 GHz or 100 MHz for the internal busses.

That's why I intuitively assigned his performance problems at 2+1 to software rather than hardware capability. For some people 2+1 (dual) or 4+1 (quad) doesn't work as well as for most and I think we could not identify any specific reason for that yet.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5831 - Posted: 20 Jan 2009 | 22:18:36 UTC - in response to Message 5830.

Oh, now I get your point.


Cool ... :)

... That's why I intuitively assigned his performance problems at 2+1 to software rather than hardware capability. For some people 2+1 (dual) or 4+1 (quad) doesn't work as well as for most and I think we could not identify any specific reason for that yet.

MrS


And that is the reason I did the opposite. The polling loop and associated "stuff" to groom the GPU I assumed to be fairly small as these things go ... thus making a light load on the memory/cache system ... that this light load cannot be serviced indicated systemic bottlenecks ...

Also, there are some truly bad MB out there ... I know I bought some ... replacing some MB with the same processor and memory at time saw a doubling of performance ... one of the reasons I tend to only buy MB from a couple names ... they have proven their worth to me ...

As to the last point ... start asking MB MFGR and model numbers ... if the same brand or model or even chipset starts to show up ... well, there is the answer. In that we are really in very early days of GPU computing (as these things go) these issues are yet to really be addressed.
____________

Post to thread

Message boards : Graphics cards (GPUs) : WUs still moderately CPU intensive?

//