Advanced search

Message boards : Number crunching : CPU utilization varies ...

Author Message
ms113
Send message
Joined: 6 Feb 09
Posts: 19
Credit: 1,281,738
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6746 - Posted: 18 Feb 2009 | 11:40:34 UTC
Last modified: 18 Feb 2009 | 11:49:06 UTC

Hello,

I am new to this Project and during my first 10 GPUGrid WUs I made an interesting observation concerning the CPU utilization, where I could not find a clear answer right now:

The machine concerned is a AMD 64 X2 CPU (1 CPU, 2 Cores) and GT9800 GPU.

The status-display in the BM task-list is always almost the same (like "0.12 CPUs, 1 CUDA" .. was originally 0.03 CPUs, then 0.11 CPUS and now remains this 0.12 for all GPUGrid WUs .. wherever that comes from?!)

BUT ..
Some of the WUs use about 9% of one CPU, some WUs use about 40% of one CPU. And I could not find any identifiable difference within the WUs themselves.

some Examples:
WU ........ Status .............. Run-Time . CPU-Time . CPU-Fract. . Time p. Step
fe15599 ... (0,11 CPUs, 1 CUDA) . 11:00:52 . 04:24:05 . 39,96% ..... 79,3 ms
DZ22923 ... (0,11 CPUs, 1 CUDA) . 10:51:08 . 04:19:41 . 39,88% ..... 78,2 ms
GJ15698 ... (0,11 CPUs, 1 CUDA) . 12:35:17 . 01:08:57 .. 9,13% ..... 89,5 ms
WpM4344 ... (0,11 CPUs, 1 CUDA) . 16:13:43 . 06:32:51 . 40,35% ..... 82,8 ms
Bh15457 ... (0,12 CPUs, 1 CUDA) . 12:26:48 . 01:10:09 .. 9,39% ..... 88,3 ms
Im14464 ... (0,12 CPUs, 1 CUDA) . 16:33:30 . 06:33:42 . 39,63% ..... 79,5 ms

What I also could observe was, that the "system-choppiness" is depending on this CPU utilization.
The "40% WUs" are perceptibly less choppy then the "9% WUs".

In detail: On a 9500GT (I know its not recommended .. but I used it for testing) during the "40% WUs" it was possible to work relatively smooth and on the "9% WUs" it was nearly impossible to work while they where running.
Still on the 9800GT, during the "9% WUs" there is a perceptibly choppiness while during "40% WUs" the system runs totally smooth.

So, as you can see in my list, there is no big difference in the "Time per Step", which indicates to me that this could not be responsible for this really big difference in choppiness.

How does this all fit together??
Why does the CPU-Utilization differ so much between WUs, and why just 9% or 40%, and where does this come from?
Is there anything I could influence?

I could not find any clear information about this question searching this forum and the web ...

Sorry for this long post, but I tried to describe it clear.

Any ideas or explanations?

Thanks,
Martin

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6783 - Posted: 19 Feb 2009 | 18:36:24 UTC

Ok,

this was a question to which I did not really have an answer too ... so I didn't answer ... :)

What I do know is that the number in the "Status" column is an estimate. I cannot recall if it is set by the project or is calculated by the manager. Regardless, it is eye-candy and has no basis in reality ...

However, speculation and generalizations ...

GPU Grid is in testing while doing science. And so, there are different tasks ... of varying complexities and so, they will use different amounts of system resources. Though the time steps are the same I have no idea of what the memory "footprint" might be on the CUDA GPU ...

If you are doing something else on the desktop, or the card is memory constrained the CPU use might be going up as it is having to move more data onto and off the card.

The contrast between my i7 and Q9300 also may have a bearing on this phenomena ... my i7 in the dark days of high CPU use was running at 7% CPU per GPU while my Q9300 was running at 20% plus ... why? well, the Q9300 is slightly slower in clock speed, lots smaller in local cache (the i7 has L3 cache and the Q9300 does not), and likely differneces in the MB chip set and system memory speeds. all that means that the older / slower system needs more CPU do do the same work ... chip architecture of the CPU also plays a part here too ...

There is not much you can do to influence these things other than to buy faster components of higher quality.

As to the incongruity of the increased CPU with better system responsiveness, I can only guess that you were running into an I/O bottleneck with data to/from the GPU, or the CPU, or the memory system.

Fundamentally there are too many moving parts to really know what is going on here ...

Cold comfort I know ... but it is the only comfort I have... sorry I cannot help ... but, this was a question I knew I did not know the specific answer to ... so, ... I did not ... well, I did now ...

ms113
Send message
Joined: 6 Feb 09
Posts: 19
Credit: 1,281,738
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6940 - Posted: 23 Feb 2009 | 19:04:58 UTC
Last modified: 23 Feb 2009 | 19:07:13 UTC

Dear Sirs,

let me try it again .. maybe I asked the question the wrong way.

Some WUs use 40% of one CPU, an while they are running in the background, the system runs totally smooth.
Some WUs use 9% of one CPU and while they are running, the same system is really choppy. (Hard to work on!!)

Did anyone else make a similar kind of observation?
Is there a way to distinguish upfront these different kinds of WUs?
Does someone have some hints how to handle that?

I try to crunch 7/24 but due to these "9%" WUs I have to suspend GPUGrid from time to time while working ... but the "40%" WUs show definitely, that there is another way ...

Thanks for your answers in advance.

Best regards,
Martin

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 6941 - Posted: 23 Feb 2009 | 19:34:01 UTC - in response to Message 6940.
Last modified: 23 Feb 2009 | 19:36:25 UTC

Okay...now I understand a bit better. It would be helpful if you could identify some examples of each type. My thinking is that this probably maps to the different types of workunits that are being sent out and may reflect "different science" being done by those different types (i.e., different types/intensity of computations), but this would need to be verified by mapping onto similar workunit names.


Edit: Forgot to mention, the WU info in your first post uses the letters from the front of each workunit name, but what is rally needed is the letters and numbers from the middle (e.g., GPUTEST, JAN2, JAN4, SH2...US, etc.).

ms113
Send message
Joined: 6 Feb 09
Posts: 19
Credit: 1,281,738
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6942 - Posted: 23 Feb 2009 | 19:59:53 UTC - in response to Message 6941.
Last modified: 23 Feb 2009 | 20:00:40 UTC

Thanks for your quick reply.

Here we go .. these are the full task-names (some examples) according to my first post (in chronological order):

fe15599-SH2_US_9-1-20-SH2_US_9310000_0
DZ22923-SH2_US_9-1-20-SH2_US_9980000_1
gJ15698-SH2_US_2-24-40-SH2_US_2250000_0
WpM4344-SH2_US_6-7-10-SH2_US_61240000_1
Bh15457-SH2_US_2-18-40-SH2_US_230000_1
Im14464-SH2_US_6-5-10-SH2_US_62110000_0

.. so the "problematic ones" (the 9% CPU ones) out of them where:
gJ15698-SH2_US_2-24-40-SH2_US_2250000_0
Bh15457-SH2_US_2-18-40-SH2_US_230000_1

all the others ran with 40% CPU and where fine.

More details in my first post or on request.

I can't really interprete the details behind the names ..

Best regards,
Martin

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6945 - Posted: 23 Feb 2009 | 20:48:18 UTC - in response to Message 6942.


fe15599-SH2_US_9-1-20-SH2_US_9310000_0
DZ22923-SH2_US_9-1-20-SH2_US_9980000_1
gJ15698-SH2_US_2-24-40-SH2_US_2250000_0
WpM4344-SH2_US_6-7-10-SH2_US_61240000_1
Bh15457-SH2_US_2-18-40-SH2_US_230000_1
Im14464-SH2_US_6-5-10-SH2_US_62110000_0

.. so the "problematic ones" (the 9% CPU ones) out of them where:
gJ15698-SH2_US_2-24-40-SH2_US_2250000_0
Bh15457-SH2_US_2-18-40-SH2_US_230000_1

all the others ran with 40% CPU and where fine.


Well, your "9% ones" are obviously the US_2 WU's. You have two US_6 and two US_9 that you are saying aren't causing an issue. Checking my own completed tasks, I see that generally the US_2 WU's have a time step of about 10-12ms larger than the US_6 (15%). I guess maybe this is where your choppiness comes from -- the longer times the WU is on the GPU instead of updating your screen.

I have not noticed any usage above 1% since about 3 weeks ago when they reduced the cpu-usage dramatically. So, unfortunately I cannot provide any other wisdom to your issue.

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 6946 - Posted: 23 Feb 2009 | 21:28:56 UTC

Looking through my own workunits, I see the same pattern of US_2 workunits having about 10% longer "Time per step" times than other work (US_6 & US_10). This means that these tasks are indeed more computationally intense than the others (maybe a project admin can explain why they do are?).

As for the choppiness, I have not seen a noticeable difference on my 9600GSO which is a bit slower than your 9800GT (as is K1atOdessa's 9500GT), but we are both in Windows so this may be an issue with Linux?


ms113
Send message
Joined: 6 Feb 09
Posts: 19
Credit: 1,281,738
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6951 - Posted: 24 Feb 2009 | 0:07:10 UTC
Last modified: 24 Feb 2009 | 0:09:14 UTC

Thanks .. this shows clear to me, that really these US_2 WUs are teasing me.

The application version my machine (Linux) is using is 6.59
.. the actual windows app version (i've seen) is 6.62.

maybe it depends on that? (of course in combination with US_2 tasks)

I don't know if the app-versions on different OS are somehow comparable (in the sense of 6.59 is three steps "behind" 6.62)

Also I have no idea how fast these apps are changing (since I am just 2 weeks with the project)

However ..

do you think this could change in a near future?

would it make sense for me, to "skip" these "US_2" WUs?
.. are there any negative side effects, aborting some WUs?
.. is there any way to filter them out?

Thanks in advance,
Martin

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 6955 - Posted: 24 Feb 2009 | 1:36:01 UTC - in response to Message 6951.

I can't really say much about the Linux distro's, so I'll leave that to others...though I would say that the BOINC client is changing quickly these days to handle the new GPU crunching, so I'd expect several changes (though not all will be good ones).

As far as the work types here, I'd expect these to change greatly. The project scientists have indicated that there will be many different types of work, so much that it will not be feasible to institute a workunit type listing (such as that at PrimeGrid for example). Thus, I am not sure that i would worry too much about trying to abort certain types. If your machine gets choppy, just suspend until your done working and then continue. If you need to abort a few due to deadline issues, they will go back into the queue and be picked up by other machines fairly quickly.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6962 - Posted: 24 Feb 2009 | 5:10:52 UTC

The one down side is that your daily numbers of tasks you can download may go down some ... though you likely only queue two tasks and take a day to complete each one, so this may not be a significant issue. Just something to be aware of ... the daily count rises with each success, so again, it may not be a significant issue ... just a possible issue ...

J.D.
Send message
Joined: 2 Jan 09
Posts: 40
Credit: 16,762,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 6971 - Posted: 24 Feb 2009 | 22:33:38 UTC - in response to Message 6951.


would it make sense for me, to "skip" these "US_2" WUs?
.. are there any negative side effects, aborting some WUs?
.. is there any way to filter them out?


If that's the only work unit type that give you trouble, you can safely abort it. Every once in a while, you'll see several of that type in a row, but more often then not you'll get a non-US_2 WU to replace it and some other machine will complete it.

If nearly everyone always aborted that work unit type I reckon that it would often exceed the five time retry limit, but as it is I don't see that happening.

I'm not aware of any automatic method of filtering it out other than aborting it.

Ironically, I prefer those work units because of the lower CPU usage on my machine, so don't hesitate to send them on over. ;-)

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7232 - Posted: 6 Mar 2009 | 9:48:55 UTC

I also encounter the same issue under windows so its not related to linux but a 9600 Gt is prolly the slowest card for doing this application :D

Post to thread

Message boards : Number crunching : CPU utilization varies ...

//