Advanced search

Message boards : Frequently Asked Questions (FAQ) : Please set CPU usage to 1 CPU for GPU Acemd tasks.

Author Message
L
Send message
Joined: 22 Mar 14
Posts: 41
Credit: 441,359,373
RAC: 329,226
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53851 - Posted: 4 Mar 2020 | 18:24:46 UTC

Hello!

Please set CPU usage to 1 CPU for GPU Acemd tasks. Because is it.
____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 253
Credit: 9,791,563,847
RAC: 2,173,005
Level
Tyr
Scientific publications
wat
Message 54090 - Posted: 26 Mar 2020 | 20:19:18 UTC

I do it in my app_config.xml file:

<app_config>
<app>
<name>acemd3</name>
<gpu_versions>
<cpu_usage>1.0</cpu_usage>
<gpu_usage>1.0</gpu_usage>
</gpu_versions>
</app>
<project_max_concurrent>2</project_max_concurrent>
</app_config>

Alessio Susi
Avatar
Send message
Joined: 7 Mar 15
Posts: 8
Credit: 20,265,543
RAC: 3,829
Level
Pro
Scientific publications
wat
Message 54206 - Posted: 3 Apr 2020 | 9:13:20 UTC

Is this useful?
____________
ASUS X570 E-Gaming
AMD Ryzen 9 3950X, 16 core / 32 thread 4.4 GHz
AMD Radeon Sapphire RX 480 4GB Nitro+
Nvidia GTX 1080 Ti Gaming X Trio
4x16 GB Corsair Vengeance RGB 3466 MHz

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 535,427,024
RAC: 1,708,858
Level
Lys
Scientific publications
wat
Message 54212 - Posted: 3 Apr 2020 | 16:30:19 UTC - in response to Message 54206.

Not really. Toni stated that with the new acemd3 app and wrapper, that the app will use a full cpu core all on its own without any additional configurations.

No need for the old SWAN_SYNC tweak we needed with acemd2.

L
Send message
Joined: 22 Mar 14
Posts: 41
Credit: 441,359,373
RAC: 329,226
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 54903 - Posted: 23 May 2020 | 9:15:46 UTC

I have a problem with the fact that ACEMD 3 actually utilizes 1 core, as a result, BOINC launches applications more than necessary and this is a big problem, because everything is drowning in context switching. Correct already this disgrace please, how much can you mock then ?????
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 535,427,024
RAC: 1,708,858
Level
Lys
Scientific publications
wat
Message 54909 - Posted: 23 May 2020 | 20:34:34 UTC

Then you must be running your cpu overcommitted to begin with.

The tasks as sent by the project by default are assigned:

0.986 CPUs + 1 NVIDIA GPU

That is sufficient cpu usage to support the task.

Reduce the overcommitment of your cpu to other projects.

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 1,133
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 54914 - Posted: 23 May 2020 | 23:14:39 UTC - in response to Message 54909.

Keith, this is not about changing the actual CPU usage. It is about making BOINC aware.

Then you must be running your cpu overcommitted to begin with.

That is what L is complaining about.

The tasks as sent by the project by default are assigned:

0.986 CPUs + 1 NVIDIA GPU

And this is what's causing it. As a long time BOINC user you'll know that 0.986 is interpreted as zero which is far from reality. Aurum's app_config addresses that, I do it the same way. But it would be better if we didn't have to override to begin with. The tasks use one CPU, not zero. Tell it like it is and all is well.

Reduce the overcommitment of your cpu to other projects.

The problem is caused by this project, not others.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 535,427,024
RAC: 1,708,858
Level
Lys
Scientific publications
wat
Message 54915 - Posted: 23 May 2020 | 23:26:30 UTC - in response to Message 54914.

Hi Floyd, you as a long time BOINC user also know that the cpu_usage values are only used for scheduling BOINC resources.

The actual science application, being acemd3 in our case is actually the only entity responsible for actual cpu usage used to support the gpu task.

The application will use as much or as little cpu as it requires and you have no control over the issue.

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 1,133
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 54917 - Posted: 24 May 2020 | 0:39:00 UTC - in response to Message 54915.

Hi Floyd, you as a long time BOINC user also know that the cpu_usage values are only used for scheduling BOINC resources.

I'm aware of that and I think the original poster is too, but after reading your replies I thought you had misinterpreted their request. Sorry if I got that wrong.

Indeed the problem is that with the current numbers BOINC does not schedule any CPU support for the GPU tasks when in fact they need a full core. That doesn't make much difference if you have idle cores but it certainly does when you run a full load of CPU tasks. The request is just to state the GPU tasks need one CPU core, not 0.986 which effectively means no CPU at all, to allow proper scheduling.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 965,102,122
RAC: 6,405,145
Level
Glu
Scientific publications
wat
Message 54918 - Posted: 24 May 2020 | 3:31:59 UTC - in response to Message 54914.


And this is what's causing it. As a long time BOINC user you'll know that 0.986 is interpreted as zero


can you cite a source for this? I'm not sure this is the case. please point me to the BOINC documentation or code segment that defines this behavior. the GPU allocation works as intended (0.5 = 0.5, 1=1, 0.33=0.33 and so on) so I have no reason to believe it doesn't account for CPU in increments less than 1 also.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 535,427,024
RAC: 1,708,858
Level
Lys
Scientific publications
wat
Message 54920 - Posted: 24 May 2020 | 4:19:58 UTC - in response to Message 54918.
Last modified: 24 May 2020 | 4:20:50 UTC

Actually Richard commented on this a little while ago somewhere. It depends where you are in the code.

Boinc is not consistent at all on how it interprets integer values. Some places in the code, anything less than 1 is interpreted as 0.

Elsewhere, 0.986 is rounded up to 1.

And 2.5 can be interpreted as either 2.0 or 3.0 in other sections of the code.

He lamented he wished that Boinc would be consistent everywhere in the applications on how integers were valued.

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 1,133
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 54929 - Posted: 24 May 2020 | 11:21:11 UTC - in response to Message 54918.


And this is what's causing it. As a long time BOINC user you'll know that 0.986 is interpreted as zero


can you cite a source for this? I'm not sure this is the case. please point me to the BOINC documentation or code segment that defines this behavior.

I'm no source diver and I don't remember seeing any documentation on this, one way or the other. I'm deducing from own observation. Note however that I'm not running the latest clients, please correct me if there were changes recently.

the GPU allocation works as intended (0.5 = 0.5, 1=1, 0.33=0.33 and so on) so I have no reason to believe it doesn't account for CPU in increments less than 1 also.

Actually GPU scheduling differs from CPU scheduling in several aspects. One difference is that BOINC schedules fractions of GPUs but not fractions of CPU cores. It's a whole core or nothing and 0.986 is not a whole core. 0.986+0.986=1 though, for CPUs. The addition is done before cutting off the fractional part so you'll get one CPU for two tasks (which still is not enough) but none for one task. Of course you won't notice this if BOINC doesn't make other use of the cores, i.e. schedule CPU tasks.

L
Send message
Joined: 22 Mar 14
Posts: 41
Credit: 441,359,373
RAC: 329,226
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 54980 - Posted: 27 May 2020 | 9:17:44 UTC
Last modified: 27 May 2020 | 9:31:39 UTC

Currently I fix the problem with crutch:

On Windows I place "app_config.xml" file

<app_config>
<app>
<name>acemd3</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<cpu_usage>1.0</cpu_usage>
<gpu_usage>1.0</gpu_usage>
</gpu_versions>
</app>
</app_config>

to the project folder:
%ALLUSERSPROFILE%\BOINC\projects\www.gpugrid.net\


It works, but it's crutch. I want the REAL fix from the team.

With out this crutch, GPU load only 20-30% because CPU load for acemd only 15-20% of 1 CPU core (S machine) and both machine (S https://www.gpugrid.net/show_host_detail.php?hostid=549893 and H https://www.gpugrid.net/show_host_detail.php?hostid=549849 ) have a big value of context switching. This is a critical issue!!!
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,544,177,532
RAC: 3,298,522
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54981 - Posted: 27 May 2020 | 10:25:32 UTC - in response to Message 54920.

Actually Richard commented ...

Sorry, I've been a bit busy lately. Coming back to this thread, and doing some researching, I came across this page:

https://boinc.berkeley.edu/trac/wiki/GpuSync

It's ten years old! It describes the problem, speculates about what to do, but so far as I know, nothing further has ever been done.

David Anderson, who I'm assuming wrote that document, likes to make BOINC servers as automatic as possible, to avoid administrators having to learn how to twiddle each mysterious knob. Fair enough, but the automatic generation of that 0.9xx figure isn't fit for purpose here. It ought to be 1 for busy-wait, or 'not 1' for interrupt-driven.

For the time being, we're in control. It's probably easier for us to add the app_config than for Toni to find and over-ride the automatic setting.

But the problem here is aggravated because Toni forgot (or hadn't been trained) to add the <priority>2</priority> line to the job.xml template, when he switched from native apps to using the wrapper. So our GPU apps are running at lowest priority, instead of 'below normal', and users with overcommitted CPUs will take a bigger hit than needed.

It's all a bit of a mess.

L
Send message
Joined: 22 Mar 14
Posts: 41
Credit: 441,359,373
RAC: 329,226
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 54982 - Posted: 27 May 2020 | 10:33:55 UTC
Last modified: 27 May 2020 | 11:32:03 UTC

-------------------
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,544,177,532
RAC: 3,298,522
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54983 - Posted: 27 May 2020 | 11:04:28 UTC
Last modified: 27 May 2020 | 11:07:29 UTC

Ian and I were having a conversation about this in https://github.com/BOINC/boinc/issues/3764.

You can add the missing priority line to job.xml.32charID locally - 2 would be the traditional value, other possibilities are described in https://boinc.berkeley.edu/trac/wiki/WrapperApp#Thejobdescriptionfile.

I found the edit was persistent over new task downloads and client restarts, Ian didn't. YMMV.

L
Send message
Joined: 22 Mar 14
Posts: 41
Credit: 441,359,373
RAC: 329,226
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 54984 - Posted: 27 May 2020 | 11:39:16 UTC

Finally. For host S, I "fixed" the problem with this config:

<app_config>
<app>
<name>acemd3</name>
<max_concurrent>3</max_concurrent>
<gpu_versions>
<cpu_usage>0.3</cpu_usage>
<gpu_usage>0.3</gpu_usage>
</gpu_versions>
</app>
</app_config>


Unfortunately, the scheduler gives only 2 tasks at a time, so loading only the GPU is only 70%, but this is better than 20% before the start of this experiment.

____________

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 253
Credit: 9,791,563,847
RAC: 2,173,005
Level
Tyr
Scientific publications
wat
Message 54989 - Posted: 27 May 2020 | 13:10:39 UTC
Last modified: 27 May 2020 | 13:11:04 UTC

I've never seen 20% GPU usage. I always install an app_config for this project and several others. I cannot imagine why using <cpu_usage>0.3</cpu_usage> is ever a good idea for GG.

I also do not understand how one can be over committed. I'm in the habit of leaving a CPU thread open in Preferences, e.g. 23/24 => Use at most 96% of the processors.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 965,102,122
RAC: 6,405,145
Level
Glu
Scientific publications
wat
Message 54991 - Posted: 27 May 2020 | 14:45:51 UTC - in response to Message 54983.

Ian and I were having a conversation about this in https://github.com/BOINC/boinc/issues/3764.

You can add the missing priority line to job.xml.32charID locally - 2 would be the traditional value, other possibilities are described in https://boinc.berkeley.edu/trac/wiki/WrapperApp#Thejobdescriptionfile.

I found the edit was persistent over new task downloads and client restarts, Ian didn't. YMMV.


not only was it not persistent, but after forcing it in place and after some time, and after trying to download new work, the project was trying to issue a new job.xml file (with the same exact file name) but it would fail because of the way I forced the setting. not having this download finish prevented any new work from running, even though an identical file was already there.

so it wont work unless the project fixes that.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,544,177,532
RAC: 3,298,522
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54992 - Posted: 27 May 2020 | 15:04:26 UTC - in response to Message 54991.

My priority line is still in the file, with an edit timestamp of around 32 hours ago (many tasks have passed under the bridge in that time!)

Linux Mint, editor launched from the File Browser with administrative rights.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 965,102,122
RAC: 6,405,145
Level
Glu
Scientific publications
wat
Message 54994 - Posted: 27 May 2020 | 15:21:43 UTC - in response to Message 54992.

I can't explain why. I did this with Ubuntu 20.04 on my test bench. I opened the file with the standard text editor (gedit) in the GUI, then added <priority>2</priority> and saved and closed the file. restarting BOINC (launched via boincmgr) results in the file being reverted without my edits.

I tried doing this again invoking root privs by running sudo gedit job.xml*, made my edits, saved, and same behavior.

just now, I ran a chown root job.xml* to change the owner to root (instead of me). and it still deleted and regenerated a new file without my edits when starting BOINC back up.

it seems the only thing that makes it stick is the chattr +i command, but then the project doesnt like it, tries to overwrite it, can't, and all tasks stop running.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,544,177,532
RAC: 3,298,522
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54995 - Posted: 27 May 2020 | 15:58:47 UTC - in response to Message 54994.

For comparison: different BOINC environment.

I have service install (from Gianfranco's PPA): I have added myself to the boinc user group created by the PPA: I think my file browser's 'administrative' mode is the equivalent of sudo (the same password works for both).

Wilgard
Send message
Joined: 4 Mar 20
Posts: 12
Credit: 2,168,312
RAC: 8,760
Level
Ala
Scientific publications
wat
Message 55090 - Posted: 1 Jul 2020 | 7:46:35 UTC - in response to Message 54984.

Hi L,
Is it still stable 1 month later ?

Post to thread

Message boards : Frequently Asked Questions (FAQ) : Please set CPU usage to 1 CPU for GPU Acemd tasks.