Advanced search

Message boards : Multicore CPUs : Multithreaded BOINC application questions

Author Message
Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32874 - Posted: 10 Sep 2013 | 21:14:29 UTC

I'm curious to know which Boinc projects have multithreaded (CPU) applications. Do you like that type of app? Would you be happy to give over all your cores to a single task? (And what control does the Boinc client give you of core allocation)?

Ta

MJH

terencewee*
Send message
Joined: 29 May 12
Posts: 8
Credit: 21,605,500
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 32876 - Posted: 11 Sep 2013 | 1:42:04 UTC - in response to Message 32874.

I remember a few years back AQUA@Home got a multi-threaded CPU app and MilkyWay@Home N-Body Simulation is multi-threaded.

I remember very vaguely there was a mechanism to control AQUA cores allocation, I'll need to dig...

Depending on the project's need, I'll allocate GPU if it is best served by GPU otherwise CPU as most projects started by using CPU-app first.

As long as the project I'm contributing to is using resources I set aside for it, it doesn't matter whether single/multi threaded.



____________
terencewee*
Sicituradastra.

terencewee*
Send message
Joined: 29 May 12
Posts: 8
Credit: 21,605,500
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 32877 - Posted: 11 Sep 2013 | 1:52:57 UTC

Richard posted this over @Milkyway.

I'm sure he'll chime in soonish... :)


____________
terencewee*
Sicituradastra.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32911 - Posted: 12 Sep 2013 | 8:39:16 UTC
Last modified: 12 Sep 2013 | 8:39:31 UTC

Edges@home also has a multi-threaded vina app http://home.edges-grid.eu/home/

The little experience I've had with multi thread apps on Milkway and Edges shows that BOINC isn't fully baked on how to help us manage how many threads to use per WU and from my perspective this is not "production ready" situation as I have to work to hard to leave a core free to feed GPU WUs.

If that situation was to change I would give those types of apps another shot to check out efficiency.
____________
Thanks - Steve

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32912 - Posted: 12 Sep 2013 | 9:22:30 UTC - in response to Message 32911.

Thanks for the replies. The situation does look too encouraging.
Is it possible at least to limit the number of concurrent tasks for a specific project?

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32913 - Posted: 12 Sep 2013 | 12:44:31 UTC - in response to Message 32912.

Thanks for the replies. The situation does look too encouraging.
Is it possible at least to limit the number of concurrent tasks for a specific project?

Yep, with an app_config. In fact it limits instances of a specific app. Here's an example:

<app_config>
<app>
<name>ecm</name>
<max_concurrent>2</max_concurrent>
</app>
</app_config>

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32914 - Posted: 12 Sep 2013 | 12:59:24 UTC - in response to Message 32913.

Thanks Beyond. It's a shame that's not exposed in the UI.

Cheers,

Matt

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32965 - Posted: 15 Sep 2013 | 11:30:15 UTC

Just curious, what are you thinking about using it for?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32967 - Posted: 15 Sep 2013 | 11:48:44 UTC - in response to Message 32965.

Oh, no reason. But do stay tuned...

Matt

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,893,911,970
RAC: 19,817,833
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32968 - Posted: 15 Sep 2013 | 11:50:10 UTC - in response to Message 32877.

Richard posted this over @Milkyway.

I'm sure he'll chime in soonish... :)

Sorry, didn't notice this thread before. Yes, I'm test-running the Milkyway N-Body application at the moment. We had a bit of a struggle getting the server configured right to send the multi-threaded tasks out with the correct "command and control" structure (plan_class), but it's working now.

Thread control is currently rudimentary, and the tendency is for a MT task to grab all cores, or none. Personally, I don't much like that - it wouldn't play well with CPU tasks from other projects if those tasks don't checkpoint frequently enough.

But there's hope on the horizon. Jacob Klein and I ganged up on David A to get him to add

[<app_version>
<app_name>uppercase</app_name>
[<plan_class>mt</plan_class>]
[<avg_ncpus>x</avg_ncpus>]
[<ngpus>x</ngpus>]
[<cmdline>--nthreads 7</cmdline>]
</app_version>]

Each <app_version> element specifies parameters for a given app version; it overrides <app>. Supported in 7.3+ clients.

to the app_config.xml specification - see Application configuration.

Note that no BOINC v7.3 client exists yet, even for testing - we expect that to happen when a current rush of work on the Android platform abates a little. I'll be keeping an eye on it, and testing when available.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32981 - Posted: 15 Sep 2013 | 23:09:12 UTC - in response to Message 32968.

Good to here of some movement on the MT front. I can think of a few projects that might benefit.
SimOne also used an MT app, but it has not been sending work for a while.
I tended to run it in a VM.

For more granular control I still think Boinc needs to move towards being resource orientated; so work gets applied to a resource or resource group.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33016 - Posted: 16 Sep 2013 | 21:02:06 UTC

Actually I wouldn't mind if an app was MT or not, as long as BOINC still works as it should from my point of view. And that would be:

- keep all GPUs busy, which will require a varying number of CPU cores depending on which project supplies work and which doesn't
- fill the remaining slots / cores with CPU work according to the usual rules

So if an MT app grabbed all cores at once I simply wouldn't run it. Or if the number of cores was set at download time I wouldn't run it either, because the number of CPU cores not neded to feed my GPUs changes in an unpredictable way.

On the other hand: if an MT app could scale the number of cores used on the fly, upon request from BOINC, then we might have a winner.

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33018 - Posted: 16 Sep 2013 | 21:06:58 UTC

BTW: whatever you want to do on multi-core CPUs can't involve large systems (since the big GPUs are best for these). But smaller tasks are inefficient on GPUs, I guess because not all shaders can be used often enough.

Could you use one CUDA kernel to run several smaller simulations independently? I'm thinking about using one SM/SMX per simulation, and then packing as many different simulations (parameter variations etc.) into one WU as there are SM/SMX present.

On GK110 you should also be able to dispatch work from up to 32 different kernels.. but who knows when this feature will trickle down to widely available cards.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33023 - Posted: 16 Sep 2013 | 21:48:10 UTC - in response to Message 33018.


Could you use one CUDA kernel to run several smaller simulations independently? I'm thinking about using one SM/SMX per simulation, and then packing as many different simulations (parameter variations etc.) into one WU as there are SM/SMX present.


The don't doesn't really work in that way. However, you can simply over-subscribe the GPU with tasks; that'll give you the same result. There's a bit of overhead in GPU memory, but not much performance loss if the GPU's a Kepler.


On GK110 you should also be able to dispatch work from up to 32 different kernels.. but who knows when this feature will trickle down to widely available cards.


The GK208 has the GK110's SMX, and very nice it is too. "HyperQ" and "dynamic Parallelism" aren't features that are useful for our application. Remember, we designed the application in G80 days, so most of the features of modern cards that make them easier to use are irrelevant for us.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33025 - Posted: 16 Sep 2013 | 21:54:01 UTC - in response to Message 33016.

If we were to have a CPU app, it's likely that it would have to be multithreaded to be fast enough to be useful. Choosing the # of cores to use would certainly have to be a one-time descision made when the application starts.

Since it seems that BOINC CPU user setting in conjunction with overrides in the app_config file gives you enough control to tweak to your satisfaction, the question that interests me is: what would be an acceptable default? Would one instance per host, using thread per core minus one for each GPU be reasonable?

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33029 - Posted: 16 Sep 2013 | 23:41:28 UTC - in response to Message 33025.
Last modified: 16 Sep 2013 | 23:53:10 UTC

I believe MilkyWay uses all logical CPUs, by default.

And the new app_config.xml app_version controls...
http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
... we will get to play with soon, will allow a user to manually set both:
- "real # logical cpus used" (via <cmdline>) , as well as
- "boinc-task-scheduler-allocated # logical cpus" (via <avg_ncpus>)

Basically, I got David to add this because I'm running into a problem where an 8-CPU MT MW task will start running, and then only 1 of my 3 GPUs can do work. I'd prefer to either overload the system (with 3 GPU tasks + 1 8-CPU MT task), or balance the system (with 3 GPU tasks + 1 5-or-6-CPU MT task), but the MilkyWay MT app (which uses all logical CPUs) doesn't allow me to do this currently.

To answer your question about defaults...
- Don't limit it to 1 instance per host. Maybe I want 10 days of your tasks - Give them to me!
- Try to shoot for a completion time less than 24 hours. I personally like the idea of 8 hour tasks, with 3-10 minute checkpoints.
- Regarding default number of threads... I think you'd want:
Maximum(1, (#CPU - #GPU + 1))
Right? Remember, not every GPU task takes up a full logical CPU (GPU tasks can be from any attached project), and also the way BOINC works is that, if you have 8 processors, with a 6-core-MT-task, it will start 3 GPU tasks that each take 1 CPU.
- Regarding task priority... You guys do a good job of ensuring your GPU task has base priority of 6. For an MT task, though, priority should be 1 or 4 for MT I think.

Those are my opinions.

Fyi: I run ~20 projects (MW is my only MT project, I run GPUGrid on 2 of my GPUs, POEM on 1 GPU when tasks are available, and I run Albert/Einstein/Seti/SetiBeta on my slow GTS 240 GPU). I keep a very close eye on system loads and process priorities, and have tweaked several projects with app_config.xml files. I know quite a bit about BOINC scheduling, but I'd wager Richard knows more.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33031 - Posted: 16 Sep 2013 | 23:54:04 UTC

I did this with Milyway n-body simulation on my quad core. When setting CPU use to 100% all four where used and the finished in minutes. When setting CPU use to 75%, only 3 where used. I noticed though it messed up with anoterh project running on CPU as well.
I also suggest this at Rosetta to implement, as they have plenty of WU' that take a long time. Wit multithreaded app it should work faster, but I got a lot of comments that it was a bad idea, slowing down things and such.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33101 - Posted: 19 Sep 2013 | 20:21:39 UTC - in response to Message 33023.


Could you use one CUDA kernel to run several smaller simulations independently? I'm thinking about using one SM/SMX per simulation, and then packing as many different simulations (parameter variations etc.) into one WU as there are SM/SMX present.

The don't doesn't really work in that way. However, you can simply over-subscribe the GPU with tasks; that'll give you the same result. There's a bit of overhead in GPU memory, but not much performance loss if the GPU's a Kepler.


I was thinking of putting the data/threads into the GPU like with vectors in Matlab. As long as only scalar operations are performed on each thread it wouldn't matter to which WU it belonged (as long as you can still sort them at the end). That obviously won't work with matrix multiplications, inversions etc. and probably not with your algorithm.. you know it pretty much infinitely better than I do ;)

"HyperQ" and "dynamic Parallelism" aren't features that are useful for our application. Remember, we designed the application in G80 days, so most of the features of modern cards that make them easier to use are irrelevant for us.

Well, if it was widely available you could use it to dispatch several small molecules at once in order to keep big GPU utilization high. But that's still very far in the future.

Sure, running 2 WUs concurrently is nice to increase GPU utilization for short queue tasks. But as far as I understand these WUs have to interleave time slices, at any one point in time only one of them can have all shaders or none. So if you want to simulate smaller systems (otherwise there'd be no need to even consider using CPUs) a big GPU might have too many shaders to use at once. That's the problem I'm trying to think around.. but if even your small systems can easily use 10k+ shaders then I could stop right here.

And might I suggest looking into Intel OpenCL first? The reason is simple: crunching on it is still hot & new, as far as I know there are only Collatz (which is pretty much useless) and Einstein to choose from. Adding a 3rd option, and a rather attractive one at that, would certainly be welcome. The CPU, on the other hand, is a ressource many projects fight for. I'm not sure you'd gain much by joining them. People might also think "hey, they could
do this on smaller GPUs.. why should I sacrifice my CPU for it?". I don't know if the OpenCL libraries are already up to the task, but if not then I could imagine you might find some open ears at Intel.

And then there's the long standing issue of using AMD GPUs. the current GCN architecture is surely much more flexible than the previous VLIWs and their OpenCL support has improved over time as well. Might as well do a re-evaluation here before starting something completely new (CPU-MT).

MrS
____________
Scanning for our furry friends since Jan 2002

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33110 - Posted: 19 Sep 2013 | 22:35:09 UTC - in response to Message 33101.


Well, if it was widely available you could use it to dispatch several small molecules at once in order to keep big GPU utilization high. But that's still very far in the future.


Fortunately even the smallest systems that we'd want to run comfortably fill the GPU. The GPU load that your monitoring tools report is a time average - any low load that indicates that the GPU is spending some time completely idle between operations, not that individual operations aren't filling it up.


Might as well do a re-evaluation here before starting something completely new (CPU-MT).


The motivation for any CPU application would be to introduce new science capabilities, not extend the deployment of the current application.

MJH

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33118 - Posted: 20 Sep 2013 | 21:15:06 UTC - in response to Message 33110.

Thanks, I think I understand now. There are still a lot of operations, special functions etc. which can't be parallelized well on GPUs, so are still GPU territory. Comparable to POEM, where their GPU app can only be used for the most simple force field, but all other algorithms still require the CPU.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 36
Credit: 363,857,679
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37137 - Posted: 23 Jun 2014 | 20:37:22 UTC

Has there been any updates in this area? I ask because some of my team have switched from FAH and have came here and to a few other projects. A lot of them had server grade equipment with multiple processors that they were used to running MT apps on. Switching to running several single apps is a big change for them. I just wanted to know if this is possibly the next phase to the recent CPU release.
____________

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 185,313,973
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37438 - Posted: 27 Jul 2014 | 8:46:41 UTC - in response to Message 37137.

Hello I got a multi-threaded CPU test app yesterday and each time I restart BOINC I seem to lose all the progress, is that normal? BOINC estimates 84 hours to complete, so if this is the case I will have to abort as I dont run 24/7.

Thanks.


GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37439 - Posted: 27 Jul 2014 | 8:59:32 UTC - in response to Message 37438.


Hello I got a multi-threaded CPU test app yesterday and each time I restart BOINC I seem to lose all the progress, is that normal? BOINC estimates 84 hours to complete, so if this is the case I will have to abort as I dont run 24/7.


The app is checkpointng correctly but the progress reporting is broken, so the end estimates are incorrect. WUs should complete in between 1 and 8hr depending on your CPU.

Matt

Profile Scalextrix[Gridcoin]
Send message
Joined: 27 Jan 09
Posts: 34
Credit: 185,313,973
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37441 - Posted: 27 Jul 2014 | 11:00:15 UTC - in response to Message 37439.

Yes you are correct, it completed, thanks.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37470 - Posted: 29 Jul 2014 | 1:43:15 UTC
Last modified: 29 Jul 2014 | 1:43:45 UTC

MJH,

Are you absolutely sure it's checkpointing correctly? I'm getting these multithreading tasks too now, on some of my laptops, and... well, they'll go for hours, and if I look at the "CPU time last checkpoint" it'll indicate that it has never checkpointed!

I don't think checkpointing is working correctly -- is it?

floyd
Send message
Joined: 17 Dec 11
Posts: 11
Credit: 105,502,570
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 37477 - Posted: 29 Jul 2014 | 9:32:15 UTC

Jacob,

the app does checkpoint every 15 minutes, but it doesn't do it through BOINC so BOINC doesn't know about it.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37478 - Posted: 29 Jul 2014 | 11:26:13 UTC - in response to Message 37470.

Yes, it is. 845 and earlier don't report that they've checkpointed to BOINC. 846 will fix that.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37481 - Posted: 29 Jul 2014 | 12:49:33 UTC - in response to Message 37478.
Last modified: 29 Jul 2014 | 12:50:57 UTC

Ah, so, it is checkpointing, but it's just not reporting it correctly to BOINC. I'd consider that "not working", but either way, it sounds like you guys are working to fix it. Thanks!

PS: It's nice to have my non-GPU laptops doing work for GPUGrid.

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37486 - Posted: 29 Jul 2014 | 15:51:55 UTC - in response to Message 37481.

Please post further replies to the news thread about the new application.

Post to thread

Message boards : Multicore CPUs : Multithreaded BOINC application questions

//