Multithreaded BOINC application questions

Message boards : Multicore CPUs : Multithreaded BOINC application questions

Author	Message
MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32874 - Posted: 10 Sep 2013 \| 21:14:29 UTC
	I'm curious to know which Boinc projects have multithreaded (CPU) applications. Do you like that type of app? Would you be happy to give over all your cores to a single task? (And what control does the Boinc client give you of core allocation)? Ta MJH
	ID: 32874 \| Rating: 0 \| rate: / Reply Quote

terencewee* Send message Joined: 29 May 12 Posts: 8 Credit: 21,605,500 RAC: 0 Level Scientific publications	Message 32876 - Posted: 11 Sep 2013 \| 1:42:04 UTC - in response to Message 32874.
	I remember a few years back AQUA@Home got a multi-threaded CPU app and MilkyWay@Home N-Body Simulation is multi-threaded. I remember very vaguely there was a mechanism to control AQUA cores allocation, I'll need to dig... Depending on the project's need, I'll allocate GPU if it is best served by GPU otherwise CPU as most projects started by using CPU-app first. As long as the project I'm contributing to is using resources I set aside for it, it doesn't matter whether single/multi threaded. ____________ terencewee* Sicituradastra.
	ID: 32876 \| Rating: 0 \| rate: / Reply Quote

terencewee* Send message Joined: 29 May 12 Posts: 8 Credit: 21,605,500 RAC: 0 Level Scientific publications	Message 32877 - Posted: 11 Sep 2013 \| 1:52:57 UTC
	Richard posted this over @Milkyway. I'm sure he'll chime in soonish... :) ____________ terencewee* Sicituradastra.
	ID: 32877 \| Rating: 0 \| rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 32911 - Posted: 12 Sep 2013 \| 8:39:16 UTC Last modified: 12 Sep 2013 \| 8:39:31 UTC
	Edges@home also has a multi-threaded vina app http://home.edges-grid.eu/home/ The little experience I've had with multi thread apps on Milkway and Edges shows that BOINC isn't fully baked on how to help us manage how many threads to use per WU and from my perspective this is not "production ready" situation as I have to work to hard to leave a core free to feed GPU WUs. If that situation was to change I would give those types of apps another shot to check out efficiency. ____________ Thanks - Steve
	ID: 32911 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32912 - Posted: 12 Sep 2013 \| 9:22:30 UTC - in response to Message 32911.
	Thanks for the replies. The situation does look too encouraging. Is it possible at least to limit the number of concurrent tasks for a specific project?
	ID: 32912 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 32913 - Posted: 12 Sep 2013 \| 12:44:31 UTC - in response to Message 32912.
	Thanks for the replies. The situation does look too encouraging. Is it possible at least to limit the number of concurrent tasks for a specific project? Yep, with an app_config. In fact it limits instances of a specific app. Here's an example: <app_config> <app> <name>ecm</name> <max_concurrent>2</max_concurrent> </app> </app_config>
	ID: 32913 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32914 - Posted: 12 Sep 2013 \| 12:59:24 UTC - in response to Message 32913.
	Thanks Beyond. It's a shame that's not exposed in the UI. Cheers, Matt
	ID: 32914 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 32965 - Posted: 15 Sep 2013 \| 11:30:15 UTC
	Just curious, what are you thinking about using it for? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 32965 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 32967 - Posted: 15 Sep 2013 \| 11:48:44 UTC - in response to Message 32965.
	Oh, no reason. But do stay tuned... Matt
	ID: 32967 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,893,911,970 RAC: 19,817,833 Level Scientific publications	Message 32968 - Posted: 15 Sep 2013 \| 11:50:10 UTC - in response to Message 32877.
	Richard posted this over @Milkyway. I'm sure he'll chime in soonish... :) Sorry, didn't notice this thread before. Yes, I'm test-running the Milkyway N-Body application at the moment. We had a bit of a struggle getting the server configured right to send the multi-threaded tasks out with the correct "command and control" structure (plan_class), but it's working now. Thread control is currently rudimentary, and the tendency is for a MT task to grab all cores, or none. Personally, I don't much like that - it wouldn't play well with CPU tasks from other projects if those tasks don't checkpoint frequently enough. But there's hope on the horizon. Jacob Klein and I ganged up on David A to get him to add [<app_version> <app_name>uppercase</app_name> [<plan_class>mt</plan_class>] [<avg_ncpus>x</avg_ncpus>] [<ngpus>x</ngpus>] [<cmdline>--nthreads 7</cmdline>] </app_version>] Each <app_version> element specifies parameters for a given app version; it overrides <app>. Supported in 7.3+ clients. to the app_config.xml specification - see Application configuration. Note that no BOINC v7.3 client exists yet, even for testing - we expect that to happen when a current rush of work on the Android platform abates a little. I'll be keeping an eye on it, and testing when available.
	ID: 32968 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 32981 - Posted: 15 Sep 2013 \| 23:09:12 UTC - in response to Message 32968.
	Good to here of some movement on the MT front. I can think of a few projects that might benefit. SimOne also used an MT app, but it has not been sending work for a while. I tended to run it in a VM. For more granular control I still think Boinc needs to move towards being resource orientated; so work gets applied to a resource or resource group. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help
	ID: 32981 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33016 - Posted: 16 Sep 2013 \| 21:02:06 UTC
	Actually I wouldn't mind if an app was MT or not, as long as BOINC still works as it should from my point of view. And that would be: - keep all GPUs busy, which will require a varying number of CPU cores depending on which project supplies work and which doesn't - fill the remaining slots / cores with CPU work according to the usual rules So if an MT app grabbed all cores at once I simply wouldn't run it. Or if the number of cores was set at download time I wouldn't run it either, because the number of CPU cores not neded to feed my GPUs changes in an unpredictable way. On the other hand: if an MT app could scale the number of cores used on the fly, upon request from BOINC, then we might have a winner. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33016 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33018 - Posted: 16 Sep 2013 \| 21:06:58 UTC
	BTW: whatever you want to do on multi-core CPUs can't involve large systems (since the big GPUs are best for these). But smaller tasks are inefficient on GPUs, I guess because not all shaders can be used often enough. Could you use one CUDA kernel to run several smaller simulations independently? I'm thinking about using one SM/SMX per simulation, and then packing as many different simulations (parameter variations etc.) into one WU as there are SM/SMX present. On GK110 you should also be able to dispatch work from up to 32 different kernels.. but who knows when this feature will trickle down to widely available cards. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33018 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 33023 - Posted: 16 Sep 2013 \| 21:48:10 UTC - in response to Message 33018.
	Could you use one CUDA kernel to run several smaller simulations independently? I'm thinking about using one SM/SMX per simulation, and then packing as many different simulations (parameter variations etc.) into one WU as there are SM/SMX present. The don't doesn't really work in that way. However, you can simply over-subscribe the GPU with tasks; that'll give you the same result. There's a bit of overhead in GPU memory, but not much performance loss if the GPU's a Kepler. On GK110 you should also be able to dispatch work from up to 32 different kernels.. but who knows when this feature will trickle down to widely available cards. The GK208 has the GK110's SMX, and very nice it is too. "HyperQ" and "dynamic Parallelism" aren't features that are useful for our application. Remember, we designed the application in G80 days, so most of the features of modern cards that make them easier to use are irrelevant for us. Matt
	ID: 33023 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 33025 - Posted: 16 Sep 2013 \| 21:54:01 UTC - in response to Message 33016.
	If we were to have a CPU app, it's likely that it would have to be multithreaded to be fast enough to be useful. Choosing the # of cores to use would certainly have to be a one-time descision made when the application starts. Since it seems that BOINC CPU user setting in conjunction with overrides in the app_config file gives you enough control to tweak to your satisfaction, the question that interests me is: what would be an acceptable default? Would one instance per host, using thread per core minus one for each GPU be reasonable? Matt
	ID: 33025 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 33029 - Posted: 16 Sep 2013 \| 23:41:28 UTC - in response to Message 33025. Last modified: 16 Sep 2013 \| 23:53:10 UTC
	I believe MilkyWay uses all logical CPUs, by default. And the new app_config.xml app_version controls... http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration ... we will get to play with soon, will allow a user to manually set both: - "real # logical cpus used" (via <cmdline>) , as well as - "boinc-task-scheduler-allocated # logical cpus" (via <avg_ncpus>) Basically, I got David to add this because I'm running into a problem where an 8-CPU MT MW task will start running, and then only 1 of my 3 GPUs can do work. I'd prefer to either overload the system (with 3 GPU tasks + 1 8-CPU MT task), or balance the system (with 3 GPU tasks + 1 5-or-6-CPU MT task), but the MilkyWay MT app (which uses all logical CPUs) doesn't allow me to do this currently. To answer your question about defaults... - Don't limit it to 1 instance per host. Maybe I want 10 days of your tasks - Give them to me! - Try to shoot for a completion time less than 24 hours. I personally like the idea of 8 hour tasks, with 3-10 minute checkpoints. - Regarding default number of threads... I think you'd want: Maximum(1, (#CPU - #GPU + 1)) Right? Remember, not every GPU task takes up a full logical CPU (GPU tasks can be from any attached project), and also the way BOINC works is that, if you have 8 processors, with a 6-core-MT-task, it will start 3 GPU tasks that each take 1 CPU. - Regarding task priority... You guys do a good job of ensuring your GPU task has base priority of 6. For an MT task, though, priority should be 1 or 4 for MT I think. Those are my opinions. Fyi: I run ~20 projects (MW is my only MT project, I run GPUGrid on 2 of my GPUs, POEM on 1 GPU when tasks are available, and I run Albert/Einstein/Seti/SetiBeta on my slow GTS 240 GPU). I keep a very close eye on system loads and process priorities, and have tweaked several projects with app_config.xml files. I know quite a bit about BOINC scheduling, but I'd wager Richard knows more.
	ID: 33029 \| Rating: 0 \| rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 33031 - Posted: 16 Sep 2013 \| 23:54:04 UTC
	I did this with Milyway n-body simulation on my quad core. When setting CPU use to 100% all four where used and the finished in minutes. When setting CPU use to 75%, only 3 where used. I noticed though it messed up with anoterh project running on CPU as well. I also suggest this at Rosetta to implement, as they have plenty of WU' that take a long time. Wit multithreaded app it should work faster, but I got a lot of comments that it was a bad idea, slowing down things and such. ____________ Greetings from TJ
	ID: 33031 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33101 - Posted: 19 Sep 2013 \| 20:21:39 UTC - in response to Message 33023.
	Could you use one CUDA kernel to run several smaller simulations independently? I'm thinking about using one SM/SMX per simulation, and then packing as many different simulations (parameter variations etc.) into one WU as there are SM/SMX present. The don't doesn't really work in that way. However, you can simply over-subscribe the GPU with tasks; that'll give you the same result. There's a bit of overhead in GPU memory, but not much performance loss if the GPU's a Kepler. I was thinking of putting the data/threads into the GPU like with vectors in Matlab. As long as only scalar operations are performed on each thread it wouldn't matter to which WU it belonged (as long as you can still sort them at the end). That obviously won't work with matrix multiplications, inversions etc. and probably not with your algorithm.. you know it pretty much infinitely better than I do ;) "HyperQ" and "dynamic Parallelism" aren't features that are useful for our application. Remember, we designed the application in G80 days, so most of the features of modern cards that make them easier to use are irrelevant for us. Well, if it was widely available you could use it to dispatch several small molecules at once in order to keep big GPU utilization high. But that's still very far in the future. Sure, running 2 WUs concurrently is nice to increase GPU utilization for short queue tasks. But as far as I understand these WUs have to interleave time slices, at any one point in time only one of them can have all shaders or none. So if you want to simulate smaller systems (otherwise there'd be no need to even consider using CPUs) a big GPU might have too many shaders to use at once. That's the problem I'm trying to think around.. but if even your small systems can easily use 10k+ shaders then I could stop right here. And might I suggest looking into Intel OpenCL first? The reason is simple: crunching on it is still hot & new, as far as I know there are only Collatz (which is pretty much useless) and Einstein to choose from. Adding a 3rd option, and a rather attractive one at that, would certainly be welcome. The CPU, on the other hand, is a ressource many projects fight for. I'm not sure you'd gain much by joining them. People might also think "hey, they could do this on smaller GPUs.. why should I sacrifice my CPU for it?". I don't know if the OpenCL libraries are already up to the task, but if not then I could imagine you might find some open ears at Intel. And then there's the long standing issue of using AMD GPUs. the current GCN architecture is surely much more flexible than the previous VLIWs and their OpenCL support has improved over time as well. Might as well do a re-evaluation here before starting something completely new (CPU-MT). MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33101 \| Rating: 0 \| rate: / Reply Quote

MJH Project administrator Project developer Project scientist Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 33110 - Posted: 19 Sep 2013 \| 22:35:09 UTC - in response to Message 33101.
	Well, if it was widely available you could use it to dispatch several small molecules at once in order to keep big GPU utilization high. But that's still very far in the future. Fortunately even the smallest systems that we'd want to run comfortably fill the GPU. The GPU load that your monitoring tools report is a time average - any low load that indicates that the GPU is spending some time completely idle between operations, not that individual operations aren't filling it up. Might as well do a re-evaluation here before starting something completely new (CPU-MT). The motivation for any CPU application would be to introduce new science capabilities, not extend the deployment of the current application. MJH
	ID: 33110 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 33118 - Posted: 20 Sep 2013 \| 21:15:06 UTC - in response to Message 33110.
	Thanks, I think I understand now. There are still a lot of operations, special functions etc. which can't be parallelized well on GPUs, so are still GPU territory. Comparable to POEM, where their GPU app can only be used for the most simple force field, but all other algorithms still require the CPU. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 33118 \| Rating: 0 \| rate: / Reply Quote

Coleslaw Send message Joined: 24 Jul 08 Posts: 36 Credit: 363,857,679 RAC: 0 Level Scientific publications	Message 37137 - Posted: 23 Jun 2014 \| 20:37:22 UTC
	Has there been any updates in this area? I ask because some of my team have switched from FAH and have came here and to a few other projects. A lot of them had server grade equipment with multiple processors that they were used to running MT apps on. Switching to running several single apps is a big change for them. I just wanted to know if this is possibly the next phase to the recent CPU release. ____________
	ID: 37137 \| Rating: 0 \| rate: / Reply Quote

Scalextrix[Gridcoin] Send message Joined: 27 Jan 09 Posts: 34 Credit: 185,313,973 RAC: 0 Level Scientific publications	Message 37438 - Posted: 27 Jul 2014 \| 8:46:41 UTC - in response to Message 37137.
	Hello I got a multi-threaded CPU test app yesterday and each time I restart BOINC I seem to lose all the progress, is that normal? BOINC estimates 84 hours to complete, so if this is the case I will have to abort as I dont run 24/7. Thanks.
	ID: 37438 \| Rating: 0 \| rate: / Reply Quote

GPUGRID Role account Send message Joined: 15 Feb 07 Posts: 134 Credit: 1,349,535,983 RAC: 0 Level Scientific publications	Message 37439 - Posted: 27 Jul 2014 \| 8:59:32 UTC - in response to Message 37438.
	Hello I got a multi-threaded CPU test app yesterday and each time I restart BOINC I seem to lose all the progress, is that normal? BOINC estimates 84 hours to complete, so if this is the case I will have to abort as I dont run 24/7. The app is checkpointng correctly but the progress reporting is broken, so the end estimates are incorrect. WUs should complete in between 1 and 8hr depending on your CPU. Matt
	ID: 37439 \| Rating: 0 \| rate: / Reply Quote

Scalextrix[Gridcoin] Send message Joined: 27 Jan 09 Posts: 34 Credit: 185,313,973 RAC: 0 Level Scientific publications	Message 37441 - Posted: 27 Jul 2014 \| 11:00:15 UTC - in response to Message 37439.
	Yes you are correct, it completed, thanks.
	ID: 37441 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 37470 - Posted: 29 Jul 2014 \| 1:43:15 UTC Last modified: 29 Jul 2014 \| 1:43:45 UTC
	MJH, Are you absolutely sure it's checkpointing correctly? I'm getting these multithreading tasks too now, on some of my laptops, and... well, they'll go for hours, and if I look at the "CPU time last checkpoint" it'll indicate that it has never checkpointed! I don't think checkpointing is working correctly -- is it?
	ID: 37470 \| Rating: 0 \| rate: / Reply Quote

floyd Send message Joined: 17 Dec 11 Posts: 11 Credit: 105,502,570 RAC: 0 Level Scientific publications	Message 37477 - Posted: 29 Jul 2014 \| 9:32:15 UTC
	Jacob, the app does checkpoint every 15 minutes, but it doesn't do it through BOINC so BOINC doesn't know about it.
	ID: 37477 \| Rating: 0 \| rate: / Reply Quote

GPUGRID Role account Send message Joined: 15 Feb 07 Posts: 134 Credit: 1,349,535,983 RAC: 0 Level Scientific publications	Message 37478 - Posted: 29 Jul 2014 \| 11:26:13 UTC - in response to Message 37470.
	Yes, it is. 845 and earlier don't report that they've checkpointed to BOINC. 846 will fix that. Matt
	ID: 37478 \| Rating: 0 \| rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 37481 - Posted: 29 Jul 2014 \| 12:49:33 UTC - in response to Message 37478. Last modified: 29 Jul 2014 \| 12:50:57 UTC
	Ah, so, it is checkpointing, but it's just not reporting it correctly to BOINC. I'd consider that "not working", but either way, it sounds like you guys are working to fix it. Thanks! PS: It's nice to have my non-GPU laptops doing work for GPUGrid.
	ID: 37481 \| Rating: 0 \| rate: / Reply Quote

GPUGRID Role account Send message Joined: 15 Feb 07 Posts: 134 Credit: 1,349,535,983 RAC: 0 Level Scientific publications	Message 37486 - Posted: 29 Jul 2014 \| 15:51:55 UTC - in response to Message 37481.
	Please post further replies to the news thread about the new application.
	ID: 37486 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Multicore CPUs : Multithreaded BOINC application questions

	About	Science	Volunteers	Performance	Forum	Join us	Donate