Advanced search

Message boards : Number crunching : How do I tell GPUgrid to keep more than two work units in reserve?

Author Message
Luke Formosa
Send message
Joined: 11 Jul 12
Posts: 32
Credit: 33,298,777
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 26326 - Posted: 16 Jul 2012 | 0:35:31 UTC

At the moment I'm running GPUgrid on the GPU and World Community Grid on the CPU. My internet connection isn't very reliable, so I have the BOINC preferences set to "connect to the internet every 2 days" and work buffer 2 days also. And it works well with WCG, because I have about 30 "ready to start" tasks in the queue.

However, GPUgrid only ever has 2 tasks or so in the queue. With my GTX670, that's a maximum of 10 hours of work. So if my internet stops working for over 10 hours, the GPU ends up idle. Is there any way to tell GPUgrid to keep more tasks in reserve, but without getting any more WCG tasks added to the queue also?
____________

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26328 - Posted: 16 Jul 2012 | 2:35:48 UTC - in response to Message 26326.

The max number of WUs GPUGrid allows per GPU is 2 at a time.
____________
Thanks - Steve

Luke Formosa
Send message
Joined: 11 Jul 12
Posts: 32
Credit: 33,298,777
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 26330 - Posted: 16 Jul 2012 | 9:13:53 UTC - in response to Message 26328.
Last modified: 16 Jul 2012 | 9:14:30 UTC

You mean regardless of my settings and the speed of my graphics card I will always have only one task on standby besides the one active task? Isn't there anything that can be done about this?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26332 - Posted: 16 Jul 2012 | 11:12:50 UTC - in response to Message 26330.

Probably best to just select the long runs.

What's up with your Internet connection?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26334 - Posted: 16 Jul 2012 | 13:56:59 UTC - in response to Message 26330.

You mean regardless of my settings and the speed of my graphics card I will always have only one task on standby besides the one active task? Isn't there anything that can be done about this?

The project generates WUs based on the results of previous WUs and if the 2 WU max limit was raised the amount of work in the pipeline, the length of time to get usable results, attendant server overhead, and administration tasks would become unduly burdensome to the project staff.

____________
Thanks - Steve

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,732,592,533
RAC: 887,807
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26337 - Posted: 16 Jul 2012 | 15:23:25 UTC

At the moment I'm running GPUgrid on the GPU and World Community Grid on the CPU. My internet connection isn't very reliable, so I have the BOINC preferences set to "connect to the internet every 2 days" and work buffer 2 days also. And it works well with WCG, because I have about 30 "ready to start" tasks in the queue.

Same here! My two cards got starved again over the weekend. I think the maximum 2 (long) WUs policy should be revised on the Cuda 4.2 apps. They are way faster (as it was mentioned several times on the forms before), takes about 5.5 hours to finish PAOLA tasks on my machines. I think the max. WUs per GPU should be raised to 4 tasks each card, so it might sufficient work for 24 hours. This should not cause major disturbance in the project management, as the old, similar WUs with Cuda:3.1 took about 11 hours on my Fermi card, more or less double than the new ones.

I mentioned this on a other forum with different perspective before, "Upload suspended - no new tasks":http://www.gpugrid.net/forum_thread.php?id=2090 "[...]As the new CUDA4.2 WUs are much faster (Paola on my computer 5.5 hours) than the old CUDA3.1 WUs, the second WUs has completed before the first finished WUs has completely uploaded, so my machines try to upload the two WUs in parallel, at this moment my two GPUs sit idle with no new work, bad for the project and my credits;-) as it seems that there are send only a max. of two WUs of GPUGRID per computer at any given moment.[...]"

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,183,509,048
RAC: 19,199,223
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26341 - Posted: 17 Jul 2012 | 16:59:58 UTC - in response to Message 26332.

Probably best to just select the long runs.

What's up with your Internet connection?

I was wondering that too, until I saw the size of the file being uploaded.

67,644 KiB ??!!

Luke Formosa
Send message
Joined: 11 Jul 12
Posts: 32
Credit: 33,298,777
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 26342 - Posted: 17 Jul 2012 | 17:47:15 UTC - in response to Message 26332.

Probably best to just select the long runs.

What's up with your Internet connection?


Long runs take about 5-6 hours, so that's only 10-12 hours of work. My internet connection has a mysterious habit of intermittently cutting out. We've replaced the modem and its power supply, ISP says the phone lines are ok but still to no avail. Perhaps the best thing to do is to switch ISP when our contract runs out.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26343 - Posted: 17 Jul 2012 | 19:25:35 UTC - in response to Message 26342.

Might be modem/router specific, so changing ISP could help, if they supplied the modem/router. Perhaps they are assigning an IP address for a very short time period or some TTL value is low. Have you tried keeping a web page open? One with meta refresh (msn, yahoo or even a streaming news station).

Might be useful if the project could adapt a trickle upload process, but I doubt that would be readily doable.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 26344 - Posted: 17 Jul 2012 | 19:35:46 UTC - in response to Message 26341.

I was wondering that too, until I saw the size of the file being uploaded.

67,644 KiB ??!!


probably about time to implement 7-zip compression of the results before upload.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26349 - Posted: 17 Jul 2012 | 19:54:58 UTC - in response to Message 26344.
Last modified: 17 Jul 2012 | 19:55:41 UTC

The more you compress a file, the longer it takes to uncompress it. Could the server handle that workload?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 26352 - Posted: 17 Jul 2012 | 20:09:04 UTC - in response to Message 26349.
Last modified: 17 Jul 2012 | 20:10:03 UTC

The more you compress a file, the longer it takes to uncompress it. Could the server handle that workload?


that's not the problem. compression at high rates takes a lot more time than decompression.

the problem here are ADSL-uplinks. downlink is fast and uplink oh that slow.

in my case right here: 16 mbit vs. 1 mbit.

but you might want to try 7-zip and compress a result-file locally. it should perform a lot better than boincs built in zip-compression http://boinc.berkeley.edu/trac/wiki/FileCompression

during upload. of course this puts a lot of load on the server already.

btw.: 7-zip is open-source..

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26353 - Posted: 17 Jul 2012 | 20:27:07 UTC - in response to Message 26352.

If it's better than what is presently used by Boinc then it should replace what is being used by Boinc. Otherwise GPUGrid (and any other project) would have to bypass the Boinc compression system. Presumably the Boinc server would also have to be bypassed. The next server upgrade would represent the problem with any such bespoke compression implementation and it might cause cross-boinc compatibility issues.

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 26355 - Posted: 17 Jul 2012 | 21:07:54 UTC - in response to Message 26353.

If it's better than what is presently used by Boinc then it should replace what is being used by Boinc. Otherwise GPUGrid (and any other project) would have to bypass the Boinc compression system. Presumably the Boinc server would also have to be bypassed. The next server upgrade would represent the problem with any such bespoke compression implementation and it might cause cross-boinc compatibility issues.


well, this is a really old topic.

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=2992&sort=7

but afaik there are several projects that have decided to move on their own..

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,183,509,048
RAC: 19,199,223
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26356 - Posted: 17 Jul 2012 | 23:03:08 UTC - in response to Message 26344.

I was wondering that too, until I saw the size of the file being uploaded.

67,644 KiB ??!!

probably about time to implement 7-zip compression of the results before upload.

That's convenient. The one I posted about came from 1E2I_39_2-PAOLA_1E21_APS-1-100-RND0032_1, but I read this just in time to catch 2HDQ_36_1-PAOLA_2HDQbis-2-100-RND8482_1

The big file is _4, and the sizes are:

Upload format: 39,929,396 bytes
.zip (WinXP compression): 36,973,370 bytes
.7z (Windows 7-zip v4.57): 33,139,008 bytes

17% compression? I don't think it's worth it.

Carlos Augusto Engel
Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26357 - Posted: 18 Jul 2012 | 0:29:02 UTC - in response to Message 26326.

You can try increase BOINCmanager "Minimum Work Buffer" ( 2 days ) and "Max Additional work buffer" ( 4 days ).

____________

Luke Formosa
Send message
Joined: 11 Jul 12
Posts: 32
Credit: 33,298,777
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 26358 - Posted: 18 Jul 2012 | 1:19:00 UTC - in response to Message 26343.

Might be modem/router specific, so changing ISP could help, if they supplied the modem/router. Perhaps they are assigning an IP address for a very short time period or some TTL value is low. Have you tried keeping a web page open? One with meta refresh (msn, yahoo or even a streaming news station).

Might be useful if the project could adapt a trickle upload process, but I doubt that would be readily doable.


Yeah, they supplied both routers. Keeping gmail open in the background doesn't seem to help, but I try to do that with msn and see how it goes. Thanks for the suggestion!

Luke Formosa
Send message
Joined: 11 Jul 12
Posts: 32
Credit: 33,298,777
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 26359 - Posted: 18 Jul 2012 | 1:19:46 UTC - in response to Message 26357.

You can try increase BOINCmanager "Minimum Work Buffer" ( 2 days ) and "Max Additional work buffer" ( 4 days ).


Tried that, doesn't seem to work. The other problem is that I have world community grid also running on the same client, so I'd get some 60 WCG tasks if I set those settings and leave them :)

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,732,592,533
RAC: 887,807
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26360 - Posted: 18 Jul 2012 | 2:21:17 UTC

It does not work increase min-max work buffer, the project limits the maximum send WUs to two per GPU at every given moment. As I have written before, it seems to me, the best solution would be to increase the WU limit to 4, as cude4.2 app are 30 to 40% or even more faster than cuda3.1 app on GPUGRID.

If project management would be able to reconsider this limitation, either increase WU limit or - although I do not know if this would be possible – increase the size of each Cuda4.2 WU, so it would be 8 to 12 hours processing time for each WU again. Although I am anxious of an upload file of let´s say 160 MB.

In my special case, after reconsidering, my internet connection should not be a problem, as I do have 512 kbytes/s upload-speed with a fix IP, but it seems that the GPUGRID server limits incoming results to a max speed of ca. 50 kbytes/s. And yes it loses connection several times on my side or project side?!

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26363 - Posted: 18 Jul 2012 | 15:40:38 UTC - in response to Message 26360.

The best solution might be a trickle upload, or some sort of task on task local generation, with uploads after each.

I think the next best thing would be to keep tasks to a reasonable size and allow for an increase in tasks based on the ability to return tasks inside 24h. If you can download, run and return 3 inside 24h, then that system should get up to 3 (depending on cache). If you get 4, 5 or 1, then the system would still be better for your setup.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Number crunching : How do I tell GPUgrid to keep more than two work units in reserve?

//