Advanced search

Message boards : Number crunching : Cancelled by Server - Suggestion

Author Message
Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9241 - Posted: 3 May 2009 | 9:30:44 UTC

This is going to be difficult to express without it being mis-read, the aim behind it is overall "job satisfaction" for those with slower cards, no more, no less. In reading it, place yourself in the situation of the lower speed rated cards.

At present the server will reach out and cancell WUs already crunched by another, and no longer needed on that PC - a good thing, no problems with that, its a win-win scenario. It does have a consequential drawback, which whilst not strong enough to negate the principle, does have sufficient weight to merit consideration of a resolution.

To quote an extreme, if a 8500/8600 etc etc is sent a WU, and is matched with a 285/295 at the same time - no contest, the 285/295 will finish first everytime. If the slower card is not running the WU, it will be cancelled, and thats fine. If its running it, it will be allowed to complete and be given credit, thats also a good thing. In that latter case however, in the extreme example, it means that the slower card never contributes in a meaningful way, as their crunched WU is never needed.

Of course in real world terms such a scenario is nigh on impossible on every single occasion. However also in real world terms it does happen on a significant number of occasions. I run a 9800GTX+ and after a recent cancellation had a look at completed WUs to see how many were "beaten to the punch", there are quite a few, around 20% or so. That number will significantly increase the slower the card,

That will over time become discouraging for those with slower cards, as it dawns on them that much of what they crunch is of no value (and here keep an even keel, I am taking in terms of running against faster cards and being beaten to the finish post, no other inference implied).

That there is a valueable and essential place for the slower cards there is no doubt whatsoever, clearly, so lets not go there ...... The recall system does produce the anomaly however, and it would not surprise me to find many withdrawals from the Project - overwhelming number of which just "disappear" no reason or song and dance about it - because they feel there is no point as they get "beaten to the finish" by someone else. The latter can be very discouraging if on a slower card, is happening a lot, and will be increasingly common with cards below a 9800GTX.

Is it possible server-side to test for the card used by the two selected crunchers to try and ensure comparable cards? It should be easy to do, I dont think it would involve too much extra cycles, and would create a level playing field with happier crunchers.

This is not a race, nor should it ever develope into one, most have the common sense to realise that. By issuing to comparable cards however, it means the slower card can contribute in a more meaningful way, with more overall chance of retaining them, and the additional computing power.

The world will not come to a halt, if this is not adopted, but it will be a happier place for those with lower spec cards if it is.

Regards
Zy

Profile X1900AIW
Send message
Joined: 12 Sep 08
Posts: 74
Credit: 23,566,124
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9246 - Posted: 3 May 2009 | 11:55:36 UTC - in response to Message 9241.
Last modified: 3 May 2009 | 11:57:34 UTC

That will over time become discouraging for those with slower cards, as it dawns on them that much of what they crunch is of no value (and here keep an even keel, I am taking in terms of running against faster cards and being beaten to the finish post, no other inference implied).


I remember some months ago there was a hype about faster cards, more RAC and lot of members upgraded. Later on low credit per hour workunits were cancelled to get higher credit workunits, side effect: those who didn´t got "worse" ones.

This is not a race, nor should it ever develope into one, most have the common sense to realise that.


I fear that is the most attracting reason for the top participants, especially those who hide their computer information, not sharing configurations and this way knowlegde to imitate crunching systems, they are fighting other wars not for science but credit ranks. It´s o.k. If a project does not meet their claims it would be a big loss of crunching power. No credits no game.

I downclocked and undervolted my cards in a period to reach a better credit per watt ratio, indeed overclocking achieves good results as before. The most important aspect for me is participating in a useful project (given) as well as getting the workunit finished without wasting my time (error diagnostics) and money (power consumption) or risking failure because of long runtimes (other circumstances).

The world will not come to a halt, if this is not adopted, but it will be a happier place for those with lower spec cards if it is.


Full agreement, any card in the FAQ should be supported in equal measure. Otherwise the FAQ have to be restricted to high end hardware, hope they do not. Every owner of a slow card today and supported well is a owner of a fast card tomorrow.

Don´t watch statistics if you get discouraged. I was fallen back from top 40 to top 300, change of perspective: it´s a luck for the project to have this horsepower and their mandate to honour and communicate each little contribute is part of it. In my opinion they react fast and according to the project needs. (greetings to all mods & scientists)

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9254 - Posted: 3 May 2009 | 13:27:13 UTC - in response to Message 9246.

I fear that is the most attracting reason for the top participants

Absolutely - and its a good thing, competition stirs on many people, and if you wish to view it in that manner, the above suggestion has even more weight as it can also encourage "unofficial" competition as there will be a level playing field.

Don´t watch statistics if you get discouraged.

I'm not discouraged in the slightest, having crunched for nine years in various guises in various projects, I'm too old in the tooth to get distracted by such artificial parameters - I dont give a rats fig about the credits yaddie yadda, and it hardly affects me with a 9800GTX+. My whole logic was aimed at making life a better place for those with low end cards. If you have one, and your work is continually "beaten" to the finish line, there will come a point where they will say "move on I'm not contributing, as my efforts are not used". Dont measure the suggestion in terms of who is "best" or who gets the most "credits" - in truth the vast majority, like me, dont care a fig about that either.


The mindset to use with this, is think of the reaction from those crunchers who have low end cards and want to genuinely contribute - if they get "beaten" each time (and that will be their perception), many will say "whats the point" - no matter what esoteric explanation is deployed. The bottom line with the majority is they crunch because they hope to be of value to the Project, that cant happen if they are seeing their efforts usurped on many occasions.

We have to have two crunchers per WU most of the time, for good reason, and thats fine, in such a level playing field scenario there is no issue, as any sensible human being is aware of the reason that only one of the "team" of two will be used, its the nature of the beast.

The whole point of the suggestion is to make the two similar in capability so that each low end card user has equal chance of having their efforts used, and therefore feel part of a Team working to the same end - at present that is not the case in a significant number of occasions when a significant number get zapped to the finish line by a faster card.

Will it be perfect - clearly not, thats life - however a very significant improvement can be made by a small change server side on the schedular.

Regards
Zy

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9255 - Posted: 3 May 2009 | 13:57:56 UTC - in response to Message 9241.
Last modified: 3 May 2009 | 14:01:19 UTC

Hi Zydor,

you bring up a valid point and I think you communicate it well, so people should understand what you really mean.

Let me first describe how it works for SETI:
Here there's a huge amount of WUs which can all be done in parallel, independent of each other. It doesn't matter when they're returned (within a reasonable time frame). Therefore every contribution from slow cards / cpus may not be the most energy efficient, but it does help the project.

For GPU-Grid things are different. I may not be telling you anything new here, but I need this as basis for further argumentation:
It's a simulation in time, which is inherently sequential: timestep n+1 can only be computed after step n. Therefore, if there was only 1 WU, only one GPU could be used at any time. The project can work around this by issuing WUs in parallel. However, the amount of WUs in parallel, which can be put to good use, is not quasi-infinite as in the case of seti: GPU-Grid needs results back to analyze them and adept accordingly to issue new WUs based on the old results.

This is where the problems start and where it may be of more value to the project to get results back faster than to start even more WUs in parallel. It seems like the project reached a state where they have enough GPUs, i.e. enough WUs in parallel, so they try to speed important WUs up by assigning them an initial replication >1.

They already have a distinction between reliable and normal hosts. Reliable are ones which return WUs within xx h with a failure rate less than y %. So.. how should the WUs be distributed?


  • example 1: 1000 WUs in parallel, 1000 reliable hosts -> easy ;)
  • example 2: 1000 WUs in parallel, 800 reliable hosts, 400 normal hosts -> 800 WUs for the reliable ones, 200 WUs with initial replication 2 for the rest
  • example 3: 1000 WUs in parallel, 1000 reliable hosts, 400 normal hosts -> 600 WUs with a reliable host each and 400 WUs with initial replication 2, each WUs with a reliable host and a normal one


Example 3 is the policy which you are talking about. If such a "normal" host is slow (may actually return WUs reliably, but not quick enough) then its results will only ever be needed in the rare case that the reliable host produces an error. This is not ideal, but what could be done about it? Without hurting project productivity? Consider this:


  • example 4: 1000 WUs in parallel, 1000 fast hosts, 1000 slow hosts


In this case you don't know about the hosts reliability, but you know their speed. You could either pair them:

(i) randomly
(ii) 500 WUs with fast-fast, 500 WUs with slow-slow
(iii) 1000 WUs with fast-slow

(ii) is what you would like to see, but there is a problem: you are guaranteed that 50% of your WUs will take a minimum of x days, if all goes well, maybe even more (failures). This would hurt the project productivity due to what I said in my introduction. On the other hand (iii) would mean that you'll get most WUs back quickly (minimum time y < x), but if there's an error there'll be redundancy provided by the slow host.

To summarize it: as soon as you deviate from option (iii) you hurt the project progress, if all WUs are equally important.

What I think could be done: use the reliable hosts with short turn-around times for mission critical WUs. And generate some less important "light WUs" with less complex models (-> smaller time per step -> less lag) and a smaller number of steps (i.e. runtime on an 9800GT less than 12h). Preferably distribute these to hosts with high average turn around times, maybe add a user preference to set which ones you prefer.
The only problem: I have no idea how scientifically useful such WUs could be.

And a remark: upon quickly browsing the result list of my 9800GTX+ I saw that I still get many WUs with initial replication 1, so we're not quite "reliable host saturated" yet.

.. hope this helps :)

MrS

EDIT: the 8500/8600 which you named specifically are not officially recommended. But the problem also applies to the faster 9600GT. At 1 days crunching time it's not too bad, but will still get beaten by almost any other card.
____________
Scanning for our furry friends since Jan 2002

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9256 - Posted: 3 May 2009 | 14:33:35 UTC - in response to Message 9255.

Guilty as charged on the card designations :) It was a quick remark as illustration I should have checked facts a little closer.

The dilema is well understood, life is not perfect, never will be and we all go with the flow on the best path available when all is factored in. I understand the drive at Project level to produce a schedular solution that is the most efficient in producing the best return of WUs - it is after all why we are all here.

I would only restate the effect of going down a path of "pure" efficiency, the latter can often be fools gold as the penalties suffered outweigh the solution enabled. In this case, I have no idea of the actual hard facts as clearly I dont have the whole project stats. The scenario I painted is plausible and I have seen similar effects elsewhere, where a drive for efficiency implemented in good faith, ends up driving away those "excluded" albeit unintentionally.

Competition is getting fierce for Crunchers in the BOINC world, and anything that helps tweek reasons to stay on this Project can only be a good thing. As stated above todays low end card user, is potentially tomorrows mega cruncher when they get hooked by it all. There is a balance in all this which is always difficult to get right all the time.

It would be an idea to "Test & Measure" numbers of lower end cards over time, and see if a trend of increasing non-activity develops, shouldnt be too hard to frame a daily stats report and log it to map the trend. Meanwhile if the schedular can be tweeked to minimise as much as is practically possible within the Projects Objectives, unbalanced card pairing, that can only be a good thing.

Regards
Zy

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9258 - Posted: 3 May 2009 | 17:01:36 UTC - in response to Message 9256.

Do you have any specific ideas on how this could be implemented? After all it's not only about "make the small guys feel good" (as important as this is), but it's also about "let's not waste their effort".

Besides my revious suggestion of different WUs I could see something else: assume that the reliable hosts are saturated with WU and there's still work left. In this case one could pair fast but unreliable hosts (with high errors rates) and slow but reliable hosts. This would greatly increase the chances that the result of the slow cruncher will be needed.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9264 - Posted: 3 May 2009 | 20:09:50 UTC - in response to Message 9258.

That made me think rofl :)

As a starter for ten, building on your idea in the post above, how about:

Three Categories with suggested sub catagories in following priority order to allocate WUs:
1. Fast & Reliable. One Replication, match fast cards only,
then allocate in the pri order:
A. Time Critical & Content Critical Project Work
B. Time Critical & Content Critical special one off or short duration

If Units & suitable hosts remaining

2. Fast or Slow, & Reliable. Two Replications, match card speed first, any
speed second, then allocate in the pri order:
A. Balance of Category 1 Units remaining
B. Critical Content Standard Project Work
C. Critical Content special one off or short duration

If Suitable hosts remaining

3. Fast or Slow, Reliable or Unreliable. Three Replications (do not pass remaining Cat2's to this)
A. Routine non time critical, non content critical
(1) Match card speed
(2) Any Card speed

If Units remaining - wait for capacity, rinse and repeat.

Within that framework, define the Definition of each type of WU in a flat table/array with Column IDs:
Time Critical
Standard Project Work
Special One off
Short Duration
Routine
Non Time Critical

From the Flat Table/array allocate the types of WU to the overall categories above, depending on the Project Priorities - I would see that as an "Option" in an application accessible for Project Admin to set/tweek criteria for the next run of units.

That way the only "maintenance" to the code - as such - is generating an allocation of WU to Definition in the overall Definition table, thats just a straight input session of a few seconds during each Project Definition and scoping to enter the next row in the array - the rest should flow.

Bit flakey, I'm no programmer - rofl - but I reckon it points in the right direction indicating the right priorities for the Project but balancing in needs of Cruncher Speeds.

Anyone else out there with an idea how to balance Project Objectives as primary with maximising slower card useage satisfation - lets hear from you:)

Regards
Zy

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9278 - Posted: 3 May 2009 | 22:44:21 UTC

As an owner of what I consider one of the slower cards ... though many would not consider it so ...

I would suggest that the distribution of longer vs. shorter tasks may be less balanced than it could be. Again, there are various ways of eating this elephant but it would bother me little at all to be given fewer sub 6 hour tasks on the machines with GTX200 class GPUs. Or to put it another way, I get a lot of tasks that MIGHT be more suitable for slower systems than the KASHIF tasks now flowing through the system.

WIthout gaming the questions and that would require more information as to the classes of machines it is hard to know if changes would make significant differences, or not.

I would argue that it would.

Put it another way, if my i7 got more of the KASHIF tasks while the slower systems got more of the other tasks our return intervals would converge ... but this requires knowing the number of systems of the various classes and the population of the tasks to be processed. And their priority. If the slower tasks are higher "priority" than the slower this makes changes significantly.

Other limitations apply. The "Feeder" application that feeds tasks to the scheduler has a very limited size. My recollector says its default size is 100 tasks. And the usual configuration is FIFO so that the task assignment is "random". You can put more smarts in the scheduler so that it selects more appropriate tasks ... or change the system so that there are multiple schedulers with multiple queues and we sign-up for the one with a best match ...

Hoar to know what the best choices are here ... Maybe I need to go buy another faster card ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9531 - Posted: 9 May 2009 | 12:45:44 UTC
Last modified: 9 May 2009 | 13:03:17 UTC

The major problem with sophisticated work distribution may be that it requires serious tweaking of the BOINC server software, which would need to be repeated upon an update (if it doesn't make it into the main code base). That's quite some work, I suppose considerably more than they have currently done with the "reliable host" flag.

And there's more: like Paul said the scheduler probably has a comparatively small amount of WUs to choose from. And it can't decide or proedict which hosts are going to contact it at what point of time. That's not a show stopper, but I think it leads to the following: the more complicated and diverse you make the WU distribution, the less matches you're going to find. Might be better to opt for a simpler, more robust scheme. But then, admittedly, I didn't tkae the time to think your suggestion through properly..


Maybe I need to go buy another faster card ...

.. which has absolutely nothing to do with the topic discussed here, doesn't it? ;) Actually I'm itching to replace the hdd in my notebook.. but I keep telling myself, that, although the new one would be more silent, larger, faster, and less power consuming, that it absolutely wouldn't change anything. So I'm trying to admire the 320 GB Scorpio Black without actually pulling the trigger :D

MrS
____________
Scanning for our furry friends since Jan 2002

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9637 - Posted: 11 May 2009 | 13:54:59 UTC - in response to Message 9531.

Hi,
just to say that we are carefully following this thread.

gdf

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9639 - Posted: 11 May 2009 | 15:51:05 UTC - in response to Message 9637.
Last modified: 11 May 2009 | 15:55:11 UTC

Its a little difficult to know which direction to go from here. I reckon a fair summary of all the above to date is:

- Overall speed of production from the overall crunching capacity of the community is the top priority, as long as that is not at the expense of overall capacity (ie yup need to crunch em quickly, but little point achieving that if we loose a chunk out the back door because they feel "unwanted"). Hard balance to achieve, but the overall goal is there.

- BOINC schedular does give some issues in that we are essentially "interveening" between it and the GPUGRID server. Not impossible to resolve if the server responded to the BOINC schedular request with internal logic to select the WU, and pass it back to the schedular for "delivery". Suspect there is some heresy there rofl, but hey, I'm no programmer :)

- No strong yells of "over my dead body", so we could be reasonably close to a done deal given some collective brain storming on detail and some pragmatic decisions on what will be an imperfect solution - lets go for the classic 80/20 rule, get it in, and massage as time goes on. Perfection first time round is not going to happen, thats not the real world.

If this is gaining thought, maybe the next step is someone better than me at "sudo code" to attempt a better rendition of my first crack at it above, and post it for collective comment?? Usually a proper sudo-code exercise can tease out good suggestions as it is readable by us mere mortels.

Regards
Zy

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9668 - Posted: 12 May 2009 | 5:58:11 UTC

NOt knowing the internal goals or the sub-task goals it is hard to know for sure...

BUt, I can easily see that there is going to be a dynamic tension between the Speed of Service (SoS) and the processing time ... what I mean is this. Assume that there are three task length classes and three SoS objectives.

Short Run Time
Med. Run Time
Long Run Time

SoS: 1 Day or less, 2-3 Days, and Deadline fine

So, in my case I have a spread of cards from GTX280 to 9800GT ... run times average 5 to 20 hours on the faster card and on my slow card it is 15 to 33 hours or there about.

Assuming, for the moment, that we can roughly class the tasks I looked at the credit claim numbers and at the moment have 3 sets 3681/3946, 7057/8076, and 4131/4352. And looking at a couple dozen of these tasks gave me the run times for these tasks.

Now if we contrast those run times with the SoS, we may find that the shortest running tasks may have a higher desired SoS, where the longer running tasks may be in the "Deadline Fine" class.

I mean the question is are we trying to average the run times so that my 9800GT card would get the "shorter" tasks which it completes in about 15 hours, while the GT280 gets the longer tasks (about 16.5 hours)? If that is the case, then the objective would be to attempt to fit them by estimated run time class.

But, the SoS objective may not be that neat and pretty. if the SoS is for 1 day or less, then you would want to assign that short run time task to the faster machine regardless.


As to the Scheduler and the project, um, there is no intervervention between it and the GPU Grid server. They are one and the same. The BOIC CLient tells the project's scheduler that it wants work and it issues work out of the available pool.

Sadly, this is one of the places that is some pretty bad code and one of the hardest to get changed. The first issue is that the feeder has a limited collection of tasks and if there is not a good choice in that selection there are two options, issue no work, or issue work that falls out of these new guidelines.

Not trying to be nay-sayer here... but, the first question is "Is there a real problem?" or are we a solution looking for a problem?

I better quit, not typing well, and not thinking much better ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9690 - Posted: 12 May 2009 | 22:22:00 UTC - in response to Message 9668.

I mean the question is are we trying to average the run times so that my 9800GT card would get the "shorter" tasks which it completes in about 15 hours, while the GT280 gets the longer tasks (about 16.5 hours)?


To briefly answer your question: I don't think so.

Currently the runtimes are rather arbitrarily set by the standard "a 9800GT should do it in ~12h", or at least that's what was used last autumn. So if you see longer and shorter tasks, that's not intentional. They could all be of te same length, as far as the project is concerned. And since each WU features many steps (was ~800k in former WUs) the runtime could be set almost arbitrarily. Well, the lower limit is the time per step.. ;)

And to finalize this long story: the quicker the WUs of this given size are returned, the better.

Not trying to be nay-sayer here... but, the first question is "Is there a real problem?" or are we a solution looking for a problem?


That's a very valid question. One could see it like this: as more new genertions of GPUs are introduced and if the old ones can still execute the future code, then the problem discussed here will only get worse, as the GPU speeds will get even more diverse.

The other side: due to the dynamic nature of GPU-Grid (i.e. the time domain simulation with WUs depending on each other) it will always benefit most from the fastest and newest GPUs. If, at some future point, there are other attractive CUDA projects available for slower cards.. what's the project going to do?

MrS
____________
Scanning for our furry friends since Jan 2002

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9694 - Posted: 12 May 2009 | 23:25:46 UTC - in response to Message 9690.
Last modified: 12 May 2009 | 23:28:51 UTC

Without knowing how it is implemented from the server perspective ... if we take a look at how WCG has multiple projects, perhaps that basic concept could be reworked for different classes of WUs (priority, 200 series card only, tight turnarounds (for compute error WUs), regular, best suited to low end cards ... you get the idea.) Let's say I have a 295, I am *asked* to sign up for specific WU types on the website (yes, this still leaves me in control so no big brother concerns) ... basic information could be provided explaining what each *WU type* is best suited for ... only if buffer < 24 hours etc. Now for coordinating this through the scheduler ... maybe just an extra <tag> on each WU to see if there is a match to the incoming client request (what the client registered for on the website). I could also select "give me anything the project needs me to do" type of WU so that if all the short turn around or tasks only suited to 200 series cards have been sent out then by all means send me a lower priority WU. This would reduce the necessarily inefficient process by which WUs are allowed to complete even thought the project already has a valid return while also reducing the implied perception of my GPU is less useful than yours so I'm gonna take my ball and go someplace else :-). Wow ... in fact if I am concerned about having my WU returned quickly (so someone else's copy does not even start processing yet) I would turn my buffer way down low which I believe the project would really appreciate.
____________
Thanks - Steve

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9845 - Posted: 16 May 2009 | 11:57:17 UTC

After thinking about it for some time I came up with a suggestion which I actually like :)

Problems I'm trying to solve:
- users with slower GPUs may feel their contribution is not worthy, get bet by faster cards
- slower GPUs have problems to meet the deadline
- very fast GPUs (e.g. if GT300 is the beast it's rumored to be) may reach crunching times of 1 - 3h / WU in a few months, which is not desireable
- the short turnaround times and small cache settings cause trouble for some users

My inspiration:
At Rosetta@home you can set how long a WU should run. As fas as I understood they're doing Monte-Carlos and each WU contains several runs. So it's easy to declare WUs finished after an arbitrary number of completed runs.

Transferring this idea to GPU-Grid:
We can't adept it directly, as our WUs have time steps which depend on each other, they're not independent runs. However, I think the number of steps in each WU can be set to arbitrary numbers, i.e. the project chooses a number of steps which leads to ~12h of computation on a 8800GT. I suggest to make this number of steps flexible and instead set the crunching time per WU.

We introduce a new user preference "preferred run time". Let's try to keep it simple and sturdy, so we don't allow arbitrary numbers, but instead give 3 options to choose from:

- short: 4 - 6h/WU, whatever the server can handle
- standard: 10 - 12h (default setting)
- long: ~24h

How it could look like, initial replication 1:
Host A requests work and the server decides to send WU 1. At this point the runtime of WU 1 is set to the preferred setting of host A. The deadline may be adjusted accordingly.
Host A crunches 10 steps within this time. The WUs is finished and sent back. A new one is generated based upon this result.

Advantages:
- apart from the adjustments for this flexible WU generation nothing changes, server side
- hosts get more freedom: especially slow ones and hosts which don't crunch 24/7 would benefit from the shorter run times.
- Users with limited inet access may prefer the longer WUs
- Users with limited upload my prefer the longer WUs, if the output file size does not depend on the number of steps (not sure here)
- Users who don't have more cores than GPUs could reduce their downtime / overhead by choosing longer WUs
- a short turn-around time on slower cards means better load balancing on the server side: WUs which don't progress as fast get more chances to be sent to fast and reliable hosts (if the server knows which hosts are fast)

Drawbacks:
- none that I can see.. apart from the necessary modifications

How it could look like, initial replication >1:
Host A requests work and the server decides to send WU 2. At this point the runtime of WU 2 is set to the preferred setting of host A. The deadline may be adjusted accordingly.

Now WU 2 gets top priority to be sent out to the other hosts. Next work request comes from host B. He's got the same preferred runtime and also gets WU 2, everything's fine. Assume host B doesn't turn up and instead it's host C with a different preferred runtime. Now a compromise has to be made:

1.) Send WU 2 to host C anyway, with the runtime setting of host A. This overrides host Cs setting, something the user may not like.

2.) Wait until host B with the matching runtime turns up. However, if one waits too long, host B will not be able to return WU 2 within the same time frame as host A and we get essentially the problem Zydor is trying to avoid, just independent of GPU speed. That's why it's important to keep the number of possible runtimes small. This problem could be avoided if there's only one runtime for everyone.

Assume host B got our WU 2 after 10 mins. Now host A returns his 10 steps of WU 2 after his preferred runtime. The scheduler could then generate a new WU immediately, based on these results. This would not be very clever, though: host B could be much faster and return 20 steps 10 mins later. Some tolerance time would have to be set here: how long does one want to wait, if the other hosts return more steps? So in this case things get a little complicated, but not terrible yet.

Return results immediately:
It would be ideal if GPU-Grid hosts would return their results immediately (an old cc_config option) and thus the maximum waiting time for the results of our host B could be small and overall WU processing speed could be increased. Actually it would even be beneficial now, if the BOINC client could be told by the server "I want you to report results immediately, but only for my project".

Advantages:
- some result will be available after the preferred runtime, regardless of host speeds (assume not all of them error out ;)
- the best result could be chosen after a (hopefully) short tolerance time

Drawbacks:
- it gets messy if too many "preferred runtimes" are allowed, depending on the actual WU request rates
- it gets ugly if BOINC waits a large, unknown amount of time before it contacts the scheduler and reports finished results
- credits would differ for each run, depending on the amount of steps done, even if "a WU" is run by different hosts. I don't think BOINC allows this. Rosetta can get around this because every "WU" is only ever sent to 1 host. Internally the server collects all results belonging to the same problem, which are distributed among different runs contained in many different WUs.
-> We could also generate a "new WU" for each new computation. If a WU is supposed to be send to several hosts we get a branch in the work stream / flow of the WU, which is joined again after results are collected. Not sure.. is this understandable? It would complicate debugging, though.

How these WUs should be distributed:
This is actually independend of what I suggest. Reliable hosts would still be fine with an initial replication of 1 and not much would change apart from the added flexibility for the user and improved overall balance (similar runtimes and cache settings regardless of GPU speed).
If WUs are sent out to several hosts the same problem, which Zydor initially pointed out, appears in a different shape: if you pair slow and fast GPUs and both are successful, the result of the slower GPU won't be used as it contains less work.
However, if the runtimes of the slow cards can be kept in check, it would be "less painful" to pair 2 slow cards instead of fast-slow. We'd probably still get less work done, but only over a limited, controlled time. It's easier to spread this evenly among all WUs and thus some speed could be traded for throughput in a controlled manner.

Comments, foul eggs, flowers anyone?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9866 - Posted: 16 May 2009 | 16:59:43 UTC - in response to Message 9845.
Last modified: 16 May 2009 | 17:06:08 UTC

You been doing that subliminal stuff again :)

I dont have the competance re the internal workings of the WU to give a validity opinion on the overall principle you gave - seems sensible to me, I can see the benefit if the basic underlying predication re splitting by time steps is fundamentally practical.

Whether its all predicated by GPU Class or Time Step, at some stage, as you pointed out, we inevitably get to the point of matching Users, which is where the fast-slow issues come in. I can see that time step is the much better of the two (GPU Class / Time Step) scenarios - given my caviat above - and has benefits beyond mere card matching.

Having got to the stage of deciding which principle to follow - GPU Class or Time Steps - indeed any other principle that may come along), the ultimate gottcha will always raise its head re matching fast-slow. Albeit Time Steps look much better in that regard. The next bit may sound a little "sledgehammer to crack a nut" - and decidedly non-tech ......

Which ever principle is chosen, only show the Cruncher their WU result, not everyone's, in the WU result page. What we dont know wont hurt us.

The Project will have gone through hoops and loops to be as fair and as accomodating as it can possibly be to the slower class card encouraging their participation. Its no biggie not to show the matched result. That way the Cruncher will not know how many times their result was dumped. Since the Project will have done its best to avoid cards being "useless" because they were not used in the final outturns, such an arrangement could be used with a clear guilt free mindset.

I recognise there are benefits in seeing all participants on screen for a WU, however for the Cruncher, at the end of the day its pure asthetics, doesnt contribute one way or another to successful or otherwise WU crunching. Such a screen - or similar - may be needed by admins et al, but thats no issue.

The Results Of The Zargon Jury?

Flowers :)

Regards
Zy

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9874 - Posted: 16 May 2009 | 20:41:16 UTC

My mind is slipping over the surface. But, I think the solution is simpler than you imagine ...

The application does time steps, we know the amount of elapsed time. If we allow the user the discression, as at Rosetta, they can set a time limit on how long they want to run individual tasks. The task is downloaded... it is run until it has completed the number of iterations that will fill up the amount of time the participant selected. The task is ended at whatever arbitrary time-step the task is on when the clock expires.

Task is returned and the next task is issued based on the amount of work done in unit time. The point here is that I can say run for 6 hours and at the end of that time I get a new task. Let us say that I completed 100 TS, well, the 9800GT in that same 6 hours would have only completed 30-32 TS ...

Obviously this complicates the work generator, credit awarding, etc.

It allows participants greater control over the work size, for those of us that do not like tasks that take more than about 6 hours would probably be happier...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9885 - Posted: 16 May 2009 | 21:50:25 UTC - in response to Message 9874.

Paul,

this is the point where I started thinking :)
There's one important problem with this approach: WUs issued to several cards.

Let's assume a slow card returns 10 steps after 4h, whereas a fast card might return 100 steps after 6h. How long are you going to wait? You could estimate the runtime from the users preference, but that's not very direct. I'm trying to make things easier to predict and more regular. Not sure how neccessary it is, but I really wouldn't want slow cards to "outrun" fast ones just because they set a shorter runtime (and the server decided to use their result for the next WU instead of waiting for the other one).

Zydor,

the first paragraph of Germanys constitution says "The dignity of man is untouchable". Earlier I didn't understand this:"why, it's being tread on all the time!" I think a couple of years ago I finally understood (or started to?). It's a normative clause.. and such a strong one, that it actually is untouchable. Gives me a shudder everytime I really think of it. And that's why we couldn't do what you suggest :)

It may be of greater immediate benefit, maybe it could be "justified" in that the project did everything they can for the slow cards. Yet.. that's not enough. I think we owe the participants the honesty to show them what happens with their crunching efforts. Anything else is unthinkable ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 9886 - Posted: 16 May 2009 | 22:28:32 UTC - in response to Message 9885.

this is the point where I started thinking :)
There's one important problem with this approach: WUs issued to several cards.

Let's assume a slow card returns 10 steps after 4h, whereas a fast card might return 100 steps after 6h. How long are you going to wait? You could estimate the runtime from the users preference, but that's not very direct. I'm trying to make things easier to predict and more regular. Not sure how neccessary it is, but I really wouldn't want slow cards to "outrun" fast ones just because they set a shorter runtime (and the server decided to use their result for the next WU instead of waiting for the other one).

MrS



Maybe I am missing something here, but wouldn't it be fairly straightforward to have the server issue the remaining work as a shorter workunit? That is, assume Paul's machine completes 100TS in 6 hours and is paired with a 9800GT that only completes 30TS in that time, with both returning results at about the same time. The server could then be made to reissue a follow-up workunit made up of the 70TS difference that could be sent to a third card (say another 9800GT, but with a 13 hour limit). If the third card was another 9800GT-6 hour limit, then once returned another reduced 40TS unit could be issued, and so on...in other words, the real new work would not be generated until the full set of TS in the original work was completed by additional cards. This could probably be made more efficient by always issuing the "A" result to a reliable host with the "B" and beyond work copy going to any host.


Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9893 - Posted: 17 May 2009 | 1:07:37 UTC

Um, I knew I was not clear ...

I ws thinking of GPU Grid following more in the line of MW where we have single issues and single returns. When you get into HR then you have to have identical returns or you cannot compare.

So, I was thinking more in line with:

..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|

Where we have 3 hour "frames" if you will ...

THe next frame is built on the return from the prior frame.

Here, we start with the 9800 and get 30 TS, next a 260 returns 100 to TS130, a 8800 returns 10 to 140 ... and so on ... so, each client runs as best they can, and we accumulate the results on a more regular schedule, but the actual work accomplished becomes highly variable.

The problem is that I do not know how much that biases the science and how it is being used. If they are looking at snapshots at certain specific TS then the returns are streamed and re-sliced for the science to be done.

The advantage to a "flow" system such as this is that the totality of the schedule to hit certain points would become more predictable on average (I would expect) because of the random assignment of tasks would mean that slower cards would cause "bumps" in the timeing, but overall the odds say that the next machine is just as likely to be faster than the one currently running the task ...

The only reason that I like shorter tasks is that my risk of loss goes down. I know credit does not matter, but the science does. If I am doing one hour tasks, I do put 6 times the load on the server to get tasks, but the output files are 1/6 the size and my risk of losing 5 hours science goes way down.

Again, I am unusual in that I have MOSTLY higher end cards so my run times are about 6 hours per ... with the single exception with the 9800GT where the run time is 12-20 hours per more to the higher end ...

Again, the Rosetta model comes to mind ... it lets ME chose how much time I want to spend on each task. How much risk *I* want to take with a task failure costing me and the project time and effort.

I don't know if this is any clearer than what I said before ... and I am not sure we are converging on a concept yet ... sadly too much excitement with dying tasks causes me to wig out ... and I cannot concentrate well ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9902 - Posted: 17 May 2009 | 11:47:16 UTC
Last modified: 17 May 2009 | 11:48:00 UTC

Paul,

for me you've been clear enough. If you go for a stream with one host per WU your system is fine and appreciably simpler than what I wrote down. However, what are you going to do if you send a WU to a host whih has been notoriously unreliable? Are you going to wait for the error or do you want to send it out to a second host immediately? Could you provide this capability and come up with something simpler than me? I don't think it would be beneficial for the project to give up the ability to issue WUs multiple times.

Scott,

well.. yes. I think that could work. I could argue that calculating the WU runtime the way you suggest neglects the individual work cache setting and thus you can not reliably estimate the time a WU will be returned (to adjust the time for the following hosts accordingly). However, I have to admit that this limitation applies to my suggestion in the same way.

Therefore our approaches are similar regarding their result:

- both should get WUs returned around the same time
- both suffer from an intrinsic inaccuracy due to the cache
- in both cases you'd want to wait until all results are in to issue the next work.. or make an estimate based on card speed, which result will likely contain the most steps, or just set some tolerance time

I can't help it.. today it's got a slightly sour taste (for me).

MrS

BTW: thanks for the flowers ;)
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9913 - Posted: 17 May 2009 | 15:42:14 UTC - in response to Message 9902.

for me you've been clear enough. If you go for a stream with one host per WU your system is fine and appreciably simpler than what I wrote down. However, what are you going to do if you send a WU to a host whih has been notoriously unreliable? Are you going to wait for the error or do you want to send it out to a second host immediately? Could you provide this capability and come up with something simpler than me? I don't think it would be beneficial for the project to give up the ability to issue WUs multiple times.

And now you see why systems engineers get the big buck ...

Here it the real rub in all this, how do you handle situations in the face of unreliability. ONe simple answer is that you send it our redundantly. And match the answers up. The problem here now is that we no longer have tasks of one and only one size. So, now we cannot match them up to do HR.

One answer is that we keep more control when issuing to unreliable hosts. And, pair them up with reliable hosts but now we also have to self limit the task on the matching reliable host. So, we know that the unreliable host is going to do 30 TS in the one hour period and so when we send it to the reliable host we would know to limit to that number of TS.

Which is why I was asking back in the beginning... are we a solution in search of a problem? Only the project types can tell us that.

As I said, the 6 hour tasks are about at my edge of comfort as far as running tasks. I have seen too many hours of work lost for one reason or another and I hate waste. Which would also be why I would question the current length of tasks... In particular because it also self limits the participant pool. Now, if the work is getting done fast enough with the limited pool we currently have, then, cool and adding more low end machines won't buy much (well some goodwill, not to be sneered at ...) ...

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 9927 - Posted: 17 May 2009 | 20:36:41 UTC - in response to Message 9913.


Which is why I was asking back in the beginning... are we a solution in search of a problem? Only the project types can tell us that.


Agreed...to a point. I think that many of the posters in the thread have been around BOINC for several years, so we have some fairly good ideas about what those problems are...especially you Paul. On that side of things, we are able to offer real solutions. But you are very much right that without more detailed understanding of the construction of workunits at this project, we can at best offer potential solutions that, if we are lucky, might hit upon a real solution...


Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9930 - Posted: 17 May 2009 | 21:23:37 UTC - in response to Message 9927.


Which is why I was asking back in the beginning... are we a solution in search of a problem? Only the project types can tell us that.


Agreed...to a point. I think that many of the posters in the thread have been around BOINC for several years, so we have some fairly good ideas about what those problems are...especially you Paul. On that side of things, we are able to offer real solutions. But you are very much right that without more detailed understanding of the construction of workunits at this project, we can at best offer potential solutions that, if we are lucky, might hit upon a real solution...

Thank you for the kind words... :)

The difficulty, and it is the only one, is that how can we best help the project ... now there are some good concepts here, but, until GDF or someone else from the project says that this or that initial concept might help the project we are about as far along as we can get ...

So, I do concur that we need that input if nothing else to tell us that there is no need ... :)

Though I must admit I would really like to see shorter tasks so that a 9800GT can get them done in less than a full day ...

popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 40,277,822
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9936 - Posted: 18 May 2009 | 7:47:48 UTC - in response to Message 9930.

Though I must admit I would really like to see shorter tasks so that a 9800GT can get them done in less than a full day ...


If they did shorten the tasks then it would be a win-win situation.
It would either
A) Allow slower GPU's to run GPUgrid
or
B) Have shorter deadlines and faster turn around so work units can be sent out faster if a host doesn't reply reducing the need for replication.

The downside is more server load...

This is a tricky topic. The project however needs fast turn around due to the nature of the work so allowing slower cards to run gpugrid would slow things down at times of low work but at times of high work then the more the better.

Perhaps the best solution could be setting up 2 projects under 1 roof (ie. seti multi-beam and astropulse) and have one for slow gpus, one for fast. Default settings would have the slow GPU selected and users could select fast GPU for more credits and longer tasks.

Bob

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9949 - Posted: 18 May 2009 | 19:54:15 UTC - in response to Message 9936.
Last modified: 18 May 2009 | 19:54:35 UTC

Perhaps the best solution could be setting up 2 projects under 1 roof (ie. seti multi-beam and astropulse) and have one for slow gpus, one for fast. Default settings would have the slow GPU selected and users could select fast GPU for more credits and longer tasks.


I like the dual WU idea if the Project dev resource can cope. We would need to be careful re credits, they need to be awarded on the "standard" BOINC formulae basis as being an equal number of credits per Flop, whatever X multiple is used in the BOINC formulae. The faster cards will get more over a given time period, which is fine as they do more work. The base calculation rate should however be the same for each Flop donated, else we'll have a two tier credit war on our hands.

Regards
Zy

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 9950 - Posted: 18 May 2009 | 19:58:36 UTC - in response to Message 9936.


Perhaps the best solution could be setting up 2 projects under 1 roof (ie. seti multi-beam and astropulse) and have one for slow gpus, one for fast. Default settings would have the slow GPU selected and users could select fast GPU for more credits and longer tasks.


A more efficient approach would be to follow the PrimeGrid model where shorter and longer types of work can be selected within the same project...However, GDF has said elsewhere in the forum (sorry can't find the link just now) that it is not possible to divide up the work this way.


popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 40,277,822
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9959 - Posted: 19 May 2009 | 0:04:41 UTC - in response to Message 9950.

A more efficient approach would be to follow the PrimeGrid model where shorter and longer types of work can be selected within the same project...However, GDF has said elsewhere in the forum (sorry can't find the link just now) that it is not possible to divide up the work this way.



That's exactly what I was referring to... different sub-projects.

How can it not be separated like that?
There are several different types of work going on right now so why could it not be set so that (for example) 79-KASHIF_HIVPR is run on slower cards with longer deadlines and p780000-RAUL on the faster cards with shorter deadlines.

As an added bonus this could help with issues such as the current problems with G90 GPU's...

We would need to be careful re credits, they need to be awarded on the "standard" BOINC formulae basis as being an equal number of credits per Flop, whatever X multiple is used in the BOINC formulae


I was thinking more of a short deadline bonus for granted credit. The idea behind this is that people would need encouragement to select the longer wu's with shorter deadlines. If we had gtx295's running work units meant for slow cards we are right back to square 1.

Bob

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 9966 - Posted: 19 May 2009 | 13:17:28 UTC - in response to Message 9959.


That's exactly what I was referring to... different sub-projects.

How can it not be separated like that?
There are several different types of work going on right now so why could it not be set so that (for example) 79-KASHIF_HIVPR is run on slower cards with longer deadlines and p780000-RAUL on the faster cards with shorter deadlines.

As an added bonus this could help with issues such as the current problems with G90 GPU's...



Many projects have setup different projects (i.e., different stats, website, servers, etc.) for different sub-projects such as Beta versions or, in the case of Milkway@home, separate cpu and gpu projects--your original post seemed to sound more like this...sorry if I misread it.

Yes, there are numerous different types of workunits, and in my few months here, I have seen no less than a dozen workunit types (maybe more). I think that this fairly rapid change in workunit types is what prevents the different subproject setup.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10026 - Posted: 21 May 2009 | 11:09:53 UTC - in response to Message 9913.
Last modified: 21 May 2009 | 11:10:23 UTC

Paul wrote:
One answer is that we keep more control when issuing to unreliable hosts. And, pair them up with reliable hosts but now we also have to self limit the task on the matching reliable host. So, we know that the unreliable host is going to do 30 TS in the one hour period and so when we send it to the reliable host we would know to limit to that number of TS.


You can not know exactly how many TS a card will complete within a given time. Unless you set the number of TS to begin with.. which is the current scheme. The problem is that you can not know what the user is doing with the PC: how much crunching time is lost to gaming, watching videos, moving windows in Aero or "don't use GPUs while PC is in use"?

That's why I opted for "let's level the playing field and give them the same desired runtime and send the WUs out at the same time". Not sure if there's anything more we could do to keep runtimes under control. Except for the suggestion made by Scott (which would be a minor correction, if WUs can be send out at the same time.. otherwise a huge correction).

Which is why I was asking back in the beginning... are we a solution in search of a problem? Only the project types can tell us that.


Yeah, I agree. I think we reached the point where we brainstormed enough ideas. So what we'd need to continue is some feedback:
    - what's possible?
    - what's neccessary?
    - what's desired?


I guess project staff has their hands busy with debugging the recent problems.. but GDF already said they're watching this thread carefully. So let's give our tortured brains a brake ;)

Scott wrote:

A more efficient approach would be to follow the PrimeGrid model where shorter and longer types of work can be selected within the same project...However, GDF has said elsewhere in the forum (sorry can't find the link just now) that it is not possible to divide up the work this way.


That sounds straight forward to set up: offer WUs of normal length and maybe 1/2 and 2 times that length. Give the user a preference in the account setting. Adjust these times as needed due to the emergence of faster cards and to balance server load.

The main problem which I see here is that it would split up the pool of WUs. If too many WUs are generated for one of the runtimes these will be lagging behind, whereas other runtimes may even run dry. If this idea was adapted it would require changes in the server software to dynamically adjust the runtime / number of steps within one "set of simulations", when ever new WUs are created. Otherwise load balancing would suck.

MrS
____________
Scanning for our furry friends since Jan 2002

popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 40,277,822
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10044 - Posted: 21 May 2009 | 19:17:54 UTC - in response to Message 10026.

The main problem which I see here is that it would split up the pool of WUs. If too many WUs are generated for one of the runtimes these will be lagging behind, whereas other runtimes may even run dry. If this idea was adapted it would require changes in the server software to dynamically adjust the runtime / number of steps within one "set of simulations", when ever new WUs are created. Otherwise load balancing would suck.

MrS


I wouldn't think that much work is needed.
If I understand the current system correctly (each simulation has a task created that is run for x steps then once returned another task is created to continue that simulation for another x steps and several simulations are run in parallel)
Then each simulation could be given to a specific length. Sure the simulations would finish at different rates but does that matter?
As for run times running dry well that happens already without this so what would be the difference? I'm sure there is plenty of work to go around so more work could be added where needed.

Bob

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10045 - Posted: 21 May 2009 | 19:52:43 UTC - in response to Message 10044.

GDF said they use the concept of reliable hosts and a higher initial replication to speed up WUs (or better, simulations) which are lagging behind the others. Imagine they run a parameter study and many simulations in parallel with one parameter being different. In such cases you'd want them all back before you start your analysis on what happened.. which may be needed to decide on what the next set of simulations should do.

I have the impression it's important to have load balancing available so most simulations finish approximately at the same time (give or take a few days, of course).

As for run times running dry well that happens already without this so what would be the difference? I'm sure there is plenty of work to go around so more work could be added where needed.


Imagine the following: most users chose the long tasks. The project issues most simulations for this "project" or however you want to call it. No, due to some reason, users switch to the standard runtime. There wouldn't be enough work issued over there and noone would finish the simulations associated to the long-WU-crew.. which may be needed to generate new work.

Sounds unlikely you may say. Well.. yes. But if I were a developer I'd want to be sure I could handle load balancing between projects / runtimes. anything else is asking for trouble at some point.

MrS
____________
Scanning for our furry friends since Jan 2002

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 10051 - Posted: 21 May 2009 | 21:01:57 UTC - in response to Message 10045.

Regarding load balancing, there is already an option for the user to check to accept other types of work if the preferred work is not available. Make this an always on option (or perhaps switched on automatically for reliable hosts), and that might solve the load balance issue.


ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10053 - Posted: 21 May 2009 | 21:48:06 UTC - in response to Message 10051.

Currently this option is not used at all.. or not that I know. But it could be used.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10277 - Posted: 28 May 2009 | 22:04:30 UTC - in response to Message 10053.

I think the whole concept of task competitiveness is moronic - especially coming from people with scientific backgrounds. It’s just too wasteful.
The amount of lost work must be huge for this project.

I’m sure there are thousands of people that have added this project in Boinc and just gave up after repeatedly getting no credit for work units.
Is there some sensible reason, I am not aware of, to turn people away from the project?

To be perfectly honest, as soon as there is another descent project out there that can utilise CUDA, I will be with it.

I will only be happy if I know my computer is doing worthwhile work – its me paying the electric bill, and its not cheap to run 150W graphics cards!

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 10278 - Posted: 29 May 2009 | 0:07:33 UTC - in response to Message 10277.

I’m sure there are thousands of people that have added this project in Boinc and just gave up after repeatedly getting no credit for work units.
Is there some sensible reason, I am not aware of, to turn people away from the project?


Under what circumstances is nil credit given when work has been completed? If someone is processing a WU that another completes while they are in mid crunch, they still get credits, hence I am a bit confused by the statement - can you expand a litle on what you mean by that?

Regards
Zy

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10282 - Posted: 29 May 2009 | 4:41:18 UTC

Tasks canceled are those that have not been started. So, no work is lost. If you start it, you can still get full credit for your work. But, if someone else has a task and they do not need the one you have, but have not yet started, it can be canceled.

The available projects that use the Nvida GPU now include:

SaH, SaH Beta, The Lattice Project, Ramsey, Aqua, and soon we hope MilkyWay...

I know that SaH, SaH Beta, Ramsey and Aqua are issuing GPU work at this time ...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10306 - Posted: 29 May 2009 | 17:06:39 UTC - in response to Message 10278.
Last modified: 29 May 2009 | 17:07:42 UTC

I have had a number of tasks cancled/stopped/deleted mid run. The system did this not me. I dont know why but I do know that some were even scrapped after reaching about 85% completion.

Im not worried about bandwidth here, just processing for 2 days only to have the server scrap the calculations. Especially when the task is over 80% and the deadline is still days away; no issue of finishing in time.

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 10314 - Posted: 30 May 2009 | 3:34:51 UTC - in response to Message 10306.

Can you post a link to/identify which ones had that happen?

Regards
Zy

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10327 - Posted: 30 May 2009 | 13:28:33 UTC - in response to Message 10314.

Here is the message:
703993 481150 22 May 2009 13:06:15 UTC 25 May 2009 16:47:19 UTC Over Client error Compute error (cpu time)5,460.94 (claimed credit)4,531.91 (Granted Credit)--- None.

Here are the links to this:
http://www.gpugrid.net/result.php?resultid=703993

http://www.gpugrid.net/workunit.php?wuid=481150

The Link Details:
stderr out <core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 8600 GT"
# Clock rate: 1300000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 4
# Number of cores: 32
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce 8600 GT"
# Clock rate: 1300000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 4
# Number of cores: 32
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
# Using CUDA device 0
# Device 0: "GeForce 8600 GT"
# Clock rate: 1300000 kilohertz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 4
# Number of cores: 32
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.

</stderr_txt>
]]>

My attempt at a solution:
I assumed that the error has nothing to do with the projects/Task/Workunit (as I can do nothing about that, other than post messages).
so, I changed the card to an 8800GT, upgraded Boinc, the Nvidia CUDA code and Video drivers. I also popped in a Phenom II 940 (better instruction set). The system is now a 5.9 everything, so hopefully this problem will not happen again.

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 240,628,285
RAC: 4,734,909
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 10329 - Posted: 30 May 2009 | 15:41:52 UTC - in response to Message 10327.

The task you've linked wasn't cancelled by the server but it had a computation error...

<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>

____________

pixelicious.at - my little photoblog

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10339 - Posted: 30 May 2009 | 23:15:45 UTC - in response to Message 10329.

The error happened during scheduled server communications. I know the log file reports it as a computational error, but that is as vague an error message as you’re ever going to find!
It only said that on the server too.
On my system it made no mention of any error!

I think the error occurred because the server called in the data before the job was finished (about 3 hours short and 2 days to spare) and viewed the data as erroneous. I don’t debug so I can’t interpret the data.

Profile Dingo
Avatar
Send message
Joined: 1 Nov 07
Posts: 20
Credit: 128,376,317
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10347 - Posted: 31 May 2009 | 18:33:43 UTC

So I think that if a user has crunched a WU and it is still within the time I should get credit for it. Look at this wu. I had crunched it for 64,753.00 secs but got nothing as the server canceled it.

http://www.gpugrid.net/result.php?resultid=564654
____________

Proud Founder and member of



Have a look at my WebCam

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 240,628,285
RAC: 4,734,909
Level
Leu
Scientific publications
watwatwatwatwatwatwatwat
Message 10349 - Posted: 31 May 2009 | 18:44:07 UTC - in response to Message 10347.

This one got not cancelled because of a redundant result but because of something else. You can see minimum quorum =1 and initial replication=1. Maybe it got cancelled because it was part of a bad batch of tasks...

So it was better to cancel it server-side than to let it run even longer and let it error out (which would also give you 0 credits)...
____________

pixelicious.at - my little photoblog

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 10351 - Posted: 31 May 2009 | 19:30:18 UTC - in response to Message 10349.

Are you sure it is not a hardware issue ? I get these "Incorrect function. (0x1) - exit code 1 (0x1)" quite often, but only on GPU3. I have now completely underclocked this unit.
____________
Join team Bletchley Park, the innovators.

Jeff Harrington
Send message
Joined: 7 Apr 09
Posts: 2
Credit: 1,614,790
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwat
Message 10361 - Posted: 1 Jun 2009 | 16:06:22 UTC - in response to Message 9241.

Why have two cards compete against each other? Wouldn't it be possible to do one of the following:

1. Eliminate the competition altogether and have all cards work on separate work units.

2. Employ a SETI@Home resolution where they send out the same work unit to three crunchers and require a minimum of 2 comparable results to reach a 'quorum' and view the work unit as satisfactorily completed and granting credit to all who submitted the verified work unit based on the lowest credit granted. In other words, this would penalize those with faster cards because they get less credit than would otherwise be granted.

Jeff Harrington
Send message
Joined: 7 Apr 09
Posts: 2
Credit: 1,614,790
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwat
Message 10362 - Posted: 1 Jun 2009 | 16:17:15 UTC - in response to Message 10361.

Two things to consider:

First and foremost, I don't care about competition or seeing how many work unit I can complete in a given day or how many work units other users accumulate against me. I care about doing worthwhile research. If my work units are getting canceled because there are others out there beating me to the punch, then I view it as not necessary and I will go donate my GPU to SETI@Home, they are always in need of crunchers. Granted, I prefer doing medical research because that may actually pay off versus listening to a signal.

Second, if my work units are canceled as redundant, then wrapping back to the first reason, it is a waste for me to donate my computers time in terms of electricity. It is a cost to me to keep my computer running full time, even while I am not using it.

Maybe its not my place to say this, but you should really get your system for sending work units out to people in line. Worry less about competition and more on just getting as many verified work units done. Competition is a waste...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10392 - Posted: 2 Jun 2009 | 21:29:28 UTC - in response to Message 10282.

[quote]
The available projects that use the Nvida GPU now include:

SaH, SaH Beta, The Lattice Project, Ramsey, Aqua, and soon we hope MilkyWay...
quote]

Thanks for the info.
I like the look of Aqua, and I have signed up to that project. I dont find the others too apealing - maths for the sake of maths is not my thing! Perhaps I will opt into the Milkyway project when they get their act together.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10396 - Posted: 2 Jun 2009 | 21:53:03 UTC - in response to Message 10392.

Thanks for the info.
I like the look of Aqua, and I have signed up to that project. I dont find the others too apealing - maths for the sake of maths is not my thing! Perhaps I will opt into the Milkyway project when they get their act together.


Aqua is looking at quantum computing ... pure math application ... :)

Also, though they will be releasing information into the public domain, be aware that the project is being run by a commercial firm which is using the data to perfect their product line. In other words, it is not a university project that is fully in the public domain...

Not trying to talk you out of the project, just so you have all the facts ... :)

Also, they are having similar problems with tasks crashing, locking up, and running overly long ... one of the reasons I stopped contributing there ... too high a risk and they have not yet implemented trickles though they did start looking into them ...

pelpolaris
Send message
Joined: 10 Nov 08
Posts: 8
Credit: 876,616,559
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10458 - Posted: 9 Jun 2009 | 8:44:34 UTC - in response to Message 10392.

Nvidia-GPU on Linux works only with the GPUGRID-project .

I newly tried the D-Wave's Adiabatic QUantum Algorithms - CUDA Enabled for Linux on 64 platform without any success. However they, today, have released the 3.23 version and I look forward to add AQUA to the short list of CUDA enabled project for Linux.

Others are still only enabled for Windows/x86. Ramsey-GPU is NOT listed as enabled on the official Ramsey app. list.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10500 - Posted: 12 Jun 2009 | 20:10:13 UTC

SKGiven,

I hope it's clear by now that your WUs are not being canceled while they run but that they error out and you therefore get no credit. If you suspect it happens because of server communication I suggest the following: set a cache of >1 day and restrict network usage to ~1h a day. Let it run for some time and take a look: if your WUs error out when there couldn't have been server communication you know for sure that your assumption is wrong.

And you likely won't see an error reported by the BOINC client as the GPU-Grid app detects the error, logs it and gracefully shuts down.

Dingo,

apparently you're using a very old BOINC client (5.x) which should not even be able to work with GPU-Grid! And it reports GPU-Grid app version 5.03, which must be wrong. Under these circumstances I wouldn't guarantee for anything.

Jeff Harrington,

sorry, but your suggestion sucks ;)

1. Eliminate the competition altogether and have all cards work on separate work units.


That's the usual mode.

2. Employ a SETI@Home resolution where they send out the same work unit to three crunchers and require a minimum of 2 comparable results to reach a 'quorum' and view the work unit as satisfactorily completed and granting credit to all who submitted the verified work unit based on the lowest credit granted. In other words, this would penalize those with faster cards because they get less credit than would otherwise be granted.


Here the credits per WU are determined by the amount of work they contain (i.e. flops neccessary to complete them). This is much more precise and fair than any time or benchmark-based system could ever be.

Furthermore what you propose as standard solution is the worst case in our current solution. If it works out well we're more efficient than that.

From reading your 2nd post I get the feeling you completely miss the point of this thread and what's currently being done. Sorry, it's a long thread already.

So let me just quickly state the core points again: if WUs are sent out to more than one GPU and one result is successfully returned, the server cancels the other results upon the next scheduler contact of those hosts if, and only if the WUs had not been started yet. Otherwise the other hosts can finish the WUs regularly and receive credits just as usual.

Canceling those WUs avoids wasting cpu time, not the other way around! It would actually be even more efficient to cancel WUs already in progress, but the credit system couldn't handle this, so it's a no-go.

And calling it a competition is somewhat misleading.. you may want to read the initial posts again. If it's still not clear I could probably summarize this issue.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11122 - Posted: 12 Jul 2009 | 7:47:58 UTC - in response to Message 10500.

So, after all these posts, some things we are going to change.
1) We will be using much less target_nresults = 2, so everyone has his own result.
2) We will upload a new application which is compiled for 1.3 cuda compute capability (CC) cards (216,280,etc). This allows us to use some optimization and the code is faster.

So, there will be two apps, a 1.1 CC compliant and a 1.3. Length of 1.3 WUs will be at least twice as long, we will have a user preference to select only the 1.3 app if you wish.

gdf

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11207 - Posted: 20 Jul 2009 | 19:02:27 UTC - in response to Message 11122.

Sounds like a very good idea! Effective enough to (hopefully) make (some) people happy and simple enough so it can be handled.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Number crunching : Cancelled by Server - Suggestion

//