Development BOINC 6.10.7 released

Message boards : Graphics cards (GPUs) : Development BOINC 6.10.7 released

Author	Message
MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12703 - Posted: 24 Sep 2009 \| 7:31:20 UTC Last modified: 24 Sep 2009 \| 7:41:02 UTC
	Another new one. Currently just for Windows. Below is the offical change log. Report any problems you get with it to the Alpha email list. This list needs registration. Change Log: - client: if a file fails verification, delete it. - client: tweak CPU scheduling policy to avoid running multithread apps overcommitted. Actually: allow overcommitment but only a fractional CPU (so that, e.g., we can run a GPU app and a 4-CPU app on a 4-CPU host) - client: fix bug that caused unstarted coproc jobs to preempt ones already running. The problem: we considered a job as started if it has an ACTIVE_TASK. However, we were creating ACTIVE_TASKS for jobs before deciding to run them, because we needed a place to store the coproc reservations. This caused the above bug, and also had the undesirable effect of creating slot directories before they're needed. Solution: store coprocessor reservations in RESULT rather than ACTIVE_TASK. - client: extra debug msgs (remove when done) - client: fix preemption bug, this time fer sure! ____________ BOINC blog
	ID: 12703 \| Rating: 0 \| rate: /

JackOfAll Send message Joined: 7 Jun 09 Posts: 40 Credit: 24,377,383 RAC: 0 Level Scientific publications	Message 12708 - Posted: 24 Sep 2009 \| 9:09:33 UTC - in response to Message 12703. Last modified: 24 Sep 2009 \| 9:10:24 UTC
	Fedora 11 x86_64 RPMS http://www.vacuumtube.org.uk/folding/fedora/11/unstable/x86_64/boinc-client-6.10.7-2.fc11.x86_64.rpm http://www.vacuumtube.org.uk/folding/fedora/11/unstable/x86_64/boinc-client-doc-6.10.7-2.fc11.noarch.rpm http://www.vacuumtube.org.uk/folding/fedora/11/unstable/x86_64/boinc-manager-6.10.7-2.fc11.x86_64.rpm
	ID: 12708 \| Rating: 0 \| rate: /

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12711 - Posted: 24 Sep 2009 \| 11:21:16 UTC
	Well they seem to have splatted the cuda preempting bug in this version. Paul has raised concerns about the amount of cuda work fetched, which looks like it might be a problem. Its been a problem from the 6.6.x days really but we need to get logs and such so the developers can work out whats going on. ____________ BOINC blog
	ID: 12711 \| Rating: 0 \| rate: /

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,859,486,851 RAC: 9,977,254 Level Scientific publications	Message 12712 - Posted: 24 Sep 2009 \| 11:33:20 UTC - in response to Message 12711.
	Well they seem to have splatted the cuda preempting bug in this version. Paul has raised concerns about the amount of cuda work fetched, which looks like it might be a problem. Its been a problem from the 6.6.x days really but we need to get logs and such so the developers can work out whats going on. There have been problems with the amount of work fetched, both here and at AQUA, because of inaccurate project estimates and DCF interactions. I read Paul's report, and it was unclear whether this possibility had been ruled out in the case he quoted.
	ID: 12712 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12720 - Posted: 24 Sep 2009 \| 16:12:49 UTC - in response to Message 12712. Last modified: 24 Sep 2009 \| 16:17:29 UTC
	Well they seem to have splatted the cuda preempting bug in this version. Paul has raised concerns about the amount of cuda work fetched, which looks like it might be a problem. Its been a problem from the 6.6.x days really but we need to get logs and such so the developers can work out whats going on. There have been problems with the amount of work fetched, both here and at AQUA, because of inaccurate project estimates and DCF interactions. I read Paul's report, and it was unclear whether this possibility had been ruled out in the case he quoted. I think this mornings information rules that out... There are at least two remaining problems ... one, which may or may not be minor, I see inconsistent updates to the debt numbers. This is for ATI GPUs only in that I have not tried this version on my CUDA systems with multiple GPU projects (all my CUDA to this point has been aimed at GPU Grid). This is the main issue I tried to document since yesterday and this morning, in part because I think it more likely that they will address it ... The second and to my mind larger problem is that the aim to now is to run GPU tasks in FIFO order. I am not clear as to why this was done as I could not follow the arguments, but, it seems to me that some of it was because of the two bugs Richard mentioned. Both now thankfully dead. This second situation arises with MW and Collatz because of the disparities in run times 00:52 MW and 17:09 Collatz and because MW restricts the downloads to 24 (on my system) ... the net effect is that Collatz will download up to 90 tasks (25 hours run time) and MW 24 minutes ... run that in strict order and my 800 to 25 resource share is inverted ... to say the least ... You can only see this if: - you watch the execution patterns: UCB doesn't; or, - You wade through 32M logs (with debugs turned on you get lots of stuff): UCB doesn't, or, - You trust those that report: UCB doesn't, or, - You think about the descriptions of execution patterns, UCB hasn't, at least not yet ... I suspect that this is going to be an issue with almost any selection of GPU projects if you pay attention. I suspect that it will be worse with projects that have task limits MW and GPU Grid though execution time disparities is more likely to be the driving issue. When I get my GTX 280 card back today and running again I think I am going to turn one of my systems from dedicated to GPU Grid to a split system and then see where it goes. In this case, I may share it with MW, Collatz, and GPU Grid and see if it runs balanced or not ... I suspect not ... Last point, to this recent point in history there has been almost no one that has been running more than one GPU project at a time on a system. We are only now able to attach to multiple GPU capable projects and so this is virgin territory. For CUDA the only real choice (IMHO) has been GPU Grid, others likely opted for only SaH... but now one has GPU Grid, MW and Collatz as CUDA choices and MW and Collatz as ATI choices ... and now we are seeing the issues ... {edit - add} Oh, and if you have been avoiding 6.10.4 through .6 I am moving my recommendation to "suggested" over 6.10.3 ... to play safe stay with 6.10.3 till I have another couple days ... but 6.10.7 looks like the next stable version that is usable ...
	ID: 12720 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12725 - Posted: 24 Sep 2009 \| 19:38:30 UTC - in response to Message 12720.
	There remains the work fetch issue regarding GPU and CPU mixes - 6.10.x (including 6.10.7) STILL goes looking for CPU tasks from GPU projects (including GPUGRID and Collatz) and still goes looking for GPU tasks from CPU only projects (Spinhenge, POEM, etc.). That strikes me as something which could and should be fixed. Seems that few have observed this, or that any who have don't seem bothered by it -- at least on the developer side. Last point, to this recent point in history there has been almost no one that has been running more than one GPU project at a time on a system. We are only now able to attach to multiple GPU capable projects and so this is virgin territory. For CUDA the only real choice (IMHO) has been GPU Grid, others likely opted for only SaH... but now one has GPU Grid, MW and Collatz as CUDA choices and MW and Collatz as ATI choices ... and now we are seeing the issues ... {edit - add} Oh, and if you have been avoiding 6.10.4 through .6 I am moving my recommendation to "suggested" over 6.10.3 ... to play safe stay with 6.10.3 till I have another couple days ... but 6.10.7 looks like the next stable version that is usable ...
	ID: 12725 \| Rating: 0 \| rate: /

Ingleside Send message Joined: 22 Sep 09 Posts: 3 Credit: 0 RAC: 0 Level Scientific publications	Message 12727 - Posted: 24 Sep 2009 \| 21:15:58 UTC - in response to Message 12725. Last modified: 24 Sep 2009 \| 21:20:47 UTC
	There remains the work fetch issue regarding GPU and CPU mixes - 6.10.x (including 6.10.7) STILL goes looking for CPU tasks from GPU projects (including GPUGRID and Collatz) and still goes looking for GPU tasks from CPU only projects (Spinhenge, POEM, etc.). That strikes me as something which could and should be fixed. Seems that few have observed this, or that any who have don't seem bothered by it -- at least on the developer side. Well, how do you test-out to see if a project has added support for new hardware, if you don't ask about this ocassionally? Currently the only way client knows this is to send a scheduler-request for work. And, since projects now can give upto 4 weeks deferral for a hardware-resource they don't currently have any application for, there isn't really any big reason to change client further. WCG is now using this new functionality, and last GPU-request included this in scheduler-reply: <cuda_backoff>604800</cuda_backoff> <ati_backoff>604800</ati_backoff> v6.10.7 immediately detected this, and next GPU-request to WCG is deferred for 7 days, as told by WCG's scheduling-server. Hmm, not sure, but it can look like the deferrals will be included in all Scheduler-replies, except then server is down. If so, as long as client is ocassionally connecting to ask for supported work or report results or send trickle-up, it will never ask for the unsupported work, except if you manually hits "update"... So, it's just for the various projects to add the neccessary functionality to their server, and choose upto a 4-week deferral...
	ID: 12727 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12730 - Posted: 25 Sep 2009 \| 1:38:00 UTC - in response to Message 12727.
	Well, how do you test-out to see if a project has added support for new hardware, if you don't ask about this ocassionally? Currently the only way client knows this is to send a scheduler-request for work. You use the publish model. Instead of having millions of requests constantly asking about something that may never happen ... you publish to the client new capabilities. If you need to make the client aware of a new capability ... But, the bottom line is, I really don't want a project deciding what it is going to run where ... and it is senseless for BOINC to be asking this "question" over and over and even this fix is lame ... they made a flawed decision and instead of acknowledging that, come up with this lame "well, the project can increase the back off" ... the client should not be asking in the first place. When the project comes up with a new capability the publish it in the news as they should and then I will make the adjustments in the settings on the site and THEN have my client start asking for work for the new resource. Today I decided to try the CUDA app at MW and when I turned on my machine to get some work it took I thin 6 to 8 CPU work requests before the client would ask for GPU work ... and I had done the update to tell the client that I would only be running GPU work on this machine ... as BarryAZ notes, this is something that does not make sense ... If you have done any systems work you know that the primary rule is that you do nothing that you do not have to ... you run no module, no test, no code that you do not absolutely need to run ... except in BOINC ... I am having a debate with JM VII on the mailing list about just this subject ... he is appalled that I would want to test his system for quality to make sure that it is returning valid results and in turn find out how fast it is besides ... and he considers that a complete waste ... But this death of a thousand cuts pinging on the servers and running RR SIM and other code in the client as often as 6 or more times a minute he is fine with ... even though there is no real need to do so ... and makes some of the resource scheduing bugs much more severe than they might other wise be ... Anyway ... it is a bad design decision ... but now that Dr. Anderson has had the idea it is going to linger like the smell of my spouse's dead fish ...
	ID: 12730 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12733 - Posted: 25 Sep 2009 \| 6:09:43 UTC - in response to Message 12730. Last modified: 25 Sep 2009 \| 6:10:45 UTC
	I posted this over on the BOINC client message board in response to Richard's comment of "the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application." I'd note that he ignores that the other thing I gain by moving back to say 6.4.5 is not only do I ready GET work from projects like POEM or Spinhenge since they only support CPU work and will fail a 'too quick' work fetch (which you generate when you ask for non-existent GPU work), but also, I don't extraneously ping the servers (as Paul noted). He's also ignoring that the current work fetch routine FAILS way too often. But as Paul noted, this no doubt is a losing effort regarding the client, and since the multi-project concept of BOINC (hello BOINC developers) means it makes sense to have multiple GPU and multiple CPU projects attached, I find myself compelled to use the later troublesome, noisy client unless I am either CPU only (5.10.45 is lovely there), or GPUGrid as a single CUDA project (6.4.5) is OK here UNLESS GPUGrid forces CUDA 2.3). Why the development folks won't simply set things up to read the user account settings and then have them stored locally on the workstation as part of the account_project file is now more a matter of developer ego than lack of feasibility. *** what posted over there ******* That might be what you want to do with 'default users' -- those who install, attach and don't do anything else. There used to be a LOT of those folks back in the old Seti@home days. To the extent that many of them are running BOINC today, I suspect they are using legacy clients and haven't changed much since they set up BOINC on their computers. But these days, those remaining in the BOINC population as active include a fair number of folks who actually have hardware configurations matched to the projects they attach to. These folks (including myself) support multiple projects -- some of which support 'everything' (MW as an example = though their CUDA and ATI support is only double precision' or Collatz -- which supports the broadest combination out there as well as CPU), projects which specifically are GPU only (like GPUGrid), as well projects which are CPU only (like Climate, POEM, Spinhenge, Malaria and a number of others. The thing is, where a project supports multiple hardware configurations, the user can configure the account for GPU or CPU or GPU and CPU. And for those with a broad range of hardware they can configure Home, Work, School -- for different combinations. I suppose on the project side, CUDA or ATI might be an additional choice that could help things there as well -- but ONLY if the client has the capability to do the account/local hardware check BEFORE it goes out to the project and asks. It just seems that while all the extra 'stuff' is going into the Client that some means to let the installed client glean that information and use it, thus reducing user observational frustration and project I/O traffic. But as I said, I'm probably just simply too stupid to see things the right way here. ***********
	ID: 12733 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12747 - Posted: 25 Sep 2009 \| 15:47:42 UTC - in response to Message 12733.
	I posted this over on the BOINC client message board in response to Richard's comment of "the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application." I would note that even with 6.10.x versions people are still having a hard time getting this to work properly if I am reading the messages correctly. Or to put it another way ... it still is not working as advertised. I'd note that he ignores that the other thing I gain by moving back to say 6.4.5 is not only do I ready GET work from projects like POEM or Spinhenge since they only support CPU work and will fail a 'too quick' work fetch (which you generate when you ask for non-existent GPU work), but also, I don't extraneously ping the servers (as Paul noted). He's also ignoring that the current work fetch routine FAILS way too often. I noted this in a post this AM to the Alpha list where, as a consequence of design choices, the support for multiple projects is slowly being compromised for reasons that are not entirely clear to me (well, maybe I am stupid, but I still don't see it). And options that would allow the participant more control have been rejected (like being able to say don't try to get more than one CPDN task at a time (though when in the last few days overlap would be acceptable) ... But as Paul noted, this no doubt is a losing effort regarding the client, and since the multi-project concept of BOINC (hello BOINC developers) means it makes sense to have multiple GPU and multiple CPU projects attached, I find myself compelled to use the later troublesome, noisy client unless I am either CPU only (5.10.45 is lovely there), or GPUGrid as a single CUDA project (6.4.5) is OK here UNLESS GPUGrid forces CUDA 2.3). I will note on the other hand that judgingby some of the complaints (historically) the way BOINC handles single projects is not that effective either ... though I have not been to the SaH boards in months ... :) That might be what you want to do with 'default users' -- those who install, attach and don't do anything else. There used to be a LOT of those folks back in the old Seti@home days. To the extent that many of them are running BOINC today, I suspect they are using legacy clients and haven't changed much since they set up BOINC on their computers. I talk to a guy who knows a bunch of people that were heavy crunchers in SETI@Home Classic and they never made the transition to BOINC. My hazy memory says that almost half of the raw processing power of the project never made the transition. They opted out so to speak. When you strip away all the noise the fundamental issue was that UCB/Dr. Anderson did not listen to them and their concerns. Note almost all of those people were of the large farm class. They had lots of machines and lots of power and we lost all of it ... Years later and it is the same ... Dr. Anderson and the cohorts at UCB may be smarter than me, they may be smarter than you ... but they are not smarter than all of us put together ...
	ID: 12747 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12749 - Posted: 25 Sep 2009 \| 18:27:26 UTC - in response to Message 12747.
	A follow up and my response -- from Richard (note his passing comment about the great DA at the end of his post) OK -- and sorry about the testiness of my replies as well -- it seems there is that classic 80% of agreement. With POEM -- it isn't so much the back off cycle (theirs is quite short - a couple of minutes and not progressive), but rather that subsequent requests by the client are STILL GPU only. POEM does not support GPU and like a number of projects does not appear to have the inclination or resources to develop a GPU application. With Spinhenge, the backoff cycle is a non-progressive 15 minutes. Curiously enough, the 6.10.x client doesn't appear to be repetitive about GPU requests there, just once and it reverts to CPU. Spinhenge is similarly not supporting or likely to support GPU. With GPUGrid, I found it curious that the request was for CPU -- again, the project is a 'single mode' project -- GPU and for that matter CUDA GPU only. I have seen queries for ATI GPU and CPU work in the client messages on the workstation -- that's just wrong to my way of looking at things. With the implementation of ATI GPU support (YES -- a GOOD thing), the matter gets a bit more complicated. In my view, ideally, at the account/preferences/resource share and graphics settings one should have the capability (for the default and by the three available groups) to control use to cover all the options -- instead of the current use GPU if available (Yes/No), Use CPU (Yes/no), there should be a USE ATI GPU (yes/no), USE CUDA GPU (yes/no), Use CPU (Yes/no). Different settings could be configured by the user for each group should they so wish (for example on Collatz which has the broadest support -- I'd have a use ATI GPU group and a use CUDA GPU group and so on and set my workstations to be part of the specific group which matches the hardware). These preferences should then get downloaded to the specific workstation to the account_project file and read by the client as a control for the type of work fetch it should use. When first adding a computer to a project, it would pull the settings off the default configuration which then could be changes by the user by switching the computer to the appropriate group. (It might be a nice 'advanced' feature when joining a new computer to a project where you are an existing user to specify which group the computer belongs in at the outset. The idea behind this is to have the client work fetch be targeted to getting work which matches the workstation configuration and not to waste time and cycles pinging the project servers -- many of which are currently stressed out with I/O traffic (SETI is not the only one running at (or over) the edge regarding I/O traffic). It would also calm the noise level of troublemakers like me (and Paul for that matter) <smile>. A portion of Richard's reply ************************** I don't know how long POEM makes you wait between requests, or why they've chosen to introduce the delay: are the delays following the twice-daily GPU ping (actually, presumably now four times a day, twice for CUDA and twice for ATI) sufficient to explain your work drought with v6.10.7 on their own? Or could their be other mechanisms at play, like the changing definitions of long term debt and 'overworked'? You'll probably need to get deep down and dirty with "work fetch debug" logging flags before you can explain exactly where the current mechanism is breaking down, and that's a necessary first step before fixing it. Overall, I agree with your general thrust that use/don't use resource switches should operate at the client level, project by project and under user control (I'm a great believer in giving users choice over how the resources they're donating are used), but it will take a concerted and well-documented effort to persuade David Anderson that this is the way forward. *****************************************************
	ID: 12749 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12755 - Posted: 26 Sep 2009 \| 3:04:49 UTC - in response to Message 12749. Last modified: 26 Sep 2009 \| 3:05:51 UTC
	OK -- and sorry about the testiness of my replies as well -- it seems there is that classic 80% of agreement. Um, did I give you the impression I thought you were being testy? My fault as that was not intended. I was just trying to amplify and clarify some points. So I am sorry you think I only agree with 80% of what you are saying ... heck, read my latest on the mailing list ... As to that last point ... Richard is dreaming ... it was a bad design choice, but because UCB never makes mistakes and they slapped in the change to the back-off it is unlikely that this "feature" is going to change ... ever ...
	ID: 12755 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12758 - Posted: 26 Sep 2009 \| 4:29:05 UTC - in response to Message 12755.
	Paul, I copied that message over from another message board - that 'testy' comment was to Richard -- his post riled me up and he apologized for its tone so I completed the loop and apoligised for my tone -- over there. Um, did I give you the impression I thought you were being testy? My fault as that was not intended.
	ID: 12758 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12759 - Posted: 26 Sep 2009 \| 4:33:28 UTC - in response to Message 12755.
	Paul -- no, you and I agree on this issue almost completely -- that 80% message was to Richard. I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. As to WHY DA can't/won't understand that -- well, my wife is the psychoanalyst, not me. I was just trying to amplify and clarify some points. So I am sorry you think I only agree with 80% of what you are saying ... heck, read my latest on the mailing list ... As to that last point ... Richard is dreaming ... it was a bad design choice, but because UCB never makes mistakes and they slapped in the change to the back-off it is unlikely that this "feature" is going to change ... ever ...
	ID: 12759 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12760 - Posted: 26 Sep 2009 \| 4:37:38 UTC - in response to Message 12755.
	Paul, here is an example of just how brain dead the current fetch routine isn note, this particular workstation has a 9800GT, and Collatz is configured for GPU only at the project level: 9/25/2009 9:34:21 PM Collatz Conjecture Sending scheduler request: To fetch work. 9/25/2009 9:34:21 PM Collatz Conjecture Requesting new tasks for CPU 9/25/2009 9:34:26 PM Collatz Conjecture Scheduler request completed: got 0 new tasks 9/25/2009 9:34:26 PM Collatz Conjecture Message from server: No work sent 9/25/2009 9:34:26 PM Collatz Conjecture Message from server: Your computer has no ATI GPU 9/25/2009 9:34:31 PM GPUGRID Sending scheduler request: To fetch work. 9/25/2009 9:34:31 PM GPUGRID Requesting new tasks for CPU 9/25/2009 9:34:36 PM GPUGRID Scheduler request completed: got 0 new tasks 9/25/2009 9:34:36 PM GPUGRID Message from server: No work sent 9/25/2009 9:34:36 PM GPUGRID Message from server: ACEMD beta version is not available for your type of computer. 9/25/2009 9:34:36 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
	ID: 12760 \| Rating: 0 \| rate: /

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12763 - Posted: 26 Sep 2009 \| 5:41:22 UTC
	It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. ____________ BOINC blog
	ID: 12763 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12766 - Posted: 26 Sep 2009 \| 5:54:41 UTC - in response to Message 12763.
	Yeah I noticed -- but a .8 then .9 bump within 24 hours doesn't give rise to confidence. The thing is, the condition causing the problem I posted here isn't considered a bug that should be fixed. It appears that DA LIKES that sort of situation. It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details.
	ID: 12766 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12770 - Posted: 26 Sep 2009 \| 6:07:24 UTC - in response to Message 12763.
	It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. The .9 release ONLY had minor adjustments for ATI configurations and even then there are considerable additional changes that ALSO need to be made to server and the science application(s) before it will be effective ... at least that is what Rom Said on the Collatz board (I think I saw you there ...) As to the rest... Ok, I got confused ... just so you know ... two lonely voices in the wilderness ... but I do think that Richard is one of the good guys ... Rom essentially says that this asking for CPU work or GPU work at the wrong places is not considered a bug at this time ... I pointed out that that *IS* the problem in a nutshell ... UCB does not consider lots of bugs to not be bugs ... As I pointed out in one of my list posts in the last day or so one of the problems is that neither Rom nor Dr. Anderson are heavy users of BOINC ... it is obvious from their comments about issues that this is still true and I know that at least as of last month or so ago (I forget exactly when I had the conversation with someone who is in a position to know) that there is no UCB "lab" where they actually run various versions of BOINC to see what it does ... and does not do ... I mean how am I supposed to take seriously a software development effort that does not appear to use the product that they are developing except casually?
	ID: 12770 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12800 - Posted: 26 Sep 2009 \| 18:06:53 UTC - in response to Message 12770.
	Further, and I have seen this in other non BOINC technical situations, so I know the IT <> User interaction can be 'suboptimal', I get the sense often enough that any problems active users have with the ever changing client iterations are because the users are not using the client the 'right' way (the 'right' way being the small mini-lab environment the developers apparently are using). Like I've said before, my increased 'noise level' on this has been that in the past I was insulated from much of the 'improvements' foisted on folks with newer client development policy. I didn't have GPU supported workstations -- so I went with the 5.10.45 client -- which works fine for Win2K and XP environments, work fetch is as expected there. When I added a couple of Vista workstations, I first went to the 6.18/6.19 client since they incorporated a change to allow the client to start with boot up and not fight the Vista 'protection scheme'. When I started adding some GPU support (9400GT, 9600GT, 9800GT, 250GS) and GPUGrid, I went to the 6.4.5 client -- it had some work fetch quirks, but generally handled things well enough. I tried the 6.6.36 client and got out of that quickly finding its work fetch schema deeply flawed, notwithstanding the gospel of Dave. I would have stayed 'dumb and happy' (the way developers often like to keep users) except for the changes over in Collatz -- with them marrying up support for low end video processors including ATI, along with a requirement for CUDA 2.3, I've been compelled to diddle with the 6.10.x series on a number of workstations in mixed project environments (I typically have 6 to 9 projects on a workstation with 4 to 6 of them being configured as CPU only -- or CPU only at the project level, and 2 to 3 configured as GPU only (or like GPUGrid GPU at the project level). This clearly is an environment with which the developers have very scarce experience or awareness. So, being forced to work with (or against) the 6.10.x client and it 'from above the developers directed force', I've take to 'railing against the machine' and joining you in a number of venues. But I'll not get in the additional channels -- from your comments, I already know the type of response I'd get. I do figure by posting in more open areas, I can seek to increase awareness of the issues amongst the rest of us hoi poloi. Sometimes the journey of a thousand miles begins with a couple of noisy posters <smile>. As I pointed out in one of my list posts in the last day or so one of the problems is that neither Rom nor Dr. Anderson are heavy users of BOINC ... it is obvious from their comments about issues that this is still true and I know that at least as of last month or so ago (I forget exactly when I had the conversation with someone who is in a position to know) that there is no UCB "lab" where they actually run various versions of BOINC to see what it does ... and does not do ... I mean how am I supposed to take seriously a software development effort that does not appear to use the product that they are developing except casually?
	ID: 12800 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12805 - Posted: 26 Sep 2009 \| 19:56:26 UTC - in response to Message 12800.
	Further, and I have seen this in other non BOINC technical situations, so I know the IT <> User interaction can be 'suboptimal', I get the sense often enough that any problems active users have with the ever changing client iterations are because the users are not using the client the 'right' way (the 'right' way being the small mini-lab environment the developers apparently are using). Which, if they had a lab one could even support that mode. If you look at the user stats on Willy's site you will see that the vast majority of participants >50% (IIRC, it has been awhile and it is not something I memorized) run only one project. when you look at small suites of less than 5 projects you cover nearly 90% of all participants. Logic would suggest that the primary focus of BOINC development would be to get it to work to make the single project (or single project with safety project(s)) type users most happy. From there work to make sure that BOINC works well with small suites of projects ... and lastly, to try to make sure that the 3,000 or so of us that run 50+ projects have an adequate tool. And, if the goal is to get more people to run more projects then this should be incentivized. Like a credit bonus based on the amount of cross-poject credit earned each month ... I need to work on that idea! :) Like I've said before, my increased 'noise level' on this has been that in the past I was insulated from much of the 'improvements' foisted on folks with newer client development policy. I didn't have GPU supported workstations -- so I went with the 5.10.45 client -- which works fine for Win2K and XP environments, work fetch is as expected there. The roots of the troubles go all the way back to when the first 4 CPU systems became available. I know, I found a scheduling anomaly and JM VII came up with a fix that was not allowed... we have gone down hill from there ... as more and more changes are piled on and more an more of the original concepts of how BOINC should work are tossed under the bus on account of "because"... When I added a couple of Vista workstations, I first went to the 6.18/6.19 client since they incorporated a change to allow the client to start with boot up and not fight the Vista 'protection scheme'. When I started adding some GPU support (9400GT, 9600GT, 9800GT, 250GS) and GPUGrid, I went to the 6.4.5 client -- it had some work fetch quirks, but generally handled things well enough. I tried the 6.6.36 client and got out of that quickly finding its work fetch schema deeply flawed, notwithstanding the gospel of Dave. I like that ... :) "Gospel" I would have stayed 'dumb and happy' (the way developers often like to keep users) except for the changes over in Collatz -- with them marrying up support for low end video processors including ATI, along with a requirement for CUDA 2.3, I've been compelled to diddle with the 6.10.x series on a number of workstations in mixed project environments (I typically have 6 to 9 projects on a workstation with 4 to 6 of them being configured as CPU only -- or CPU only at the project level, and 2 to 3 configured as GPU only (or like GPUGrid GPU at the project level). This clearly is an environment with which the developers have very scarce experience or awareness. Early to Mid year this year before I hit a two month low spell I demonstrated how some of the internals are being run as often as 6 times a minute on my systems and they are not the fastest or the "widest" though they are faster and wider than most. Add in projects with short run times and you have internal chaos where the running of the Resource Scheduler for all the triggers means that trivial reasons are the cause of the constant reordering of the work schedule. The saddest point is that with a 1 day queue and no project with a deadline less than three or four days in the future means that there is zero schedule pressure ... yet BOINC would go into panic after panic after panic ... The most pathetic thing is that JM VII keeps bringing up a project, now defunct, that had a 6 minute deadline as justification for this lunacy. And I say pathetic because with TSI being 60 minutes the tasks from this mythical project would cost, on average, 30 minutes of processing on running tasks because of preemptions. So, being forced to work with (or against) the 6.10.x client and it 'from above the developers directed force', I've take to 'railing against the machine' and joining you in a number of venues. But I'll not get in the additional channels -- from your comments, I already know the type of response I'd get. I do figure by posting in more open areas, I can seek to increase awareness of the issues amongst the rest of us hoi poloi. Sometimes the journey of a thousand miles begins with a couple of noisy posters <smile>. And I thank you for your support ... :) But history says that it will not matter in the slightest. Sadly the only thing that I think will save BOINC is when and if Dr. Anderson leaves ... I agree he had the one, or two, great idea(s), but that, to my mind, does not excuse the 10,000 blunders that followed ...
	ID: 12805 \| Rating: 0 \| rate: /

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,859,486,851 RAC: 9,977,254 Level Scientific publications	Message 12858 - Posted: 28 Sep 2009 \| 11:02:42 UTC - in response to Message 12759.
	I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. Barry, Check out the BOINC message board again. We have progress: it'll need server updates as well as a new client build, and I can already see some fine tuning needed during testing, but the direction of movement is positive. Berkeley is not deaf, merely hard of hearing!
	ID: 12858 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12872 - Posted: 28 Sep 2009 \| 19:14:59 UTC - in response to Message 12858.
	I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. Barry, Check out the BOINC message board again. We have progress: it'll need server updates as well as a new client build, and I can already see some fine tuning needed during testing, but the direction of movement is positive. Berkeley is not deaf, merely hard of hearing! And it begs the question what made them change their minds ... in famous pigeon expiriments they rewarded randomly and the pigeons developed elaborate "dances" to get the food pellet ... because it is what works ... just as one of my dogs knows that if she scratches on the automatic door that that is what causes it to open... Now if they will start to address the myriad of other issues that are killing us ... like the strict GPU FIFO rule that negates resource share unless you run with no queue (or a very short one (0.1 days has been working, I have not increased it yet, maybe later this week)... What is saddest is that as best I can tell the FIFO rule was added because of the execution order issues caused by bugs that have since been addressed in 6.10.7 ... sigh ...
	ID: 12872 \| Rating: 0 \| rate: /

Message boards : Graphics cards (GPUs) : Development BOINC 6.10.7 released

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12703 - Posted: 24 Sep 2009 \| 7:31:20 UTC Last modified: 24 Sep 2009 \| 7:41:02 UTC
	Another new one. Currently just for Windows. Below is the offical change log. Report any problems you get with it to the Alpha email list. This list needs registration. Change Log: - client: if a file fails verification, delete it. - client: tweak CPU scheduling policy to avoid running multithread apps overcommitted. Actually: allow overcommitment but only a fractional CPU (so that, e.g., we can run a GPU app and a 4-CPU app on a 4-CPU host) - client: fix bug that caused unstarted coproc jobs to preempt ones already running. The problem: we considered a job as started if it has an ACTIVE_TASK. However, we were creating ACTIVE_TASKS for jobs before deciding to run them, because we needed a place to store the coproc reservations. This caused the above bug, and also had the undesirable effect of creating slot directories before they're needed. Solution: store coprocessor reservations in RESULT rather than ACTIVE_TASK. - client: extra debug msgs (remove when done) - client: fix preemption bug, this time fer sure! ____________ BOINC blog
	ID: 12703 \| Rating: 0 \| rate: /

JackOfAll Send message Joined: 7 Jun 09 Posts: 40 Credit: 24,377,383 RAC: 0 Level Scientific publications	Message 12708 - Posted: 24 Sep 2009 \| 9:09:33 UTC - in response to Message 12703. Last modified: 24 Sep 2009 \| 9:10:24 UTC
	Fedora 11 x86_64 RPMS http://www.vacuumtube.org.uk/folding/fedora/11/unstable/x86_64/boinc-client-6.10.7-2.fc11.x86_64.rpm http://www.vacuumtube.org.uk/folding/fedora/11/unstable/x86_64/boinc-client-doc-6.10.7-2.fc11.noarch.rpm http://www.vacuumtube.org.uk/folding/fedora/11/unstable/x86_64/boinc-manager-6.10.7-2.fc11.x86_64.rpm
	ID: 12708 \| Rating: 0 \| rate: /

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12711 - Posted: 24 Sep 2009 \| 11:21:16 UTC
	Well they seem to have splatted the cuda preempting bug in this version. Paul has raised concerns about the amount of cuda work fetched, which looks like it might be a problem. Its been a problem from the 6.6.x days really but we need to get logs and such so the developers can work out whats going on. ____________ BOINC blog
	ID: 12711 \| Rating: 0 \| rate: /

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,859,486,851 RAC: 9,977,254 Level Scientific publications	Message 12712 - Posted: 24 Sep 2009 \| 11:33:20 UTC - in response to Message 12711.
	Well they seem to have splatted the cuda preempting bug in this version. Paul has raised concerns about the amount of cuda work fetched, which looks like it might be a problem. Its been a problem from the 6.6.x days really but we need to get logs and such so the developers can work out whats going on. There have been problems with the amount of work fetched, both here and at AQUA, because of inaccurate project estimates and DCF interactions. I read Paul's report, and it was unclear whether this possibility had been ruled out in the case he quoted.
	ID: 12712 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12720 - Posted: 24 Sep 2009 \| 16:12:49 UTC - in response to Message 12712. Last modified: 24 Sep 2009 \| 16:17:29 UTC
	Well they seem to have splatted the cuda preempting bug in this version. Paul has raised concerns about the amount of cuda work fetched, which looks like it might be a problem. Its been a problem from the 6.6.x days really but we need to get logs and such so the developers can work out whats going on. There have been problems with the amount of work fetched, both here and at AQUA, because of inaccurate project estimates and DCF interactions. I read Paul's report, and it was unclear whether this possibility had been ruled out in the case he quoted. I think this mornings information rules that out... There are at least two remaining problems ... one, which may or may not be minor, I see inconsistent updates to the debt numbers. This is for ATI GPUs only in that I have not tried this version on my CUDA systems with multiple GPU projects (all my CUDA to this point has been aimed at GPU Grid). This is the main issue I tried to document since yesterday and this morning, in part because I think it more likely that they will address it ... The second and to my mind larger problem is that the aim to now is to run GPU tasks in FIFO order. I am not clear as to why this was done as I could not follow the arguments, but, it seems to me that some of it was because of the two bugs Richard mentioned. Both now thankfully dead. This second situation arises with MW and Collatz because of the disparities in run times 00:52 MW and 17:09 Collatz and because MW restricts the downloads to 24 (on my system) ... the net effect is that Collatz will download up to 90 tasks (25 hours run time) and MW 24 minutes ... run that in strict order and my 800 to 25 resource share is inverted ... to say the least ... You can only see this if: - you watch the execution patterns: UCB doesn't; or, - You wade through 32M logs (with debugs turned on you get lots of stuff): UCB doesn't, or, - You trust those that report: UCB doesn't, or, - You think about the descriptions of execution patterns, UCB hasn't, at least not yet ... I suspect that this is going to be an issue with almost any selection of GPU projects if you pay attention. I suspect that it will be worse with projects that have task limits MW and GPU Grid though execution time disparities is more likely to be the driving issue. When I get my GTX 280 card back today and running again I think I am going to turn one of my systems from dedicated to GPU Grid to a split system and then see where it goes. In this case, I may share it with MW, Collatz, and GPU Grid and see if it runs balanced or not ... I suspect not ... Last point, to this recent point in history there has been almost no one that has been running more than one GPU project at a time on a system. We are only now able to attach to multiple GPU capable projects and so this is virgin territory. For CUDA the only real choice (IMHO) has been GPU Grid, others likely opted for only SaH... but now one has GPU Grid, MW and Collatz as CUDA choices and MW and Collatz as ATI choices ... and now we are seeing the issues ... {edit - add} Oh, and if you have been avoiding 6.10.4 through .6 I am moving my recommendation to "suggested" over 6.10.3 ... to play safe stay with 6.10.3 till I have another couple days ... but 6.10.7 looks like the next stable version that is usable ...
	ID: 12720 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12725 - Posted: 24 Sep 2009 \| 19:38:30 UTC - in response to Message 12720.
	There remains the work fetch issue regarding GPU and CPU mixes - 6.10.x (including 6.10.7) STILL goes looking for CPU tasks from GPU projects (including GPUGRID and Collatz) and still goes looking for GPU tasks from CPU only projects (Spinhenge, POEM, etc.). That strikes me as something which could and should be fixed. Seems that few have observed this, or that any who have don't seem bothered by it -- at least on the developer side. Last point, to this recent point in history there has been almost no one that has been running more than one GPU project at a time on a system. We are only now able to attach to multiple GPU capable projects and so this is virgin territory. For CUDA the only real choice (IMHO) has been GPU Grid, others likely opted for only SaH... but now one has GPU Grid, MW and Collatz as CUDA choices and MW and Collatz as ATI choices ... and now we are seeing the issues ... {edit - add} Oh, and if you have been avoiding 6.10.4 through .6 I am moving my recommendation to "suggested" over 6.10.3 ... to play safe stay with 6.10.3 till I have another couple days ... but 6.10.7 looks like the next stable version that is usable ...
	ID: 12725 \| Rating: 0 \| rate: /

Ingleside Send message Joined: 22 Sep 09 Posts: 3 Credit: 0 RAC: 0 Level Scientific publications	Message 12727 - Posted: 24 Sep 2009 \| 21:15:58 UTC - in response to Message 12725. Last modified: 24 Sep 2009 \| 21:20:47 UTC
	There remains the work fetch issue regarding GPU and CPU mixes - 6.10.x (including 6.10.7) STILL goes looking for CPU tasks from GPU projects (including GPUGRID and Collatz) and still goes looking for GPU tasks from CPU only projects (Spinhenge, POEM, etc.). That strikes me as something which could and should be fixed. Seems that few have observed this, or that any who have don't seem bothered by it -- at least on the developer side. Well, how do you test-out to see if a project has added support for new hardware, if you don't ask about this ocassionally? Currently the only way client knows this is to send a scheduler-request for work. And, since projects now can give upto 4 weeks deferral for a hardware-resource they don't currently have any application for, there isn't really any big reason to change client further. WCG is now using this new functionality, and last GPU-request included this in scheduler-reply: <cuda_backoff>604800</cuda_backoff> <ati_backoff>604800</ati_backoff> v6.10.7 immediately detected this, and next GPU-request to WCG is deferred for 7 days, as told by WCG's scheduling-server. Hmm, not sure, but it can look like the deferrals will be included in all Scheduler-replies, except then server is down. If so, as long as client is ocassionally connecting to ask for supported work or report results or send trickle-up, it will never ask for the unsupported work, except if you manually hits "update"... So, it's just for the various projects to add the neccessary functionality to their server, and choose upto a 4-week deferral...
	ID: 12727 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12730 - Posted: 25 Sep 2009 \| 1:38:00 UTC - in response to Message 12727.
	Well, how do you test-out to see if a project has added support for new hardware, if you don't ask about this ocassionally? Currently the only way client knows this is to send a scheduler-request for work. You use the publish model. Instead of having millions of requests constantly asking about something that may never happen ... you publish to the client new capabilities. If you need to make the client aware of a new capability ... But, the bottom line is, I really don't want a project deciding what it is going to run where ... and it is senseless for BOINC to be asking this "question" over and over and even this fix is lame ... they made a flawed decision and instead of acknowledging that, come up with this lame "well, the project can increase the back off" ... the client should not be asking in the first place. When the project comes up with a new capability the publish it in the news as they should and then I will make the adjustments in the settings on the site and THEN have my client start asking for work for the new resource. Today I decided to try the CUDA app at MW and when I turned on my machine to get some work it took I thin 6 to 8 CPU work requests before the client would ask for GPU work ... and I had done the update to tell the client that I would only be running GPU work on this machine ... as BarryAZ notes, this is something that does not make sense ... If you have done any systems work you know that the primary rule is that you do nothing that you do not have to ... you run no module, no test, no code that you do not absolutely need to run ... except in BOINC ... I am having a debate with JM VII on the mailing list about just this subject ... he is appalled that I would want to test his system for quality to make sure that it is returning valid results and in turn find out how fast it is besides ... and he considers that a complete waste ... But this death of a thousand cuts pinging on the servers and running RR SIM and other code in the client as often as 6 or more times a minute he is fine with ... even though there is no real need to do so ... and makes some of the resource scheduing bugs much more severe than they might other wise be ... Anyway ... it is a bad design decision ... but now that Dr. Anderson has had the idea it is going to linger like the smell of my spouse's dead fish ...
	ID: 12730 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12733 - Posted: 25 Sep 2009 \| 6:09:43 UTC - in response to Message 12730. Last modified: 25 Sep 2009 \| 6:10:45 UTC
	I posted this over on the BOINC client message board in response to Richard's comment of "the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application." I'd note that he ignores that the other thing I gain by moving back to say 6.4.5 is not only do I ready GET work from projects like POEM or Spinhenge since they only support CPU work and will fail a 'too quick' work fetch (which you generate when you ask for non-existent GPU work), but also, I don't extraneously ping the servers (as Paul noted). He's also ignoring that the current work fetch routine FAILS way too often. But as Paul noted, this no doubt is a losing effort regarding the client, and since the multi-project concept of BOINC (hello BOINC developers) means it makes sense to have multiple GPU and multiple CPU projects attached, I find myself compelled to use the later troublesome, noisy client unless I am either CPU only (5.10.45 is lovely there), or GPUGrid as a single CUDA project (6.4.5) is OK here UNLESS GPUGrid forces CUDA 2.3). Why the development folks won't simply set things up to read the user account settings and then have them stored locally on the workstation as part of the account_project file is now more a matter of developer ego than lack of feasibility. *** what posted over there ******* That might be what you want to do with 'default users' -- those who install, attach and don't do anything else. There used to be a LOT of those folks back in the old Seti@home days. To the extent that many of them are running BOINC today, I suspect they are using legacy clients and haven't changed much since they set up BOINC on their computers. But these days, those remaining in the BOINC population as active include a fair number of folks who actually have hardware configurations matched to the projects they attach to. These folks (including myself) support multiple projects -- some of which support 'everything' (MW as an example = though their CUDA and ATI support is only double precision' or Collatz -- which supports the broadest combination out there as well as CPU), projects which specifically are GPU only (like GPUGrid), as well projects which are CPU only (like Climate, POEM, Spinhenge, Malaria and a number of others. The thing is, where a project supports multiple hardware configurations, the user can configure the account for GPU or CPU or GPU and CPU. And for those with a broad range of hardware they can configure Home, Work, School -- for different combinations. I suppose on the project side, CUDA or ATI might be an additional choice that could help things there as well -- but ONLY if the client has the capability to do the account/local hardware check BEFORE it goes out to the project and asks. It just seems that while all the extra 'stuff' is going into the Client that some means to let the installed client glean that information and use it, thus reducing user observational frustration and project I/O traffic. But as I said, I'm probably just simply too stupid to see things the right way here. ***********
	ID: 12733 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12747 - Posted: 25 Sep 2009 \| 15:47:42 UTC - in response to Message 12733.
	I posted this over on the BOINC client message board in response to Richard's comment of "the only thing you gain by degrading to v6.6.33 or earlier is ignorance (lack of information in the logs). And the only thing you gain by degrading to v6.4.5 is the inability to run both CPU and GPU tasks for the same project/application." I would note that even with 6.10.x versions people are still having a hard time getting this to work properly if I am reading the messages correctly. Or to put it another way ... it still is not working as advertised. I'd note that he ignores that the other thing I gain by moving back to say 6.4.5 is not only do I ready GET work from projects like POEM or Spinhenge since they only support CPU work and will fail a 'too quick' work fetch (which you generate when you ask for non-existent GPU work), but also, I don't extraneously ping the servers (as Paul noted). He's also ignoring that the current work fetch routine FAILS way too often. I noted this in a post this AM to the Alpha list where, as a consequence of design choices, the support for multiple projects is slowly being compromised for reasons that are not entirely clear to me (well, maybe I am stupid, but I still don't see it). And options that would allow the participant more control have been rejected (like being able to say don't try to get more than one CPDN task at a time (though when in the last few days overlap would be acceptable) ... But as Paul noted, this no doubt is a losing effort regarding the client, and since the multi-project concept of BOINC (hello BOINC developers) means it makes sense to have multiple GPU and multiple CPU projects attached, I find myself compelled to use the later troublesome, noisy client unless I am either CPU only (5.10.45 is lovely there), or GPUGrid as a single CUDA project (6.4.5) is OK here UNLESS GPUGrid forces CUDA 2.3). I will note on the other hand that judgingby some of the complaints (historically) the way BOINC handles single projects is not that effective either ... though I have not been to the SaH boards in months ... :) That might be what you want to do with 'default users' -- those who install, attach and don't do anything else. There used to be a LOT of those folks back in the old Seti@home days. To the extent that many of them are running BOINC today, I suspect they are using legacy clients and haven't changed much since they set up BOINC on their computers. I talk to a guy who knows a bunch of people that were heavy crunchers in SETI@Home Classic and they never made the transition to BOINC. My hazy memory says that almost half of the raw processing power of the project never made the transition. They opted out so to speak. When you strip away all the noise the fundamental issue was that UCB/Dr. Anderson did not listen to them and their concerns. Note almost all of those people were of the large farm class. They had lots of machines and lots of power and we lost all of it ... Years later and it is the same ... Dr. Anderson and the cohorts at UCB may be smarter than me, they may be smarter than you ... but they are not smarter than all of us put together ...
	ID: 12747 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12749 - Posted: 25 Sep 2009 \| 18:27:26 UTC - in response to Message 12747.
	A follow up and my response -- from Richard (note his passing comment about the great DA at the end of his post) OK -- and sorry about the testiness of my replies as well -- it seems there is that classic 80% of agreement. With POEM -- it isn't so much the back off cycle (theirs is quite short - a couple of minutes and not progressive), but rather that subsequent requests by the client are STILL GPU only. POEM does not support GPU and like a number of projects does not appear to have the inclination or resources to develop a GPU application. With Spinhenge, the backoff cycle is a non-progressive 15 minutes. Curiously enough, the 6.10.x client doesn't appear to be repetitive about GPU requests there, just once and it reverts to CPU. Spinhenge is similarly not supporting or likely to support GPU. With GPUGrid, I found it curious that the request was for CPU -- again, the project is a 'single mode' project -- GPU and for that matter CUDA GPU only. I have seen queries for ATI GPU and CPU work in the client messages on the workstation -- that's just wrong to my way of looking at things. With the implementation of ATI GPU support (YES -- a GOOD thing), the matter gets a bit more complicated. In my view, ideally, at the account/preferences/resource share and graphics settings one should have the capability (for the default and by the three available groups) to control use to cover all the options -- instead of the current use GPU if available (Yes/No), Use CPU (Yes/no), there should be a USE ATI GPU (yes/no), USE CUDA GPU (yes/no), Use CPU (Yes/no). Different settings could be configured by the user for each group should they so wish (for example on Collatz which has the broadest support -- I'd have a use ATI GPU group and a use CUDA GPU group and so on and set my workstations to be part of the specific group which matches the hardware). These preferences should then get downloaded to the specific workstation to the account_project file and read by the client as a control for the type of work fetch it should use. When first adding a computer to a project, it would pull the settings off the default configuration which then could be changes by the user by switching the computer to the appropriate group. (It might be a nice 'advanced' feature when joining a new computer to a project where you are an existing user to specify which group the computer belongs in at the outset. The idea behind this is to have the client work fetch be targeted to getting work which matches the workstation configuration and not to waste time and cycles pinging the project servers -- many of which are currently stressed out with I/O traffic (SETI is not the only one running at (or over) the edge regarding I/O traffic). It would also calm the noise level of troublemakers like me (and Paul for that matter) <smile>. A portion of Richard's reply ************************** I don't know how long POEM makes you wait between requests, or why they've chosen to introduce the delay: are the delays following the twice-daily GPU ping (actually, presumably now four times a day, twice for CUDA and twice for ATI) sufficient to explain your work drought with v6.10.7 on their own? Or could their be other mechanisms at play, like the changing definitions of long term debt and 'overworked'? You'll probably need to get deep down and dirty with "work fetch debug" logging flags before you can explain exactly where the current mechanism is breaking down, and that's a necessary first step before fixing it. Overall, I agree with your general thrust that use/don't use resource switches should operate at the client level, project by project and under user control (I'm a great believer in giving users choice over how the resources they're donating are used), but it will take a concerted and well-documented effort to persuade David Anderson that this is the way forward. *****************************************************
	ID: 12749 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12755 - Posted: 26 Sep 2009 \| 3:04:49 UTC - in response to Message 12749. Last modified: 26 Sep 2009 \| 3:05:51 UTC
	OK -- and sorry about the testiness of my replies as well -- it seems there is that classic 80% of agreement. Um, did I give you the impression I thought you were being testy? My fault as that was not intended. I was just trying to amplify and clarify some points. So I am sorry you think I only agree with 80% of what you are saying ... heck, read my latest on the mailing list ... As to that last point ... Richard is dreaming ... it was a bad design choice, but because UCB never makes mistakes and they slapped in the change to the back-off it is unlikely that this "feature" is going to change ... ever ...
	ID: 12755 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12758 - Posted: 26 Sep 2009 \| 4:29:05 UTC - in response to Message 12755.
	Paul, I copied that message over from another message board - that 'testy' comment was to Richard -- his post riled me up and he apologized for its tone so I completed the loop and apoligised for my tone -- over there. Um, did I give you the impression I thought you were being testy? My fault as that was not intended.
	ID: 12758 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12759 - Posted: 26 Sep 2009 \| 4:33:28 UTC - in response to Message 12755.
	Paul -- no, you and I agree on this issue almost completely -- that 80% message was to Richard. I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. As to WHY DA can't/won't understand that -- well, my wife is the psychoanalyst, not me. I was just trying to amplify and clarify some points. So I am sorry you think I only agree with 80% of what you are saying ... heck, read my latest on the mailing list ... As to that last point ... Richard is dreaming ... it was a bad design choice, but because UCB never makes mistakes and they slapped in the change to the back-off it is unlikely that this "feature" is going to change ... ever ...
	ID: 12759 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12760 - Posted: 26 Sep 2009 \| 4:37:38 UTC - in response to Message 12755.
	Paul, here is an example of just how brain dead the current fetch routine isn note, this particular workstation has a 9800GT, and Collatz is configured for GPU only at the project level: 9/25/2009 9:34:21 PM Collatz Conjecture Sending scheduler request: To fetch work. 9/25/2009 9:34:21 PM Collatz Conjecture Requesting new tasks for CPU 9/25/2009 9:34:26 PM Collatz Conjecture Scheduler request completed: got 0 new tasks 9/25/2009 9:34:26 PM Collatz Conjecture Message from server: No work sent 9/25/2009 9:34:26 PM Collatz Conjecture Message from server: Your computer has no ATI GPU 9/25/2009 9:34:31 PM GPUGRID Sending scheduler request: To fetch work. 9/25/2009 9:34:31 PM GPUGRID Requesting new tasks for CPU 9/25/2009 9:34:36 PM GPUGRID Scheduler request completed: got 0 new tasks 9/25/2009 9:34:36 PM GPUGRID Message from server: No work sent 9/25/2009 9:34:36 PM GPUGRID Message from server: ACEMD beta version is not available for your type of computer. 9/25/2009 9:34:36 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
	ID: 12760 \| Rating: 0 \| rate: /

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12763 - Posted: 26 Sep 2009 \| 5:41:22 UTC
	It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. ____________ BOINC blog
	ID: 12763 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12766 - Posted: 26 Sep 2009 \| 5:54:41 UTC - in response to Message 12763.
	Yeah I noticed -- but a .8 then .9 bump within 24 hours doesn't give rise to confidence. The thing is, the condition causing the problem I posted here isn't considered a bug that should be fixed. It appears that DA LIKES that sort of situation. It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details.
	ID: 12766 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12770 - Posted: 26 Sep 2009 \| 6:07:24 UTC - in response to Message 12763.
	It didn't last very long. Superceeded by 6.10.9. See seperate message thread for details. The .9 release ONLY had minor adjustments for ATI configurations and even then there are considerable additional changes that ALSO need to be made to server and the science application(s) before it will be effective ... at least that is what Rom Said on the Collatz board (I think I saw you there ...) As to the rest... Ok, I got confused ... just so you know ... two lonely voices in the wilderness ... but I do think that Richard is one of the good guys ... Rom essentially says that this asking for CPU work or GPU work at the wrong places is not considered a bug at this time ... I pointed out that that *IS* the problem in a nutshell ... UCB does not consider lots of bugs to not be bugs ... As I pointed out in one of my list posts in the last day or so one of the problems is that neither Rom nor Dr. Anderson are heavy users of BOINC ... it is obvious from their comments about issues that this is still true and I know that at least as of last month or so ago (I forget exactly when I had the conversation with someone who is in a position to know) that there is no UCB "lab" where they actually run various versions of BOINC to see what it does ... and does not do ... I mean how am I supposed to take seriously a software development effort that does not appear to use the product that they are developing except casually?
	ID: 12770 \| Rating: 0 \| rate: /

BarryAZ Send message Joined: 16 Apr 09 Posts: 163 Credit: 920,275,294 RAC: 0 Level Scientific publications	Message 12800 - Posted: 26 Sep 2009 \| 18:06:53 UTC - in response to Message 12770.
	Further, and I have seen this in other non BOINC technical situations, so I know the IT <> User interaction can be 'suboptimal', I get the sense often enough that any problems active users have with the ever changing client iterations are because the users are not using the client the 'right' way (the 'right' way being the small mini-lab environment the developers apparently are using). Like I've said before, my increased 'noise level' on this has been that in the past I was insulated from much of the 'improvements' foisted on folks with newer client development policy. I didn't have GPU supported workstations -- so I went with the 5.10.45 client -- which works fine for Win2K and XP environments, work fetch is as expected there. When I added a couple of Vista workstations, I first went to the 6.18/6.19 client since they incorporated a change to allow the client to start with boot up and not fight the Vista 'protection scheme'. When I started adding some GPU support (9400GT, 9600GT, 9800GT, 250GS) and GPUGrid, I went to the 6.4.5 client -- it had some work fetch quirks, but generally handled things well enough. I tried the 6.6.36 client and got out of that quickly finding its work fetch schema deeply flawed, notwithstanding the gospel of Dave. I would have stayed 'dumb and happy' (the way developers often like to keep users) except for the changes over in Collatz -- with them marrying up support for low end video processors including ATI, along with a requirement for CUDA 2.3, I've been compelled to diddle with the 6.10.x series on a number of workstations in mixed project environments (I typically have 6 to 9 projects on a workstation with 4 to 6 of them being configured as CPU only -- or CPU only at the project level, and 2 to 3 configured as GPU only (or like GPUGrid GPU at the project level). This clearly is an environment with which the developers have very scarce experience or awareness. So, being forced to work with (or against) the 6.10.x client and it 'from above the developers directed force', I've take to 'railing against the machine' and joining you in a number of venues. But I'll not get in the additional channels -- from your comments, I already know the type of response I'd get. I do figure by posting in more open areas, I can seek to increase awareness of the issues amongst the rest of us hoi poloi. Sometimes the journey of a thousand miles begins with a couple of noisy posters <smile>. As I pointed out in one of my list posts in the last day or so one of the problems is that neither Rom nor Dr. Anderson are heavy users of BOINC ... it is obvious from their comments about issues that this is still true and I know that at least as of last month or so ago (I forget exactly when I had the conversation with someone who is in a position to know) that there is no UCB "lab" where they actually run various versions of BOINC to see what it does ... and does not do ... I mean how am I supposed to take seriously a software development effort that does not appear to use the product that they are developing except casually?
	ID: 12800 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12805 - Posted: 26 Sep 2009 \| 19:56:26 UTC - in response to Message 12800.
	Further, and I have seen this in other non BOINC technical situations, so I know the IT <> User interaction can be 'suboptimal', I get the sense often enough that any problems active users have with the ever changing client iterations are because the users are not using the client the 'right' way (the 'right' way being the small mini-lab environment the developers apparently are using). Which, if they had a lab one could even support that mode. If you look at the user stats on Willy's site you will see that the vast majority of participants >50% (IIRC, it has been awhile and it is not something I memorized) run only one project. when you look at small suites of less than 5 projects you cover nearly 90% of all participants. Logic would suggest that the primary focus of BOINC development would be to get it to work to make the single project (or single project with safety project(s)) type users most happy. From there work to make sure that BOINC works well with small suites of projects ... and lastly, to try to make sure that the 3,000 or so of us that run 50+ projects have an adequate tool. And, if the goal is to get more people to run more projects then this should be incentivized. Like a credit bonus based on the amount of cross-poject credit earned each month ... I need to work on that idea! :) Like I've said before, my increased 'noise level' on this has been that in the past I was insulated from much of the 'improvements' foisted on folks with newer client development policy. I didn't have GPU supported workstations -- so I went with the 5.10.45 client -- which works fine for Win2K and XP environments, work fetch is as expected there. The roots of the troubles go all the way back to when the first 4 CPU systems became available. I know, I found a scheduling anomaly and JM VII came up with a fix that was not allowed... we have gone down hill from there ... as more and more changes are piled on and more an more of the original concepts of how BOINC should work are tossed under the bus on account of "because"... When I added a couple of Vista workstations, I first went to the 6.18/6.19 client since they incorporated a change to allow the client to start with boot up and not fight the Vista 'protection scheme'. When I started adding some GPU support (9400GT, 9600GT, 9800GT, 250GS) and GPUGrid, I went to the 6.4.5 client -- it had some work fetch quirks, but generally handled things well enough. I tried the 6.6.36 client and got out of that quickly finding its work fetch schema deeply flawed, notwithstanding the gospel of Dave. I like that ... :) "Gospel" I would have stayed 'dumb and happy' (the way developers often like to keep users) except for the changes over in Collatz -- with them marrying up support for low end video processors including ATI, along with a requirement for CUDA 2.3, I've been compelled to diddle with the 6.10.x series on a number of workstations in mixed project environments (I typically have 6 to 9 projects on a workstation with 4 to 6 of them being configured as CPU only -- or CPU only at the project level, and 2 to 3 configured as GPU only (or like GPUGrid GPU at the project level). This clearly is an environment with which the developers have very scarce experience or awareness. Early to Mid year this year before I hit a two month low spell I demonstrated how some of the internals are being run as often as 6 times a minute on my systems and they are not the fastest or the "widest" though they are faster and wider than most. Add in projects with short run times and you have internal chaos where the running of the Resource Scheduler for all the triggers means that trivial reasons are the cause of the constant reordering of the work schedule. The saddest point is that with a 1 day queue and no project with a deadline less than three or four days in the future means that there is zero schedule pressure ... yet BOINC would go into panic after panic after panic ... The most pathetic thing is that JM VII keeps bringing up a project, now defunct, that had a 6 minute deadline as justification for this lunacy. And I say pathetic because with TSI being 60 minutes the tasks from this mythical project would cost, on average, 30 minutes of processing on running tasks because of preemptions. So, being forced to work with (or against) the 6.10.x client and it 'from above the developers directed force', I've take to 'railing against the machine' and joining you in a number of venues. But I'll not get in the additional channels -- from your comments, I already know the type of response I'd get. I do figure by posting in more open areas, I can seek to increase awareness of the issues amongst the rest of us hoi poloi. Sometimes the journey of a thousand miles begins with a couple of noisy posters <smile>. And I thank you for your support ... :) But history says that it will not matter in the slightest. Sadly the only thing that I think will save BOINC is when and if Dr. Anderson leaves ... I agree he had the one, or two, great idea(s), but that, to my mind, does not excuse the 10,000 blunders that followed ...
	ID: 12805 \| Rating: 0 \| rate: /

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,859,486,851 RAC: 9,977,254 Level Scientific publications	Message 12858 - Posted: 28 Sep 2009 \| 11:02:42 UTC - in response to Message 12759.
	I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. Barry, Check out the BOINC message board again. We have progress: it'll need server updates as well as a new client build, and I can already see some fine tuning needed during testing, but the direction of movement is positive. Berkeley is not deaf, merely hard of hearing!
	ID: 12858 \| Rating: 0 \| rate: /

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12872 - Posted: 28 Sep 2009 \| 19:14:59 UTC - in response to Message 12858.
	I really don't understand why an installed client can't read the user controlled settings on the local system regarding GPU or CPU for each project as part of the fetch. The user can change those settings at the project site and can control for differing workstation settings by using different groups for different workstation configurations. Barry, Check out the BOINC message board again. We have progress: it'll need server updates as well as a new client build, and I can already see some fine tuning needed during testing, but the direction of movement is positive. Berkeley is not deaf, merely hard of hearing! And it begs the question what made them change their minds ... in famous pigeon expiriments they rewarded randomly and the pigeons developed elaborate "dances" to get the food pellet ... because it is what works ... just as one of my dogs knows that if she scratches on the automatic door that that is what causes it to open... Now if they will start to address the myriad of other issues that are killing us ... like the strict GPU FIFO rule that negates resource share unless you run with no queue (or a very short one (0.1 days has been working, I have not increased it yet, maybe later this week)... What is saddest is that as best I can tell the FIFO rule was added because of the execution order issues caused by bugs that have since been addressed in 6.10.7 ... sigh ...
	ID: 12872 \| Rating: 0 \| rate: /