Advanced search

Message boards : Graphics cards (GPUs) : 6.6.28 is now official BOINC Recommended version

Author Message
TomaszPawel
Send message
Joined: 18 Aug 08
Posts: 121
Credit: 59,836,411
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9997 - Posted: 20 May 2009 | 11:43:03 UTC
Last modified: 20 May 2009 | 11:54:55 UTC

Hi!

6.6.28 is now official BOINC Recommended version

Windows 32bit

Windows 64bit

For Mac OS X Version 10.3.9+ 6.6.29


Change LOG since 6.6.20:

Changes for 6.6.21

- GUI RPC: client side: if parse a RESULT and CPU is nonzero but elapsed time is zero, we must be talking to an old client; set elapsed = CPU

- client: (unix): if host name lookup fails, call res_init().
This is an attempt to fix a problem on Linux where, if the client starts before a VPN is set up, it can never communicate

- Mac: Add -lresolv to XCode linker flags for client and manager

- Mac: MGR: add standard Preferences item under BOINC menu; add -lresolv to XCode linker flags for screensaver

- client: show project name in "backoff ended" msg

- SS: Under Mac Sandbox security, gfx_switcher launches default ss app as user and group boinc_master; don't setgid boincscr

- Mac: Add -lresolv to XCode linker flags for ss_app and boinccmd

- SS: Under Mac Sandbox security, terminate default screensaver graphics app via gfx_switcher

- WINSCR: It appears newer notebook models with multiple video chipsets exhibit an interesting situation. It appears as though in certain conditions a single monitor machine actually reports itself as having three monitors. Normally the monitor that contains the primary window (coord 0,0) is on monitor 0, but on these machines coord 0,0 is actually on monitor 2. This led to the screensaver not properly exiting when keyboard and/or mouse activity was detected. Now when we detect that keyboard and/or mouse activity has happened we send the WM_INTERRUPTSAVER event to all windows on all monitors.

- WINSETUP: When uninstalling, don't migrate the client data back to the 5.x location.

- WINSETUP: On some setups, how we were including the 'Everyone' well known security ID in the boinc_users group didn't work properly.

From now on include the 'Everyone' security ID in the various ACLs instead. This will probley clean up a wide range of various issues with multi-user installs.

- web and client: change the default for "run_gpu_if_user_active" from true to false.
Currently running CUDA apps on NVIDIA GPUs causes a significant slowdown in GUI response.

- client: we were setting config defaults after parsing cmdline.

This meant that the cmdline args that set config params weren't working:
--allow_multiple_clients
--report_results_immediately
--no_priority_change
--start_delay

- MGR: Fix compatibility problem with sizing of all-projects list in Attach Project Wizard on wxWidgets 2.8.8 or later

- XCode Project chenages created automatically by updating XCode to version 3.1.2

Changes for 6.6.22

- client: back out res_init() change; it didn't work

- Mac: build MGR with wxWidgets 2.8.10; Remove -lresolv from XCode linker flags

- Mac client: fill in command field of PROCINFO struct so <exclusive_app> log flag works properly on Mac.

- XCode Project changes created automatically by updating XCode to version 3.1.2

- client: for each app version, keep track of the largest WSS of tasks using it.
In checking whether tasks fit in RAM, use this as an estimate for tasks that haven't started yet.
This avoids a situation where the client starts a lot of tasks in sequence, only to find that each one doesn't fit in RAM.

- manager: show execution directory in task properties

- graphics API: add rotation arg to txf_render_string() (from Carl C.)

- Mac client: fill in command field of PROCINFO struct more efficiently

- graphics API: full-on 3D rotation of text

- WINSETUP: Be sure to define INSTALLDIR if it isn't already defined by the system or the transform. DATADIR was already being handled.

- WINSETUP: Save setup state at the end of the execution phase as well as the end of the UI phase.


Changes for 6.6.23

- client: for coproc jobs, don't start a job while a quit is pending. Otherwise the new job may fail on memory allocation.

- client: instead of scheduling coproc jobs EDF:
* first schedule jobs projected to miss deadline in EDF order
* then schedule remaining jobs in FIFO order

This is intended to reduce the number of preemptions of coproc jobs, and hence (since they are always preempted by quit) to reduce the wasted time due to checkpoint gaps.

- client: the CPU scheduling policy made use of the number of deadline misses in various places. This should include only the deadline misses of CPU jobs. So move "deadlines_missed" from RR_SIM_STATUS and PROJECT to RSC_PROJECT_WORK_FETCH so that we have separate counts for CPU and coproc jobs, and use the count for CPU jobs.

- GUI RPC: removed the rr_sim_deadlines_missed field from project descriptor. This is no longer meaningful, and it didn't seem to be used anywhere.

- GUI RPC and manager: send slot and show it in task properties rather than slot path (slot_path is defined only for apps with graphics app).

- client: put back the call to res_init() on lookup failure. Apparently it worked after all.

- client: Fix spelling mistake in Windows environment.

- client: for each app version, keep track of the largest WSS of tasks using it. In checking whether tasks fit in RAM, use this as an estimate for tasks that haven't started yet. This avoids a situation where the client starts a lot of tasks in sequence, only to find that each one doesn't fit in RAM.


Extra change for Mac OSX:

- Mac: Add -lresolv to XCode linker flags for client, manager boinccmd, screensaver; add #include of <resolv.h> to non-Windows network.cpp

Changes for 6.6.24

- client: eliminate the need to write the state file on each checkpoint.
Instead, write the info into a file in the slot directory, and check for these files on startup. This should reduce the overhead of state-file writing on machines with lots of cores. There will still be a flurry of writes each time a job finishes, but reducing that overhead would be a larger job.

- client: make sure we write the state file after a failed RPC.

- SS: launch default screensaver graphics app as user and group boinc_project, not boinc_master.

- Fix compiler errors ( From: Sascha Manns )

- Unix build: make it work if "diff" is missing (??)
from Michael Tughan

- Add ICU to the BOINC depends list. It is needed for SQLLite3 which will be needed for reading Firefox 3.x cookies.

- Mac MGR: Add keyboard shortcuts command-shift-S, command-shift-A to switch views

- client: fixed a crash caused by using %f to write working-set size into a fixed-size buffer. use %e instead. TODO: figure out why WSS was huge.

- fix app_plan crash (fixes #874)

- client: if detach a project, adjust debts and trigger CPU sched and work fetch.

- Mac MGR: Changes to build with full Unicode support with wxWidgets-2.8.10

- Fix GPL License

- MGR: Put keyboard shortcuts CTRL+SHIFT+S and CTRL+SHIFT+A in View Menu so their functionality is not hidden.

- client: improve CPU sched debug messages (say what kind of job and why we're scheduling it)

- client: log messages describing GPUs: one line per GPU; fixes #879

- client: new approach to handling multiple GPUs.

old: find fastest GPU, and pretend that others are the same.
Problem: other GPUs might be less capable, and not able to handle jobs sent by server.

new: find the most "capable" GPU, use others that are equivalent, don't use those that are not.
"Capable" is defined by

* compute capability (i.e., hardware version)
* driver version
* memory size
* FLOPs
in that priority order.

- client: fix crash bug in CUDA init

- client: When a preemptable task wasn't preempted (e.g. because it hadn't finished its time slice).
we were failing to mark it as scheduled.


Changes for 6.6.25

- client: message tweak

- client: tweak to 4/21 checkin.

After finding the "most capable" GPU, ignore FLOPS in deciding what GPUs are equivalent to it. This opens up the possibility that the client will get jobs that it won't be able to finish in time. But it still avoids getting jobs that will crash.

- fix typo in compare_cuda()

- client: show message when user does a project or task op (suspend, resume, update, etc.)

- client: add <use_all_gpus> config option. If set, use GPUs even if they're not equivalent to the most capable one.


Changes for 6.6.26

- Remove 10 corrupted languages which haven't seen an update since the conversion to SVN and updated them with current languages.


- removed outdated translation files; updated template

- Changes to get the client to build on IRIX:
don't use the variable name "sgi"; include <xxx.h> instead of <cxxx>; the latter just adds overloaded functions that we avoid.

- MGR: Turn GetViewName? into the unlocalized version of the view name, so the configuration group name is consistant across all languages and does not cause conversion issues on different platforms where the configuration information is treated differently when compiled Unicode vs. ANSI


Changes for 6.6.27

- removed outdated translation files; updated template

- client: view 2 GPUs as equivalent if their memory differs by <30%. (maybe their memory differed slightly from the most capable one)

- client: simplify enforce_schedule(), and maybe fix bugs.
New approach: take the "ordered_schedule_results" list, add running jobs that haven't finished their time slice, and order the result appropriately. Then run jobs in order until CPUs are filled. Simpler and clearer than the old way.

= client: fix compiler warning


Changes for 6.6.28

- client: enforce_schedule() wasn't starting GPU jobs

- Update Translations


Changes for 6.6.29 -for MAC

- MGR: Add comments and slightly reorder code for clarity

- client: write message (and show new config info) when config file reread

- client: improve cpu_sched_debug messages

- web translation: code wasn't handling multi-line tokens

- client, Mac: don't do res_init(). It causes a crash.

- client (Unix): if client crashes while benchmark processes are going, make sure they detect this and exit.

- Client: don't do res_init() on Mac, if client crashes during benchmarks, exit benchmarks on UNIX.

- Mac: Remove -lresolv from XCode linker flags for client, manager, boinccmd and screensaver

- Mac client: fix parent died test in benchmark_time_to_stop()

Based on this tread
____________
POLISH NATIONAL TEAM - Join! Crunch! Win!

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10009 - Posted: 20 May 2009 | 16:02:49 UTC

6.6.29 also released for Linux and Linux 64.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10021 - Posted: 21 May 2009 | 10:12:21 UTC

Not perfect, but better than 6.6.20 for sure. Can someone summarize which serious, confirmed problems people have to expect from this, compared to 6.5.0 / 6.4.7? Anything else than the "long term debts out of whack"?

MrS
____________
Scanning for our furry friends since Jan 2002

Renato
Send message
Joined: 22 Oct 08
Posts: 4
Credit: 5,819,617
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 10025 - Posted: 21 May 2009 | 11:00:50 UTC - in response to Message 10021.

Hello

all Boinc Manager after 6.6.20 suspends GPUGRID Tasks
runs only 8 CPU Tasks with QMC

System= windows7 64Bit I7-920 NV260

Renato

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10030 - Posted: 21 May 2009 | 11:44:10 UTC - in response to Message 10025.

link

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10038 - Posted: 21 May 2009 | 15:22:09 UTC

I guess you mean me? Well, if so:

** Opt in flag for GPU use is now Opt Out requiring a change to preferences to enable the use of a GPU

** LTD grows asymmetrically which will mean at some point the expected queue of work for either the GPU or CPU projects will suffer. This is dependent on the project mix but is more driven by the speed ratio between the GPU capabilities and the CPU's capabilities. My Q9300 with GTX280 took a couple of weeks to get out of sorts, while my i7 with two GTX295 cards lasts at most 4 days or so. In my case it is almost always the GPU side that stops filling the queue.

** The Resource Scheduler is more stable and respects Task Switch Interval (TSI) under most conditions but is still prone to put tasks into High Priority mode and preempt other running tasks inappropriately.

** There is a timing bug (more common on SaH) where the download of a task can cause the preemption of another running task and yet the preempting task immediately fails because it was not initialized (I hope I have summarized this one correctly, Richard Haselgrove reported this one) (New Trac 897: If a CUDA task is scheduled to run immediately, an existing CUDA task may be pre-empted to make way for it. A delay has been introduced to allow the preempted task to exit fully and release allocated memory. If this delay is invoked, the scheduled task is eventually called with a "Resume" instead of an "Initial" status, and fails because the data files are not available.)

** Work fetch can ask for the wrong classes of work from projects. Asking for CPU work from GPU Grid and GPU work from IBERCIVIS for example. The expectation is that over time the inappropriate requests will slow to one per day because of exponential back-off. Usual message may indicate work was requested but no work is available... but you have to turn on debug logs to see that the wrong class of work was requested. (New Trac 896: f a client has a CUDA work shortfall, but no CUDA project is fetchable, it may issue a CPU work fetch instead. If there is no CPU shortfall, the work fetch is issued for 0.00 seconds.)

** Duration Correction Factor (DCF) is not properly calculated under a variety of conditions. In the case of GPU Grid the Kashi class tasks that run for twice as long as normal will bias the DCF (though GDF indicated recent changes to the tasks)

** The new test for CUDA card compatibility may cause the second of any set of GPUs installed to be locked out. You have to use the CC Config flag to enable the use of all GPUs.

** Other projects may start or end tasks with behaviors that cause long delays in the task spinning up and these events may cause the BOINC Client to miss heartbeat signals and if it does the other running science applications will terminate. After the other task is running the other applications will be restarted by the BOINC Client. There is not a log event indicating that this has happened. The primary symptom is a message in completed tasks of "no heartbeat" in STDOUT.

** I have seen an install bug where the manager will not launch from the installer. Closing the error pop-up and exiting the installer and starting BOINC from my start button works. ( a reboot works too).

_hiVe*
Send message
Joined: 18 Feb 09
Posts: 12
Credit: 13,424,069
RAC: 15,818
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 10098 - Posted: 23 May 2009 | 18:18:14 UTC - in response to Message 10038.

Well, based on what Paul summarized, seems pretty crappy release to me. (again)
Btw, is it just me or generally all these late BOINC releases are seriously flawed? -.-

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10099 - Posted: 23 May 2009 | 18:47:46 UTC - in response to Message 10098.

Well, based on what Paul summarized, seems pretty crappy release to me. (again)
Btw, is it just me or generally all these late BOINC releases are seriously flawed? -.-

I could make the argument that almost all the BOINC versions have major flaws. Going all the way back. It is mostly an exercise in which flaws do you want to live with and is the version "good enough" for work...

My major complaint, such as it is, is that there is no serious effort to harness the abilities of the community to address the weaknesses and to put forth efforts to correct the flaws. Ageless in MW took me to task with an observation that I am unduly hard on Dr. Anderson and without him blah, blah ... (a paraphrase of what he said)

Yes, and no... If I call Dr. Anderson out it is because he has assumed for himself the mantle of the final arbiter of BOINC. When you do that ... you own it all ... the good, the bad, the indifferent ... Sadly, and in my personal opinion his leadership would not even make a good doorstop.

Back onto your point ... 6.6.28 is good enough to get work done. Its work fetch and work scheduling are still pretty bad. Saddest is that I don't see major efforts going into code changes that would correct the known issues.

_hiVe*
Send message
Joined: 18 Feb 09
Posts: 12
Credit: 13,424,069
RAC: 15,818
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 10101 - Posted: 23 May 2009 | 19:35:24 UTC - in response to Message 10099.
Last modified: 23 May 2009 | 19:37:03 UTC

Fortunately I don't concern myself all that much with it...:)
But still, it makes me wonder, why is it the way it is, why not iron out the "bugs" first and the open up the ground for new problems...
In the end however, based on insufficient information at my disposal (and the lack of effort to find out more^^) I reach the conclusion that perhaps there's more into it than what we volunteers can deduce.

Aah, yet it would be great to have a flawless platform that does not hinder the progress of the projects that are being run on it, hopefully we'll have such soon~

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10105 - Posted: 23 May 2009 | 20:25:34 UTC

To me the worst issue is that they don't factor in which project uses which compute ressource. Thus either GPU or CPU accumulates so much long term debt that the corresponding ressource runs dry. Sorry, but "rediculous" does not even get remotely close to describing this.
And leads me to the bold statement that anything built onto such a flawed fundament is not going to work well.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10112 - Posted: 23 May 2009 | 22:00:49 UTC - in response to Message 10105.

For at least the 6.6.20 version (which I'm getting ready to replace with 6.6.28 on my machine) if you use the BOINC Manager to looks at the properties of various projects, all of them currently have Non CPU intensive set to No. This looks like a likely name for a setting on which projects use mostly the GPU, so are you interested in looking more information on just what it is intended to do?

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10117 - Posted: 24 May 2009 | 0:14:44 UTC - in response to Message 10112.

For at least the 6.6.20 version (which I'm getting ready to replace with 6.6.28 on my machine) if you use the BOINC Manager to looks at the properties of various projects, all of them currently have Non CPU intensive set to No. This looks like a likely name for a setting on which projects use mostly the GPU, so are you interested in looking more information on just what it is intended to do?

Actually it is a setting that only indicates that the project does not use that much CPU. Granted that GPU Grid falls into that category, but not by the BOINC definition. The only projects running right now that is NCI are QCN and FreeHAL. To run QCN you have to have a laptop of the right kind with a accelerometer or a stationary machine for which you buy an accelerometer (OS supported are OS-X, Win, and Linux).

FreeHal is a project that is doing text search and clasification and has (most of the time, unless bugs crop in) a very low load.

Almere Grid and the Almere Test Grid both should be NCI projects but are not so classed.

Sadly Dr. Anderson has decreed that NCI projects can only have one task on the machine at a time (per project) and that you cannot queue work. FreeHAL has modified the scheduler to violate that rule so that you can run multiple tasks, but you still cannot effectively queue work because the Resource Scheduler in the client will always run all existing NCI tasks on the machine. So if you download 25 tasks at a time from FreeHal you will run all 25 at the same time.

Post to thread

Message boards : Graphics cards (GPUs) : 6.6.28 is now official BOINC Recommended version

//