New batch of QC tasks (QMML)

Message boards : Multicore CPUs : New batch of QC tasks (QMML)

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48356 - Posted: 13 Dec 2017 \| 17:30:24 UTC
	These are called QMML, and rather experimental (more dependencies). Let's see how they work.
	ID: 48356 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 48358 - Posted: 13 Dec 2017 \| 21:21:51 UTC
	Toni, I have one of these that has been running for over 2 hours and so far looks like it is only using one CPU (thread). It has 4 threads allocated to each task. There are also a number of warnings messages in the stderr.txt that look like this: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml/lib/python3.6/site-packages/tables/path.py:112: NaturalNameWarning: object name is not a valid Python identifier: '122'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though NaturalNameWarning) You should be able to see all of them when the task uploads. Let me know if you need more info.
	ID: 48358 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48359 - Posted: 13 Dec 2017 \| 22:10:31 UTC - in response to Message 48358. Last modified: 13 Dec 2017 \| 22:14:22 UTC
	Thanks, the warnings are expected and harmless. The thread allocation has some bug. I would have expected to use more threads than allocated, not less, but hey. I'll be debugging. Also I hope suspend/resume and the progress bar work (more or less), unlike the old "plain" QC tasks.
	ID: 48359 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48362 - Posted: 14 Dec 2017 \| 0:44:20 UTC
	Subj: Observation of QC CPU WU's Core Utilization Regarding the last go around several weeks ago with the QC CPU WU's that ran successfully on my 8 and 4 core cpu's, the following was observed. 4-core (Phenom II 3 GHz) cpu: core utilization was consistantly near 100% all 4 cores. Work completed in about 40 minutes per WU calender time. 8-core (FX-8350 4 GHz) cpu's (2): 4-cores (alternatively) were utilized at 100% with the remaining cores at much lesser utilization. Work completed with about 20 minutes real time per WU. With the most recent QC WU's (12/13/2017), one FX-8350 errors out consistantly and will not run the WU's (maybe it requires software or drivers not currently installed) and the other FX-8350 runs them but at a substanially less core utilization rate than the prior WU batches. So far, the first WU in progress is currently less than 40% complete with about 2 hours calender time invested. Not sure if these recent WU's are of the same length as the earlier ones however. I have uploaded photos of ksysguard graphic cpu utilization that can be observed if interested with addresses below. Basically, as I am sure that this is a work in progress and bugs need to be resolved but I would conclude so far that these WU's process efficiently on a 4-core system but that all 8-cores should be fully utilized to make it worth while sacrificing 8-cores to a single WU when 8 WU's from other projects use all 8 at 100% thereby being much more efficient. Haven't tried the suspend/resume yet but will when the opprotunity is available. The previous poster appears correct re thread utilization being one issue. Screenshots: http://members.toast.net/obc/computing/grid_computing/images/QC_4-core_cpu.png http://members.toast.net/obc/computing/grid_computing/images/QC_8-core_cpu.png http://members.toast.net/obc/computing/grid_computing/images/QC-12-13-2017.png (Sorry, crude way to present photos but wanted to get them up for anyone interested)
	ID: 48362 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48363 - Posted: 14 Dec 2017 \| 0:52:44 UTC Last modified: 14 Dec 2017 \| 0:54:04 UTC
	Mine is also just a bit less than 1 thread with 7 available for CPU usage. Good thing I have another client available to keep the CPU busy. If the progress bar is correct it will take about 9.5 hours on 1950x at 3.75 GHz.
	ID: 48363 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48364 - Posted: 14 Dec 2017 \| 8:44:09 UTC - in response to Message 48363.
	The next batch (QMML313a) should respect the number of threads requested by your client.
	ID: 48364 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48365 - Posted: 14 Dec 2017 \| 11:26:38 UTC
	The Multiple threaded work units that were sent out last month worked fine for me with no issues. These new ones however are all failing (so far on two different 64bit Linux machines) with this error ERROR conda.core.link:_execute_actions(337): An error occurred while installing package 'psi4::gcc-5-5.2.0-1'. LinkError: post-link script failed for package psi4::gcc-5-5.2.0-1 running your command again with `-v` will provide additional information location of failed script: /home/Conan/BOINC/projects/www.gpugrid.net/miniconda/envs/qmml/bin/.gcc-5-post-link.sh When checking this path I found that there is nothing in the /envs/ folder, which is probably where the job is failing. Conan
	ID: 48365 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48366 - Posted: 14 Dec 2017 \| 11:58:44 UTC
	Hmm 1st one completed for me and the 2nd one is at around 86%.
	ID: 48366 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 48370 - Posted: 14 Dec 2017 \| 19:36:39 UTC
	Toni said, The next batch (QMML313a) should respect the number of threads requested by your client. It looks like this one does respect the number of threads requested by my client. My app_config specifies 4 threads and it looks to be using 4 threads. Let me know if you need more info.
	ID: 48370 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48371 - Posted: 14 Dec 2017 \| 19:56:33 UTC
	I am not able to get QC on my Ryzen 1700 machine running Ubuntu 17.10. I just get a "No tasks are available for Quantum Chemistry" message when I request them. However, I am able to get QC on my i7 3770 machine running Ubuntu 16.04 (both machines have BOINC 7.8.3). Both machines are set to the same profile (work), so they should be treated identically. But I see that some people with AMD machines get work. Is this a bug or a feature?
	ID: 48371 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48373 - Posted: 14 Dec 2017 \| 23:43:10 UTC - in response to Message 48371.
	The reason why otherwise similar machines get/do not get work completely baffles me. I don't think it's related to the maker of the CPU. Perhaps with the history of tasks/host reliability or somesuch. With this respects BOINC is of no help.
	ID: 48373 \| Rating: 0 \| rate: / Reply Quote

bcavnaugh Send message Joined: 8 Nov 13 Posts: 56 Credit: 1,002,640,163 RAC: 0 Level Scientific publications	Message 48374 - Posted: 15 Dec 2017 \| 0:11:25 UTC - in response to Message 48356. Last modified: 15 Dec 2017 \| 0:12:32 UTC
	These are called QMML, and rather experimental (more dependencies). Let's see how they work. I would really like to get some tasks but seems they are not being given out ATM http://www.gpugrid.net/show_host_detail.php?hostid=457056 Been Trying for awhile now. Intel(R) Core(TM) i7-3970X ____________ Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.
	ID: 48374 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48375 - Posted: 15 Dec 2017 \| 2:10:38 UTC - in response to Message 48373.
	The reason why otherwise similar machines get/do not get work completely baffles me. I have seen many instances of it myself over the years, but had hoped the latest BOINC clients were past that. Unfortunately not.
	ID: 48375 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48376 - Posted: 15 Dec 2017 \| 2:54:01 UTC Last modified: 15 Dec 2017 \| 3:00:20 UTC
	My 1950x also on 17.10 is getting tasks. Were the old 3.13 tasks producing bad data as my processing time was just wasted by the server canceling them in the middle of processing them? Thats the absolute worst thing a project admin can do. Cancel ones not started but don't ever take a crap on donated resources. Its still not working right. In 22.5min of the task running it has used 1:37min of CPU time when the task is limited to 3 cores. That's over 4 cores of CPU usage. And they just had a computation error. At least 3.13 worked. http://www.gpugrid.net/result.php?resultid=16767178 Prob a good thing tasks are being sent to some AMD CPUs. Damn seg fault.
	ID: 48376 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,719,275,486 RAC: 2,163,447 Level Scientific publications	Message 48378 - Posted: 15 Dec 2017 \| 4:25:33 UTC
	Most of the Quantum Chemistry v3.14 (mt) fail on my AMD 1700x. v3.13 (mt) worked more or less. As an example: http://www.gpugrid.net/result.php?resultid=16767309 I use an app_config to limit the use to 4 cores for each work unit (WU) and runs 3 WUs in parallel. Two cores are reserved for the GPU. I had to change the configuration to accept only GPU Work Unites as the computer crashed twice today. Hope this helps.
	ID: 48378 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48379 - Posted: 15 Dec 2017 \| 8:58:27 UTC - in response to Message 48378. Last modified: 15 Dec 2017 \| 8:59:36 UTC
	I understand that seing cancelled WUs is not nice, but it saves future crunching and network bandwidth (both server and client) that would otherwise be lost. Also, I thought that the function we use only cancelled UNSENT or un started wus.
	ID: 48379 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 48380 - Posted: 15 Dec 2017 \| 15:20:57 UTC Last modified: 15 Dec 2017 \| 15:24:44 UTC
	Hey friends, in case you need some more machines for testing, I can set up another one that is Linux based. As I have seen in the below conversation, there might be some issues with the CPU type. So .. I have both brands available for testing. Which one would help you most, Intel or AMD? If you want me to, I could even let you choose the generation. From older Sandy/Ivy Bridge to new Skylake to Ryzen. Lust let me know and I will make one available on short notice. Edit: I can even contribute a very slow Celeron or Pentium, if that would give you some information on how older and slower systems will perform later... as there still are many older units out there. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 48380 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48381 - Posted: 15 Dec 2017 \| 16:15:44 UTC
	Quite a few errors on 3.14 so I stopped running QC. Seg faults on AMD and Intel machines.
	ID: 48381 \| Rating: 0 \| rate: / Reply Quote

bcavnaugh Send message Joined: 8 Nov 13 Posts: 56 Credit: 1,002,640,163 RAC: 0 Level Scientific publications	Message 48383 - Posted: 15 Dec 2017 \| 16:43:14 UTC
	So I did get some Tasks but it seems that only AMD Processors can run them. http://www.gpugrid.net/show_host_detail.php?hostid=179948 A nice setting in the Preference for this Tasks would be to allow us to set the number of Cores; Like on a 32 Core Host you could set 2 Tasks running 16 Cores Each. Or even 2 Tasks running 8 Cores Each. ____________ Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.
	ID: 48383 \| Rating: 0 \| rate: / Reply Quote

el_gallo_azul Send message Joined: 14 Jun 14 Posts: 9 Credit: 28,094,797 RAC: 0 Level Scientific publications	Message 48384 - Posted: 16 Dec 2017 \| 8:20:48 UTC Last modified: 16 Dec 2017 \| 8:21:40 UTC
	I received ~60 new WUs yesterday, but I didn't see what happened with them. I was surprised when I went back to the computer an hour or so later and they had all disappeared. I received another batch of ~60 WUs today, and this time I see that they all resulted in "Computation error". Intel Xeon E5-2680 x 2 (ie. 32 hyperthreading cores).
	ID: 48384 \| Rating: 0 \| rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 41 Credit: 88,126,864 RAC: 833 Level Scientific publications	Message 48386 - Posted: 16 Dec 2017 \| 11:53:50 UTC
	I'm not getting any WUs, neither on my AMD, nor on my Intel CPUs.
	ID: 48386 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48388 - Posted: 16 Dec 2017 \| 17:22:23 UTC - in response to Message 48386.
	Let me summarize the current status. We are making tests in view of a large production run. The WUs which are out now are called QMML314long and last several hours. This longer test have a couple of new failure modes which I think are related to restarts, and can be fixed. Another different problem is task distribution by the BOINC scheduler. First of all, as said above, some hosts are ignored for no reason I can fathom. Another is that some hosts are "soaking up" dozens of WUs, which means they are not available to others. I am hoping that both problems will sort out by themselves with a sufficiently large batch. Final notes: (a) CPU maker is irrelevant. (b) disappeared WUs were tests which I cancelled from the server.
	ID: 48388 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48390 - Posted: 17 Dec 2017 \| 2:57:50 UTC - in response to Message 48365. Last modified: 17 Dec 2017 \| 2:58:31 UTC
	The Multiple threaded work units that were sent out last month worked fine for me with no issues. These new ones however are all failing (so far on two different 64bit Linux machines) with this error ERROR conda.core.link:_execute_actions(337): An error occurred while installing package 'psi4::gcc-5-5.2.0-1'. LinkError: post-link script failed for package psi4::gcc-5-5.2.0-1 running your command again with `-v` will provide additional information location of failed script: /home/xxxxxxxx/BOINC/projects/www.gpugrid.net/miniconda/envs/qmml/bin/.gcc-5-post-link.sh When checking this path I found that there is nothing in the /envs/ folder, which is probably where the job is failing. Conan Just an update as I am still am getting these errors, this is the full error CondaValueError: prefix already exists: /home/xxxxxxxx/BOINC/projects/www.gpugrid.net/miniconda/envs/qmml ERROR conda.core.link:_execute_actions(337): An error occurred while installing package 'psi4::gcc-5-5.2.0-1'. LinkError: post-link script failed for package psi4::gcc-5-5.2.0-1 running your command again with `-v` will provide additional information location of failed script: /home/xxxxxxxx/BOINC/projects/www.gpugrid.net/miniconda/envs/qmml/bin/.gcc-5-post-link.sh ==> script messages <== <None> Attempting to roll back. LinkError: post-link script failed for package psi4::gcc-5-5.2.0-1 running your command again with `-v` will provide additional information location of failed script: /home/xxxxxxxx/BOINC/projects/www.gpugrid.net/miniconda/envs/qmml/bin/.gcc-5-post-link.sh ==> script messages <== <None> Traceback (most recent call last): File "pre_script.py", line 20, in <module> raise Exception("Error installing psi4 dev") Exception: Error installing psi4 dev 10:18:33 (23979): $PROJECT_DIR/miniconda/bin/python exited; CPU time 69.668408 10:18:33 (23979): app exit status: 0x1 10:18:33 (23979): called boinc_finish(195) Conan
	ID: 48390 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48392 - Posted: 17 Dec 2017 \| 14:04:51 UTC - in response to Message 48390.
	@conan: do you have "gcc" installed in your system? If not, can you try to install it?
	ID: 48392 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48393 - Posted: 17 Dec 2017 \| 15:15:43 UTC - in response to Message 48392. Last modified: 17 Dec 2017 \| 15:18:09 UTC
	@conan: do you have "gcc" installed in your system? If not, can you try to install it? It was installed on on one computer with Fedora 25, but was not installed on the other two with Fedora 16 and Fedora 21, all 64 bit. Have installed now and await to see what happens. Versions range from 4.6.3-2 (Fedora 16), 4.9.2-6 (Fedora 21) to 6.4.1-1 (Fedora 25). Thanks Conan
	ID: 48393 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48394 - Posted: 17 Dec 2017 \| 19:28:37 UTC - in response to Message 48388.
	Another different problem is task distribution by the BOINC scheduler. First of all, as said above, some hosts are ignored for no reason I can fathom. Another is that some hosts are "soaking up" dozens of WUs, which means they are not available to others. I am hoping that both problems will sort out by themselves with a sufficiently large batch. Final notes: (a) CPU maker is irrelevant. (b) disappeared WUs were tests which I cancelled from the server. Yes! I just got some QC on my Ryzen 1700. All good things come to those that wait. (The first four errored out after a couple of minutes, but the fifth one is running fine after 50 minutes and I think it will fly, running two cores on each WU.)
	ID: 48394 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48395 - Posted: 17 Dec 2017 \| 19:36:10 UTC - in response to Message 48394. Last modified: 17 Dec 2017 \| 19:37:00 UTC
	I may have understood the problem of hosts not getting WUs. I was sending tasks at a high priority, which means they crossed the threshold to only go to "reliable hosts" -- a questionable heuristic. 100 tasks named "s*-QMML314long" I made at a lower priority seem to have been sent quickly.
	ID: 48395 \| Rating: 0 \| rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 41 Credit: 88,126,864 RAC: 833 Level Scientific publications	Message 48396 - Posted: 17 Dec 2017 \| 20:07:46 UTC
	I got my first WU today. Unfortunately the WU needs 4,7 GB of ram. Can you optimise that?
	ID: 48396 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48397 - Posted: 17 Dec 2017 \| 23:41:55 UTC - in response to Message 48393.
	@conan: do you have "gcc" installed in your system? If not, can you try to install it? It was installed on on one computer with Fedora 25, but was not installed on the other two with Fedora 16 and Fedora 21, all 64 bit. Have installed now and await to see what happens. Versions range from 4.6.3-2 (Fedora 16), 4.9.2-6 (Fedora 21) to 6.4.1-1 (Fedora 25). Thanks Conan The Fedora 16 host still has the same error, but the Fedora 21 host is processing a work unit now for the last 8 hours 21 minutes and 68% done, so it looks good at this point. My Fedora 25 host has not received any work yet so can't say about that one. My WU is using 1.5 GB of RAM. Thanks Conan
	ID: 48397 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,719,275,486 RAC: 2,163,447 Level Scientific publications	Message 48398 - Posted: 18 Dec 2017 \| 2:58:48 UTC - in response to Message 48395.
	I may have understood the problem of hosts not getting WUs. I was sending tasks at a high priority, which means they crossed the threshold to only go to "reliable hosts" -- a questionable heuristic. 100 tasks named "s*-QMML314long" I made at a lower priority seem to have been sent quickly. This solved it for my second computer. Works on a USB Stick with Lubuntu 17.04. Unfortunatelly, crashed: http://www.gpugrid.net/result.php?resultid=16776102
	ID: 48398 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48400 - Posted: 18 Dec 2017 \| 5:31:13 UTC - in response to Message 48397.
	@conan: do you have "gcc" installed in your system? If not, can you try to install it? It was installed on on one computer with Fedora 25, but was not installed on the other two with Fedora 16 and Fedora 21, all 64 bit. Have installed now and await to see what happens. Versions range from 4.6.3-2 (Fedora 16), 4.9.2-6 (Fedora 21) to 6.4.1-1 (Fedora 25). Thanks Conan The Fedora 16 host still has the same error, but the Fedora 21 host is processing a work unit now for the last 8 hours 21 minutes and 68% done, so it looks good at this point. My Fedora 25 host has not received any work yet so can't say about that one. My WU is using 1.5 GB of RAM. Thanks Conan This WU on the Fedora 21 host worked and completed successfully, my first of this batch. Conan
	ID: 48400 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48401 - Posted: 18 Dec 2017 \| 7:48:05 UTC - in response to Message 48398.
	@klepel - can you try installing gcc (if not already there)? tks
	ID: 48401 \| Rating: 0 \| rate: / Reply Quote

Dayle Diamond Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level Scientific publications	Message 48402 - Posted: 18 Dec 2017 \| 9:56:27 UTC
	https://drive.google.com/file/d/1bKmSXT4IAVTR8b-fpiGdC6Gm4szduk0X/view?usp=sharing Running one now. 1950x, Linux 17.10. Average time taken is two hours, fifteen minutes per task. At the time of the screenshot, the work unit is around fourty percent done. I'm watching my CPU usage hit 100%, stay there for a while, then...waves. I don't think it's thermal throttling. It's not overclocked and WCG tasks only make those patterns when tasks are starting/finishing. If it's working as intended, ok.
	ID: 48402 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48403 - Posted: 18 Dec 2017 \| 10:10:17 UTC - in response to Message 48401. Last modified: 18 Dec 2017 \| 10:20:18 UTC
	What are the requirements for the running of Psi4? My Fedora 16 computer after installing "gcc" is still getting the same error that it failed to install, but the Fedora 21 computer is now running fine. Is there a certain "glibc", "gcc" or Linux kernel that is required to install this programme? My older Fedora 16 install may not meet the requirements perhaps? I still can't get any work on my Intel Xeon running Fedora 25, keeps saying that there is no work available when in fact there is, but that is another issue. Thanks Conan
	ID: 48403 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48405 - Posted: 18 Dec 2017 \| 12:44:38 UTC - in response to Message 48403.
	@dayle - oscillating CPU% is expected and due to the parts of the calculation which are not parallelized. Thermal throttling is unlikely imho (and I imagine it would manifests as a decrease in CPU clock, not CPU%). @conan - in principle a system with GLIBC>=2.14 should be capable to run; Fedora 16 seemed to have it but it is old, so probably something else is missing. Sorry. I've made another thread with information which may be useful.
	ID: 48405 \| Rating: 0 \| rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 41 Credit: 88,126,864 RAC: 833 Level Scientific publications	Message 48406 - Posted: 18 Dec 2017 \| 13:18:30 UTC - in response to Message 48397.
	My WU is using 1.5 GB of RAM. 18-Dec-2017 14:07:22 [GPUGRID] Quantum Chemistry needs 4768.37 MB RAM but only 3469.24 MB is available for use.
	ID: 48406 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48407 - Posted: 18 Dec 2017 \| 13:33:47 UTC - in response to Message 48406. Last modified: 18 Dec 2017 \| 13:34:25 UTC
	We are talking about 3 different memory use figures: A. The amount of memory actually used (which varies with time), which Conan measuread as 1.5 GB B. The amount "requested" by the workunit, currently 4 GB C. The maximum amount your boinc client allows to use (you can configure this to some extent) The following should hold: A < B < C In your case, B>C and therefore the WU was not allowed to start (I guess).
	ID: 48407 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48409 - Posted: 18 Dec 2017 \| 14:23:14 UTC - in response to Message 48405.
	@dayle - oscillating CPU% is expected and due to the parts of the calculation which are not parallelized. Thermal throttling is unlikely imho (and I imagine it would manifests as a decrease in CPU clock, not CPU%). That one is a little confusing. The remaining time also increases as a consequence. I think I aborted some unnecessarily when it appeared that they were stuck. I now just let them run. Maybe you should make a big sticky on it to catch people's attention?
	ID: 48409 \| Rating: 0 \| rate: / Reply Quote

langfod Send message Joined: 15 Dec 17 Posts: 2 Credit: 5,577,735 RAC: 0 Level Scientific publications	Message 48413 - Posted: 18 Dec 2017 \| 17:50:23 UTC
	Did tasks just get aborted by the system? Name s51-TONI_QMML314long-0-1-RND1523_1 Workunit 12930407 Exit status 202 (0xca) EXIT_ABORTED_BY_PROJECT <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> aborted by project - no longer usable</message> <stderr_txt>
	ID: 48413 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48414 - Posted: 18 Dec 2017 \| 18:02:01 UTC - in response to Message 48413. Last modified: 18 Dec 2017 \| 18:03:03 UTC
	Did tasks just get aborted by the system? Yes. I just had a bunch aborted at 13:44 UTC. But there are now new ones in the pipeline.
	ID: 48414 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48415 - Posted: 18 Dec 2017 \| 18:32:13 UTC - in response to Message 48414. Last modified: 18 Dec 2017 \| 18:32:27 UTC
	Can you please confirm that those WUs were cancelled while running and not just while waiting to start?
	ID: 48415 \| Rating: 0 \| rate: / Reply Quote

langfod Send message Joined: 15 Dec 17 Posts: 2 Credit: 5,577,735 RAC: 0 Level Scientific publications	Message 48416 - Posted: 18 Dec 2017 \| 18:41:43 UTC - in response to Message 48415.
	All I can tell is that the one task I had running was a couple hours from completion the last I looked. Then I checked the task list and saw the cancellation: 16776352 12930407 457243 18 Dec 2017 \| 6:16:48 UTC 18 Dec 2017 \| 14:17:56 UTC Cancelled by server 28,813.53 238,856.00 --- Quantum Chemistry v3.14 (mt)
	ID: 48416 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48417 - Posted: 18 Dec 2017 \| 19:11:56 UTC - in response to Message 48415. Last modified: 18 Dec 2017 \| 19:17:36 UTC
	Can you please confirm that those WUs were cancelled while running and not just while waiting to start? On my i7-4770 machine, there were 13 aborted at 13:43:51 UTC. Twelve of them show 0 elapsed time, but the other one shows 05:02:06 (19:52:01) elapsed time. They are all listed as "cancelled by server". And on an i7-3770 machine, three of them completed just after that, at 13:45:09 UTC, after running for around 24 hours or more each, and all show "cancelled by server". Finally, on my Ryzen 1700 machine, two of them completed at 13:52:55 UTC and show "cancelled by server" after running about 18 to 19 hours. So it works. EDIT: But BoincTasks shows the i7-4770 and the Ryzen 1700 machines as "Reported: OK+", so it is only on the GPUGrid status page that the true story is told apparently.
	ID: 48417 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48418 - Posted: 18 Dec 2017 \| 19:30:51 UTC - in response to Message 48417.
	It's true that running tasks are being killed. This is not what I expected. By the way: these WUs should not run 10+ hours on modern CPUs. That's strange.
	ID: 48418 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48419 - Posted: 18 Dec 2017 \| 20:59:20 UTC - in response to Message 48418.
	By the way: these WUs should not run 10+ hours on modern CPUs. That's strange. The i7-3770 machine and the Ryzen 1700 machine were running only 2 cores per work unit, while the i7-4770 was running 4 cores per work unit.
	ID: 48419 \| Rating: 0 \| rate: / Reply Quote

Dayle Diamond Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level Scientific publications	Message 48420 - Posted: 18 Dec 2017 \| 22:21:25 UTC
	It's true that running tasks are being killed. This is not what I expected. So far 76 tasks on my machine have been canceled. Please continue killing any task in progress if you don't want the data. No point squandering precious CPU cycles when the science/programming has moved on to a newer revision. Happy Holidays!
	ID: 48420 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48421 - Posted: 19 Dec 2017 \| 3:17:38 UTC - in response to Message 48415.
	Can you please confirm that those WUs were cancelled while running and not just while waiting to start? Yes I had one that had been running for 33,977 seconds (CPU time 140,002 seconds) and it was cancelled, as well as 2 that had not started. Just an aside to my Fedora 16 Host problems running these work units that are all failing, it is running 'gcc' 4.6. I did some reading on Psi4 and found that it seems to need gcc 4.9 or later in order to run. I have since installed this 'gcc' version on that computer and am awaiting a work unit to see if it works or not. There may still be something missing. I may just have to update Fedora 16 to something more recent. Conan
	ID: 48421 \| Rating: 0 \| rate: / Reply Quote

Petr Kriz Send message Joined: 22 Feb 09 Posts: 3 Credit: 114,900 RAC: 0 Level Scientific publications	Message 48422 - Posted: 19 Dec 2017 \| 8:05:25 UTC
	I have still problems with task miniconda-installer reached time limit 360. Tried 4 tasks today with same result (other 2 task I cancelled). Have standard Fedora 26, nothing special. I don't think the problem is in firewall or slow connection (as suggested in another thread). Is miniconda-installer really downloading something? I rather think, that there is something wrong with installation of files already downloaded on hard-drive. I think I will now wait some time, and come back later (maybe month or two). Hopefully, it will be resolved.
	ID: 48422 \| Rating: 0 \| rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 158 Credit: 388,132 RAC: 0 Level Scientific publications	Message 48424 - Posted: 19 Dec 2017 \| 9:16:06 UTC
	I downloaded 3 wus 3.14 on my vbox linux. They don't start....."Waiting to run". No message on boinc manager.
	ID: 48424 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48425 - Posted: 19 Dec 2017 \| 10:28:30 UTC - in response to Message 48422.
	I have still problems with task miniconda-installer reached time limit 360. Tried 4 tasks today with same result (other 2 task I cancelled). Have standard Fedora 26, nothing special. I don't think the problem is in firewall or slow connection (as suggested in another thread). Is miniconda-installer really downloading something? I rather think, that there is something wrong with installation of files already downloaded on hard-drive. I think I will now wait some time, and come back later (maybe month or two). Hopefully, it will be resolved. Check that SELINUX is not blocking any files from running. I had this problem on my Fedora 25 install and had to create an exception for it. Also make sure your 'gcc' packages are up to date dnf install gcc, or dnf install gcc-c++, should help if you haven't already done so. Conan
	ID: 48425 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48426 - Posted: 19 Dec 2017 \| 10:32:19 UTC - in response to Message 48424. Last modified: 19 Dec 2017 \| 10:33:32 UTC
	@petr - miniconda is downloaded from our servers (~50 MB). After that, at the beginning, psi4 and other packages are downloaded from Anaconda's servers (only the first time). If you suspect a mixup, feel free to "reset" the GPUGRID project and everything should be deleted (and downloaded again at the next WU). Beware that it would kill running tasks! @conan - in principle part of the run is indeed to download its dependencies from Anaconda, including a GCC 5 version which is installed in the project's directory. However, to complete its installation, a library is needed which is generally shipped with... the system's GCC. It's indeed a bit confusing. Maybe your tweak solves the problem, maybe not.
	ID: 48426 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48428 - Posted: 19 Dec 2017 \| 12:16:24 UTC Last modified: 19 Dec 2017 \| 12:22:40 UTC
	@ Toni, when these work units run do they get to a certain point then just idle along on a single core for hours on end? I finally got a work unit to download to my Fedora 25 host, and it ran fine up to about 6 hours or so run time and 78.698% completed. After this it has been running on a single core for almost 8 hours now and the progress is still locked at 78.698%. The time to completion has increased from 1 hour 39 minutes to 3 hours 44 minutes and counting. What happened to the Multi-Threading I thought these work units were supposed to do? Run time is now approaching 14 hours and the % done has not moved, this is on a 16 core computer. Conan
	ID: 48428 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48429 - Posted: 19 Dec 2017 \| 12:47:20 UTC - in response to Message 48428.
	Run time is now approaching 14 hours and the % done has not moved, this is on a 16 core computer. I seem to recall problems on LHC/ATLAS when running on more than 8 cores, though I was not involved with the problem myself as I run only 7 cores there anyway. But you could try an app_config.xml to limit it to 8 cores.
	ID: 48429 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48430 - Posted: 19 Dec 2017 \| 13:14:27 UTC - in response to Message 48428.
	The computation is done looping over several molecules (~60 if i remember correctly). A checkpoint is written after each loop. Inside a loop there is a part which is multithreaded, and a part which is not. The relative sizes are different. So it's not strange that thread occupancy oscillates. Limiting the number of cores to, say, 4, via the client is ok.
	ID: 48430 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48431 - Posted: 19 Dec 2017 \| 14:25:37 UTC - in response to Message 48429.
	Run time is now approaching 14 hours and the % done has not moved, this is on a 16 core computer. I seem to recall problems on LHC/ATLAS when running on more than 8 cores, though I was not involved with the problem myself as I run only 7 cores there anyway. But you could try an app_config.xml to limit it to 8 cores. Someone did tests there. Atlas runs best around 3/4/5 threads. More threads are not utilized very well. I don't recall any mt BOINC app utilizing all threads at 8+. It would probably have to be a straight math project that calculates more #s in parallel to do that.
	ID: 48431 \| Rating: 0 \| rate: / Reply Quote

Petr Kriz Send message Joined: 22 Feb 09 Posts: 3 Credit: 114,900 RAC: 0 Level Scientific publications	Message 48433 - Posted: 19 Dec 2017 \| 16:20:43 UTC
	@Toni, Conan: Thanks to both of you. Looks like the SELINUX was blocking it. It's kind of black box for me. I have found this procedure, which I applied on my system. I hope, that I didn't open the Pandora's box instead :). But after this, the wu started to download additional pkgs and now the computation is running. I will see, if it will succeed, but so far all 6 cores (12 threads) runs at full speed (with some small slowdowns from time to time). So again thx for help.
	ID: 48433 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48438 - Posted: 19 Dec 2017 \| 22:18:59 UTC - in response to Message 48428. Last modified: 19 Dec 2017 \| 22:36:38 UTC
	@ Toni, when these work units run do they get to a certain point then just idle along on a single core for hours on end? I finally got a work unit to download to my Fedora 25 host, and it ran fine up to about 6 hours or so run time and 78.698% completed. After this it has been running on a single core for almost 8 hours now and the progress is still locked at 78.698%. The time to completion has increased from 1 hour 39 minutes to 3 hours 44 minutes and counting. What happened to the Multi-Threading I thought these work units were supposed to do? Run time is now approaching 14 hours and the % done has not moved, this is on a 16 core computer. Conan Well I got sick of waiting for this one to finish (it had now been running for over 22 hours still on 1 core) so I created an "app_config.xml" file and inserted that in the project folder and restarted the BOINC Client and Manager. The WU reset itself to 1.098% completed, 5 hours run time and 18 days 23 hours to completion. So 17 to 18 hours run time disappeared and all processing went as well, so apparently no checkpoints. It is now running on 8 cpus instead of 16 which had stopped other work for a day. Will now see what happens. EDIT:: Just after I posted the WU has jumped to 81.741% done and to completion has now dropped to 1 day 11 minutes. So appears to be working heaps better. Conan
	ID: 48438 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,206,655,749 RAC: 261,147 Level Scientific publications	Message 48440 - Posted: 20 Dec 2017 \| 1:28:12 UTC - in response to Message 48438.
	It is now running on 8 cpus instead of 16 which had stopped other work for a day. Will now see what happens. Your host has 2 CPUs, both have 4 cores hyperthreaded, so the performance scaling will drop rapidly if you run more than 8 threads of Floating Point calculations (most of the science projects are using FP). To all multi-threaded CPU crunchers: Hyperthreaded CPUs have half as many cores as BOINC reports, so you should limit the threads utilized by the app to obtain optimal performance / reliability.
	ID: 48440 \| Rating: 0 \| rate: / Reply Quote

Dayle Diamond Send message Joined: 5 Dec 12 Posts: 84 Credit: 1,663,883,415 RAC: 0 Level Scientific publications	Message 48441 - Posted: 20 Dec 2017 \| 1:32:50 UTC
	If these workunits are gonna take an average of Fifty five hours of CPU time, they really shouldn't crash when I reboot the system. https://www.gpugrid.net/workunit.php?wuid=12932734 Needed to apply updates. Waited until one finished uploading before risking it. Glad I waited. This one was active for less than five minutes. Select Language▼ Twitter Facebook Follow us on: \| Server status \| Dayle Diamond [log out] logo About Science Volunteers Performance Forum Join us Donate Name c448-TONI_QMML314rst-0-1-RND2050_0 Workunit 12932734 Created 18 Dec 2017 \| 17:32:38 UTC Sent 18 Dec 2017 \| 21:17:59 UTC Received 20 Dec 2017 \| 1:26:41 UTC Server state Over Outcome Computation error Client state Compute error Exit status 195 (0xc3) EXIT_CHILD_FAILED Computer ID 453935 Report deadline 23 Dec 2017 \| 21:17:59 UTC Run time 10.36 CPU time 6.37 Validate state Invalid Credit 0.00 Application version Quantum Chemistry v3.14 (mt) Stderr output <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 17:21:56 (70558): wrapper (7.7.26016): starting 17:21:56 (70558): wrapper (7.7.26016): starting 17:21:56 (70558): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda) Python 3.6.3 :: Anaconda, Inc. 17:22:04 (70558): miniconda-installer exited; CPU time 6.370986 17:22:04 (70558): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py) CondaValueError: prefix already exists: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml forrtl: error (78): process killed (SIGTERM) Image PC Routine Line Source libpcm.so.1 00007F3B35B1D725 Unknown Unknown Unknown libpcm.so.1 00007F3B35B1B347 Unknown Unknown Unknown libpcm.so.1 00007F3B35A32AA2 Unknown Unknown Unknown libpcm.so.1 00007F3B35A328F6 Unknown Unknown Unknown libpcm.so.1 00007F3B35A00EFD Unknown Unknown Unknown libpcm.so.1 00007F3B35A04298 Unknown Unknown Unknown libpthread.so.0 00007F3B48D8E150 Unknown Unknown Unknown libmkl_def.so 00007F3B2293D916 Unknown Unknown Unknown forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source libpcm.so.1 00007F3B35A04B9A Unknown Unknown Unknown libpthread.so.0 00007F3B48D8E150 Unknown Unknown Unknown Stack trace terminated abnormally. SIGSEGV: segmentation violation Stack trace (11 frames): ../../projects/www.gpugrid.net/wrapper_26198_x86_64-pc-linux-gnu[0x457672] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13150)[0x7fa252998150] ../../projects/www.gpugrid.net/wrapper_26198_x86_64-pc-linux-gnu[0x494313] ../../projects/www.gpugrid.net/wrapper_26198_x86_64-pc-linux-gnu[0x4905b5] /lib/x86_64-linux-gnu/libc.so.6(+0x37140)[0x7fa2525dc140] /lib/x86_64-linux-gnu/libc.so.6(nanosleep+0x58)[0x7fa25267db98] /lib/x86_64-linux-gnu/libc.so.6(usleep+0x44)[0x7fa2526b0134] ../../projects/www.gpugrid.net/wrapper_26198_x86_64-pc-linux-gnu[0x467f2f] ../../projects/www.gpugrid.net/wrapper_26198_x86_64-pc-linux-gnu[0x40b1a1] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fa2525c61c1] ../../projects/www.gpugrid.net/wrapper_26198_x86_64-pc-linux-gnu[0x407ca2] Exiting... 17:24:57 (1324): wrapper (7.7.26016): starting 17:24:57 (1324): wrapper (7.7.26016): starting 17:24:57 (1324): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py) CondaValueError: prefix already exists: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs/qmml CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.continuum.io/pkgs/main/linux-64/repodata.json.bz2> Elapsed: - An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way. ConnectionError(MaxRetryError("HTTPSConnectionPool(host='repo.continuum.io', port=443): Max retries exceeded with url: /pkgs/main/linux-64/repodata.json.bz2 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f19ff5e8898>: Failed to establish a new connection: [Errno -2] Name or service not known',))",),) Traceback (most recent call last): File "pre_script.py", line 13, in <module> raise Exception("Error installing h5py") Exception: Error installing h5py 17:24:58 (1324): $PROJECT_DIR/miniconda/bin/python exited; CPU time 0.257581 17:24:58 (1324): app exit status: 0x1 17:24:58 (1324): called boinc_finish(195) </stderr_txt> ]]> About Science Volunteers Performance Forum Join us Contact Google+ Facebook Twitter © 2017 Universitat Pompeu Fabra
	ID: 48441 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48444 - Posted: 20 Dec 2017 \| 4:23:12 UTC
	I finally finished a task so I can post now. Can someone explain what the QC app shows for Status in the BOINC Manager. I had a app_config.xml loaded to limit the number of cpu cores it was supposed to use to 4. However in the Status column it showed 16C for the number of cores allotted. Is that normal? Is that just how it describes itself to BOINC or was it really using all 16 cores? This was my app_confg.xml <app_config> <app> <name>acemdlong</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>acemdshort</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>QC</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>QC</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> Does anyone see anything wrong with the app_config?
	ID: 48444 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48445 - Posted: 20 Dec 2017 \| 9:07:10 UTC - in response to Message 48444. Last modified: 20 Dec 2017 \| 9:09:36 UTC
	I finally finished a task so I can post now. Can someone explain what the QC app shows for Status in the BOINC Manager. I had a app_config.xml loaded to limit the number of cpu cores it was supposed to use to 4. However in the Status column it showed 16C for the number of cores allotted. Is that normal? Is that just how it describes itself to BOINC or was it really using all 16 cores? This was my app_confg.xml <app_config> <app> <name>acemdlong</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>acemdshort</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>QC</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>QC</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> </app_config> Does anyone see anything wrong with the app_config? Try <avg_ncpus>4.000000</avg_ncpus> The red highlighted line above, it works for me. Conan
	ID: 48445 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48446 - Posted: 20 Dec 2017 \| 9:08:52 UTC - in response to Message 48444. Last modified: 20 Dec 2017 \| 9:21:33 UTC
	Can someone explain what the QC app shows for Status in the BOINC Manager. I had a app_config.xml loaded to limit the number of cpu cores it was supposed to use to 4. However in the Status column it showed 16C for the number of cores allotted. Is that normal? Is that just how it describes itself to BOINC or was it really using all 16 cores? The app_config looks the same as mine. Did you reboot in order to activate it? In some of these multi-core projects, the Status is not updated until the next group of work units comes in after you have set the app_config. But a reboot usually fixes it.
	ID: 48446 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48449 - Posted: 20 Dec 2017 \| 10:46:22 UTC - in response to Message 48446.
	To clarify run times: all the QMML314rst wus are the same length. Even on a single core, they should not take longer than 20h maximum (on a relatively modern PC). The HTTP messages indicate a connectivity problem of course. I hope they cause a failure soon rather than remaining stuck. Re SElinux... I hope it leaves us in peace.
	ID: 48449 \| Rating: 0 \| rate: / Reply Quote

Sebastian M. Bobrecki Send message Joined: 4 Oct 09 Posts: 6 Credit: 110,801,812 RAC: 0 Level Scientific publications	Message 48454 - Posted: 20 Dec 2017 \| 13:27:54 UTC Last modified: 20 Dec 2017 \| 13:38:38 UTC
	After about 10h and reaching 69.568% app started to use only one core. What's worst it stays in that state for another 10h and perf is indicating that it's in OMP spinlock: 83.49% python libiomp5.so [.] __kmp_wait_yield_4 6.76% python libiomp5.so [.] __kmp_eq_4 5.74% python libiomp5.so [.] __kmp_yield 0.66% python [kernel.vmlinux] [k] entry_SYSCALL_64 ... Edit: On second machine it looks similar but after 6h and 78.698% it stays in that state for about 11h now. Perf: 84.60% python libiomp5.so [.] __kmp_wait_yield_4 6.80% python libiomp5.so [.] __kmp_eq_4 5.77% python libiomp5.so [.] __kmp_yield 0.59% python [kernel.vmlinux] [k] entry_SYSCALL_64 0.37% python [kernel.vmlinux] [k] __schedule ...
	ID: 48454 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48455 - Posted: 20 Dec 2017 \| 13:33:18 UTC - in response to Message 48449.
	Even on a single core, they should not take longer than 20h maximum (on a relatively modern PC). They are not behaving that well at all. I did not have any work units complete yesterday on four machines. That was running two cores each on two i7-3770s and four cores each on an i7-4770 and a Ryzen 1700. These machines are all Ubuntu 16/17, and run 24/7. http://www.gpugrid.net/results.php?userid=90514 They must loop back at some point, but I will let them run for another couple of days. By the way, posting is difficult as the website is often unaccessible for a few minutes at a time. Maybe that is related to some of problems some people are having, but I have not looked into it further.
	ID: 48455 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48458 - Posted: 20 Dec 2017 \| 14:54:10 UTC - in response to Message 48455.
	Two have just completed on my Ryzen 1700 (4 cores each). The elapsed time shows as 4 hours 10 minutes, but the CPU time is over two days. http://www.gpugrid.net/results.php?hostid=452287
	ID: 48458 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48460 - Posted: 20 Dec 2017 \| 17:10:57 UTC - in response to Message 48446.
	Can someone explain what the QC app shows for Status in the BOINC Manager. I had a app_config.xml loaded to limit the number of cpu cores it was supposed to use to 4. However in the Status column it showed 16C for the number of cores allotted. Is that normal? Is that just how it describes itself to BOINC or was it really using all 16 cores? The app_config looks the same as mine. Did you reboot in order to activate it? In some of these multi-core projects, the Status is not updated until the next group of work units comes in after you have set the app_config. But a reboot usually fixes it. I reloaded the app_config via the Manager. I was afraid to reboot the machine because I had read earlier in the thread that the tasks would restart and I would lose the processing up to that point. It is normal for BOINC to identify downloaded tasks with the existing cpu/gpu resource usage at time of download. But I could swear I had the app_config in place before I finally snagged my first two tasks. Will wait and see when I can get my next task.
	ID: 48460 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48464 - Posted: 20 Dec 2017 \| 21:02:56 UTC - in response to Message 48460.
	But I could swear I had the app_config in place before I finally snagged my first two tasks. Will wait and see when I can get my next task. You have to activate the app_config. If you have BoincTasks, there is a way to read all the cc_config and app_config files for any connected machine. (I don't have it in front of me at the moment). Otherwise, a reboot will be necessary. I am having all sorts of problems with the work units, and a reboot is probably not worse than anything else at the moment.
	ID: 48464 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48465 - Posted: 20 Dec 2017 \| 23:11:28 UTC - in response to Message 48464.
	The official BOINC Manager has an option to reread config files as well. I use BOINC Tasks and a new/updated file is picked up without a reboot or client restart.
	ID: 48465 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48467 - Posted: 21 Dec 2017 \| 3:40:51 UTC
	There is also an option in the BOINC code that allows for the number of cpus that you want to use per host. Each host can have a different setting. Ask over at Amicable Numbers as they have done that there. Then you wont need app_config.xml files at all. Conan
	ID: 48467 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48469 - Posted: 21 Dec 2017 \| 6:26:09 UTC - in response to Message 48467.
	Setting CPU % in BOINC is system and project wide. Not very good for fine tuning per project. The app_config was specifically introduced for specific project tuning and is the preferred method to control gpu and cpu usage per application. I have the cpu cores limited in my app_config for both the ACEMD and QC apps. I was wondering why it wasn't picked up after all config files were re-read.
	ID: 48469 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48471 - Posted: 21 Dec 2017 \| 12:02:09 UTC - in response to Message 48467. Last modified: 21 Dec 2017 \| 12:03:40 UTC
	There is also an option in the BOINC code that allows for the number of cpus that you want to use per host. Each host can have a different setting. Ask over at Amicable Numbers as they have done that there. Then you wont need app_config.xml files at all. Conan Setting CPU % in BOINC is system and project wide. Not very good for fine tuning per project. The app_config was specifically introduced for specific project tuning and is the preferred method to control gpu and cpu usage per application. I have the cpu cores limited in my app_config for both the ACEMD and QC apps. I was wondering why it wasn't picked up after all config files were re-read. Keith Myers I am not referring to the Boinc Client on your personal computer. My comments were aimed at the BOINC Server Code and therefore are relevant to fine tuning per project. The option I am referring to is meant for Multiple Threading, so you can set the number of cores that you want to run a MT work unit on. Over at Amicable Numbers I have normal for my 4 Core host, 5 cores for my 6 core host and 8 cores for my 16 core host (allowing 2 work units to run at the same time), so that none of the computers have the same setting but could if I wanted them to. Had the same issue I found here with the 16 core machine and that is why I set it to 8 cores. App_config.xml works and works well, I was offering an option especially for those of us that are not too good at creating these xml files, and to show that BOINC does have an option in its code to cover this situation. Conan
	ID: 48471 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48472 - Posted: 21 Dec 2017 \| 12:35:13 UTC - in response to Message 48471.
	I am not referring to the Boinc Client on your personal computer. My comments were aimed at the BOINC Server Code and therefore are relevant to fine tuning per project. The option I am referring to is meant for Multiple Threading, so you can set the number of cores that you want to run a MT work unit on. They have that at LHC too, for the ATLAS project. And they used to do something similar at WCG for the CEP2 project (though that was not mt), in order to limit the high number of writes to the disk drive. I think it would be very valuable here, since it appears that limiting the number of cores will be needed for many people, and not everyone will be willing to use app_config.xml files.
	ID: 48472 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48473 - Posted: 21 Dec 2017 \| 13:04:51 UTC - in response to Message 48472.
	We'll try to limit the number of cores indeed. It requires server-side changes so may not be soon & may not work.
	ID: 48473 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48476 - Posted: 21 Dec 2017 \| 19:11:10 UTC - in response to Message 48471.
	I understood what you are saying. The only way that works is if you set up different venues for different projects. I work primarily at SETI and Einstein. The venue mechanism does not work correctly and will likely never be updated. Very low chance that any major rework of the BOINC server code happens in the future with the lack of developers. I am very comfortable with writing and editing app_info and app_config. Been doing it for a very long while. App_config is the simplest way to tune for individual projects as long as you are using a later version of the Client. I also run more than one project simultaneously which makes your solution unworkable.
	ID: 48476 \| Rating: 0 \| rate: / Reply Quote

ETQuestor Send message Joined: 11 Jul 09 Posts: 27 Credit: 1,000,618,568 RAC: 0 Level Scientific publications	Message 48477 - Posted: 21 Dec 2017 \| 19:41:07 UTC Last modified: 21 Dec 2017 \| 19:44:41 UTC
	I just had to restart the BOINC client and the QMML work unit started back from 0% "fraction done" even though it had a checkpoint time of ~130000 and was at ~65%. Boo. name: c457-TONI_QMML314rst-0-1-RND3080_4 WU name: c457-TONI_QMML314rst-0-1-RND3080 project URL: http://www.gpugrid.net/ received: Thu Dec 21 01:43:45 2017 report deadline: Tue Dec 26 01:43:44 2017 ready to report: no got server ack: no final CPU time: 0.000000 state: downloaded scheduler state: scheduled exit_status: 0 signal: 0 suspended via GUI: no active_task_state: EXECUTING app version num: 314 checkpoint CPU time: 135211.300000 current CPU time: 138876.400000 fraction done: 0.010989 swap size: 1236 MB working set size: 304 MB estimated CPU time remaining: 747697.676876
	ID: 48477 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48483 - Posted: 22 Dec 2017 \| 21:45:12 UTC - in response to Message 48477.
	Can anybody please try if a couple of tasks can run in simultaneously?
	ID: 48483 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48500 - Posted: 24 Dec 2017 \| 17:15:46 UTC - in response to Message 48483.
	If another one would be made available, I could try. I only ever see one task ready to be snagged. Just got one. Happy to report the change in allowed cores limit was properly applied after I changed 4 to 4.0000.
	ID: 48500 \| Rating: 0 \| rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 41 Credit: 88,126,864 RAC: 833 Level Scientific publications	Message 48511 - Posted: 26 Dec 2017 \| 13:56:46 UTC
	Why is there such a big credit difference? 16794335 12932706 25 Dec 2017 \| 10:46:18 UTC 26 Dec 2017 \| 5:21:03 UTC Fertig und Bestätigt 66,830.38 199,504.30 440.38 Quantum Chemistry v3.14 (mt) 16792051 12932673 24 Dec 2017 \| 10:37:41 UTC 25 Dec 2017 \| 5:22:21 UTC Fertig und Bestätigt 67,429.09 200,424.90 1,627.77 Quantum Chemistry v3.14 (mt)
	ID: 48511 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48512 - Posted: 27 Dec 2017 \| 3:04:31 UTC - in response to Message 48483.
	I just grabbed 2 QC tasks and I will attempt to run them simultaneously tomorrow during the SETI outage.
	ID: 48512 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48514 - Posted: 27 Dec 2017 \| 18:20:32 UTC
	These two WU's ran concurrently on one of my FX8350 using 4 cores each. They were started within about twenty minutes of each other and finished about 4 minutes apart. 16795026 12932509 426610 25 Dec 2017 \| 15:47:53 UTC 27 Dec 2017 \| 12:23:12 UTC Completed and validated 53,395.91 188,995.80 3,034.10 Quantum Chemistry v3.14 (mt) 16795025 12932492 426610 25 Dec 2017 \| 15:47:53 UTC 27 Dec 2017 \| 12:19:03 UTC Completed and validated 53,957.92 193,510.10 3,066.04 Quantum Chemistry v3.14 (mt)
	ID: 48514 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48520 - Posted: 28 Dec 2017 \| 9:07:14 UTC - in response to Message 48512.
	I just grabbed 2 QC tasks and I will attempt to run them simultaneously tomorrow during the SETI outage. I just finished these two QC tasks run concurrently. 16798843 12932712 456812 27 Dec 2017 \| 3:00:54 UTC 28 Dec 2017 \| 7:57:47 UTC Completed and validated 26,162.47 103,683.40 4,277.76 Quantum Chemistry v3.14 (mt) 16798838 12932759 456812 27 Dec 2017 \| 2:58:51 UTC 28 Dec 2017 \| 7:57:47 UTC Completed and validated 26,201.39 103,805.90 4,284.12 Quantum Chemistry v3.14 (mt) I started them within a minute of each other using 4 cores each. I also had 3 Einstein GPU tasks running concurrently with them. System is a AMD Ryzen 1800X 16 core CPU and three Nvidia GTX 970's. Didn't appear to have any problems. Tasks ran right through with about 70% CPU utilization.
	ID: 48520 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48521 - Posted: 28 Dec 2017 \| 9:11:41 UTC - in response to Message 48520.
	Thanks @keith, thanks @starbase!
	ID: 48521 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48527 - Posted: 28 Dec 2017 \| 17:31:07 UTC - in response to Message 48521. Last modified: 28 Dec 2017 \| 17:31:41 UTC
	Toni, well there's your answer if you discount the low population sample. I doubt that two successful runs were due to the brand of cpu, but could be wrong. As long as you are using a Client later than 7.0.40 you can use an app_config.xml file to tune the number of cores you allow the task to run on.
	ID: 48527 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48532 - Posted: 29 Dec 2017 \| 0:16:02 UTC - in response to Message 48527. Last modified: 29 Dec 2017 \| 0:19:27 UTC
	My Ryzen 1700 machine has certainly done better than my two i7-3770 PCs (all machines on Ubuntu, run 24/7 and otherwise set up the same): Ryzen 1700: http://www.gpugrid.net/results.php?hostid=452287 i7-3770: http://www.gpugrid.net/results.php?hostid=433866 http://www.gpugrid.net/results.php?hostid=448995 EDIT: My i7-4770 PC also tended to hang or otherwise fail. http://www.gpugrid.net/results.php?hostid=357332 I have often gotten hung work units on the Intel machines, but seldom or never on the AMD. And looking around at the other users who fail the work units, they seem to be predominantly Intel, while the ones that succeed seem to be AMD, though I have not done a count myself. Presumably Toni can get those figures.
	ID: 48532 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48534 - Posted: 29 Dec 2017 \| 2:22:44 UTC
	Thanks for the post. Interesting. I could discount the fact that the Ryzen's have actual 8 physical cores so on paper a good head start over the Intel 4 core cpu's. But the FX-9350 earlier in the thread had a good result too with only 4 physical cores too and much more handicapped FFT registers in its modules compared to Ryzen and Intel. Need a lot more samples to definitively clarify I think.
	ID: 48534 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48535 - Posted: 29 Dec 2017 \| 2:38:32 UTC - in response to Message 48534.
	Need a lot more samples to definitively clarify I think. Yes. I am not looking at the output, but really only the error rate. All the machines are now running two cores per work unit, and only one work unit per machine, though earlier I had been running four cores on the AMD machine. And both the Intel and AMD cores have about the same speed, so the output should be comparable anyway now. I have changed one of the i7-3770 machines (GTX-1070-PC) from Ubuntu 17.10 (and BOINC 7.8.3) back to Ubuntu 16.04 (and BOINC 7.6.31). I doubt that it will make much difference, but I will let it run for a couple of weeks. If I continue to get more errors on Intel, I think I will go with just the Ryzen PC.
	ID: 48535 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48537 - Posted: 29 Dec 2017 \| 8:07:09 UTC - in response to Message 48535.
	Good experiment. I was just thinking that spreading the compute load over 4 cores is less hard compared to 100% workload over 4 cores on Intel. If the test on 2 and one cores is equally stable, just slower, it might suggest something bothersome on Intel architecture. The different OS platform could have a big effect too.
	ID: 48537 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48550 - Posted: 29 Dec 2017 \| 20:52:48 UTC Last modified: 29 Dec 2017 \| 21:05:30 UTC
	It appears one cannot start two or more of these type of WU's simultaneously. One of the two errored as shown below. 16803962 12952984 426610 29 Dec 2017 \| 16:16:51 UTC 29 Dec 2017 \| 17:15:51 UTC Completed and validated 916.51 6,109.03 35.20 Quantum Chemistry v3.14 (mt) 16803949 12952971 426610 29 Dec 2017 \| 16:16:51 UTC 29 Dec 2017 \| 17:26:30 UTC Error while computing 5.04 0.00 --- Quantum Chemistry v3.14 (mt) So I started two more but with a five second delay and they both are now happily processing together on one of my FX8350's 4 cores each. With that in mind, the boinc client may not be relied upon to run these cpu jobs unattended using a split cpu core configuration to allow multiple WU processing lest two or more were to start simultaneously causing a possible failure on at least one WU. Edit: Will try a simultaneous start with my last two WU's to see if this is repeatable. Edit2: Yep, happened again. Will copy the pair to this post when the one in progress finishes.
	ID: 48550 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48551 - Posted: 29 Dec 2017 \| 22:02:47 UTC
	These are the last two, one of which errored: 16804044 12953066 426610 29 Dec 2017 \| 17:28:46 UTC 29 Dec 2017 \| 21:02:26 UTC Error while computing 6.06 0.00 --- Quantum Chemistry v3.14 (mt) 16804026 12953048 426610 29 Dec 2017 \| 17:28:46 UTC 29 Dec 2017 \| 21:24:44 UTC Completed and validated 1,393.35 5,255.89 60.62 Quantum Chemistry v3.14 (mt)
	ID: 48551 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48552 - Posted: 30 Dec 2017 \| 12:14:42 UTC - in response to Message 48550.
	It appears one cannot start two or more of these type of WU's simultaneously. That is curious. Thanks for the report. The new DOMINIKs run fine on my two i7-3770s thus far, probably since they are shorter than the TONIs and don't get to the point of hanging up. And the Ryzen 1700 continues to do well. But there must be some selection process going on at the server, since it is the getting only the TONIs. They are all reissues now, but it has handled them all thus far, even a _8. That is a good idea, since it makes optimum use of each CPU type. If things continue this way, I will just let all the machines run.
	ID: 48552 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48553 - Posted: 30 Dec 2017 \| 14:22:06 UTC - in response to Message 48552.
	Looks like that multiple WUs together is ok, but starting exactly at the same time is not. I hope it is a relatively rare occurrence. In principle I could put a locking mechanism, but I am not enthusiastic because that would be inviting more failure modes (e.g. stale locks) to solve a relatively rare case.
	ID: 48553 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48554 - Posted: 30 Dec 2017 \| 14:27:07 UTC - in response to Message 48553.
	Looks like that multiple WUs together is ok, but starting exactly at the same time is not. I hope it is a relatively rare occurrence. In principle I could put a locking mechanism, but I am not enthusiastic because that would be inviting more failure modes (e.g. stale locks) to solve a relatively rare case. So every new member crunching these will get an error on their 1st task. Genius. It absolutely should be fixed.
	ID: 48554 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48555 - Posted: 30 Dec 2017 \| 15:02:08 UTC - in response to Message 48554. Last modified: 30 Dec 2017 \| 15:18:46 UTC
	Looks like that multiple WUs together is ok, but starting exactly at the same time is not. I hope it is a relatively rare occurrence. In principle I could put a locking mechanism, but I am not enthusiastic because that would be inviting more failure modes (e.g. stale locks) to solve a relatively rare case. So every new member crunching these will get an error on their 1st task. Genius. It absolutely should be fixed. I got two errors at first also, but none since and I did not look at the reason. But it is over and done with, and not a problem. I think if you look hard at the logic of lock mechanisms, it is a logical impossibility to fix simultaneity problems. That is, any delay you put it will match some other starting situation, and result in an error also. You can try, but I don't think it is worth the effort either. EDIT: I would look to see if it happens again with the next batch. If so, then I would investigate something, whatever it is. But the errors were very short for me, and no real time lost.
	ID: 48555 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48556 - Posted: 30 Dec 2017 \| 16:19:59 UTC
	I would speculate that the probability of two or more WU's starting at exactly the same time in an unattended environment would be low however, those running multiple projects competing for the same GPU/CPU resources are usually switched at a specified time interval to give each project process time. With that in mind, it is possible for two QC jobs to finish and the boinc client nearing the end of a switch app interval, change to the other project and then when it came time to switch back to run the cpu gpugrid WU's with a queue full of QC jobs start two or more simultaneously. The only way I can see avoiding the possibility of such an error completely would be to keep all cores dedicated to one QC job on unattended machines (headless crunchers in my case). When the next batch is available, I will cut my switch app time down to say 5 minutes and see if I can get a feel for how the client handles things and the possibility of simultaneous starts with the QC jobs. The two jobs that failed due to simultaneous starts both ended with exit code 195 EXIT_CHILD_FAILED. Not sure if that means the app failed to spawn a child thread or close one for one of the two WU's.
	ID: 48556 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48557 - Posted: 30 Dec 2017 \| 16:30:49 UTC - in response to Message 48556.
	You are right that BOINC is not really a random environment, and if it happens in a more-or-less predictable manner, it should be possible to prevent it. We will see how often that is.
	ID: 48557 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48569 - Posted: 31 Dec 2017 \| 19:02:23 UTC - in response to Message 48557.
	You are right that BOINC is not really a random environment, and if it happens in a more-or-less predictable manner, it should be possible to prevent it. We will see how often that is. Agreed, predicable is the key. I really need to find out how the projects I crunch for work unattended because all my little headless crunchers are in process of being converted to diskless/headless cluster nodes with one of the FX system being the master. This should all prove interesting as this is the first project I've worked with that uses multicores for a single WU.
	ID: 48569 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48571 - Posted: 1 Jan 2018 \| 0:34:29 UTC
	When there is work or the GPUGrid servers sent out work they come in batches. Two tasks end up starting at once since they have short deadlines. I also said new members so again multiple tasks starting at once. Not everyone runs the same project all the time so that tasks have a chance to get off sequence.
	ID: 48571 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48578 - Posted: 2 Jan 2018 \| 19:34:34 UTC - in response to Message 48535.
	I have changed one of the i7-3770 machines (GTX-1070-PC) from Ubuntu 17.10 (and BOINC 7.8.3) back to Ubuntu 16.04 (and BOINC 7.6.31). I doubt that it will make much difference, but I will let it run for a couple of weeks. If I continue to get more errors on Intel, I think I will go with just the Ryzen PC. I just completed two TONI work units, one on each of my i7-3770 PCs (2 cores per work unit): http://www.gpugrid.net/workunit.php?wuid=12932866 http://www.gpugrid.net/workunit.php?wuid=12932333 They had each errored out on other PCs, and I don't know why they worked on mine. But I do know that they each got stuck at 78.698% until I rebooted, and then they completed normally. However, the total Run time shown does not include the time they were stuck, which was about two hours in each case. This is no way to get work done; I can't be rebooting for each work unit. So I will have to just stop on the i7-3770 machines and continue only with the Ryzen 1700, which continues to work fine. Note that the new DOMINIK work units are no problem - if they could send only those to the Intel machines, I think the problem would be solved.
	ID: 48578 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48586 - Posted: 3 Jan 2018 \| 23:31:07 UTC
	I just ran a DOMINIK QC task and it ran very fast. Task 16815024 Anybody else run one of these yet? I see that there are a TON of them available. Server Status
	ID: 48586 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 48587 - Posted: 3 Jan 2018 \| 23:39:37 UTC
	50,000 WUs... Holy ****. If only those were GPU WUs
	ID: 48587 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48588 - Posted: 4 Jan 2018 \| 0:14:48 UTC - in response to Message 48587.
	50,000 WUs... Holy ****. If only those were GPU WUs I saw that too and downloaded 4 on a computer I haven't run any of these on. Started at the same time and boom all went to crap. I even tried to pause some to stop the error. http://www.gpugrid.net/results.php?hostid=458003
	ID: 48588 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48589 - Posted: 4 Jan 2018 \| 1:02:26 UTC Last modified: 4 Jan 2018 \| 1:05:56 UTC
	So far from what I have learned starting more than one multiple cpu job at a time in a split core senerio is that they need not be started at exactly the same time for one to error. Given not an exact start at the same time, the one started first always errors and the second one started processes to successful completion with the next WU in the queue started to also complete successfully. Four of four tries, near simultaneous starts the one started first ended up failing. Secondly, controlling the time to when the second WU is allowed to start following the first start time is up to 5 seconds as tested so far. Third, when the boinc client switches between projects, the QC WU's so far observed are completed in pairs leaving no single job left in progress (suspended) to stagger start the times. This unfortunate characteristic means "simultaneous (or nearly so) starts" cause an error whenever the client switches back to the gpugrid cpu jobs with a queue larger than one WU. Guess the only way to prevent this behavior is to not split cores, especially for unattended clients. Edit: correct my lousy spelling, stupid keyboard :)
	ID: 48589 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48590 - Posted: 4 Jan 2018 \| 1:15:38 UTC - in response to Message 48589.
	I didn't have that experience with the two TONI tasks I started simultaneously. Or within the 5 second window you described. Both completed successfully. I am limiting core usage to four with an app_config file. Limiting the max_concurrent to 1 now since I also crunch SETI cpu tasks on that computer. I ran the two concurrent jobs when Toni requested users to try that experiment.
	ID: 48590 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48591 - Posted: 4 Jan 2018 \| 2:02:50 UTC
	Wow, the credit awarded is all over the place for these DOMINIK tasks. Obviously NOT tied to computation time or resources used for compute. Task 16862848 3715 seconds CPU time Credit awarded 21 Task 16815159 3687 seconds CPU time Credit awarded 161
	ID: 48591 \| Rating: 0 \| rate: / Reply Quote

FredoGuan Send message Joined: 29 Dec 16 Posts: 2 Credit: 1,397 RAC: 0 Level Scientific publications	Message 48592 - Posted: 4 Jan 2018 \| 2:12:21 UTC
	I just got a task and it finished on a dual e5-2450l 32g ram server fine. resultid=16815237 Keep developing this, please. This is quite nice and I would really like to see this as part of GPUGRID permanently.
	ID: 48592 \| Rating: 0 \| rate: / Reply Quote

FredoGuan Send message Joined: 29 Dec 16 Posts: 2 Credit: 1,397 RAC: 0 Level Scientific publications	Message 48593 - Posted: 4 Jan 2018 \| 2:57:04 UTC - in response to Message 48592.
	resultid=16815470
	ID: 48593 \| Rating: 0 \| rate: / Reply Quote

Dominik Send message Joined: 15 Dec 17 Posts: 9 Credit: 0 RAC: 0 Level Scientific publications	Message 48597 - Posted: 4 Jan 2018 \| 11:10:04 UTC
	Hello Keith, Wow, the credit awarded is all over the place for these DOMINIK tasks. Obviously NOT tied to computation time or resources used for compute. Did you observe this behavior multiple times? Really strange to be honest. Thanks for helping out everyone!
	ID: 48597 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48603 - Posted: 4 Jan 2018 \| 13:07:24 UTC - in response to Message 48597.
	All tasks were erroring out in 2 minutes due to the app using gcc5.5 This got it to go to farther and start being multithreaded. We'll see if it actually completes. sudo apt-get install gcc-5 g++-5
	ID: 48603 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48608 - Posted: 4 Jan 2018 \| 14:08:09 UTC - in response to Message 48603.
	Yup, it completed. http://www.gpugrid.net/result.php?resultid=16817115
	ID: 48608 \| Rating: 0 \| rate: / Reply Quote

Dominik Send message Joined: 15 Dec 17 Posts: 9 Credit: 0 RAC: 0 Level Scientific publications	Message 48610 - Posted: 4 Jan 2018 \| 14:14:33 UTC
	Great! Thank you very much
	ID: 48610 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,719,275,486 RAC: 2,163,447 Level Scientific publications	Message 48613 - Posted: 4 Jan 2018 \| 16:01:40 UTC - in response to Message 48401.
	@klepel - can you try installing gcc (if not already there)? tks I tried it yesterday. I installed gcc-5 and gcc-6. And it worked on the computer http://www.gpugrid.net/results.php?hostid=452211
	ID: 48613 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48616 - Posted: 4 Jan 2018 \| 16:49:17 UTC - in response to Message 48590.
	These are the two WU's that were started about 5 seconds apart on an AMD FX-8350 with the first one started failing. Stdoutdea.txt: 03-Jan-2018 17:20:14 [GPUGRID] [css] running e113s22_e86s4p0f123-PABLO_p53_PHEX10P_IDP-0-1-RND2720_0 (0.987 CPUs + 1 NVIDIA GPU) 03-Jan-2018 17:20:14 [GPUGRID] Starting task c00000_00024-DOMINIK_QMML2_m0000000055-0-1-RND3244_0 03-Jan-2018 17:20:14 [GPUGRID] [cpu_sched] Starting task c00000_00024-DOMINIK_QMML2_m0000000055-0-1-RND3244_0 using QC version 314 (mt) in slot 9 03-Jan-2018 17:20:14 [GPUGRID] [css] running c00000_00024-DOMINIK_QMML2_m0000000055-0-1-RND3244_0 (4 CPUs) 03-Jan-2018 17:20:20 [GPUGRID] task c06475_06499-DOMINIK_QMML2_m0000000054-0-1-RND3067_0 resumed by user 03-Jan-2018 17:20:21 [GPUGRID] [css] running e113s22_e86s4p0f123-PABLO_p53_PHEX10P_IDP-0-1-RND2720_0 (0.987 CPUs + 1 NVIDIA GPU) 03-Jan-2018 17:20:21 [GPUGRID] [css] running c00000_00024-DOMINIK_QMML2_m0000000055-0-1-RND3244_0 (4 CPUs) 03-Jan-2018 17:20:21 [GPUGRID] Starting task c06475_06499-DOMINIK_QMML2_m0000000054-0-1-RND3067_0 03-Jan-2018 17:20:21 [GPUGRID] [cpu_sched] Starting task c06475_06499-DOMINIK_QMML2_m0000000054-0-1-RND3067_0 using QC version 314 (mt) in slot 10 03-Jan-2018 17:20:21 [GPUGRID] [css] running c06475_06499-DOMINIK_QMML2_m0000000054-0-1-RND3067_0 (4 CPUs) 03-Jan-2018 17:20:25 [GPUGRID] [sched_op] Deferring communication for 00:01:39 03-Jan-2018 17:20:25 [GPUGRID] [sched_op] Reason: Unrecoverable error for task c00000_00024-DOMINIK_QMML2_m0000000055-0-1-RND3244_0 03-Jan-2018 17:20:25 [GPUGRID] Computation for task c00000_00024-DOMINIK_QMML2_m0000000055-0-1-RND3244_0 finished 03-Jan-2018 17:20:25 [GPUGRID] [css] running e113s22_e86s4p0f123-PABLO_p53_PHEX10P_IDP-0-1-RND2720_0 (0.987 CPUs + 1 NVIDIA GPU) 03-Jan-2018 17:20:25 [GPUGRID] [css] running c06475_06499-DOMINIK_QMML2_m0000000054-0-1-RND3067_0 (4 CPUs) I didn't have that experience with the two TONI tasks I started simultaneously. Or within the 5 second window you described. Both completed successfully. I am limiting core usage to four with an app_config file. Limiting the max_concurrent to 1 now since I also crunch SETI cpu tasks on that computer. I ran the two concurrent jobs when Toni requested users to try that experiment. Since this hasn't been the case with your Intel's implies this could be a cpu related phenomenom (architecture/scheduling differences). Perhaps the Intel's can handle initial start up processes faster than the FX series AMD, (might spring for a Ryen7 soon just to check them as well). Regardless, the issue is resolved with my systems by limiting concurrent QC jobs to one and use the other four cores to run WCG as to date I have not experienced a concurrent issue with the WCG WU's.
	ID: 48616 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48617 - Posted: 4 Jan 2018 \| 18:14:01 UTC - in response to Message 48597.
	Hello Keith, Wow, the credit awarded is all over the place for these DOMINIK tasks. Obviously NOT tied to computation time or resources used for compute. Did you observe this behavior multiple times? Really strange to be honest. Thanks for helping out everyone! Yes, the first completed tasks got reasonable credit. Then when I downloaded more, all the credit for them nosedived. Once I saw that they weren't worth running I set NNT. 16862848 12962574 456812 3 Jan 2018 \| 23:38:10 UTC 4 Jan 2018 \| 1:20:46 UTC Completed and validated 1,000.69 3,715.60 21.12 Quantum Chemistry v3.14 (mt) 16815333 12963093 456812 4 Jan 2018 \| 1:00:00 UTC 4 Jan 2018 \| 2:57:45 UTC Completed and validated 1,020.00 3,812.63 27.38 Quantum Chemistry v3.14 (mt) 16815332 12963092 456812 4 Jan 2018 \| 0:59:23 UTC 4 Jan 2018 \| 2:40:47 UTC Completed and validated 990.38 3,697.44 25.78 Quantum Chemistry v3.14 (mt) 16815320 12963080 456812 4 Jan 2018 \| 1:00:37 UTC 4 Jan 2018 \| 3:14:28 UTC Completed and validated 997.66 3,731.61 27.28 Quantum Chemistry v3.14 (mt) 16815307 12963067 456812 4 Jan 2018 \| 1:07:01 UTC 4 Jan 2018 \| 4:04:01 UTC Completed and validated 1,009.28 3,642.48 26.40 Quantum Chemistry v3.14 (mt) 16815275 12963035 456812 4 Jan 2018 \| 1:07:38 UTC 4 Jan 2018 \| 4:21:09 UTC Completed and validated 1,033.63 3,668.31 26.52 Quantum Chemistry v3.14 (mt) 16815264 12963024 456812 4 Jan 2018 \| 1:06:24 UTC 4 Jan 2018 \| 3:46:54 UTC Completed and validated 969.15 3,592.69 25.72 Quantum Chemistry v3.14 (mt) 16815248 12963008 456812 4 Jan 2018 \| 0:58:46 UTC 4 Jan 2018 \| 2:24:17 UTC Completed and validated 935.31 3,503.52 23.59 Quantum Chemistry v3.14 (mt) 16815234 12962994 456812 4 Jan 2018 \| 1:05:48 UTC 4 Jan 2018 \| 3:30:44 UTC Completed and validated 981.30 3,616.45 26.43 Quantum Chemistry v3.14 (mt) 16815171 12962931 456812 3 Jan 2018 \| 23:37:35 UTC 4 Jan 2018 \| 1:04:10 UTC Completed and validated 929.82 3,523.44 18.45 Quantum Chemistry v3.14 (mt) 16815159 12962919 456812 3 Jan 2018 \| 23:35:43 UTC 4 Jan 2018 \| 0:16:22 UTC Completed and validated 986.88 3,687.96 161.36 Quantum Chemistry v3.14 (mt)
	ID: 48617 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,719,275,486 RAC: 2,163,447 Level Scientific publications	Message 48618 - Posted: 4 Jan 2018 \| 19:31:13 UTC Last modified: 4 Jan 2018 \| 19:36:13 UTC
	I have to report back on the AMD Ryzen 1700x Computer: http://www.gpugrid.net/results.php?hostid=420971 If I run 3 instances, the WUs crashes after about 200 seconds, and after that the computer crashes completely. If I am running one (01) instance (WU), the computer runs without any problem. However, as BOINC downloads several of this Quantum Chemistry v3.14 (mt) WUs, BOINC thinks my CPU cache is full and refuses to download additional CPU WUs from PRIMGRID. So after a while the CPU is only loaded with one QC WU (4 threads) and the rest of the cores are idle - Not very efficient. Sorry, Dominik unter diesen Umständen kann ich keine weiteren QC WUs für diesen Computer herunterladen. Komme aber gerne zurück, wenn wir ohne Probleme mehrere MultiCores WUs gleichzeitig bearbeiten können.
	ID: 48618 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48619 - Posted: 4 Jan 2018 \| 19:45:44 UTC - in response to Message 48618. Last modified: 4 Jan 2018 \| 20:02:00 UTC
	However, as BOINC downloads several of this Quantum Chemistry v3.14 (mt) WUs, BOINC thinks my CPU cache is full and refuses to download additional CPU WUs from PRIMGRID. So after a while the CPU is only loaded with one QC WU (4 threads) and the rest of the cores are idle - Not very efficient. If you temporarily suspend the QC jobs (except perhaps the one in progress), you should download more work from your other projects and once downloaded, resume the QC jobs and let boinc take over running the various projects as you have them configured. You may need to "update" the projects you want more work from under the "projects" tab to initiate the downloads right away. Edit: Close quote and add last sentence.
	ID: 48619 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48624 - Posted: 5 Jan 2018 \| 2:50:48 UTC
	Seti@home has been down all day so ran out of work. Decided to give the QC Dominik tasks another try. Thought that possibly the last batch were unique or one-offs or something different about the computer from when I first ran them. Nope. Even worse credit for the batch I ran this afternoon. Credits awarded = 6. If you expect to get anybody to want to run these, you are going to have to make them more appealing, credit-wise at least. For me, not worth the electricity to run them. Would rather let the computer go cold and give my power bill a temporary reprieve.
	ID: 48624 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,621,228,413 RAC: 10,749,388 Level Scientific publications	Message 48632 - Posted: 5 Jan 2018 \| 11:58:50 UTC Last modified: 5 Jan 2018 \| 12:07:27 UTC
	Yeah credit took a dump on the last two I completed. Run Time-----CPU Time-----Credit 2,389.43-----26,206.90-----549.62 3,034.92-----36,710.33-----94.32
	ID: 48632 \| Rating: 0 \| rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 41 Credit: 88,126,864 RAC: 833 Level Scientific publications	Message 48639 - Posted: 5 Jan 2018 \| 20:01:49 UTC
	I got 5.97 points for 32 minutes calculation on my fx 6100. That's ridiculous!
	ID: 48639 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48642 - Posted: 5 Jan 2018 \| 21:16:29 UTC
	If you care to learn about why the low credit or why certain tasks get hi-middle-low credit, read my post here
	ID: 48642 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48643 - Posted: 5 Jan 2018 \| 21:21:52 UTC
	The project is using the "old" BOINC credit algorithm. If I remember how it works, the very first few tasks on a new application that BOINC sees gives very high credit. Then when more tasks are returned the algorithm tunes the credit downwards. The APR for the application stabilizes after 11 valid tasks have been returned. This project utilizes the application APR function of BOINC. It is up to each project to decide whether to follow BOINC standards or write their own. So far I have seen credit awards of 160, 21 and 7. That must be because those were 3 different molecules.
	ID: 48643 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 48653 - Posted: 7 Jan 2018 \| 1:31:32 UTC Last modified: 7 Jan 2018 \| 1:37:35 UTC
	Seems to me that credit should be based on CPU time only. One WU should be worth a specific amount of credit regardless of how fast or slowly processed. Probably require a large bureaucratic committee to attempt to quantify such a value but that's my 2 cents and maybe it would be worth it if we get an answer within a few years :). Edit: Added last sentence, sorry, couldn't resist.
	ID: 48653 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 48654 - Posted: 7 Jan 2018 \| 1:47:05 UTC - in response to Message 48653.
	Seems to me that credit should be based on CPU time only. One WU should be worth a specific amount of credit regardless of how fast or slowly processed. Probably require a large bureaucratic committee to attempt to quantify such a value but that's my 2 cents and maybe it would be worth it if we get an answer within a few years :). Edit: Added last sentence, sorry, couldn't resist. That's how Einstein handles credit. They don't use the BOINC algorithm and decide how much credit any application awards independent of run_time.
	ID: 48654 \| Rating: 0 \| rate: / Reply Quote

Conan Send message Joined: 25 Mar 09 Posts: 25 Credit: 582,385 RAC: 0 Level Scientific publications	Message 48733 - Posted: 24 Jan 2018 \| 8:39:38 UTC
	Just posting to say that I have finally gotten my 6 core Phenom CPU working on this project. It has taken me about 3 months of errors and an OS upgrade from Fedora 16 to 21, before the error message changed, saying that now qmml/bin/python was not there. So after comparing another working system and a huge number of updates, I noticed that a number of symlinks were missing from my computer. I installed 7 symlinks, all python related. Existing files........Missing files pydoc3.6 >>>>>>>>>>> pydoc pydoc3.6 >>>>>>>>>>> pydoc3 python3.6 >>>>>>>>>> python python3.6 >>>>>>>>>> python3 pyvenv-3.6 >>>>>>>>> pyvenv python3.6m-config >> python3.6-config python3.6m-config >> python3-config After installing all of these links it now has begun to work. So Toni, hopefully no more errors. Conan
	ID: 48733 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Multicore CPUs : New batch of QC tasks (QMML)