New QC app

Message boards : Multicore CPUs : New QC app

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 49759 - Posted: 2 Jul 2018 \| 10:18:54 UTC Last modified: 2 Jul 2018 \| 10:52:36 UTC
	Dears, after a hot weekend during which I accidentally cancelled QC WUs, we are ready to start again with a new app. As soon as we get it right, we should be able to run on more machines (gcc no longer a requirement). There will be a largish download the first time you run app 329. If you want to free up some disk space, please reset the project (recommended, but not urgent).
	ID: 49759 \| Rating: 0 \| rate: / Reply Quote

kain Send message Joined: 3 Sep 14 Posts: 152 Credit: 826,175,036 RAC: 4,076,927 Level Scientific publications	Message 49760 - Posted: 2 Jul 2018 \| 11:29:25 UTC
	Ready for action ;)
	ID: 49760 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 49766 - Posted: 2 Jul 2018 \| 16:30:41 UTC
	First 330 task completed and validated. The second one is waiting for memory alongside a GPU task, which is however using only 4% of the total 8 GB of RAM. Tullio
	ID: 49766 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 49778 - Posted: 4 Jul 2018 \| 8:47:07 UTC - in response to Message 49766.
	I will let the WUs run out for a day because I want to see if something weird is happening on my side (the WUs are calculating fine, don't worry). I'll submit more once they are completed tomorrow.
	ID: 49778 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,835,616,430 RAC: 19,791,081 Level Scientific publications	Message 50348 - Posted: 30 Aug 2018 \| 13:43:18 UTC
	@ Toni, @ Stefan There's an error report in Number Crunching (This computer has finished a daily quota of 31 tasks) which suggests that the maximum upload size for a batch of QC tasks has been set too low. Task name is 6955_1_15_16_18_dd130713_n00001-SDOERR_SELE2-0-1-RND2528
	ID: 50348 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,835,616,430 RAC: 19,791,081 Level Scientific publications	Message 50349 - Posted: 30 Aug 2018 \| 13:43:42 UTC
	@ Toni, @ Stefan There's an error report in Number Crunching (This computer has finished a daily quota of 31 tasks) which suggests that the maximum upload size for a batch of QC tasks has been set too low. Task name is 6955_1_15_16_18_dd130713_n00001-SDOERR_SELE2-0-1-RND2528
	ID: 50349 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50404 - Posted: 5 Sep 2018 \| 14:34:10 UTC - in response to Message 50349. Last modified: 5 Sep 2018 \| 14:34:49 UTC
	I think the actual error is a segmentation fault, which leaves large temporary files behind, and their transfer is attempted. Let's see if the situation improves with the new version. If in doubt, please reset the project.
	ID: 50404 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50412 - Posted: 6 Sep 2018 \| 5:32:45 UTC - in response to Message 50404. Last modified: 6 Sep 2018 \| 5:33:01 UTC
	Just curious, is there someplace that tells how well the QC apps are running like on the server status page. I can look at my own machines and see the errors there but overall is there someplace like for the GPU apps?
	ID: 50412 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,835,616,430 RAC: 19,791,081 Level Scientific publications	Message 50415 - Posted: 6 Sep 2018 \| 8:25:58 UTC - in response to Message 50404.
	I think the actual error is a segmentation fault, which leaves large temporary files behind, and their transfer is attempted. Let's see if the situation improves with the new version. BOINC won't attempt to upload a temporary file unless its name is specified with an upload URL in the workunit template. I'll be able to advise better when you release the Windows app, and I can see any problems happening on my own machines.
	ID: 50415 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50417 - Posted: 6 Sep 2018 \| 8:46:57 UTC - in response to Message 50415.
	May well be restricted to a few machines.
	ID: 50417 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,835,616,430 RAC: 19,791,081 Level Scientific publications	Message 50418 - Posted: 6 Sep 2018 \| 9:42:25 UTC - in response to Message 50417.
	May well be restricted to a few machines. If you want to count me in, I'll do my best to report on any issues that may arise (in beta mode if necessary). I'm primarily Windows 7, so the WSL approach would be difficult except for one dual-boot test machine with Windows 10 ready to run.
	ID: 50418 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50419 - Posted: 6 Sep 2018 \| 10:21:20 UTC
	My main Linux box is crunching SELE6 on its Opteron 1210. If necessary, I have a Windows 10 PC with an AMD A10-6700 and 22 GB RAM. Tullio ____________
	ID: 50419 \| Rating: 0 \| rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 158 Credit: 388,132 RAC: 0 Level Scientific publications	Message 50420 - Posted: 6 Sep 2018 \| 12:24:27 UTC
	again <message> upload failure: <file_xfer_error> <file_name>5516_14_15_18_19_8125b500_n00001-SDOERR_SELE6-0-1-RND8343_0_1</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> Have i had to reset the project?
	ID: 50420 \| Rating: 0 \| rate: / Reply Quote

Chilean Send message Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level Scientific publications	Message 50421 - Posted: 6 Sep 2018 \| 12:26:27 UTC
	CPU apps not working very well. Lots of idle time. It maybe because of the 48 threads... some pythons use 4 cores, the rest just 1 and not all the time. It's gotta be an I/O issue (?). ____________
	ID: 50421 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50422 - Posted: 6 Sep 2018 \| 13:18:05 UTC - in response to Message 50421.
	Each task should use 4 threads max.
	ID: 50422 \| Rating: 0 \| rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 158 Credit: 388,132 RAC: 0 Level Scientific publications	Message 50423 - Posted: 6 Sep 2018 \| 15:16:03 UTC - in response to Message 50420.
	again <message> upload failure: <file_xfer_error> <file_name>5516_14_15_18_19_8125b500_n00001-SDOERR_SELE6-0-1-RND8343_0_1</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> Have i had to reset the project? Another clue: this happens when i reboot the virtual machine (and the wu restarts)
	ID: 50423 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50424 - Posted: 6 Sep 2018 \| 17:11:33 UTC Last modified: 6 Sep 2018 \| 17:13:33 UTC
	I have had good luck for the past day running QC on my i7-4770 (Ubuntu 16.04). That doesn't prove much, except that there is no fatal flaw in all the work units. And I am limiting them to two cores per work unit, and three work units at a time. That gives me essentially the same output as four cores on two work units at a time, but leaves over a little more CPU support for my GTX 1070 on Folding. All in all, it seems to be working fine. (boboviz - I wouldn't draw conclusions from virtual machines. Even LHC has a hard time with their own stuff.)
	ID: 50424 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,835,616,430 RAC: 19,791,081 Level Scientific publications	Message 50425 - Posted: 6 Sep 2018 \| 17:32:31 UTC - in response to Message 50423.
	(and the wu re-starts) I think that is indeed a clue. One of the mechanisms I considered was a WU re-starting, and appending a second result to an existing file, doubling its size. Checking the 'headroom' between the typical result file size and <max_nbytes> was one of the tests I had in mind for the Windows version. Can anyone comment?
	ID: 50425 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50426 - Posted: 6 Sep 2018 \| 18:15:16 UTC
	Most LHC users are Windows users and they run Scientific Linux research programs from CERN (not BOINC programs) using Virtual Machines. Tullio
	ID: 50426 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50427 - Posted: 7 Sep 2018 \| 11:41:06 UTC Last modified: 7 Sep 2018 \| 11:42:54 UTC
	This is interesting. Twice overnight my PC crashed. Shut down. Didn't run. But each time, I saw no errors in the BoincTasks log, or in the Folding log either. The Folding work unit just continued from where it left off. But each time, a new set of (three) QC tasks started after starting up the PC. And now I see the same old error message in the stderr.txt file: </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>2360_16_18_20_21_e0c95459_n00001-SDOERR_SELE6-0-1-RND3935_0_1</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> </message> http://www.gpugrid.net/result.php?resultid=18686175 The messages don't appear immediately after the crashes, but a few hours later. And I just attached this machine to GPUGrid a couple of days ago. So it didn't take long.
	ID: 50427 \| Rating: 0 \| rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 158 Credit: 388,132 RAC: 0 Level Scientific publications	Message 50430 - Posted: 7 Sep 2018 \| 13:45:25 UTC - in response to Message 50424.
	(boboviz - I wouldn't draw conclusions from virtual machines. Even LHC has a hard time with their own stuff.) Do you think i have to use a physical linux machine?
	ID: 50430 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50431 - Posted: 7 Sep 2018 \| 13:59:07 UTC - in response to Message 50430.
	(boboviz - I wouldn't draw conclusions from virtual machines. Even LHC has a hard time with their own stuff.) Do you think i have to use a physical linux machine? Don't know. Real Linux hasn't fixed the problem for me. But the VM's just add another layer of complexity, and from what I understand (I am not an expert), hide the various other problems. At least that is what they say on the LHC forum, where they would like to get away from VirtualBox if that were possible. It is why they developed native ATLAS, and would like to do that for the other projects if it were possible.
	ID: 50431 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50432 - Posted: 7 Sep 2018 \| 14:42:04 UTC - in response to Message 50431. Last modified: 7 Sep 2018 \| 14:48:30 UTC
	I am running GPUGRID,both CPU an GPU, on a SuSE Linux box with a GTX 750 Ti GPU board. I am running Atlas@home of LHC, on a Windows 10 PC with 4 cores (but Task Manager says two cores and 4 logical processors on a AMD A10-6700 CPU). It has a GTX 1050 Ti GPU board, but GPUGRID overheats it to 80 C and it crashes, so I am running Atlas (no GPU but VirtuaBox 5.2.18), Einstein@home and SETI@home, both CPU and GPU. Tullio Atlas native runs only on Ubuntu Linux,it does not run on my SuSE Linux nor on Windows. ____________
	ID: 50432 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50433 - Posted: 7 Sep 2018 \| 17:25:03 UTC - in response to Message 50432.
	Restarts may be a problem. It's not really the output file size (which should be small), but the fact that temporary files are not deleted as the consequence of some other error. Can someone reliably reproduce the problem (i.e., stopping and restarting a WU)? If it does, is the problem solved enabling the "keep application in memory" option?
	ID: 50433 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50434 - Posted: 7 Sep 2018 \| 17:59:00 UTC - in response to Message 50433. Last modified: 7 Sep 2018 \| 18:32:00 UTC
	If it does, is the problem solved enabling the "keep application in memory" option? I don't think I can reliably reproduce it, but I have "Leave application in memory" enabled (as is my usual practice), and that does not prevent it. EDIT: Also, I should point out that there were no other BOINC applications running, and my machine runs 24/7, so the QC work units were never being suspended anyway.
	ID: 50434 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50448 - Posted: 8 Sep 2018 \| 16:46:20 UTC Last modified: 8 Sep 2018 \| 16:47:25 UTC
	So I'm curious. When we see the results for a work unit for QC it list both a run time and a cpu time. Since we are so used to run time and cpu time beling a linear measurement, I was wondering if CPU time for the QC is actually a combined sum total for all CPU threads being used. I've calculated out how long the CPU time is and it's much higher than the actual run time I'm seeing on the machine. What does make sense is if it's the sum total of 4 threads all running at the time time for a set amount of time. CPU time = N threads x actual run time Is this correct? Mostly for my own curiosity. ____________
	ID: 50448 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 50449 - Posted: 8 Sep 2018 \| 16:58:35 UTC
	Since the utilization of the CPU is so low on these WUs I presume it doesn't count most of the run time as CPU time because the CPU is waiting for the hard drive.
	ID: 50449 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50451 - Posted: 9 Sep 2018 \| 14:22:48 UTC Last modified: 9 Sep 2018 \| 14:58:04 UTC
	I thought I would try my Ryzen 1700 on QC (Ubuntu 18.04), to see if it would behave differently than my Intel machines (i7-4770, i7-8700). The first work unit errored: CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/r/noarch/repodata.json.bz2> Elapsed: - An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way. If your current network has https://www.anaconda.com blocked, please file a support request with your network engineering team. ConnectionError(MaxRetryError("HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/r/noarch/repodata.json.bz2 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f60af4b15c0>: Failed to establish a new connection: [Errno -2] Name or service not known',))",),) A reportable application error has occurred. Conda has prepared the above report. Upload did not complete.10:07:33 (1830): /usr/bin/flock exited; CPU time 11.796829 10:07:33 (1830): app exit status: 0x1 10:07:33 (1830): called boinc_finish(195) This may be a somewhat different error message than the others, but it seems to me that they are all communications-related. I suspect it has to do with the intermittent connections I have been getting to GPUGrid for the past several weeks/months, as previously discussed. http://www.gpugrid.net/forum_thread.php?id=4806 EDIT: The next two are running OK, and it looks like they will complete normally. It is a very intermittent problem.
	ID: 50451 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50453 - Posted: 10 Sep 2018 \| 4:20:16 UTC - in response to Message 50448.
	So I'm curious. When we see the results for a work unit for QC it list both a run time and a cpu time. Since we are so used to run time and cpu time beling a linear measurement, I was wondering if CPU time for the QC is actually a combined sum total for all CPU threads being used. I've calculated out how long the CPU time is and it's much higher than the actual run time I'm seeing on the machine. What does make sense is if it's the sum total of 4 threads all running at the time time for a set amount of time. CPU time = N threads x actual run time Is this correct? Mostly for my own curiosity. I suspect that cpu time = (0.95 X N cpu's X run time + any time the computer is in use and background overhead). These darn newer WU's are so memory hungry that latency on all my machines running them is so long, the computers are becoming useless to me. I may have to quit running these until the memory issue is addressed. Both my FX 8 cores have 16GB ram and with one QC job running just 4 cores, my swap usage is as high as 7% and it takes forever to get the machine to do what I need. Plus I had to repartition the root dirs on 6 machines to accomodate the increased HD demands. Not sure how much longer I can hold out.
	ID: 50453 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50454 - Posted: 10 Sep 2018 \| 7:35:24 UTC - in response to Message 50453.
	The QC jobs should use 4GB of RAM each. If you are swapping just don't run as many in parallel. You will never finish them anyway if you end up swapping.
	ID: 50454 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50464 - Posted: 10 Sep 2018 \| 18:55:35 UTC - in response to Message 50454.
	The QC jobs should use 4GB of RAM each. If you are swapping just don't run as many in parallel. You will never finish them anyway if you end up swapping. Yes, understood and confirmed however: These last two posts by me are provided simply as FYI hopefully for the benefit of the project and are a summary of my experiences crunching the QC WU's, not a complaint. Before these newer QC WU's, I was able to run two mt jobs 4 threads each (following fixing the simultaneous start bug) and acemd or E@H concurrently on my two FX machines without any memory or latency issues. With these newer jobs, I can only run a single WU 4 threads (WCG and acemd on the remaining cores). After about 12 hours or so of run time, the swap file begins to be utilized and of course latency (my intervention to use the computer) starts increasing up to seconds before responding. I have not found out what application(s) actually use the extra ram causing the swap to be invoked (probably not the QC app because they finish quickly, usually around 15 - 30 mins) but the swap usage gradually increases with time with 7% usage being highest observed to date. The swap does not appear to be utilized consistently but rather for short increments of time even when I am not using the machine. Is it possible that the QC app is returning most but not all memory it uses back to the memory pool as calc's are completed? I have 6 computers running QC with 4 being 4 core headless crunchers only and as long as they provide valid results I leave them alone but the two FX machines with consoles, the latancy issue leaves me with little choice but to consider stop running the QC apps on the FX's, or perhaps cut them down to 2 or 3 threads to see if that works. I will try the latter before I stop QC on the FX's but that is going increase turnaround time and undo the benefits of using the extra ram and multi-cores to speed turnaround time seemingly causing a catch 22 situation (2 concurrent WU's 4 threads each taking longer but returning 2 WU's/real time vs running a single WU 2 threads quicker but returning 1 WU/real time).
	ID: 50464 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50465 - Posted: 10 Sep 2018 \| 20:17:36 UTC - in response to Message 50464.
	Is it possible that the QC app is returning most but not all memory it uses back to the memory pool as calc's are completed? That is an interesting question, and could explain some of the random errors I have been getting. But QC is running OK now on an i7-3770, running four work units at a time with 2 cores per WU. I see memory usage up around 4 GB per work unit though, so it is fortunate I have 32 GB. That has not prevented the errors in the past on comparable machines, but I have found that when it works, don't touch it.
	ID: 50465 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50478 - Posted: 12 Sep 2018 \| 2:43:32 UTC - in response to Message 50465.
	That is an interesting question, and could explain some of the random errors I have been getting. But QC is running OK now on an i7-3770, running four work units at a time with 2 cores per WU. I see memory usage up around 4 GB per work unit though, so it is fortunate I have 32 GB. That has not prevented the errors in the past on comparable machines, but I have found that when it works, don't touch it. Same here, don't mess with a working situation. I should have gone full 32 GiB when I last upgraded RAM. Darn, I went 4 x 2 initially and later added 4 x 2 more so now I have to buy all 32 G (8 x 4) rather than just add 16 G more, to the tune of around 250 USD each FX box. My rule has been 2 G per thread/core but in this situation 4 G /thread appears minimum, in fact, all my ATX can take. While running 4 WU's 2 threads each, have you noticed any swap file usage with the 32 G RAM?
	ID: 50478 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50480 - Posted: 12 Sep 2018 \| 10:29:26 UTC - in response to Message 50478. Last modified: 12 Sep 2018 \| 10:33:36 UTC
	While running 4 WU's 2 threads each, have you noticed any swap file usage with the 32 G RAM? I have set swappiness to never use swap: sudo sysctl vm.swappiness=0 But I don't think I would notice it anyway, since it is a dedicated machine and I don't have a way to check it. But whenever I run "free", I always see plenty of free/available memory. Currently, it is 3 GB free, and 22 GB available, but it varies a lot. However, I haven't seen less than 18 GB available.
	ID: 50480 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50481 - Posted: 12 Sep 2018 \| 10:54:04 UTC - in response to Message 50478.
	I should have gone full 32 GiB when I last upgraded RAM. Darn, I went 4 x 2 initially and later added 4 x 2 more so now I have to buy all 32 G (8 x 4) rather than just add 16 G more, to the tune of around 250 USD each FX box. My rule has been 2 G per thread/core but in this situation 4 G /thread appears minimum, in fact, all my ATX can take. Just leave it at the default 4 cores per work unit. I need the extra memory only because I am using 2 cores per work unit, and then running 4 work units at a time.
	ID: 50481 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50482 - Posted: 12 Sep 2018 \| 18:28:50 UTC - in response to Message 50481.
	Just leave it at the default 4 cores per work unit. Experimenting, I went to 3 cores with only 1 QC job at a time and that helped just about eliminate the user latency issues. I can live with it now but as expected, the real times increased. Now, after a fresh boot, I have plenty of free memory but over time it starts pushing toward the limit. Would be interesting to find out if the QC app is faithfully returning all memory used back to the system after each of the calc's.
	ID: 50482 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50483 - Posted: 12 Sep 2018 \| 21:43:24 UTC - in response to Message 50482.
	Would be interesting to find out if the QC app is faithfully returning all memory used back to the system after each of the calc's. Even though I show 3 GB memory free, and 21 GB available at the moment, it still shows that 361 MB of the swap file is used (out of 2 GB total), even though I have swappiness set to 0. I don't know what that means, but the machine has not been rebooted for three days, and still has plenty of memory left. But it could be using it up. (It is getting hard to post again, due to all the browser timeouts. It is a wonder I am able to get work at all.)
	ID: 50483 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50484 - Posted: 12 Sep 2018 \| 22:03:29 UTC - in response to Message 50483. Last modified: 12 Sep 2018 \| 22:06:22 UTC
	(It is getting hard to post again, due to all the browser timeouts. It is a wonder I am able to get work at all.) I hear you re the website. I have been having the same issues for several days with the browser timeouts and having to reesend data just to complete a post. On the swap file issue on your machine, try swapoff -a as root or sudo. That for sure disables swapping. I use it to flush the swap. Use swapon -a to re-enable swap. Edit: I use a RPM disto so not sure if swapoff is available to a DEB linux.
	ID: 50484 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50485 - Posted: 12 Sep 2018 \| 22:13:33 UTC - in response to Message 50484.
	On the swap file issue on your machine, try swapoff -a as root or sudo. That for sure disables swapping. I use it to flush the swap. Use swapon -a to re-enable swap. Very good. I did sudo swapoff -a, and now swap shows as zero. We will see how it goes.
	ID: 50485 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50486 - Posted: 13 Sep 2018 \| 3:53:03 UTC Last modified: 13 Sep 2018 \| 3:54:16 UTC
	I have a GPU task and a CPU task running on my two cores Opteron 1210, 8 GB DDR2 RAM, GTX 750 Ti at 1202 MHz clock, 61 C. OS is SuSE Leap 42.3.Swap space is used at 37% of 2 GB. My HP Linux laptop, also running SuSE Leap 15.0 has a 7 GB swap space, not used. It is not running GPUGRID tasks because BOINC space is only 30 GB instead of the 760 GB of my main Linux host, a 2008 vintage SUN workstation, running 24/7 since January 2008. Hats off to SUN! Tullio
	ID: 50486 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50490 - Posted: 13 Sep 2018 \| 15:19:34 UTC - in response to Message 50485.
	Very good. I did sudo swapoff -a, and now swap shows as zero. We will see how it goes. Remember when you reboot, you will have to use the command again unless you write a script and set it up to auto run at boot. I have a GPU task and a CPU task running on my two cores Opteron 1210, 8 GB DDR2 RAM, GTX 750 Ti at 1202 MHz clock, 61 C. OS is SuSE Leap 42.3.Swap space is used at 37% of 2 GB. My HP Linux laptop, also running SuSE Leap 15.0 has a 7 GB swap space, not used. How is your user latency (I define as delay between user requests and computer response)? One of my FX computers has been running QC jobs about 20 hours since a fresh reboot and already swap usage is at 4% (333 of 8047 MiB) and I am only using 3 threads of 8 available. I open boincmgr and it is taking up to 10 seconds to communicate to localhost. I am really suspecting this newer QC app is leaving some small percentage of dirty ram pages behind after completing the calc's and they add up over time to swap usage and user latency.
	ID: 50490 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50491 - Posted: 13 Sep 2018 \| 16:03:35 UTC - in response to Message 50490. Last modified: 13 Sep 2018 \| 16:24:42 UTC
	I am using this computer to read my mail, to navigate the WWW, read the newspapers including the NYTimes which leaves me ten free articles/month and I feel no delay. I have a 30 Mbit/s mixed fiber/copper connection by Telecom Italy, which means the the fiber reaches a cabinet that is not far from my home, then I have a copper connection. My router is also connected by WiFi to a Windows 10 PC, a printer and a decoder which gives me SKY TV on the TV set which is also the monitor of the Windows PC. I just had a Microsoft update on the Windows 10 PC and it restarted with its two Einstein@home tasks. Tullio I have also a smartphone running Android 7.1.1 on its 8 cores 64ARM CPU, connected by WiFi to the router. It is running SETI@HOMe and Einstein@home CPU tsks. ____________
	ID: 50491 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50492 - Posted: 13 Sep 2018 \| 17:24:36 UTC - in response to Message 50491.
	Concerning returning the memory when the WU is over: that's sure. The OS enforces that. Concerning swapoff : I don't recommend that. A small amount of swap use is normal. If you remove this "safety valve", the only thing that will happen is that processes will just fail (often in confusing ways). My suggestion is that you pay attention to your system's performance during heavy boinc use. If it becomes sluggish/irritating/unusable, swap use is probably going up -- you'll have to run fewer tasks simultaneously. Removing the swap will make the system kill them. This said, QC tasks come in various sizes, so you may hit an "unfortunate" combination of large ones. T[/quote]
	ID: 50492 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50493 - Posted: 13 Sep 2018 \| 17:43:26 UTC
	Doesn't happen alot but i still am getting these occasionally that cause errors. </stderr_txt> <message> finish file present too long </message> ]]>
	ID: 50493 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50494 - Posted: 13 Sep 2018 \| 17:45:07 UTC - in response to Message 50493.
	General question to windows users: do you see "black windows" like a command prompt coming up when running QC apps?
	ID: 50494 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,659,123,724 RAC: 13,383,585 Level Scientific publications	Message 50500 - Posted: 13 Sep 2018 \| 21:32:53 UTC - in response to Message 50493.
	Doesn't happen alot but i still am getting these occasionally that cause errors. </stderr_txt> <message> finish file present too long </message> ]]> Those are my bane too. Nothing can be done about them. Normal feedback is to don't quit BOINC just as a task is finishing up. But it happens even when you never quit BOINC. It can happen for whatever reason that one task finishes up just as another project's task starts and BOINC takes too long to report the task. So the error.
	ID: 50500 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 50502 - Posted: 13 Sep 2018 \| 23:26:43 UTC Last modified: 13 Sep 2018 \| 23:31:24 UTC
	Toni asked: General question to windows users: do you see "black windows" like a command prompt coming up when running QC apps? Just started my first QC app on windows. About 1 minute after the task started, a "black window" flashed up on the display then immediately disappeared. About 15 minutes after the task started, there is a "python" app listed in windows task manager that I assume is the QC app. It is using all available threads on a 16 thread system. In BOINC Manager, the task shows that it should be using "4 CPU's". Let me know if you need more info.
	ID: 50502 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50505 - Posted: 14 Sep 2018 \| 10:42:17 UTC - in response to Message 50502. Last modified: 14 Sep 2018 \| 10:43:18 UTC
	Toni asked: General question to windows users: do you see "black windows" like a command prompt coming up when running QC apps? Just started my first QC app on windows. About 1 minute after the task started, a "black window" flashed up on the display then immediately disappeared. About 15 minutes after the task started, there is a "python" app listed in windows task manager that I assume is the QC app. It is using all available threads on a 16 thread system. In BOINC Manager, the task shows that it should be using "4 CPU's". Let me know if you need more info. Ok thanks. If the flashing is not annoying, I'd leave it as it is. Regarding threads - the python app is indeed QC. Are you running multiple WUs simultaneously, or just one WU occupied all 16 threads? Thanks a lot
	ID: 50505 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 50508 - Posted: 14 Sep 2018 \| 12:48:36 UTC
	Toni asked: Regarding threads - the python app is indeed QC. Are you running multiple WUs simultaneously, or just one WU occupied all 16 threads? I was running 1 task that took all threads. I will try to get another task and run it when one becomes available.
	ID: 50508 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 50588 - Posted: 21 Sep 2018 \| 15:49:54 UTC
	Just had 8 tasks error out with the following error code. SafetyError: The package for hdf5 located at /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/pkgs/hdf5-1.10.2-hba1933b_1 appears to be corrupted. The path 'lib/libhdf5.so.101.1.0' has a sha256 mismatch. reported sha256: d8628337423317dafe2d7f1f5029bfbb5cd22428fbf97e81678cc0db0e93c2c2 actual sha256: 6ef2e91ed97113943149adc62ad53c6d831db045c015176b13d3f7fd8f9e1c0f I tried a reset on the project but that computer is now locked out for the rest of the day. Anything else I need to do?
	ID: 50588 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51019 - Posted: 10 Dec 2018 \| 14:23:29 UTC
	Looks like we have made headway on these work units. Just over a thousand left. Will be interesting to see if they bring back the other set. ____________
	ID: 51019 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51020 - Posted: 10 Dec 2018 \| 16:05:34 UTC - in response to Message 51019.
	Will be interesting to see if they bring back the other set. I could use some longer ones, as long as the disc space problem is solved. (I have 170 GB free, so it should not be a problem, though that remains to be seen.)
	ID: 51020 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51021 - Posted: 11 Dec 2018 \| 9:20:21 UTC
	Sorry for running out. Give me some time. I'm at a conference so it's not super easy to spawn new work units right now but I'll give it a try.
	ID: 51021 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51022 - Posted: 11 Dec 2018 \| 11:36:17 UTC
	Actually I think I will let them run out to not resubmit the same workunits again. I will send new ones out tomorrow or the day after.
	ID: 51022 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51023 - Posted: 11 Dec 2018 \| 16:51:43 UTC - in response to Message 51022.
	No worries, it was more of a comment than a compliant. Enjoy your conference, we aren't going anyplace hahaha... ____________
	ID: 51023 \| Rating: 0 \| rate: / Reply Quote

kain Send message Joined: 3 Sep 14 Posts: 152 Credit: 826,175,036 RAC: 4,076,927 Level Scientific publications	Message 51024 - Posted: 11 Dec 2018 \| 18:08:20 UTC
	Are you give us any comment about results of this batch? Was it usefull?
	ID: 51024 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51040 - Posted: 14 Dec 2018 \| 14:00:23 UTC
	@kain The easy answer is "yes". This is a machine learning project so the more data you throw at it the better the predictor becomes. If you want to know if we solved the problem we set out to solve then the answer is "not totally yet" although we will publish something on it very soon.
	ID: 51040 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51170 - Posted: 1 Jan 2019 \| 10:51:03 UTC
	I'm going to let the workunits run out again to make sure I don't recalculate them. In a few days I'll send out new ones.
	ID: 51170 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51171 - Posted: 1 Jan 2019 \| 14:58:07 UTC - in response to Message 51170.
	Ok, Thanks Stefan. Happy New Years ____________
	ID: 51171 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51200 - Posted: 7 Jan 2019 \| 7:58:38 UTC
	I everything goes smooth I ought to have the last workunits out by tomorrow, or latest the day after.
	ID: 51200 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51203 - Posted: 7 Jan 2019 \| 9:57:51 UTC - in response to Message 51200.
	I everything goes smooth I ought to have the last workunits out by tomorrow, or latest the day after. No problem. We are ready when you are. But the term "the last" is ominous. Is it the end of the road for QC, or just for this experiment?
	ID: 51203 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51204 - Posted: 7 Jan 2019 \| 11:19:17 UTC - in response to Message 51203. Last modified: 7 Jan 2019 \| 11:19:26 UTC
	No, I think there will be more in the future. But we will have to consider well what to run since the large molecules were a bit of a pain. Some molecule fragmentation might solve the issue. But there might be some downtime on QM jobs, hard to tell.
	ID: 51204 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51209 - Posted: 7 Jan 2019 \| 17:51:58 UTC - in response to Message 51204.
	That is perfectly OK. It is much better for projects to tell us what is going on than to leave us hanging. You never know if they are dead or alive. At least you are alive.
	ID: 51209 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51214 - Posted: 8 Jan 2019 \| 12:55:02 UTC
	So, seems like it will take 2 more days or so since other people are running stuff on the machine that calculates the workunits. It's a bit of a massive amount of data so bear with it a few more days
	ID: 51214 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 51215 - Posted: 8 Jan 2019 \| 19:38:42 UTC - in response to Message 51214.
	which machine do you use for calculating the WU? Just in case some of us are willing to donate some money for hardware upgrades at your institute. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 51215 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51219 - Posted: 9 Jan 2019 \| 7:42:20 UTC - in response to Message 51215.
	Although I love the support, I feel like it's an exercise in futility. Eventually all machines we have become overrun by one or more of our lab members. No machine is safe :D Even if I had one dedicated to myself I might use it for something else while not calculating WUs (would be a shame to have it idle) and then when we run out of WUs I would need to deal with the same issue. I guess we'll just have to wait one or two more days. It's roughly 80% done. Worst case if it's still not done in two days I'll talk with the lab member currently squeezing the living juices out of it to stop his jobs for a moment if possible.
	ID: 51219 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51220 - Posted: 9 Jan 2019 \| 17:26:02 UTC
	Does QC still have a problem when multiple WUs start at the same time??? I loaded Rosetta jobs and watched them in System Monitor. They appeared to load one by one. The first CPU ramped up to 100% and then a minute or so later the second ramped up... I wonder if they have a solution they'd share. I'm thinking of a script that could replicate that behavior. QC WUs DL. The app_config.xml file could be iteratively modified: <app> <name>QC</name> <max_concurrent>1</max_concurrent> </app> Wait a minute and change 1 to 2 and tell BOINC to Read Config files. <app> <name>QC</name> <max_concurrent>2</max_concurrent> </app> Wait a minute and change 2 to 3 and tell BOINC to Read Config files. Repeat until max desired WUs is reached. Reset N to 1 in case of a reboot you restart one at a time. E.g., I have a Xeon E5-2699 v4 with 22c/44t. I could conceivably run ten 4-CPU WUs and leave 4 threads for the GPUs. When the next batch of QC WUs posts I'm going to babysit a computer and do this manually. I've never written a script so I'm reading book learning it now. If someone knows how to code this I'd be glad to test drive their script for them.
	ID: 51220 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 51228 - Posted: 9 Jan 2019 \| 22:32:55 UTC
	Aurum asked: Does QC still have a problem when multiple WUs start at the same time??? Short answer is No. This was fixed several weeks ago.
	ID: 51228 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51229 - Posted: 9 Jan 2019 \| 23:39:06 UTC
	Great. How much RAM do I need per QC WU??? Rosetta needs between 750 MB to 1 GB.
	ID: 51229 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51231 - Posted: 10 Jan 2019 \| 6:09:53 UTC
	Is there an L3 Cache requirement for QC WUs??? E.g. at WCG the MIP project require 4-5s MB per WU. It runs with less but speed drops fast if overloaded.
	ID: 51231 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51232 - Posted: 10 Jan 2019 \| 7:38:08 UTC
	Today I'll upload a new batch. I have no real clue on the technical requirements though. It will be try-and-see because these are some last large molecules I need to run. After these large ones I'll send some last small and fast ones again.
	ID: 51232 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51234 - Posted: 10 Jan 2019 \| 12:45:49 UTC Last modified: 10 Jan 2019 \| 12:53:48 UTC
	Wow! This is a major RAM hog. I just watched while 6 4C WUs consumed 16 GB and crashed a computer. Then another had 3 4C WUs running and it stopped one them as Suspended:Wating for memory, then a second one and third is still using 12 of 16 GB. Then they all 3 come back on and use all the memory. Then start failing for computation error. We may need to reserve 3 GB per C, i.e. run only one 4C WU per 16 GB RAM. So I'm adding this to my app_config: <app> <name>QC_beta</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>QC_beta</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--fetch_minimal_work</cmdline> </app_version>
	ID: 51234 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51235 - Posted: 10 Jan 2019 \| 13:23:19 UTC - in response to Message 51232. Last modified: 10 Jan 2019 \| 13:23:56 UTC
	I got two of them. One took 2.5 GB, and the other 3.5 GB. Send more. I need to use the 32 GB of memory on my i7-4770. And longer ones are OK; these took only 12.75 minutes. I have no idea about disk usage though.
	ID: 51235 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 51236 - Posted: 10 Jan 2019 \| 13:36:30 UTC - in response to Message 51235.
	Still beta WU for now. Disk usage may be large (and we can't tune it). Memory should be <4 GB per WU. Threads, up to 4, controlled by boinc. There will be more. Thanks!
	ID: 51236 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 51237 - Posted: 10 Jan 2019 \| 13:56:14 UTC
	I have some TONI WUs that have very low CPU usage and very low RAM usage.
	ID: 51237 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 51238 - Posted: 10 Jan 2019 \| 15:11:19 UTC Last modified: 10 Jan 2019 \| 15:15:01 UTC
	Another thing I noticed is my GPU usage, normally at 100% running GPUGrid is now 5-15% and the CPU is not anywhere near 100% usage. Update: I just suspended and resumed the GPU WU and now it's back at 100% usage
	ID: 51238 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51239 - Posted: 10 Jan 2019 \| 15:49:44 UTC - in response to Message 51232.
	Today I'll upload a new batch. I have no real clue on the technical requirements though. It will be try-and-see because these are some last large molecules I need to run. After these large ones I'll send some last small and fast ones again. No joy with my computers in getting any work. Since they are remote I can't access them to request new work units. Shouldn't be an issue with 1 machine as it has 64GB ram and 4 TB hard drive. The other has 32 GB but also a 4 TB HDD. Just waiting to see if they get any work units to crunch. ____________
	ID: 51239 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51242 - Posted: 10 Jan 2019 \| 16:20:57 UTC - in response to Message 51239.
	No joy with my computers in getting any work. Do you have "Run test applications?" selected in your preferences? I am getting them fairly regularly now.
	ID: 51242 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51243 - Posted: 10 Jan 2019 \| 16:49:12 UTC
	Can you set it so that each Host gets one beta before a single Host gets 87 ???
	ID: 51243 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51246 - Posted: 10 Jan 2019 \| 17:30:38 UTC - in response to Message 51242. Last modified: 10 Jan 2019 \| 17:31:01 UTC
	No joy with my computers in getting any work. Do you have "Run test applications?" selected in your preferences? I am getting them fairly regularly now. Had the beta but not the "Run test applications" selected. Thanks for that. Made the change, will see if that works. ____________
	ID: 51246 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51247 - Posted: 10 Jan 2019 \| 18:23:22 UTC Last modified: 10 Jan 2019 \| 18:25:44 UTC
	Each BOINC slot with a QC job is about 29 GB, so plan accordingly. I am running two now, which uses all eight cores and still leaves 29 GB free, so I think I am safe for now.
	ID: 51247 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 51249 - Posted: 10 Jan 2019 \| 18:40:24 UTC - in response to Message 51247.
	Current test WUs are limited to 30GB, but many are failing because the limit is hit. I will need to raise the limit (the production QC app is 60 GB).
	ID: 51249 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51251 - Posted: 10 Jan 2019 \| 19:09:13 UTC - in response to Message 51249.
	I will need to raise the limit (the production QC app is 60 GB). I will put in a 256 GB SSD, but I have a 500 GB sitting on the shelf too. Let us know what you need.
	ID: 51251 \| Rating: 0 \| rate: / Reply Quote

Ola Send message Joined: 8 Apr 18 Posts: 21 Credit: 1,309,700 RAC: 0 Level Scientific publications	Message 51252 - Posted: 10 Jan 2019 \| 19:14:02 UTC
	Two tasks have "restarted" themself and started run from th beginning when they achieved about 40%. Just before it my CPU has heated suddenly to over 80 Celsius degrees, fortunately only for a moment.
	ID: 51252 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51254 - Posted: 10 Jan 2019 \| 21:26:37 UTC Last modified: 10 Jan 2019 \| 21:36:53 UTC
	The default BOINC settings for disk memory are far too low for your experiment. The RAM requirements are all over the map. You're sending these jobs to Xeons with 24 to 44 logical CPUs that can only run one or two at a time. Why don't these WUs respect the client's resources??? E.g., if I have 16 GB RAM you'll start 5 or 6 WUs and consume all RAM and the computer freezes. If I have an app_config.xml set limiting QC or QC_beta to one WU (4C needs 16 GB RAM) then you want 60 GB of disk memory which is fine if I've set BOINC high enough. Did you think this through before you launched this mayhem??? You should be able to tell us what these requirements are before launching. 4 GB per CPU thread per WU = 16 GB RAM is 4 times higher than Rosetta and by far the highest I've seen. This is the first time in 5 hours I've been able to access this website.
	ID: 51254 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51255 - Posted: 10 Jan 2019 \| 21:34:43 UTC - in response to Message 51254. Last modified: 10 Jan 2019 \| 21:39:01 UTC
	The default BOINC settings for disk memory are far too low for your experiment. Good point. I don't remember what the defaults are, but I always increase them anyway. They should probably create a sticky on how to set up BOINC for this project. Also, a sample app_config.xml file on how to limit the number of work units operating at a time would be helpful. Some projects allow you to limit the number of work units downloaded at at time (and even the number of cores to be used per work unit) on their preferences page. That would be even easier. I have trouble connecting here too. It seems to be a problem for everyone outside of Spain, but especially in the U.S.
	ID: 51255 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 51256 - Posted: 10 Jan 2019 \| 21:39:44 UTC
	My windows machine is doing QC WUs, is this supposed to happen?
	ID: 51256 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 51258 - Posted: 10 Jan 2019 \| 22:47:08 UTC
	PappaLito said: My windows machine is doing QC WUs, is this supposed to happen? Could be, I also have 2 running on a Windows machine.
	ID: 51258 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51261 - Posted: 11 Jan 2019 \| 0:02:24 UTC
	So about half are failing the other half are finishing. They are using all of the threads on my CPUs. I had a max concurrent in my app_config but I think these are a different name so that limitation is being ignored. I should have tried a <project_max_concurrent> but didn't have time to play with it to see if it would restrict the number running. Guess that will have to wait until Saturday when I can re-access those machines. ____________
	ID: 51261 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51263 - Posted: 11 Jan 2019 \| 1:46:23 UTC - in response to Message 51261. Last modified: 11 Jan 2019 \| 2:23:54 UTC
	So about half are failing the other half are finishing. They are using all of the threads on my CPUs. Even with a limit on the number of work units running, some of them will fail anyway with a "196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED" error. They need more disk memory than they are allowed at the moment, as Toni mentioned below.
	ID: 51263 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 51272 - Posted: 11 Jan 2019 \| 7:35:38 UTC Last modified: 11 Jan 2019 \| 7:35:59 UTC
	Sorry to hijack again this thread. I might make a new one but since I made here the original post: the current WUs I sent to the QC app (not the beta) should run fine. They are the same molecules as before. I changed my mind about submitting the other monsters, I'll try to run them on our local cluster first.
	ID: 51272 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51273 - Posted: 11 Jan 2019 \| 9:48:32 UTC - in response to Message 51272.
	I changed my mind about submitting the other monsters, I'll try to run them on our local cluster first. Well I was just about to put in a 500 GB SSD, but I will wait until you give us the heads up.
	ID: 51273 \| Rating: 0 \| rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 158 Credit: 388,132 RAC: 0 Level Scientific publications	Message 51278 - Posted: 11 Jan 2019 \| 14:26:06 UTC
	After some times i restart my linux box and.... 226 mb of upload of a "Toni" quantum chemistry wu!! Ouch
	ID: 51278 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51279 - Posted: 11 Jan 2019 \| 14:31:00 UTC
	The QC per core RAM load has dropped. What's today's requirement???
	ID: 51279 \| Rating: 0 \| rate: / Reply Quote

STE\/E Send message Joined: 18 Sep 08 Posts: 368 Credit: 3,225,564,275 RAC: 50,539,994 Level Scientific publications	Message 51287 - Posted: 12 Jan 2019 \| 9:32:19 UTC Last modified: 12 Jan 2019 \| 9:38:32 UTC
	What are the requirements to receive Regular Non BETA QC Wu's ??? I have never received 1 even though I have been calling for them all along. I can get the BETA QC Wu's so one would think I should be able get the Regular QC Wu's. Running Windows 10 Pro with 64gb's Memory on a Intel i7 7770k CPU. When my Box calls for Wu's I get this: PBOYZTOYNP9873 13027 GPUGRID 1/12/2019 4:35:13 AM No tasks are available for Quantum Chemistry Server Status shows Quantum Chemistry 125,158 Unsent ... Thanks ____________ STE\/E
	ID: 51287 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51292 - Posted: 12 Jan 2019 \| 14:25:33 UTC
	STEVE, Have you gone to BOINCmgr/Options/Computing Preferences/Disk and Memory??? The combination of those 3 checkboxes under Disk must yield over 60 GB of free disk space. May be 60 GB per QC WU. In my case I was getting QCs but they seem to have stopped over night. No idea why. How many months will it take to clear 125,000 WUs???
	ID: 51292 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51293 - Posted: 12 Jan 2019 \| 14:44:26 UTC - in response to Message 51292. Last modified: 12 Jan 2019 \| 14:47:07 UTC
	The combination of those 3 checkboxes under Disk must yield over 60 GB of free disk space. May be 60 GB per QC WU. As I read Stefan's post, they have backed off from the large ones, and have gone back to the earlier ones. So 30 GB (per work unit) should do; that is what I have been getting. http://www.gpugrid.net/forum_thread.php?id=4785&nowrap=true#51272 It is entirely possible that they will go to the big ones (or even bigger) in the future, unless they can figure out how to break up the work into smaller pieces.
	ID: 51293 \| Rating: 0 \| rate: / Reply Quote

STE\/E Send message Joined: 18 Sep 08 Posts: 368 Credit: 3,225,564,275 RAC: 50,539,994 Level Scientific publications	Message 51294 - Posted: 12 Jan 2019 \| 15:00:59 UTC - in response to Message 51292. Last modified: 12 Jan 2019 \| 15:06:56 UTC
	STEVE, Have you gone to BOINCmgr/Options/Computing Preferences/Disk and Memory??? The combination of those 3 checkboxes under Disk must yield over 60 GB of free disk space. May be 60 GB per QC WU. In my case I was getting QCs but they seem to have stopped over night. No idea why. How many months will it take to clear 125,000 WUs??? I have a 1 Terabyte HD that I run Boinc on, there is 878gb of free space on it, there should be more than enough space to run the Wu's. Disk: use at most 750 GB Disk: leave free at least 1 GB Values smaller than 0.001 are ignored GB Disk: use at most 100 % of total Tasks checkpoint to disk at most every 60 seconds Swap space: use at most 95% of total Memory: when computer is in use, use at most 95% of total Memory: when computer is not in use, use at most 98% of total ____________ STE\/E
	ID: 51294 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 51296 - Posted: 12 Jan 2019 \| 15:52:39 UTC
	STEVE asked, What are the requirements to receive Regular Non BETA QC Wu's ??? As far as I know, regular non BETA QC WU's are still Linux only. I am getting plenty on my Linux box, none on a Windows box.
	ID: 51296 \| Rating: 0 \| rate: / Reply Quote

STE\/E Send message Joined: 18 Sep 08 Posts: 368 Credit: 3,225,564,275 RAC: 50,539,994 Level Scientific publications	Message 51297 - Posted: 12 Jan 2019 \| 16:05:06 UTC - in response to Message 51296.
	STEVE asked, What are the requirements to receive Regular Non BETA QC Wu's ??? As far as I know, regular non BETA QC WU's are still Linux only. I am getting plenty on my Linux box, none on a Windows box. Yeah that's what I was afraid of, guess they don't need the Wu's run very fast if it's Linux only. Why would they test the BETA Wu's on Windows Machines but not run the Regular Wu's on them, weird ... ____________ STE\/E
	ID: 51297 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,570,519,747 RAC: 18,202,645 Level Scientific publications	Message 51298 - Posted: 12 Jan 2019 \| 16:44:13 UTC
	STEVE, I prefer this line of thinking, general business users usually run Windows, graphics artist and publishers usually run Macs, and scientists usually run Linux. Almost all of the top 500 super-computers in the world run on Linux. I'm guessing that most of the scientific research applications natively run on Linux. It was easier for the GPUGRID scientist to get the QC CPU app running on Linux PC's so that was the initial release. They tried to get the QC CPU to run on Windows a few months ago using the Windows Subsystem for Linux (WSL) but ran into problems with inconsistency between Windows releases. This is their next attempt to get the QC app to run on Windows. Looks like this one might work when they get the parameters set up for the size of tasks involved.
	ID: 51298 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51301 - Posted: 12 Jan 2019 \| 22:10:05 UTC
	Well I tried to install a <project_max_concurrent> into the app_config but the beta work units just ignore that restriction. They were utilizing all 20 threads on the computer and starving the GPUs for resources. So I've stopped the beta for now on the machines. I have to rebuild a computer anyway this week so I think I will strip out the GPUs and leave the 16 thread CPU and install a 4 TB HDD. That should be enough I think for the beta work units and I will leave that machine only for those work units. ____________
	ID: 51301 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51302 - Posted: 12 Jan 2019 \| 22:45:39 UTC - in response to Message 51301.
	Well I tried to install a <project_max_concurrent> into the app_config but the beta work units just ignore that restriction. You could try: <app_config> <app> <name>QC_beta</name> <max_concurrent>4</max_concurrent> </app> </app_config> That has worked for me in the past.
	ID: 51302 \| Rating: 0 \| rate: / Reply Quote

[CSF] Aleksey Belkov Send message Joined: 26 Dec 13 Posts: 86 Credit: 1,271,845,786 RAC: 510,000 Level Scientific publications	Message 51303 - Posted: 12 Jan 2019 \| 23:07:37 UTC - in response to Message 51302.
	Jim1348 wrote: That has worked for me in the past. I tested yesterday Quantum Chemistry Beta for Windows and QC_beta app name works for limiting concurrent computing WUs.
	ID: 51303 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 220,113 Level Scientific publications	Message 51306 - Posted: 13 Jan 2019 \| 13:20:42 UTC
	Using <project_max_concurrent> is a problem since it counts all GG WUs. Also, they ran out of GPU WUs yesterday. The control of CPU & GPU projects should be completely separate. <app_config> <app> <name>acemdlong</name> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <fraction_done_exact>1</fraction_done_exact> </app> <app> <name>acemdshort</name> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <fraction_done_exact>1</fraction_done_exact> </app> <app> <name>QC</name> <max_concurrent>8</max_concurrent> <fraction_done_exact>1</fraction_done_exact> </app> <app_version> <app_name>QC</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> </app_version> <app> <name>QC_beta</name> <max_concurrent>1</max_concurrent> <fraction_done_exact>1</fraction_done_exact> </app> <app_version> <app_name>QC_beta</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> </app_version> </app_config>
	ID: 51306 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51309 - Posted: 13 Jan 2019 \| 14:32:39 UTC - in response to Message 51302.
	Well I tried to install a <project_max_concurrent> into the app_config but the beta work units just ignore that restriction. You could try: <app_config> <app> <name>QC_beta</name> <max_concurrent>4</max_concurrent> </app> </app_config> That has worked for me in the past. Thanks Jim, I'll give that a shot later this week when I get back to those computers. Since there doesn't appear to be any more beta at the moment. I"ll put that in the app_config for when and if those betas return. ____________
	ID: 51309 \| Rating: 0 \| rate: / Reply Quote

Ola Send message Joined: 8 Apr 18 Posts: 21 Credit: 1,309,700 RAC: 0 Level Scientific publications	Message 51312 - Posted: 13 Jan 2019 \| 22:16:34 UTC
	Two tasks have been frozen at 1,098% despite they ahievied it very quickly. I dont't think the app is good, yet...
	ID: 51312 \| Rating: 0 \| rate: / Reply Quote

STE\/E Send message Joined: 18 Sep 08 Posts: 368 Credit: 3,225,564,275 RAC: 50,539,994 Level Scientific publications	Message 51316 - Posted: 14 Jan 2019 \| 9:58:27 UTC - in response to Message 51312.
	Two tasks have been frozen at 1,098% despite they ahievied it very quickly. I dont't think the app is good, yet... None of the BETA Wu's freeze up on my Box but it's 50/50 whether they get a computation error or not, They all seem to run 30 Min's even if they get the computation error, some finish okay but show up as Invalid here at the site ... ____________ STE\/E
	ID: 51316 \| Rating: 0 \| rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 158 Credit: 388,132 RAC: 0 Level Scientific publications	Message 51324 - Posted: 15 Jan 2019 \| 15:40:08 UTC - in response to Message 51312.
	Two tasks have been frozen at 1,098% despite they ahievied it very quickly. I dont't think the app is good, yet... My "_TONI_" wus for linux stuck at 10% and time remaining continue to grow up
	ID: 51324 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51382 - Posted: 27 Jan 2019 \| 2:39:20 UTC
	Off to a rough start but think I got the kinks worked out. Got my 16 thread i7 on an Alphacool 360mm Radiator. 3 work units at a time with 4 threads, holding around 61-62C. Will see how it runs over the next 24 hours. ____________
	ID: 51382 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51464 - Posted: 11 Feb 2019 \| 16:26:57 UTC - in response to Message 51382.
	So this is the biggest one that I've crunched so far. ____________
	ID: 51464 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51481 - Posted: 13 Feb 2019 \| 16:28:51 UTC - in response to Message 51464. Last modified: 13 Feb 2019 \| 16:29:51 UTC
	Starting to see lot of big ones lately but still none of those Giant one that we had a few months ago. 20239542 15931887 13 Feb 2019 \| 4:52:27 UTC 13 Feb 2019 \| 10:28:09 UTC Completed and validated 5,123.56 19,627.41 1,235.32 Quantum Chemistry v3.31 (mt) ____________
	ID: 51481 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51482 - Posted: 13 Feb 2019 \| 17:08:41 UTC - in response to Message 51481.
	Starting to see lot of big ones lately but still none of those Giant one that we had a few months ago. It is not as much fun working on the small ones. Maybe when they get the Windows app developed, they will have enough users that they can offer "small" and "large" work units, as defined by disk requirements (or even main memory).
	ID: 51482 \| Rating: 0 \| rate: / Reply Quote

rbpeake Send message Joined: 30 Jul 08 Posts: 17 Credit: 80,343,188 RAC: 0 Level Scientific publications	Message 51483 - Posted: 13 Feb 2019 \| 20:40:10 UTC - in response to Message 51482.
	Starting to see lot of big ones lately but still none of those Giant one that we had a few months ago. It is not as much fun working on the small ones. Maybe when they get the Windows app developed, they will have enough users that they can offer "small" and "large" work units, as defined by disk requirements (or even main memory). Are they working on a Windows app? Thanks!
	ID: 51483 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51484 - Posted: 13 Feb 2019 \| 23:01:28 UTC - in response to Message 51483.
	Are they working on a Windows app? Very much so. I think they are making progress. http://www.gpugrid.net/forum_thread.php?id=4790
	ID: 51484 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 51489 - Posted: 14 Feb 2019 \| 15:49:18 UTC - in response to Message 51249.
	Current test WUs are limited to 30GB, but many are failing because the limit is hit. I will need to raise the limit (the production QC app is 60 GB). Wow, these things are going to be rough on SSD drive life. How large are the uploads and downloads?
	ID: 51489 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51492 - Posted: 14 Feb 2019 \| 17:05:40 UTC - in response to Message 51489.
	Wow, these things are going to be rough on SSD drive life. How large are the uploads and downloads? They have gone back to small ones. Running 11 of them on my i7-8770 (single core), my project folder is only 2.1 GB. It is no fun at all.
	ID: 51492 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51493 - Posted: 14 Feb 2019 \| 17:51:12 UTC - in response to Message 51489.
	Current test WUs are limited to 30GB, but many are failing because the limit is hit. I will need to raise the limit (the production QC app is 60 GB). Wow, these things are going to be rough on SSD drive life. How large are the uploads and downloads? You could do like I did and install HDD. Those are fairly reasonable in price.Still have OS on the SSD but have BOINC installed on the HDD. Or if it's a new system, just install a HDD. I think I got 4 TB HDD for like $110 USD ____________
	ID: 51493 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 2,083,841 Level Scientific publications	Message 51495 - Posted: 15 Feb 2019 \| 18:23:54 UTC
	I've had a few "error while computing" for QC tasks on linux recently. I had 4 on Feb 14th and 3 on Feb 13th but none today yet. <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 17:04:03 (7591): wrapper (7.7.26016): starting 17:04:03 (7591): wrapper (7.7.26016): starting 17:04:03 (7591): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda && /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p qmml3 --override-channels -c defaults -c gpugrid --file requirements.txt ") Python 3.6.5 :: Anaconda, Inc. # >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<< `$ /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p qmml3 --override-channels -c defaults -c gpugrid --file requirements.txt` environment variables: CIO_TEST=<not set> CONDA_ROOT=/var/lib/boinc-client/projects/www.gpugrid.net/miniconda PATH=/usr/bin:/bin REQUESTS_CA_BUNDLE=<not set> SSL_CERT_FILE=<not set> active environment : None user config file : /var/lib/boinc-client/.condarc populated config files : conda version : 4.5.4 conda-build version : not installed python version : 3.6.5.final.0 base environment : /var/lib/boinc-client/projects/www.gpugrid.net/miniconda (writable) channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/free/linux-64 https://repo.anaconda.com/pkgs/free/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch https://repo.anaconda.com/pkgs/pro/linux-64 https://repo.anaconda.com/pkgs/pro/noarch https://conda.anaconda.org/gpugrid/linux-64 https://conda.anaconda.org/gpugrid/noarch package cache : /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/pkgs /var/lib/boinc-client/.conda/pkgs envs directories : /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/envs /var/lib/boinc-client/.conda/envs platform : linux-64 user-agent : conda/4.5.4 requests/2.18.4 CPython/3.6.5 Linux/4.15.0-45-generic linuxmint/19.1 glibc/2.27 UID:GID : 122:129 netrc file : None offline mode : False V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V CondaHTTPError: HTTP 504 GATEWAY_TIMEOUT for url <https://conda.anaconda.org/gpugrid/linux-64/repodata.json> Elapsed: 00:59.680897 CF-RAY: 4a92d5abb8af46f2-EWR A remote server error occurred when trying to retrieve this URL. A 500-type error (e.g. 500, 501, 502, 503, etc.) indicates the server failed to fulfill a valid request. The problem may be spurious, and will resolve itself if you try your request again. If the problem persists, consider notifying the maintainer of the remote server. A reportable application error has occurred. Conda has prepared the above report. Upload successful. 17:06:11 (7591): /usr/bin/flock exited; CPU time 14.574405 17:06:11 (7591): app exit status: 0x1 17:06:11 (7591): called boinc_finish(195) </stderr_txt> ]]>
	ID: 51495 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51497 - Posted: 15 Feb 2019 \| 21:41:43 UTC - in response to Message 51495.
	Yeah I got a bunch of those as well. I think it's having an issue either downloading or connecting to get the required data. Doesn't happen often but when does, it usually results in numerous errors. ____________
	ID: 51497 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 51574 - Posted: 25 Feb 2019 \| 17:52:44 UTC
	I am running QC on a HP Laptop with E-450 CPU, 8 GB RAM, a 1 TB hard disk which has a SSD partition of 8 GB. No problem so far after two SSD disks on the same laptop failed one after the other running BOINC tasks. Tullio ____________
	ID: 51574 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 51575 - Posted: 25 Feb 2019 \| 18:34:20 UTC - in response to Message 51495.
	A remote server error occurred when trying to retrieve this URL. A 500-type error (e.g. 500, 501, 502, 503, etc.) indicates the server failed to fulfill a valid request. The problem may be spurious, and will resolve itself if you try your request again. If the problem persists, consider notifying the maintainer of the remote server. I get around a 1% to 2% error rate on those, so it happens rarely. But they usually complete on other machines, so it is not the work units but the communications somewhere.
	ID: 51575 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51698 - Posted: 15 Apr 2019 \| 21:01:22 UTC
	Upgrade my main CPU cruncher to 420 mm Rad with push pull and 4 TB HDD. Need to tweek it just a bit but seems stable at 5 work units with 4 threads apiece. Would include a pic but would take up a lot of space. ____________
	ID: 51698 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Multicore CPUs : New QC app

	About	Science	Volunteers	Performance	Forum	Join us	Donate