"This computer has finished a daily quota of 32 tasks"

Message boards : Multicore CPUs : "This computer has finished a daily quota of 32 tasks"

Author	Message
Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50601 - Posted: 26 Sep 2018 \| 8:36:15 UTC
	My i7-8700 is left with nothing to do. http://www.gpugrid.net/results.php?hostid=475515 I will put it on Folding.
	ID: 50601 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 50602 - Posted: 26 Sep 2018 \| 10:01:23 UTC
	My Ryzen 1700 is still busy with plenty of QC tasks… and there are many more in the queue. How can it be that your 8700 doesn't get any? This system is also a Linux based one, is it not? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 50602 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50603 - Posted: 26 Sep 2018 \| 12:57:56 UTC - in response to Message 50602. Last modified: 26 Sep 2018 \| 13:01:38 UTC
	Yes, that is the point, there are plenty of tasks available. It seems that they just place a limit on them. I think it is to guard against machines that produce a lot of errors, but mine doesn't. I think the limit should be increased. Note that my i7-8700 was running QC only, and ran through a lot of them per day. I have a Ryzen 1700 also, but run just four cores on QC (two work units running two cores each). That machine has no problem getting work, and I will let it run. But if they ever want to get their mountain of work done, they will have to let the high-productivity machines get them. The Androids won't do it.
	ID: 50603 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50604 - Posted: 26 Sep 2018 \| 13:46:08 UTC - in response to Message 50603.
	If somebody has an idea of where the daily quota setting limit is, I'd like to hear.
	ID: 50604 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50605 - Posted: 26 Sep 2018 \| 13:54:58 UTC - in response to Message 50604.
	As you probably know, there was some discussion of it earlier, though it does not tell you much. http://www.gpugrid.net/forum_thread.php?id=4823 And Richard Hasselgrove (as usual) has the best handle on it: http://www.gpugrid.net/forum_thread.php?id=4825
	ID: 50605 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50606 - Posted: 26 Sep 2018 \| 14:36:14 UTC - in response to Message 50605.
	Jim, Your last work unit reported at 07:13 with an error. I don't see any others after that. Is it possible that server put your machine in "time out" until you reported a new work unit that validates? I've not seen a limit yet on the work units for CPU. I'm running a i7 6950X and it been steadily busy since I got it running under Ubuntu. I usually get about 24 at a time which corresponds to my daily limit of 0.5 days + 0.1 extra Z ____________
	ID: 50606 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50607 - Posted: 26 Sep 2018 \| 14:45:17 UTC - in response to Message 50606. Last modified: 26 Sep 2018 \| 14:55:29 UTC
	Your last work unit reported at 07:13 with an error. I don't see any others after that. Is it possible that server put your machine in "time out" until you reported a new work unit that validates? That could be it, but I don't know. If so, they need to increase the limit, or machines will be idle too often. I don't know of any other project that shuts down the supply of work after only one error (which could happen for a variety of causes). EDIT: I keep a 0.1 + 0.5 day buffer on all my machines, which is the default. It seems to be the reverse of yours, but it should not matter much. Second EDIT: There are a couple of errors. They say: CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/pro/linux-64/repodata.json.bz2> Elapsed: - An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way. I think this must be due to the intermittent connections and timeouts I get with GPUGrid. There may be no cure for that, but at least they could increase whatever error limits they have.
	ID: 50607 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50609 - Posted: 26 Sep 2018 \| 16:08:43 UTC - in response to Message 50607.
	I increase the daily quota because new QC jobs are short. Failures and successes will cause the quota to go up and down for your host, as per BOINC heuristics. The CondaHTTPError is a connection error between your host and Conda cloud, not GPUGRID.
	ID: 50609 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50610 - Posted: 26 Sep 2018 \| 16:20:59 UTC - in response to Message 50609.
	OK, I will try it again later and see how it goes.
	ID: 50610 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 50611 - Posted: 27 Sep 2018 \| 1:04:51 UTC
	Found my r7 1700 system idling with a daily quota of 4 hit. Why would so many WUs fail?
	ID: 50611 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50612 - Posted: 27 Sep 2018 \| 2:03:50 UTC - in response to Message 50611.
	Found my r7 1700 system idling with a daily quota of 4 hit. Why would so many WUs fail? CondaHTTPError: HTTP 503 SERVICE UNAVAILABLE: BACK-END SERVER IS AT CAPACITY for url Been seeing that the last few errors I've had. Not sure what it means. ____________
	ID: 50612 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50613 - Posted: 27 Sep 2018 \| 2:06:48 UTC - in response to Message 50611.
	Found my r7 1700 system idling with a daily quota of 4 hit. Why would so many WUs fail? Good question. But it makes it difficult to devote an entire PC to it. You need to be running something else in case your quota is hit. I hope they can fix it.
	ID: 50613 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50614 - Posted: 27 Sep 2018 \| 12:57:56 UTC - in response to Message 50612.
	@Zalster: That conda was getting too many download requests from users so it refused to download the packages on your machine at that moment. Should work next time I assume.
	ID: 50614 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50615 - Posted: 27 Sep 2018 \| 14:33:57 UTC
	My system gets random CondaHTTPErrors as well. From a layman's perspective this seems to be a bottleneck. Are volunteers possibly risking being blacklisted by thrashing the Conda Cloud? Is there another way for the project to distribute/cache the necessary packages?
	ID: 50615 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50616 - Posted: 27 Sep 2018 \| 17:25:04 UTC - in response to Message 50614.
	@Zalster: That conda was getting too many download requests from users so it refused to download the packages on your machine at that moment. Should work next time I assume. Yes it did, but in the meantime 40 QC units "erred out" . Only thing that saved me from a "time out" is that I had more QC units in the cache that validated later and help me avoid being locked out. I agree, it does seem like a bottleneck. If and when the Windows QC goes mainstream, I would expect to see a huge up spike in these "errors" and lockouts. ____________
	ID: 50616 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50617 - Posted: 27 Sep 2018 \| 18:52:39 UTC - in response to Message 50616.
	Indeed the new short WUs probably contact the conda cloud too often. Even if there is no download, just checking for new versions (which I don't think we can really avoid) triggers the block. We may need to recreate the WUs as larger blocks.
	ID: 50617 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50618 - Posted: 27 Sep 2018 \| 20:04:13 UTC - in response to Message 50617. Last modified: 27 Sep 2018 \| 20:04:41 UTC
	Indeed the new short WUs probably contact the conda cloud too often. I was about to say the same thing, though on a different basis. My Ryzen 1700, running two work units (2 cores each) has no problem with the Conda server, but each work unit usually runs over 30 minutes. My i7-8700 was churning through them at 10 minutes (or less), and got the errors. I think we need to somehow back off, and larger work units make sense to me.
	ID: 50618 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50619 - Posted: 28 Sep 2018 \| 15:01:50 UTC
	I'll look into making the WUs larger next week. For the weekend I don't want to break stuff so it will keep on running as is, sorry.
	ID: 50619 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50620 - Posted: 28 Sep 2018 \| 15:02:22 UTC
	Can you give me an estimated runtime of these WUs to know how many of them to pack together?
	ID: 50620 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 50621 - Posted: 28 Sep 2018 \| 15:07:11 UTC - in response to Message 50620.
	Can you give me an estimated runtime of these WUs to know how many of them to pack together? Hello Stefan, Linked below is my r7 1700 system running at 3.9ghz with 2933mhz ram. You can see all of the run times. http://www.gpugrid.net/results.php?hostid=424454
	ID: 50621 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50623 - Posted: 28 Sep 2018 \| 15:59:03 UTC - in response to Message 50620. Last modified: 28 Sep 2018 \| 16:01:08 UTC
	Can you give me an estimated runtime of these WUs to know how many of them to pack together? Don't know if this link will work but here's a list of my CPU tasks http://www.gpugrid.net/results.php?userid=103037&offset=0&show_names=0&state=0&appid=30 edit.. I run 4 threads per work unit. Currently only 1 work unit per machine. 2 machines.
	ID: 50623 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50625 - Posted: 28 Sep 2018 \| 16:56:21 UTC - in response to Message 50620. Last modified: 28 Sep 2018 \| 16:57:17 UTC
	Here is my i7-8700 http://www.gpugrid.net/results.php?hostid=475515 They were often less than 10 minutes each, and I was running three at a time. You could pack 10 of them together insofar as I am concerned (or at least 4).
	ID: 50625 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50626 - Posted: 28 Sep 2018 \| 17:48:36 UTC
	Very interesting comparison of run times. Running intel myself, Ryzen seems to struggle. Tasks take approx 10mins now but I'd prefer to crunch tasks <60mins.
	ID: 50626 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50630 - Posted: 29 Sep 2018 \| 8:12:26 UTC
	Ok thanks for the reports! The problem is that the WU runtime scales quadratically to the number of electrons in the molecule so larger molecules will take longer. But I assume I can go at least 5x the current length for this batch.
	ID: 50630 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50632 - Posted: 29 Sep 2018 \| 21:46:37 UTC - in response to Message 50626.
	Running intel myself, Ryzen seems to struggle. My i7-8700 was running 4 cores per work unit, whereas my Ryzen 1700 was running only 2 cores per work unit. And the Ryzen has 16 virtual cores, while the i7-8700 has only 12, so you would expect more per core from the Intel. Still, I agree that Intel is a little faster, though not be a large amount. I would be comfortable using either or both.
	ID: 50632 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50634 - Posted: 1 Oct 2018 \| 16:08:00 UTC - in response to Message 50632.
	The following is largely anecdotal, but I've found that 4-core tasks are more efficient than to two 2-core tasks. After 1 hour, 4-cores had accumulated (slightly) more credit, which includes start up time for each task and so on. My CPU does not support Hyper Threading, but it might be worth a separate test, if you're looking for best efficiency. With more cores memory and disk through put seem especially relevant for QC.
	ID: 50634 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50635 - Posted: 1 Oct 2018 \| 16:15:29 UTC - in response to Message 50634.
	With more cores memory and disk through put seem especially relevant for QC. That could be, especially with the new work units. I think we all should test that if possible. Thanks.
	ID: 50635 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50652 - Posted: 8 Oct 2018 \| 2:37:12 UTC - in response to Message 50630.
	Ok thanks for the reports! The problem is that the WU runtime scales quadratically to the number of electrons in the molecule so larger molecules will take longer. But I assume I can go at least 5x the current length for this batch. So I just checked and see the CPU work units are running longer. Longest so far was 1800 seconds. Are these the new work units you were talking about. Still much shorter than a GPU task. No errors so far (looks around for wood to knock on) ____________
	ID: 50652 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50653 - Posted: 8 Oct 2018 \| 7:21:31 UTC Last modified: 8 Oct 2018 \| 7:24:14 UTC
	I am getting errors on QC tasks "Disk limit exceeded". They are all SELE6. Tullio
	ID: 50653 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50664 - Posted: 9 Oct 2018 \| 18:35:01 UTC - in response to Message 50653.
	I'm starting to see those too. Just had 4 of them error out on my machine.
	ID: 50664 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50665 - Posted: 10 Oct 2018 \| 5:44:34 UTC
	+1
	ID: 50665 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50668 - Posted: 10 Oct 2018 \| 11:56:12 UTC
	I am running SETI@home and Einstein@home on both Linux boxen and also on a Ulephone smartphone with Android 7.1.1, Atlas@home on my Windows 10 PC.Goodbye GPUGRID. Tullio
	ID: 50668 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50669 - Posted: 10 Oct 2018 \| 14:53:48 UTC - in response to Message 50653.
	The last three QC have all erred for me with "Disk usage limit exceeded" also. It is time to give it a rest until they can get it fixed, hopefully soon.
	ID: 50669 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50670 - Posted: 10 Oct 2018 \| 15:50:21 UTC - in response to Message 50669. Last modified: 10 Oct 2018 \| 15:50:41 UTC
	The last three QC have all erred for me with "Disk usage limit exceeded" also. It is time to give it a rest until they can get it fixed, hopefully soon. Yes it appears to be getting worse. Almost all are erring out now. I say almost all, a half dozen have finished where previously they erred on other's machines. ____________
	ID: 50670 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50671 - Posted: 10 Oct 2018 \| 16:16:08 UTC
	I am getting a lot of the "Disk usage limit exceeded" errors now. Was getting a few several days ago but now it is nearly all that error out. It is unclear whether the error message refers to disk capacity or frequency of disk write/reads are exceeded. It would be nice if the project folks would let us know why the error and if there is anything that we can do to reduce the probability of encountering these errors. Basically, there is no point in continuing to run these WU's since nearly all are erring and these SELE6 jobs are thrashing all my machines regardless of number of threads allowed. I finally figured out a way to shrink the linux disk cache from eating all my ram leaving only < 1% free but even leaving at least 4% ram free doesn't stop the thrashing. Maybe if I spring for 32 GB on the 8 core machines with currently 16GB each, the thrashing will be reduced but that won't help the disk errors. Might try on one machine to see out of curiosity.
	ID: 50671 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50672 - Posted: 10 Oct 2018 \| 18:43:37 UTC - in response to Message 50671.
	I finally figured out a way to shrink the linux disk cache from eating all my ram leaving only < 1% free but even leaving at least 4% ram free doesn't stop the thrashing. Maybe if I spring for 32 GB on the 8 core machines with currently 16GB each, the thrashing will be reduced but that won't help the disk errors. I have a large write cache on all my Ubuntu machines, basically to protect the SSDs from the high write rates of some projects (not QC). But out of 32 GB memory on my Ryzen 1700, I set aside about 8 GB for a write cache, with a 2 hour latency. That allows all the writes to go to the main memory. It also cuts down on the amount written to the SSD, if a given memory location is over-written before the 2 hour latency period has expired. Each time I check it, there are always several GB of memory free or at least available. So, along with about 180 GB free on my SSD, I should not be exceeding any disk limits. But I allow four work units to run at a time maximum (using an app_config.xml); if I cut it down to two at a time, that might work, though I expect that the real problem is something else.
	ID: 50672 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50673 - Posted: 10 Oct 2018 \| 18:45:20 UTC - in response to Message 50672. Last modified: 10 Oct 2018 \| 18:47:12 UTC
	Please delete. Each time I edit something, it posts a new message.
	ID: 50673 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50674 - Posted: 10 Oct 2018 \| 20:48:33 UTC
	Yeah, I just stopped accepting new QC work units until they figure out what the problem is. ____________
	ID: 50674 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50675 - Posted: 11 Oct 2018 \| 4:07:37 UTC
	172,128 QC ready to send, 48 users. No comment. Tullio
	ID: 50675 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,307,632,676 RAC: 29,649,566 Level Scientific publications	Message 50676 - Posted: 11 Oct 2018 \| 4:52:37 UTC - in response to Message 50675.
	172,128 QC ready to send, 48 users. No comment. Tullio this imbalance will not change as long as there is no Windows app for QC. Too bad that it's so difficult come up with one :-(
	ID: 50676 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,630,480,236 RAC: 18,470,125 Level Scientific publications	Message 50677 - Posted: 11 Oct 2018 \| 15:18:16 UTC
	And now this: Thu 11 Oct 2018 10:10:44 AM CDT \| GPUGRID \| Aborting task 123_35_37_39_42_da3ae375_n00001-SDOERR_SELE6-0-1-RND5707_4: exceeded disk limit: 59944.94MB > 57220.46MB Looks like the project admins need to make some adjustments.
	ID: 50677 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50678 - Posted: 11 Oct 2018 \| 16:06:54 UTC
	Yes, I have also reluctantly set preferences not to accept any more production QC tasks until the disk usage problem is identified and eliminated. I had 6 machines with 16 threads (1/2 of available threads) on the project but all these WU's are just thrashing my machines and producing errors after an hour of so wasted cpu time. I am configured to run QC beta should any fixes be attempted.
	ID: 50678 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50679 - Posted: 12 Oct 2018 \| 0:51:25 UTC
	Tried two more QC tasks, they all fail the same way. Complete silence from admins. Tullio
	ID: 50679 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50681 - Posted: 12 Oct 2018 \| 15:36:31 UTC - in response to Message 50679.
	Decided to give it a try again. Rough estimates are 1 valid for every 4 errors. Almost all validates were after another computer errored out but not due to size limits. So there are batches of work units out there that don't exceed the limit but failed for other reasons. However, on my computers, almost all of my errors were size related. So the original error for this thread still exist. ____________
	ID: 50681 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50682 - Posted: 12 Oct 2018 \| 17:26:14 UTC
	Going back to Aug24, my QC completion record shows 194 errors out of 454 QC WU's processed. This is about a 42.7% failure rate and a small random sampling of the error causes reveals almost all due to 'disk usage limit exceeded.' Would be nice to get an explanation what specifically this error means.
	ID: 50682 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50683 - Posted: 12 Oct 2018 \| 18:34:40 UTC
	I am running other 4 BOINC projects, both on Linux and Windows 10. Some use also GPUs, some don't but use Virtual Box, so I have a vast experience on all kind of errors. But all give me a feedback by admins or other volunteers with similar experiences. Here only silence. Tullio
	ID: 50683 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,649,689,831 RAC: 10,244,922 Level Scientific publications	Message 50685 - Posted: 12 Oct 2018 \| 20:57:10 UTC
	The error has occurred on some other projects where the disk size usage went past a limit set by the app. It wasn't a limit on the PC running the task.
	ID: 50685 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50688 - Posted: 13 Oct 2018 \| 18:46:17 UTC
	of the few validating, this is the biggest so far http://www.gpugrid.net/workunit.php?wuid=14584472 4943 seconds credit 1279.49 ____________
	ID: 50688 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50696 - Posted: 16 Oct 2018 \| 12:16:53 UTC
	Ok I will ask Toni if he can increase the WU disk space. But at some point we will just write your whole disk full as it seems...
	ID: 50696 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50698 - Posted: 16 Oct 2018 \| 12:50:38 UTC - in response to Message 50696.
	Ok I will ask Toni if he can increase the WU disk space. But at some point we will just write your whole disk full as it seems... Thanks Stefan... I increased my SSD to 500 GB so I'm good for now but It's always easier to clone the OS to a larger SSD than it is to a smaller SSD. If we start to hit a limit on the SSD then I guess I have an excuse to look at 1 T SSD hahaha... ____________
	ID: 50698 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50699 - Posted: 16 Oct 2018 \| 13:47:29 UTC
	We decided to cancel them for the moment. I might redesign it at a later point and send more sensible WUs out. Sorry for the trouble, I was a bit out these days working on finishing up a project.
	ID: 50699 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50700 - Posted: 16 Oct 2018 \| 13:56:49 UTC
	I am making some new ones now to send out maybe by tomorrow.
	ID: 50700 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50701 - Posted: 16 Oct 2018 \| 14:44:59 UTC - in response to Message 50700.
	I am making some new ones now to send out maybe by tomorrow. Very good. But please don't compromise the project for that. I usually have at least 180 GB free these days with the 256 GB SSDs. I know not everyone can do that, so you could try separating into small and large.
	ID: 50701 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50702 - Posted: 16 Oct 2018 \| 14:57:03 UTC
	It's okay because we run two separate QM projects. So I'll stop the SELE WUs and restart the QMML ones but with larger batch sizes now to avoid spamming the conda server and getting blocked.
	ID: 50702 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50703 - Posted: 17 Oct 2018 \| 8:19:44 UTC - in response to Message 50696.
	I received a notice from the server that QC tasks require 77 GB. But I have more than 700 GB available to BOINC on my two Linux boxen. No other BOINC project requires that much space. Tullio
	ID: 50703 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50704 - Posted: 17 Oct 2018 \| 8:47:22 UTC - in response to Message 50703.
	The new ones won't really require any disk space because they are smaller molecules. Although I don't know how BOINC does it, like if it reserves the maximum space per WU or it just cuts the WU if it exceeds the max space.
	ID: 50704 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,649,689,831 RAC: 10,244,922 Level Scientific publications	Message 50705 - Posted: 17 Oct 2018 \| 10:23:58 UTC - in response to Message 50703. Last modified: 17 Oct 2018 \| 10:27:21 UTC
	I received a notice from the server that QC tasks require 77 GB. But I have more than 700 GB available to BOINC on my two Linux boxen. No other BOINC project requires that much space. Tullio There is a disk size limit set by the server for tasks. The error is not that your own physical disk is out of space. The task filled its allotted amount of space. The new ones won't really require any disk space because they are smaller molecules. Although I don't know how BOINC does it, like if it reserves the maximum space per WU or it just cuts the WU if it exceeds the max space. The latter. Memory and disk usage will grow while crunching as the task requires until completion, it reaches the limit or reaches the BOINC Mgr disk limit percentage set in preferences.
	ID: 50705 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50706 - Posted: 17 Oct 2018 \| 11:49:39 UTC - in response to Message 50705.
	They work for me. I have downloaded 24 work units (a quota limit), and they take only 1 GB disk space in total. I am running three at a time (4 cores each), and they are taking only about 1 GB memory each. They run for about 40 to 60 minutes on my i7-8700, which is ideal.
	ID: 50706 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50708 - Posted: 17 Oct 2018 \| 12:44:50 UTC
	No problems on my computers. Seem to be running ok. ____________
	ID: 50708 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50709 - Posted: 17 Oct 2018 \| 14:10:25 UTC Last modified: 17 Oct 2018 \| 14:10:37 UTC
	Great! Since the last QMML ones were so short that they spammed the conda server I made these 5 times larger (so you calculate up to 50 conformation energies in each WU, in some cases less if I didn't have 50).
	ID: 50709 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50711 - Posted: 18 Oct 2018 \| 13:11:31 UTC - in response to Message 50709.
	It seems also they give a correct progress figure, not the usual 10%. Tullio
	ID: 50711 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50712 - Posted: 18 Oct 2018 \| 13:58:48 UTC - in response to Message 50711.
	It seems also they give a correct progress figure, not the usual 10%. Tullio Indeed progress is computed on the fraction of conformations computed.
	ID: 50712 \| Rating: 0 \| rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 50713 - Posted: 18 Oct 2018 \| 17:54:43 UTC
	Since I restarted QC, the QMML50 are running great. No memory or disk limits reached or exceeded. So far 32 in progress and 6 of 6 successful completions. They seem to do the 10% + 1.8% X 50 = 100%. Times so far are ranging a little less than 40 min to nearly two hours per WU with the longer completions on slower two thread at 2 GHz machines and the others running 4 threads at 4 GHz.
	ID: 50713 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50715 - Posted: 19 Oct 2018 \| 8:41:59 UTC Last modified: 19 Oct 2018 \| 8:43:08 UTC
	I mean, the larger problematic ones are obviously of more interest but I believe we should rethink the design of those large molecules and maybe try to break them down into their constituent components so that they don't take half a hard-drive to compute. It's not trivial but it seems like it's necessary if we want to keep running them on GPUGRID. Thank you all in any case for hanging on through all the troubles of the large WUs.
	ID: 50715 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50716 - Posted: 19 Oct 2018 \| 13:59:14 UTC - in response to Message 50715.
	Thank you all in any case for hanging on through all the troubles of the large WUs. As long as we know you are working on it, we will work on it. A few people could do the large ones; SSDs are cheap these days, though if you need 1000 crunchers on them, then I think you are right you will need to break them down.
	ID: 50716 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 50717 - Posted: 19 Oct 2018 \| 18:11:37 UTC - in response to Message 50715.
	I mean, the larger problematic ones are obviously of more interest but I believe we should rethink the design of those large molecules and maybe try to break them down into their constituent components In case the larger ones are of more interest for you.. and therefore more valuable for science... I would be interested in them even so and upgrade my machines accordingly. Maybe you could split QC into long and short runs like the GPU jobs? That would make it possible to choose. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 50717 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 50718 - Posted: 19 Oct 2018 \| 19:26:03 UTC - in response to Message 50717.
	Maybe you could split QC into long and short runs like the GPU jobs? That would make it possible to choose. +1
	ID: 50718 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50719 - Posted: 20 Oct 2018 \| 13:43:44 UTC - in response to Message 50715.
	I mean, the larger problematic ones are obviously of more interest but I believe we should rethink the design of those large molecules and maybe try to break them down into their constituent components so that they don't take half a hard-drive to compute. It's not trivial but it seems like it's necessary if we want to keep running them on GPUGRID. Thank you all in any case for hanging on through all the troubles of the large WUs. Yes I would also be interested in helping with the larger ones. How large of an SSDs are we talking about? 1, 2, 4 Terabytes? ____________
	ID: 50719 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50720 - Posted: 20 Oct 2018 \| 13:57:17 UTC - in response to Message 50719.
	Yes I would also be interested in helping with the larger ones. How large of an SSDs are we talking about? 1, 2, 4 Terabytes? I like people who think big.
	ID: 50720 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50722 - Posted: 20 Oct 2018 \| 16:14:51 UTC - in response to Message 50720.
	Well, not really SSD. HDD would be just fine. Right now scratch files are kept wherever BOINC's "slot" directory is (as expected from a well-behaved application).
	ID: 50722 \| Rating: 0 \| rate: / Reply Quote

3de64piB5uZAS6SUNt1GFDU9d... Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level Scientific publications	Message 50723 - Posted: 20 Oct 2018 \| 16:53:53 UTC - in response to Message 50720. Last modified: 20 Oct 2018 \| 16:54:07 UTC
	Yes I would also be interested in helping with the larger ones. How large of an SSDs are we talking about? 1, 2, 4 Terabytes? +1 Would a 2TB HDD be enough? ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
	ID: 50723 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50724 - Posted: 20 Oct 2018 \| 17:44:50 UTC - in response to Message 50722.
	Well, not really SSD. HDD would be just fine. Right now scratch files are kept wherever BOINC's "slot" directory is (as expected from a well-behaved application). HDD cool. See amazon has a sale on WD Red 4TB NAS Hard Drive. Time to add onto the machine. ____________
	ID: 50724 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50725 - Posted: 20 Oct 2018 \| 18:39:29 UTC - in response to Message 50723.
	Any SSD won't cut it. You'd need the expensive stuff with larger faster write buffers. Or use several HDDs (each on another connection, with several BOINC instances to spread the slots around). But really you should look into Optane, but it will cost a fortune.
	ID: 50725 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50726 - Posted: 20 Oct 2018 \| 18:58:23 UTC - in response to Message 50715.
	The argument that QMML has little value is disturbing to me. Either there is scientific justification or not. And please stick with a plan. All these abrupt changes make me question the research goal. I like to feel involved too, but the community shouldn't get to change course. Further I fear splitting the project will just further complicate things for everyone. And please always communicate upfront how your tasks will impact the volunteer's systems (with TBW, conda requests and so on). Volunteers should not have to find out themselves.
	ID: 50726 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50727 - Posted: 20 Oct 2018 \| 20:05:49 UTC - in response to Message 50725.
	Any SSD won't cut it. You'd need the expensive stuff with larger faster write buffers. If you are referring to lifetime, I run three QC work units at a time (4 cores per WU) on an i7-8700. According to iostat, they are writing about 50 GB/day. That is not excessive; the SSD should last a normal lifetime. As a matter of practice, I also use a write cache (12 GB size, 2 hour latency), though that is not really necessary to protect the SSD. But since I have 32 GB main memory, I like to use it.
	ID: 50727 \| Rating: 0 \| rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 3,630,480,236 RAC: 18,470,125 Level Scientific publications	Message 50728 - Posted: 20 Oct 2018 \| 22:34:51 UTC
	On Oct. 17, Stephan wrote: "The new ones won't really require any disk space because they are smaller molecules." BTW, the requirement for having 57,220.46 MB of disk space for the CPU tasks is still in effect. Does it need to be?
	ID: 50728 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50729 - Posted: 21 Oct 2018 \| 4:30:49 UTC
	We are up to 74 Linux users. But recent QC tasks run well on my two Linux boxen. The one with a GTX 750 Ti GPU board is also running a GPU task alongside a CPU task. All this on an Opteron 1210 of 2008 vintage. But this SUN workstation is still my main host. Tullio
	ID: 50729 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50731 - Posted: 21 Oct 2018 \| 18:15:32 UTC - in response to Message 50729. Last modified: 21 Oct 2018 \| 18:35:36 UTC
	The one with a GTX 750 Ti GPU board is also running a GPU task alongside a CPU task. I find that I can run a GTX 750 Ti without having to reserve a core for it on my i7-8700 without any noticeable affect on the QCs running on all 12 cores. It is the efficiency of CUDA, and the card shows only about 8 to 16% CPU core usage (or about 1% for the whole CPU). It also works well for the i7-4770 machine when I use that, with somewhat higher CPU percentages. But it makes a great combination. And the 750 Ti gets all the work done in under 24 hours, while not expending much power.
	ID: 50731 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50737 - Posted: 25 Oct 2018 \| 18:32:37 UTC
	So I have successfully installed and tested the new WD Red 4T NAS HDD into 2 of my computers. They are up are running again. So however the project decides to proceed with the larger molecules, I hope to be prepared. ____________
	ID: 50737 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,307,632,676 RAC: 29,649,566 Level Scientific publications	Message 50738 - Posted: 26 Oct 2018 \| 5:25:49 UTC - in response to Message 50737.
	the project decides to proceed with the larger molecules, I hope to be prepared. my hope is that some day, there will be a Windows version for the QC CPU tasks.
	ID: 50738 \| Rating: 0 \| rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 50739 - Posted: 26 Oct 2018 \| 9:41:10 UTC
	There will. It's a matter of priorities and time allocation. It's close to 90% completed but right now we have 0 time to dedicate to the windows build.
	ID: 50739 \| Rating: 0 \| rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50742 - Posted: 26 Oct 2018 \| 13:50:20 UTC - in response to Message 50739.
	The final 10% always takes 90% of the time. Tullio
	ID: 50742 \| Rating: 0 \| rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 50856 - Posted: 13 Nov 2018 \| 8:10:03 UTC
	Apologies.. Looks like I trashed around 23 CPU work units. Internet was down for most of the day and I didn't notice it until just now. Not sure if the internet being down caused the errors or if it was something with the computer but it appears to be running normal. Will losing internet after the work units download but before they start to crunch cause them to error if they can't contact the server when they start?? ____________
	ID: 50856 \| Rating: 0 \| rate: / Reply Quote

AuxRx Send message Joined: 3 Jul 18 Posts: 22 Credit: 2,758,801 RAC: 0 Level Scientific publications	Message 50857 - Posted: 13 Nov 2018 \| 17:52:01 UTC - in response to Message 50856.
	Long answer: Yes.
	ID: 50857 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Multicore CPUs : "This computer has finished a daily quota of 32 tasks"

	About	Science	Volunteers	Performance	Forum	Join us	Donate