Advanced search

Message boards : Number crunching : Unsent tasks decreasing much more slowly

Author Message
WPrion
Send message
Joined: 30 Apr 13
Posts: 77
Credit: 1,034,112,811
RAC: 742
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 54312 - Posted: 12 Apr 2020 | 13:04:35 UTC

I've noticed that the number of Unsent Tasks is decreasing at a much slower rate even though the number of tasks in progress is growing and the Current GigaFLOPS is approaching record levels.

Tasks in progress had decreased from 300,000 to 250,000 in a few weeks, but now it is taking several days to decrease by only 1,000.

What changed? Are additional new tasks being added or are the tasks being crunched now more difficult?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54314 - Posted: 12 Apr 2020 | 14:14:37 UTC - in response to Message 54312.

Toni prioritized some batches before, those have run out. That made the number of unsent task to decrease more rapidly.
Now it's back to the "normal" (almost 0) rate. It means that when these will run out, the decrease will be 100 times faster than the previous faster rate.

WPrion
Send message
Joined: 30 Apr 13
Posts: 77
Credit: 1,034,112,811
RAC: 742
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 54317 - Posted: 13 Apr 2020 | 11:39:18 UTC - in response to Message 54314.

Thanks!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54579 - Posted: 4 May 2020 | 19:27:18 UTC

On March 10th 2020 | 17:39:16 UTC Retvari Zoltan wrote at message #53884:

I'm receiving many tasks which are the last one of their batch:

1nkvA00_450_0-TONI_MDADpr4sn-9-10-RND4090_0

Or near the end of their batch:
1gaxA04_348_0-TONI_MDADpr4sg-8-10-RND1850_0

Total number of tasks in the batch
The sequential number of the given task within the batch (starting number is 0)

I expect the number of unsent tasks in the queue will drop significantly during the next days.
There are 305.826 unsent tasks as I wrote this.

At this time, the number of unsent tasks is 243.556, as can be seen at Server status page.
The last tasks I'm currently receiving are similar to: 3tekA00_320_3-TONI_MDADpr4st-8-10-RND9554_0
As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54604 - Posted: 7 May 2020 | 6:01:35 UTC

As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?)

All my received WUs today are this kind.
Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54607 - Posted: 7 May 2020 | 9:45:18 UTC - in response to Message 54604.
Last modified: 7 May 2020 | 9:56:02 UTC

As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?)
All my received WUs today are this kind.
Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.
I'm sure that the number of unsent tasks will drop drastically in the next few days.
The only question is the bottom of that drop. It depends on the priority of the tasks in the queue. If it's uniform, the number of unsent tasks will drop near 0, only the tasks stuck in slow or inactive hosts will remain in the queue (~1000 in this case). If there are lower priority tasks than the ones we receive now, then we will receive those soon. We will know if that's the case as they will have low sequence number (for example 3-10). In this case the number of unsent tasks will remain high. I guess there are no lower priority tasks, so the number of unsent tasks will drop near 0.
Number of unsent task is 237.790 at the moment. (-4.773 ~2% drop in 3h 45m)

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 947
Credit: 4,353,973
RAC: 58
Level
Ala
Scientific publications
watwatwatwat
Message 54608 - Posted: 7 May 2020 | 11:39:02 UTC - in response to Message 54607.

I prioritised tasks ending with _0: 1gaxA04_348_0 over the others (_1 to _4)

T

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54612 - Posted: 7 May 2020 | 18:21:56 UTC - in response to Message 54604.
Last modified: 7 May 2020 | 18:36:17 UTC

Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.
Current reading is 222 460 that is -20 103 (8.28%) drop in 12h 20m = 27.17 / minute
If this rate is constant, the present supply will last for 5 days 16 hours 28 minutes and 50.8 seconds. :)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54613 - Posted: 8 May 2020 | 6:19:07 UTC - in response to Message 54612.
Last modified: 8 May 2020 | 6:24:10 UTC

Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption.
Current reading is 222 460 that is -20 103 (8.28%) drop in 12h 20m = 27.17 / minute
If this rate is constant, the present supply will last for 5 days 16 hours 28 minutes and 50.8 seconds. :)
The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute
The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54618 - Posted: 8 May 2020 | 20:08:45 UTC - in response to Message 54613.

The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute
The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:)

-1) Mr. Zoltan: Thank you very much for making this funny.
I took screenshots that are confirming your data.





Reduction in unsent tasks: 41.926 in this about 24H lapse.

-2) Mr. Toni/GPUGrid's Team: Thank you very much for your continuous support.
This high decreasing rate has been greatly facilitated by exceptionally good communications since yesterday's morning.
Whatever you did in the transition from May 6th to 7th, it supposed a drastic change between extremely sluggish to very agile communications.
Please, take note of the recipy.

At he moment of writing this, scheduler is stopped.
I guess that this high rate in returning results has caused a new momentary buffer disk overflow...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54621 - Posted: 8 May 2020 | 20:26:58 UTC - in response to Message 54618.
Last modified: 8 May 2020 | 20:30:03 UTC

The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute
The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:)

At the moment of writing this, scheduler is stopped.
I guess that this high rate in returning results has caused a new momentary buffer disk overflow...

Note that the return rate was this high all along hence there are frequent disk buffer overflows. As new tasks created from the returned tasks the number of unsent workunits remain constant, so the return rate remain hidden from us, until the batches reach their final sequence number.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54622 - Posted: 8 May 2020 | 20:54:34 UTC - in response to Message 54621.

Note that the return rate was this high all along hence there are frequent disk buffer overflows. As new tasks created from the returned tasks the number of unsent workunits remain constant, so the return rate remain hidden from us, until the batches reach their final sequence number.

Yes, you're right, and I'm aware of it.
Lately frequent schduler stops most probably keep relationship with this Optimized bandwith anouncement, and significantly raised number of crunchers...
This combination has likely caused some bottleneck in project's resources.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 482
Credit: 554,467,553
RAC: 13,106
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54624 - Posted: 9 May 2020 | 3:59:19 UTC

It looks like the server status page needs something added - free disk space - at least for this disk areas that receive uploads.

That seems to be the current bottleneck in the project's resources.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54631 - Posted: 9 May 2020 | 10:53:25 UTC - in response to Message 54624.

One more conclusion that could be drawn:

- Taking Retvari Zoltan's current calculation: 29,1 average returned WUs per minute
- Taking some calculations coming from this previous outage: 6,367 MB average per returned WU
This results in 185,28 MB coming from finished WUs data returned to server per minute.
That is: 260,55 GB of data to manage per day, counting only returned WU's data. (About 1 TB every 4 days)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,525,793,267
RAC: 3,287,654
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54632 - Posted: 9 May 2020 | 12:31:04 UTC - in response to Message 54631.

What we don't know - at least, I certainly don't know, and I've not seen it described here, ever - is what exactly the processing path of that data is after our raw results are returned to the server.

We do know that each of our tasks forms part of a sequential sequence of (currently) 10 tasks making up the entire job, and that at least some of our returned data is used to assemble the starting data for the next task in the sequence.

Is it all used in that way? Once it's been used, does it need to be kept? If so, how long? Can it (any of it) be discarded once the next task in sequence has been created? Has been completed? Once the whole 10-task job has been completed?

People in other threads have mentioned SETI as a comparison. There, the process is that the scientific data returned by each task is assimilated into a gigantic, 20-year, scientific database. And that once assimilation has taken place, our raw, returned, data is erased (usually within 24 hours).

If we knew for certain that our returned data needed to be retained in quick-access online storage, say until the final paper had been accepted for publication following peer review, then I'd be prepared to contribute to a fundraising drive for additional disk spindles and a chassis to mount them in. But if the daily data is simply transferred over a slow link to an offsite backing store, then spindles aren't the answer: more drives would simply delay the need for an outage from a 5 day to a 10 day interval, and then extend that outage when it eventually arrived.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54640 - Posted: 10 May 2020 | 9:54:11 UTC
Last modified: 10 May 2020 | 9:54:52 UTC

Project's scheduler is just up again, with 174.874 tasks left ready to send!

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54641 - Posted: 10 May 2020 | 10:24:22 UTC

All my stacked WUs have been reported as finished, and all (but one 8-10) the new WUs I've received are of the kind 9-10.
So this topic is still on fire đŸ”ĨđŸ”ĨđŸ”Ĩ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54642 - Posted: 10 May 2020 | 11:44:52 UTC

I have a couple of ghost tasks, so I suppose that many other ghost tasks are waiting to pass their deadline, so some 8-10 tasks will be re-send to other hosts.
However the present supply (171,016) will last for about 4 days from now.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,681,602
RAC: 1,676,608
Level
Lys
Scientific publications
wat
Message 54650 - Posted: 11 May 2020 | 6:53:32 UTC

What is the ghost recovery procedure on this project?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54651 - Posted: 11 May 2020 | 8:36:40 UTC - in response to Message 54650.

Ghost tasks are on GPUGRID's server side.
After 5 days deadline is past, server will automatically clear ghost tasks on original host, and resend to another one.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54652 - Posted: 11 May 2020 | 9:41:47 UTC - in response to Message 54650.

What is the ghost recovery procedure on this project?
I've tried the way it works for SETI, but it didn't work here.
Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54653 - Posted: 11 May 2020 | 9:46:56 UTC - in response to Message 54642.

the present supply (171,016) will last for about 4 days from now.
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54654 - Posted: 11 May 2020 | 9:52:57 UTC - in response to Message 54651.
Last modified: 11 May 2020 | 9:54:26 UTC

Ghost tasks are on GPUGRID's server side.
After 5 days deadline is past, server will automatically clear ghost tasks on original host, and resend to another one.

[Clarification]

We call "Ghost task" to that the server counts as sent to a Host, but for any reason, it was not really received.
It doesn't interfere at the host side, as BOINC Manager will not see these ghost tasks, and it will continue asking for new tasks until tasks buffer is full, or maximum "2 tasks per GPU" is achieved.
On the server's side, ghost tasks are wrongly being counted as "In process" tasks, while really they are not.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54656 - Posted: 11 May 2020 | 9:59:54 UTC - in response to Message 54653.

The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

What is coming next, is a mystery...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,681,602
RAC: 1,676,608
Level
Lys
Scientific publications
wat
Message 54660 - Posted: 11 May 2020 | 16:13:33 UTC - in response to Message 54652.

What is the ghost recovery procedure on this project?
I've tried the way it works for SETI, but it didn't work here.
Luckily GPUGrid has a much shorter deadline than SETI, so it's not a big problem.

Thanks Zoltan, I tried my Seti ghost recovery protocol and it didn't work either.
I managed to pick up 10 ghosts and wanted to clear them.
Good thing the deadline here is so short compared to Seti.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 94
Credit: 131,197,652
RAC: 877,533
Level
Cys
Scientific publications
wat
Message 54662 - Posted: 11 May 2020 | 17:20:09 UTC

These ghost tasks seem to occur after the server runs out of disk space. Are they somehow related to that? 🤔
_____________________________

An unrelated item: Anybody else getting this error?

(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
01:29:38 (6776): wrapper (7.9.26016): starting
01:29:38 (6776): wrapper: running acemd3.exe (--boinc input --device 0)
EXCEPTIONAL CONDITION: src\mdio\bincoord.c, line 193: "nelems != 1"
01:29:40 (6776): acemd3.exe exited; CPU time 0.015625
01:29:40 (6776): app exit status:


It apparently signals that the WU is bad- when you track them. After getting 6 of them I'm curious what the bug might be. Bad code?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 929,678,608
RAC: 6,427,313
Level
Glu
Scientific publications
wat
Message 54663 - Posted: 11 May 2020 | 17:21:14 UTC - in response to Message 54662.

yes, I saw a bunch of bad WUs. checking the resends, they are all erroring out also on different hosts.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,681,602
RAC: 1,676,608
Level
Lys
Scientific publications
wat
Message 54664 - Posted: 11 May 2020 | 18:50:58 UTC

Looks like a lot of tasks lost their file references on the storage. Can't pull the correct data for the tasks.

<core_client_version>7.17.0</core_client_version>
<![CDATA[
<message>
ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/trajectory.cpp line 135: Simulation box has to be rectangular!
07:01:16 (1119448): acemd3 exited; CPU time 0.557061
07:01:16 (1119448): app exit status: 0x9e
07:01:16 (1119448): called boinc_finish(195)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,525,793,267
RAC: 3,287,654
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54665 - Posted: 11 May 2020 | 19:09:47 UTC - in response to Message 54664.

I'm interpreting that message as "file is present, but contains bad contents".

On another aspect of the 'error task' problem. I'm using a very ancient predecessor of BoincTasks. It (and I think BoincTasks itself), retains the concept of "CPU efficiency", which was withdrawn from BOINC Manager several years ago.

What I'm seeing for Windows tasks is that the ACEMD worker app crashes seconds after launch, but the Wrapper app doesn't notice for some time - the task as a whole is seen by BOINC as continuing to run. This shows up as a CPU efficiency of 0.0000 (helpfully colour coded) - no CPU time is being measured for the task as a whole, instead of the usual 96% - 97%.

That low efficiency warning prompts me to look at the workunit on the website, and see if there are any previous failures (the replication number is a good hint, as well). If it's a bad workunit, I can abort and move on with less wasted time overall.

It's a technique which some users might find helpful.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 252
Credit: 9,791,563,847
RAC: 3,936,285
Level
Tyr
Scientific publications
wat
Message 54666 - Posted: 11 May 2020 | 19:19:39 UTC - in response to Message 54656.

The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)

What is coming next, is a mystery...

And what we're finishing now is a complete and utter mystery as well.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 94
Credit: 131,197,652
RAC: 877,533
Level
Cys
Scientific publications
wat
Message 54674 - Posted: 12 May 2020 | 18:32:49 UTC - in response to Message 54666.
Last modified: 12 May 2020 | 18:34:20 UTC

And what we're finishing now is a complete and utter mystery as well


I've only been able to glean that it is a vigorous attempt at mapping the simulation environment which is meant to improve (or simplify?) future modeling methods.

If one of the admins would want to comment, we're all ears...
👂👂👂👂👂đŸĻģ👂😉

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54675 - Posted: 12 May 2020 | 19:05:46 UTC

New version of ACEMD: 73,631 Unsent tasks left

âŗī¸

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54686 - Posted: 14 May 2020 | 9:49:10 UTC - in response to Message 54653.

the present supply (171,016) will last for about 4 days from now.
The rate of decline seems to stabilize around 30/minute, so the supply will last for about 3 days from now.
(exactly 2 days 22 hours 39 minutes and 42.9 seconds)
3 days passed, there are 11.806 workunits left, this supply will last for another 6~7 hours.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 252
Credit: 9,791,563,847
RAC: 3,936,285
Level
Tyr
Scientific publications
wat
Message 54690 - Posted: 14 May 2020 | 18:28:49 UTC

They're all gone, so what now?

Ben
Send message
Joined: 28 Dec 14
Posts: 7
Credit: 100,987,550
RAC: 465,142
Level
Cys
Scientific publications
watwat
Message 54691 - Posted: 14 May 2020 | 18:47:42 UTC - in response to Message 54690.
Last modified: 14 May 2020 | 18:50:50 UTC

Our poor GPUs start getting hangry!! :)

And I was pushing so hard for the magic 100m milestone. :(

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,811,663
RAC: 915,511
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54693 - Posted: 14 May 2020 | 20:07:15 UTC - in response to Message 54691.

They're all gone, so what now?

I liked this expresion:

...is a complete and utter mystery...

Familiar?
(I took note for such a moment like this)

Now that unsent tasks have reached and stuck on zero, the topic of this thread recovers full sense: Unsent tasks decreasing much more slowly
(Unless negative values are permitted, who knows?)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2185
Credit: 15,824,047,857
RAC: 697,719
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54696 - Posted: 14 May 2020 | 21:19:32 UTC - in response to Message 54690.
Last modified: 14 May 2020 | 21:23:00 UTC

They're all gone, so what now?
It will take at least 5-10 days (or more) until all the workunits out in the field are finished (or timed out, and finished on another host).
I don't expect that another batch will be queued until then.
Exam period is coming, then the summer break is coming, so perhaps there won't be much work queued soon.
Unless Toni prepared some COVID-19 related work. Or perhaps we could help out the Acellera drug design people doing their job.

Erich56
Send message
Joined: 1 Jan 15
Posts: 697
Credit: 3,294,399,981
RAC: 381,374
Level
Arg
Scientific publications
watwatwatwatwatwat
Message 54698 - Posted: 15 May 2020 | 5:38:07 UTC

the difference between the tasks of the current series in contrast to all the others before is:
whereas, before, tasks still could be downloaded once in a while, as long as there were enough tasks "in process", here this seems not to be the case.
Once the "unsent" queue is dry, no more tasks can be downloaded.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,681,602
RAC: 1,676,608
Level
Lys
Scientific publications
wat
Message 54699 - Posted: 15 May 2020 | 6:56:25 UTC

I picked up 4 resends after the RTS buffer had hit zero today.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,525,793,267
RAC: 3,287,654
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54700 - Posted: 15 May 2020 | 7:16:57 UTC - in response to Message 54699.

I picked up 4 resends after the RTS buffer had hit zero today.

Were they from the 'instant crashing' batch? I've had a few of those recently, though I haven't checked to see if I got any while I was asleep.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 947
Credit: 4,353,973
RAC: 58
Level
Ala
Scientific publications
watwatwatwat
Message 54702 - Posted: 15 May 2020 | 7:49:18 UTC - in response to Message 54700.

The large batch has essentially finished. If there are MDAD left, they are probably failing leftovers.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,681,602
RAC: 1,676,608
Level
Lys
Scientific publications
wat
Message 54707 - Posted: 15 May 2020 | 15:42:33 UTC - in response to Message 54700.

No,3 in fact were original issue -1's from yesterday. Must have been the very last issued. One was a -2 resend from an aborted user. None were from the badly formatted task run. I got lucky.

I was just surprised to see the cache increase after I had seen the RTS count on the SSP go to zero.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,525,793,267
RAC: 3,287,654
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54709 - Posted: 15 May 2020 | 18:12:52 UTC - in response to Message 54707.

At this project, only the _0 are original issue. _1 is already a replacement, unlike projects which use comparison validation.

But I'm glad you got some meat off the bones.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,681,602
RAC: 1,676,608
Level
Lys
Scientific publications
wat
Message 54710 - Posted: 15 May 2020 | 19:32:20 UTC

Thanks for correcting me Richard. I forgot about the workunits on this project with quorum of 1.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 94
Credit: 131,197,652
RAC: 877,533
Level
Cys
Scientific publications
wat
Message 54713 - Posted: 16 May 2020 | 2:57:26 UTC

I figure what we see trickle out for a while will be timed-out tasks that are recycled by Grosso. I'm curious as to how many tasks expire on how many hosts by the end of a run the size of this one.

Friends, don't dismiss the amount of raw computing power that the project we all have just accomplished represents! Even if Grosso isn't always as awesome as its nickname suggests, the DC network it supports is truly awesome in its potential power. That power finds its source in every one of us and our individual support.

And beside that, where else can we accrue BOINC cobblestones this fast? 😉

Hey, look at this little hiatus as "recess", where you get to go find out what the kids in the other classes are crunching. You might get a little case of hardware lust.

Just remember to practice virtual social distancing! đŸ’ģ🌎đŸ–Ĩ🌏đŸ–Ĩ🌍đŸ’ģ😎

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,525,793,267
RAC: 3,287,654
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54714 - Posted: 16 May 2020 | 8:21:36 UTC

Looks like I'm participating in that trickle-down, too. Somebody let WU 19993861 slip past it's deadline, so they tossed it back for me.

oemuser
Send message
Joined: 18 Sep 16
Posts: 8
Credit: 1,291,979
RAC: 0
Level
Ala
Scientific publications
wat
Message 54716 - Posted: 16 May 2020 | 10:17:46 UTC

folding@home has many GPU work units now against Corona Virus. So that would be a good option.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 482
Credit: 554,467,553
RAC: 13,106
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54718 - Posted: 16 May 2020 | 14:19:30 UTC - in response to Message 54716.

folding@home has many GPU work units now against Corona Virus. So that would be a good option.

I signed up for folding@home at least a week ago, and then enabled GPU work for them. No GPU work downloaded so far, only CPU work.

Also, so far, I've been unable to log into their forums.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 733
Credit: 1,478,749,566
RAC: 117,028
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 54719 - Posted: 16 May 2020 | 15:23:38 UTC - in response to Message 54718.
Last modified: 16 May 2020 | 15:29:40 UTC

I signed up for folding@home at least a week ago, and then enabled GPU work for them. No GPU work downloaded so far, only CPU work.

Also, so far, I've been unable to log into their forums.

Things seem to be a bit strange on Folding at the moment. I can't get to the forums either, but I have been getting work regularly (both CPU and GPU) for a couple of weeks. But on some cards I don't get any work. It is not a difference in the cards, but some of their servers have more problems than others, due to the recent growing pains. If you try later, you will probably get some.

And make sure you are using their latest release. They have fixed a few bugs recently that could hang up getting work.
https://foldingathome.org/start-folding/

EDIT: I think this explains it.
https://foldingathome.org/2020/05/16/foldingforum-org-is-currently-out-of-service/

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 482
Credit: 554,467,553
RAC: 13,106
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54720 - Posted: 16 May 2020 | 19:09:27 UTC - in response to Message 54719.

OK, that site offers a more recent version.

I hope that updating it will not disturb work in progress - there doesn't seem to be a way to tell the previous version to finish any work in progress, but not start more.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 94
Credit: 131,197,652
RAC: 877,533
Level
Cys
Scientific publications
wat
Message 54721 - Posted: 17 May 2020 | 1:29:02 UTC - in response to Message 54720.

Robert, I had to adjust the slider to full power to get my GPUs to engage. It will take a while to catch some available work the first time as I recall. Once you have GPU tasks you can run at any speed.
The app is great alongside of BOINC from my recent experience. I can multi-task my GPUs that way. Just remember twice the tasks means half the speed for each task.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 482
Credit: 554,467,553
RAC: 13,106
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54724 - Posted: 17 May 2020 | 2:39:14 UTC

Pop Piasa,

Thanks. Than started my first Folding@Home GPU task.

Post to thread

Message boards : Number crunching : Unsent tasks decreasing much more slowly