Advanced search

Message boards : Number crunching : Report Results Immediately doesn't always works

Author Message
Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 411
Credit: 6,058,255,976
RAC: 533,904
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35774 - Posted: 20 Mar 2014 | 22:56:06 UTC

The reports results immediately doesn't always works. Here is an example, if you have multiple video cards on your computer, and you are uploading two work units simultaneously. One finishes first, and is ready to report, while the other is still uploading, it will not report automatically until the other one finishes uploading.

7958785 5303530 20 Mar 2014 | 5:09:50 UTC 20 Mar 2014 | 13:43:41 UTC Completed and validated 25,857.08 11,965.98 77,400.00 Long runs (8-12 hours on fastest card) v8.15 (cuda60)
7958756 5303512 20 Mar 2014 | 4:41:12 UTC 20 Mar 2014 | 13:43:41 UTC Completed and validated 25,718.89 11,772.80 77,400.00 Long runs (8-12 hours on fastest card) v8.15 (cuda55)


They will report together, when both finish uploading. Off course, you can always click on "update". I would guess it would be the same case, if you have more than 2 work units uploading.

I know, this is really a minor thing, and I don't know whether anybody else noticed this, but it seems to be an error in the logical, and a rather curious little quirk.



Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1391
Credit: 3,479,463,183
RAC: 204,625
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35783 - Posted: 21 Mar 2014 | 1:21:18 UTC - in response to Message 35774.

There are deliberate provisions in recent BOINC clients to delay scheduler RPCs - whether to report completed work, or request new work - while uploads are active.

These are primarily aimed at projects with high volumes of short tasks: in computational terms, opening a database connection on the server is relatively expensive, and it is efficient to 'batch together' multiple database transactions over the same connection. If database actions are inhibited during uploads, it is more likely that reporting the recently completed task can be batched together with requesting a replacement - especially at projects like this one, which have a low limit of 'tasks in progress'.

I can see that can be a problem if it frequently happens that two tasks complete in quick succession, given the large upload file sizes here and if you have a slow internet connection. If you have a particular problem with the behaviour which that explanation doesn't cover, I can report it back to the developers.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 411
Credit: 6,058,255,976
RAC: 533,904
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35793 - Posted: 21 Mar 2014 | 22:40:28 UTC - in response to Message 35783.

I don't have any particular problems with this. It is just an oddity, and I thought I would just point it out. Is this something worth investigating or an insignificant anomaly? I can't say.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35798 - Posted: 21 Mar 2014 | 23:56:50 UTC - in response to Message 35793.
Last modified: 21 Mar 2014 | 23:59:43 UTC

Yet more settings being enforced upon all projects clandestinely!

Boinc needs to be hardware, project and app specific, not generic. Boinc should be controlled by the cruncher and the researchers, rather than Boinc high command at the bequest of one or a few mal-projects.

Just because one or two projects release 60sec tasks doesn't mean all projects should be prevented from reporting results when they finish.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

mikey
Send message
Joined: 2 Jan 09
Posts: 286
Credit: 567,888,276
RAC: 53,894
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35806 - Posted: 22 Mar 2014 | 11:33:12 UTC - in response to Message 35798.

Yet more settings being enforced upon all projects clandestinely!

Boinc needs to be hardware, project and app specific, not generic. Boinc should be controlled by the cruncher and the researchers, rather than Boinc high command at the bequest of one or a few mal-projects.

Just because one or two projects release 60sec tasks doesn't mean all projects should be prevented from reporting results when they finish.


Perhaps they could be persuaded to ONLY connect to report units every 60 sec IF there is one to report, not just hammer on the door regardless. That way the unit finishes and then no more then 60 sec later it gets reported. That way all the different projects could still use their multi length units and not be hammered by all the pointless connections.

Jeremy Zimmerman
Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 35810 - Posted: 22 Mar 2014 | 12:36:04 UTC - in response to Message 35798.

Skgiven,

From the recommended config file settings, I had never implemented the report results right away. Every time I see a GPUGrid task finish, it immediately starts uploading. The other tasks like Einstein, Seti, LHC, and Rosetta will cue up and then report. This is still true on the 7.2.42 BOINC (XP32 and Win7 64). I never implemented because I did not see a need. My GTX460 1gb cards which take ~17 hours still end up with the credit bonus all the time (except the occasional >1gb task) with the low 0.01 day work queue.

While I understand a broken setting is a broken setting, I was curious if your machines would not report right away on GPUGrid and/or I have just been lucky with some random combination of settings that this just works for GPUGrid.

Also, I wonder if it is a project/WU setting that can determine how this functions. I think it was said in the past the GPUGrid is very speed / WU turnaround dependent.

Regards,
Jeremy

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35817 - Posted: 23 Mar 2014 | 11:18:50 UTC - in response to Message 35810.
Last modified: 23 Mar 2014 | 11:40:35 UTC

The root of this is that a projects credit database is separate from the WU database. So files have to first upload to the WU database and then credit is reported to a second database. For some projects this secondary reporting to a second database is an issue (it uses up connection resources and bandwidth).

At GPUGrid reporting is not normally a resource problem (thought it might be if a large batch of files all failed after 2sec), so work should normally be reported immediately, as requested by the project.

The cc_config setting is no longer needed. The setting is now controlled by the project (has been for a year or so). However IF the way Boinc behaves overrides or interferes with Project control then we are in trouble.

If you have one GPU, this should not be a problem, but if you have 2 or more GPU's it could be anything from an occasional oddity to a real nuisance to you and the project. Tasks don't always update smoothly; they can start and stop several times, with delays before retrying. If you have 4 similar GPU's in the one system and start work at the same time, same type tasks will finish at around the same time. If one task takes 10min to upload, then it will be 40min to update all 4 tasks, all at around the same time. When trying to upload more than one task at a time you are significantly more likely to have to back off and try to upload later for one or more tasks. So at times it could be several hours before 4 tasks uploaded.
Perhaps if the back-off time of files weighting to transfer to the one project was reset to 3sec after a file uploaded it would help?
Keeping a low cache of tasks (work buffer) would help alleviate the situation, as would running 24/7 and having faster GPU's, but if you have 4 GTX650's, run long tasks and have limited bandwidth (to the GPUGRID server) this could be the straw that breaks the camels back.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

mikey
Send message
Joined: 2 Jan 09
Posts: 286
Credit: 567,888,276
RAC: 53,894
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35819 - Posted: 23 Mar 2014 | 11:39:30 UTC - in response to Message 35810.

Skgiven,

From the recommended config file settings, I had never implemented the report results right away. Every time I see a GPUGrid task finish, it immediately starts uploading. The other tasks like Einstein, Seti, LHC, and Rosetta will cue up and then report. This is still true on the 7.2.42 BOINC (XP32 and Win7 64). I never implemented because I did not see a need. My GTX460 1gb cards which take ~17 hours still end up with the credit bonus all the time (except the occasional >1gb task) with the low 0.01 day work queue.

While I understand a broken setting is a broken setting, I was curious if your machines would not report right away on GPUGrid and/or I have just been lucky with some random combination of settings that this just works for GPUGrid.

Also, I wonder if it is a project/WU setting that can determine how this functions. I think it was said in the past the GPUGrid is very speed / WU turnaround dependent.

Regards,
Jeremy


I think what you are seeing at most projects is the unit 'report' right away that it has successfully completed the unit, but not send the actual results back until the next regular communication. The results reporting can take a little bit depending on the size of the data to be reported, some projects have alot and some have just a little bit. The reporting that it was successfully crunched tells the Server there is no need to resend the unit because you did finish it.

The idea behind sending back the actual results right away is the granting of credits faster, so instead of waiting for your cache to need refilling as we did in the older versions of boinc, you sent the results back right away. All this was supposed to help your rac go up faster, that is all. What it ends up doing though is tying up the Server ports for every unit you finish as opposed to doing it only when you have several units to report. The other possible reason was if your cache was so big that you were always close to having your units expire due to the time, so returning the results for them as soon as you finish them helps ensure that they get back on time. BUT I am not sure that is a valid discussion because you already reported that you did finish the unit and the Server knows that, I do not know how much leeway there is after the deadline if the Server knows the unit was crunched.

MANY years ago Seti figured it was taking 30 milliseconds for their Server to get the info that you wanted to talk to their Server, open a port and the data to start flowing. In server time that is ALOT of time. There are only so many ports available and 30 milliseconds times 2.5 million people, each doing it several times or more, per day, meant that there was no way the Server could keep up. Always trying to get a connection and retrying, retrying and retrying some more is called 'hammering' and can also slow down Server communications. That is why they instituted the increased back off times when it doesn't work, to stop people from saying 'LET ME IN' constantly. It STILL takes about that same 30 milliseconds for the Server to say 'no you can't come in', so hammering is STILL tying up the Server.

I too have only very rarely used the flag and do not use it currently as it isn't important to me.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1391
Credit: 3,479,463,183
RAC: 204,625
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35823 - Posted: 23 Mar 2014 | 12:25:37 UTC - in response to Message 35817.

The root of this is that a projects credit database is separate from the WU database. So files have to first upload to the WU database and then credit is reported to a second database. For some projects this secondary reporting to a second database is an issue (it uses up connection resources and bandwidth).

No, not true.

When a task completes (here or at any BOINC-based project), there are indeed two separate follow-up phases to be gone through, but they are not quite as you describe. They are (and have to be, in this order):

1) a 'data' phase
This is where the scientific results of the computation are returned. It's a straight file transfer, one hard disk (yours) to another (the project's). During this phase, BOINC Manager will show 'uploading' against the task name, and one or more files will be shown in the 'Transfers' tab. This phase is not visible on the 'Projects' tab. GPUGrid typically generates large upload files - 40MB-50MB for the long tasks - and they can take several minutes to upload over typical home DSL circuits (possibly less if you have a cable or fibre-optic connection).

2) a 'control' phase
This is where you let the project know that you have already uploaded the result file, so the project can proceed to check it (and validate it against another user's work, if needed). This is the point at which the project database - only one database - can be updated with the new status (success or error), the time of reporting, any credit due to be awarded, and all the other little housekeeping details.

No single task can be 'reported' (the control data written) until all the data due to the project for that task has been transferred - checking that the files are present and correct is a necessary part of the reporting process. As has been correctly said, this project sets a flag to say that reporting - what I've called the control phase - should take place as soon as possible after the data transfer is complete. We don't need to do anything for that to happen.

The confusion has arisen because of what can happen if two different tasks happen to complete in quick succession. In that particular case (only), task1 may have completed its file transfer and become 'ready to report', but task2 may still be transferring data. In that particular case, BOINC will delay the reporting of task1 until all the transfers for task2 are complete, and then report both tasks together in a single database transaction.

There are safeguards in place: if the second set of transfers fails, BOINC will go ahead and report task1 on its own anyway: and there's something like a maximum 5 minutes timeout for uploads which slow to a crawl but don't actually fail with an error message.

So, a task - 'task1' - may not be reported 'immediately' in this case, but it should be reported within 5 or 10 minutes, depending on the speed of your upload link. I don't think that's the end of the world?

Jeremy Zimmerman
Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 35824 - Posted: 23 Mar 2014 | 12:30:40 UTC - in response to Message 35819.

skgiven, since I have only been at gpugrid for a year that would explain my observation on how upload tasks is working.

mikey, exactly why I do not use the setting. I recall back in the day when our university email server was getting hit by every student to check their mail. They enabled each department to have their own which those servers were then limited to hit the main server every 5 minutes for the message dump. More people than open ports.


Appreciate the feedback from both of you. Thank you.

Post to thread

Message boards : Number crunching : Report Results Immediately doesn't always works

//