Advanced search

Message boards : Server and website : Problems uploading completed work units

Author Message
wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22235 - Posted: 8 Oct 2011 | 18:04:23 UTC

It seems every work unit that my machine finishes takes multiple tries to upload. I am using BOINC 6.12.34. Anyone else having this same problem?

It is extremely frustrating because some of the retry times go to 8 or more hours. I literally have to sit at my computer and press "retry now" multiple times over a period of perhaps 10 or 20 minutes to "force" a finished work unit to complete its upload.

Are there any known issues with uploading?
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22242 - Posted: 9 Oct 2011 | 23:26:37 UTC - in response to Message 22235.

I'm not aware of this issue. Are you using wireless?
You might want to report this to Berkeley.

Fortunately the uploads here allow you to continue from where you left off; if you uploaded 3MB then you would continue from 3MB (this is not normal on many Boinc projects that require you to restart).

I did notice that when trying to upload some CPU tasks (elsewhere) Boinc keeps adding the bandwidth used, so tasks go past the 100% mark. In such cases you definately have to select try again to get anywhere. That is a Boinc issue, and not related to this project.

Using report tasks immediately might help.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22245 - Posted: 10 Oct 2011 | 10:55:14 UTC
Last modified: 10 Oct 2011 | 11:05:07 UTC

I had uploads going into backoff tonight as well. After hitting the retry button a few times they managed to get through. Is there a comms issue on the server end (or anywhere in between)?

Downloads seems fine.

1043 GPUGRID 10-10-2011 09:33 PM Temporarily failed upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_0: HTTP error
1044 GPUGRID 10-10-2011 09:33 PM Backing off 13 min 56 sec on upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_0
1045 GPUGRID 10-10-2011 09:33 PM Temporarily failed upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_1: HTTP error
1046 GPUGRID 10-10-2011 09:33 PM Backing off 13 min 27 sec on upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_1
1047 GPUGRID 10-10-2011 09:33 PM Started upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_2
1048 GPUGRID 10-10-2011 09:33 PM Started upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_3
1049 GPUGRID 10-10-2011 09:34 PM Temporarily failed upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_2: HTTP error
1050 GPUGRID 10-10-2011 09:34 PM Backing off 16 min 3 sec on upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_2
1051 GPUGRID 10-10-2011 09:34 PM Temporarily failed upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_3: HTTP error
1052 GPUGRID 10-10-2011 09:34 PM Backing off 19 min 45 sec on upload of s0r162-TONI_SH2MS3-46-100-RND4787_0_3

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22246 - Posted: 10 Oct 2011 | 11:12:00 UTC - in response to Message 22245.

We are not aware of any connectivity issue, but we'll keep an eye.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22248 - Posted: 10 Oct 2011 | 17:21:33 UTC - in response to Message 22242.

I'm not aware of this issue. Are you using wireless?
You might want to report this to Berkeley.

Fortunately the uploads here allow you to continue from where you left off; if you uploaded 3MB then you would continue from 3MB (this is not normal on many Boinc projects that require you to restart).

I did notice that when trying to upload some CPU tasks (elsewhere) Boinc keeps adding the bandwidth used, so tasks go past the 100% mark. In such cases you definately have to select try again to get anywhere. That is a Boinc issue, and not related to this project.

Using report tasks immediately might help.
I am not on wireless. I am also only running GPU WUs. I'll report to Berkeley; however, I have only observed this behavior with GPUGrid.

Even with uploads picking up where they left off, if the upload fails enough times, the WU could be returned well after the 24 hour deadline for 1.5 credits even though it completed processing long before that. Gianni's WUs take my 460 about 18 hours. I could easily see retry fails returning the finished WUs over 24 hours after initial download.

I had uploads going into backoff tonight as well. After hitting the retry button a few times they managed to get through. Is there a comms issue on the server end (or anywhere in between)?

Downloads seems fine.
This matches my experience, and I was thinking that there might be server problems, too.

Download, for me, typically happens at something like 500kbps. However, upload seems to only happen at less than 100 kbps. Also, what I have observed when uploading finished work units is this: Typically, (among other files) there is a file that is on the order of 20 or more MB, and when this uploads, simultaneous upload of other files fails. In particular, it always seems to be one file that is always 855kB in size. It seems as if the server is not accepting more that one simultaneous connection from any one user, or perhaps the nature of the data in that file is somehow affecting the upload? It always seems to be only that 855kB file that fails.

The next time I run a few work units, I'll post the upload logs. It happened again yesterday. In fact, it is a regular occurrence for me.


____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22249 - Posted: 10 Oct 2011 | 21:17:21 UTC - in response to Message 22248.
Last modified: 10 Oct 2011 | 21:32:07 UTC

Well, for ADSL connections uploads are normally much slower than downloads. (Sometimes as low as 128 kbit/s). What *might* be happening is that large uploads hog the upload capacity, and cause smaller uploads to timeout.

I'm not sure how timeouts are handled, but I doubt it's a parameter in the server.

Edit: possibly these posts may be helpful-


http://climateprediction.net/board/viewtopic.php?p=94924#p94924
http://climateprediction.net/board/viewtopic.php?p=92119#p92119

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22250 - Posted: 10 Oct 2011 | 21:56:58 UTC - in response to Message 22249.

It's possible this is a routing problem. Turn your router off for a couple of minutes and then back on. Then restart your system, or if your up to it just flush your DNS and renew ip info; (Start, run, CMD, ipconfig -release, ipconfig -flushdns, ipconfig -renew) - fixes many issues. ISP's tend to update their DNS routers and servers over weekends. The route from the USA to Europe could change by the minute, so if you are using old arp addresses this is likely to happen. Another thing is that some ISP's sneakily reduce your bandwith, number of connections and contention, restarting resets this in many cases.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22289 - Posted: 18 Oct 2011 | 1:46:18 UTC - in response to Message 22250.
Last modified: 18 Oct 2011 | 1:47:26 UTC

I will most often turn off my "router" (a machine running SuSE Linux) over night, so first failures are sometimes attributable to the network not being available. However, when I turn the router back on, I also select all pending uploads and hit "retry now." I have never had a similar issue with other projects unless they are off-line for some reason. The problem seems unique, for me, to GPUGrid. I do run a local caching name server, however, from the logs below, I doubt that is the problem.

As I previously stated, it always seems to be, relative to each individual WU, the same file. Is there something special about this file that would cause the upload to fail?

Anyway, I ran three WUs over the weekend. After filtering out the error messages for not having the router running, here are the pertinent entries for the same problem:

10/15/2011 11:05:22 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_0
10/15/2011 11:05:22 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 11:05:41 | GPUGRID | Finished upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_0
10/15/2011 11:05:41 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_2
10/15/2011 11:06:00 | GPUGRID | Finished upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_2
10/15/2011 11:06:00 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_3
10/15/2011 11:06:10 | GPUGRID | Temporarily failed upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1: HTTP error
10/15/2011 11:06:10 | GPUGRID | Backing off 18 min 38 sec on upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 11:06:10 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_7
10/15/2011 11:06:11 | GPUGRID | Finished upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_3
10/15/2011 11:06:11 | GPUGRID | Finished upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_7
10/15/2011 11:06:13 | | Project communication failed: attempting access to reference site
10/15/2011 11:06:15 | | Internet access OK - project servers may be temporarily down.
10/15/2011 11:24:48 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 11:25:17 | GPUGRID | Temporarily failed upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1: HTTP error
10/15/2011 11:25:17 | GPUGRID | Backing off 21 min 14 sec on upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 11:25:21 | | Project communication failed: attempting access to reference site
10/15/2011 11:25:23 | | Internet access OK - project servers may be temporarily down.
10/15/2011 11:46:32 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 11:47:08 | GPUGRID | Temporarily failed upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1: HTTP error
10/15/2011 11:47:08 | GPUGRID | Backing off 53 min 24 sec on upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 11:47:10 | | Project communication failed: attempting access to reference site
10/15/2011 11:47:11 | | Internet access OK - project servers may be temporarily down.
10/15/2011 12:40:32 | GPUGRID | Started upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 12:40:39 | GPUGRID | Finished upload of p5-IBUCH_5_nwEGFR_110919-14-20-RND1733_0_1
10/15/2011 12:40:42 | GPUGRID | Sending scheduler request: To report completed tasks.
10/15/2011 12:40:42 | GPUGRID | Reporting 1 completed tasks, not requesting new tasks
10/16/2011 8:13:42 | GPUGRID | Temporarily failed upload of I172R1-GIANNI_KKFREE5-38-100-RND6106_1_4: HTTP error
10/16/2011 8:13:42 | GPUGRID | Backing off 17 min 54 sec on upload of I172R1-GIANNI_KKFREE5-38-100-RND6106_1_4
10/16/2011 8:13:46 | | Project communication failed: attempting access to reference site
10/16/2011 8:13:48 | | Internet access OK - project servers may be temporarily down.
10/16/2011 8:31:37 | GPUGRID | Started upload of I172R1-GIANNI_KKFREE5-38-100-RND6106_1_4
10/16/2011 8:33:24 | GPUGRID | Finished upload of I172R1-GIANNI_KKFREE5-38-100-RND6106_1_4
10/16/2011 19:53:35 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_0
10/16/2011 19:53:35 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 19:53:43 | GPUGRID | Finished upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_0
10/16/2011 19:53:43 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_2
10/16/2011 19:53:57 | GPUGRID | Finished upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_2
10/16/2011 19:53:57 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_3
10/16/2011 19:54:12 | GPUGRID | Temporarily failed upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1: HTTP error
10/16/2011 19:54:12 | GPUGRID | Backing off 15 min 44 sec on upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 19:54:12 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_4
10/16/2011 19:54:16 | | Project communication failed: attempting access to reference site
10/16/2011 19:54:17 | | Internet access OK - project servers may be temporarily down.
10/16/2011 19:54:23 | GPUGRID | Temporarily failed upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_3: HTTP error
10/16/2011 19:54:23 | GPUGRID | Backing off 13 min 46 sec on upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_3
10/16/2011 19:54:23 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_7
10/16/2011 19:54:24 | GPUGRID | Finished upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_7
10/16/2011 19:54:26 | | Project communication failed: attempting access to reference site
10/16/2011 19:54:27 | | Internet access OK - project servers may be temporarily down.
10/16/2011 20:00:45 | GPUGRID | Finished upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_4
10/16/2011 20:08:10 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_3
10/16/2011 20:08:20 | GPUGRID | Finished upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_3
10/16/2011 20:09:58 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:10:26 | GPUGRID | Temporarily failed upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1: HTTP error
10/16/2011 20:10:26 | GPUGRID | Backing off 32 min 50 sec on upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:10:30 | | Project communication failed: attempting access to reference site
10/16/2011 20:10:32 | | Internet access OK - project servers may be temporarily down.
10/16/2011 20:10:52 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:11:17 | GPUGRID | Temporarily failed upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1: HTTP error
10/16/2011 20:11:17 | GPUGRID | Backing off 1 hr 12 min 11 sec on upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:11:21 | | Project communication failed: attempting access to reference site
10/16/2011 20:11:22 | | Internet access OK - project servers may be temporarily down.
10/16/2011 20:11:24 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:11:50 | GPUGRID | Temporarily failed upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1: HTTP error
10/16/2011 20:11:50 | GPUGRID | Backing off 1 hr 21 min 49 sec on upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:11:53 | | Project communication failed: attempting access to reference site
10/16/2011 20:11:54 | | Internet access OK - project servers may be temporarily down.
10/16/2011 20:12:02 | GPUGRID | Started upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:12:07 | GPUGRID | Finished upload of s0r334-TONI_SH2MS3-58-100-RND1445_0_1
10/16/2011 20:12:11 | GPUGRID | Sending scheduler request: To report completed tasks.
10/16/2011 20:12:11 | GPUGRID | Reporting 1 completed tasks, not requesting new tasks
10/16/2011 20:12:13 | GPUGRID | Scheduler request completed
____________

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22290 - Posted: 18 Oct 2011 | 2:02:40 UTC - in response to Message 22249.

Well, for ADSL connections uploads are normally much slower than downloads. (Sometimes as low as 128 kbit/s). What *might* be happening is that large uploads hog the upload capacity, and cause smaller uploads to timeout.

I'm not sure how timeouts are handled, but I doubt it's a parameter in the server.

Edit: possibly these posts may be helpful-


http://climateprediction.net/board/viewtopic.php?p=94924#p94924
http://climateprediction.net/board/viewtopic.php?p=92119#p92119

So the first post has the admin removing two 0 byte files from the server of the same name - I'm not saying this is the problem, just pointing out that the admin found "something" on the server.

Also, though the first "time-outs" appear when simultaneous uploads are in progress - as in both threads - to me, that really does not explain why I have to hit retry on the "problem file" multiple times even when it is the only file that remains to upload.

My experience has been that even after all files have uploaded except the problem file, the "problem file" will still experience difficulty even when it is the only file uploading. Though there is an _02 file that had a problem when one of my WUs was uploading, it seems to most often be the _01 file that has the problem as with MarkJ's post above.
____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22294 - Posted: 18 Oct 2011 | 12:52:59 UTC - in response to Message 22290.

Have you tried to add these to cc_config.xml ?


<cc_config>
<options>
<http_transfer_timeout>900</http_transfer_timeout>
</options>
</cc_config>



wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22343 - Posted: 23 Oct 2011 | 1:40:58 UTC - in response to Message 22294.

Have you tried to add these to cc_config.xml ?


<cc_config>
<options>
<http_transfer_timeout>900</http_transfer_timeout>
</options>
</cc_config>




I've searched the machine, and do not find this file. I am running Window 7. Where should I put the file?

Thanks.
____________

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22345 - Posted: 23 Oct 2011 | 2:31:44 UTC - in response to Message 22343.

You have to create the cc_config.xml file as it's not one of the files BOINC creates for you. It is kept in the BOINC data directory. In BOINC manager, open the Event Log (Messages if running an older BOINC),. scroll to the top of Event log, 5 or 10 lines from the top it will tell you the path to the data directory. If it doesn't then that line has "expired". In that case stop BOINC client, restart the client, open the Event Log and you'll see the location of the data directory.

To create the cc_config.xml file, start Notepad (not Wordpad, Word or an XML editor, just Notepad), copy and paste the XML code Toni gave you into Notepad, save the file in UTF-8 or ASCII format, exit Notepad. Doublecheck the name of the file to make sure it is saved as cc_config.xml, sometimes Notepad tries to add the .txt extension. Then in BOINC manager click Advanced -> Read Config File. Then look in Event Log, at the bottom of the messages, where it should say something like "HTTP transfer timeout: 900". Such a message indicates you created and saved the file correctly.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22351 - Posted: 23 Oct 2011 | 11:46:23 UTC - in response to Message 22345.
Last modified: 23 Oct 2011 | 11:46:54 UTC

From FAQ - Best configurations for GPUGRID:

    For Vista and Win7 create the file in this folder, C:\ProgramData\BOINC

    Add the following lines:
      <cc_config>
      <options>
      <report_results_immediately>1</report_results_immediately>
      <http_transfer_timeout>900</http_transfer_timeout>
      </options>
      </cc_config>


Boinc has to be closed then opened again for the changes to take effect, reading does just that, reads them but does not implement the changes.

PS. The cc_config.xml file is not there by default in the Windows versions of Boinc, however it is there by default in Linux versions, and comes with a list of options and log flags.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22379 - Posted: 27 Oct 2011 | 0:42:04 UTC - in response to Message 22351.

I created the file in the c:\programdata\boinc directory with this content:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<http_transfer_timeout>900</http_transfer_timeout>
</options>
</cc_config>

I then saved the file as UTF-8 and restared boinc. The event log had a line that there was a missing "start" tag in the file. Apparently, someone has traced this to saving the file as UTF-8, so I saved again as ANSI.

On restarting BOINC, I see nothing indicating that the http transfer timeout has been set to 900 seconds, however, there is a line indicating report results immediately has been set. So, should the line about the transfer timeout appear?

Thanks for the help.

Matthew
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22384 - Posted: 27 Oct 2011 | 7:49:23 UTC - in response to Message 22379.
Last modified: 27 Oct 2011 | 8:18:36 UTC

The cc_config http_transfer_timeout option was introduced with Boinc version 6.12.27, so it should work with your 6.12.34. Your cc_config.xml contents look fine. When I tested using http_transfer_timeout the Event Log did not report anything either.
By default the timeout for file transfer is 300seconds, so after 5min of connection inactivity a transfer attempt would abort.

Profile WirelessDude
Send message
Joined: 3 Aug 11
Posts: 21
Credit: 189,614,059
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwat
Message 22405 - Posted: 28 Oct 2011 | 18:10:42 UTC - in response to Message 22235.

Just so that you don't feel all alone on the issue, I have a WU now and then not immediately uploading. But, it does eventually upload on its own...
---
WirelessDude

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22414 - Posted: 29 Oct 2011 | 18:29:25 UTC

Yes, the WUs do eventually upload. I tend to run my computers just for the weekend, and periodically during the week. What I find annoying is that if there are failed uploads when I am trying to shut down for the weekend, I will sometimes have to hit retry numerous times and sometimes it takes 20 minutes to shut my machines down because the WU fails to complete its upload.

I am now running with the latest settings as suggested, and the problem is still evident. In addition, it looks like setting the http timeout to 900 seconds has not been helpful as the _1 file timed out in approximately 33-seconds.

Here's the latest log of the upload process for a WU that completed this morning:

10/29/2011 11:19:31 | GPUGRID | Computation for task s0r694-TONI_SH2MS3-66-100-RND2873_0 finished
10/29/2011 11:19:42 | GPUGRID | Starting task I63R0-GIANNI_KKFREE5-56-100-RND9456_0 using acemdlong version 615
10/29/2011 11:19:43 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_0
10/29/2011 11:19:43 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_1
10/29/2011 11:19:47 | GPUGRID | Finished upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_0
10/29/2011 11:19:47 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_2
10/29/2011 11:20:00 | GPUGRID | Finished upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_2
10/29/2011 11:20:00 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_3
10/29/2011 11:20:08 | GPUGRID | Finished upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_3
10/29/2011 11:20:08 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_4
10/29/2011 11:20:16 | GPUGRID | Temporarily failed upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_1: HTTP error
10/29/2011 11:20:16 | GPUGRID | Backing off 15 min 58 sec on upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_1

10/29/2011 11:20:16 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_7
10/29/2011 11:20:17 | GPUGRID | Finished upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_7
10/29/2011 11:20:19 | | Project communication failed: attempting access to reference site
10/29/2011 11:20:20 | | Internet access OK - project servers may be temporarily down.
10/29/2011 11:25:57 | GPUGRID | Finished upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_4
10/29/2011 11:36:14 | GPUGRID | Started upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_1
10/29/2011 11:36:24 | GPUGRID | Finished upload of s0r694-TONI_SH2MS3-66-100-RND2873_0_1
10/29/2011 11:36:27 | GPUGRID | Sending scheduler request: To report completed tasks.
10/29/2011 11:36:27 | GPUGRID | Reporting 1 completed tasks, not requesting new tasks
10/29/2011 11:36:29 | GPUGRID | Scheduler request completed


Note that as previously reported, the _1 file failed its initial upload attempt. AFAIK, I've done everything on my end to resolve the issue.

Personally, I have 20-years experience programming, and if it were me, I would be looking at the server logs to see if there is an indication of a problem on the server. The fact that it happens 99% of the time with the _1 file I would suspect is not coincidental and is a good clue to finding the issue.

If you have not done so already, please humor me and check the server and/or set up some monitoring to debug this. I'm not the only one who is experiencing this issue; perhaps as in the case of the Climate Prediction project, there is something unexpected happening on the server.

Thanks.
____________

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22421 - Posted: 30 Oct 2011 | 3:30:19 UTC
Last modified: 30 Oct 2011 | 3:32:59 UTC

In your cc_config file, between the <options> tags try adding the following:

<http_1_0>1</http_1_0>

Set this flag to use HTTP 1.0 instead of 1.1 (this may be needed with some proxies).

You'll need to re-read the config file or restart BOINC to pick up the change.

If that doesn't work we'll probably have to turn on the debug flags to see what sort of error response its coming back with. Let us know how this goes.

I don't have access to the server logs, but Toni and GDF would be able to see them.
____________
BOINC blog

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22589 - Posted: 26 Nov 2011 | 18:45:45 UTC - in response to Message 22421.

In your cc_config file, between the <options> tags try adding the following:

<http_1_0>1</http_1_0>

Set this flag to use HTTP 1.0 instead of 1.1 (this may be needed with some proxies).

You'll need to re-read the config file or restart BOINC to pick up the change.

If that doesn't work we'll probably have to turn on the debug flags to see what sort of error response its coming back with. Let us know how this goes.

I don't have access to the server logs, but Toni and GDF would be able to see them.

I am not using a proxy. So, it sounds like this will not help?? Or should I do this anyway just to see if it will help?

Being someone in the software industry, I would take the action that you are suggesting, i.e., set the debug flags on the server. Why? Because multiple people are experiencing this as noted by others who have posted to the thread. To me, as a software industry professional, it makes no sense that it is on the user's side when multiple people are experiencing the problem.

So, please let me know whether it is worth it for me to adjust my config file again even though I am not using a proxy and I am not the only one experiencing the problem. If you think it will help, I'll do it. Everything that was suggested, to this point, I have implemented and it has not helped.

I've a wu on my machine right now that is showing the same problem.
____________

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22594 - Posted: 27 Nov 2011 | 1:31:23 UTC - in response to Message 22589.

Being someone in the software industry, I would take the action that you are suggesting, i.e., set the debug flags on the server. Why? Because multiple people are experiencing this as noted by others who have posted to the thread. To me, as a software industry professional, it makes no sense that it is on the user's side when multiple people are experiencing the problem.

<snipped>

I've a wu on my machine right now that is showing the same problem.


The debug flags I referred to are on your client, using the same cc_config file. As I mentioned before I don't have access to the server logs, thats something the project admins (GDF, Toni and Ignassi) would have to do.

The GPUgrid server(s) are usually pretty stable. We've had problems in the past when they have run out of disk space. The error message in your BOINC client logs would tell you that. As it is its saying the fairly generic HTTP error one.

One other thing to try (apart from shutting down BOINC and restarting it) is to flush your DNS cache. At a command prompt type "ipconfig /flushdns". This tells windows to flush its DNS cache so that it has to lookup the DNS again instead of using its local cache.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22596 - Posted: 27 Nov 2011 | 10:38:23 UTC - in response to Message 22594.
Last modified: 27 Nov 2011 | 10:43:15 UTC

I think the problem is most likely to do with router/ISP settings; timeout, bandwidth restrictions or contention, which may change during different times of the day. Could you run a speed test, and a Ping Test between your location and Barcalona, and post the resulting images?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22597 - Posted: 27 Nov 2011 | 11:13:22 UTC - in response to Message 22596.
Last modified: 27 Nov 2011 | 11:24:21 UTC

Many users, especially those with mobile, are blessed by all kinds of "transparent" proxies and packet shaping courtesy of their ISPs.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22725 - Posted: 17 Dec 2011 | 19:45:51 UTC - in response to Message 22596.

I think the problem is most likely to do with router/ISP settings; timeout, bandwidth restrictions or contention, which may change during different times of the day. Could you run a speed test, and a Ping Test between your location and Barcalona, and post the resulting images?

Using Barcelona as the server, the Ping is 137 ms, download is 9.39 Mbps., upload is 0.75 Mbps. Upload has always seemed slow to GPUGrid; seems that the upload speeds confirm this, and that it is not as a result of GPUGrid.

With Pingtest and Barcelona, Packet loss is 0%, Ping is 143 ms, Jitter is 5 ms. overall grade is "B"

I am going to add the HTTP 1.0 setting to my config file. The problem is still happening after the server upgrade, so that would seem to at least somewhat rule out the server. It took 3 times to upload the _1 file for this result. Perhaps there is a transparent proxy somewhere that is interfering.

To me, the perplexing thing about this this is why is it always only the _1 file which is relatively small in size as compared to at least one of the other result files. Is there something special about this file? Answering these questions about the _1 files may give some clues.

As to the DNS cache, I regularly reboot this machine. It is not on all the time, and I have done nothing to persist the cache to file - if persisting the DNS cache to file is even something that can be done on a non-server version of Windows. In addition, I run a local DNS on my firewall that is on a separate machine that is also regularly rebooted. By regularly I mean that neither machine is usually up more than a few hours - except when I run GPUGrid. Since I set up the DNS on the firewall, I know that it does not persist its cache to file.

I appreciate your help and your patience.


____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22727 - Posted: 17 Dec 2011 | 20:43:02 UTC - in response to Message 22725.
Last modified: 17 Dec 2011 | 21:04:54 UTC

Hi, some random notes, which unfortunately, I'm afraid, are not going to help:

* Slow upload (eg your rate) is typical of current ADSLs, so the value you see is not strange
* Errors on _1 file is not easy to explain. File _2, for example, should be very similar in size and header.


My guess is that the ISP have put some proxy or, more likely, a packet-dependent filter. For example, they may be mis-recognizing some bytes in the file as a who-knowns-what that they want to filter.

The underlying problem in the broader scheme of things is that, nowadays, ISP tend to see uploads as suspicious by default. After all, internet works like a TV: why should one be uploading stuff? :)

By the way, you may try to search the internet for other complaints by users of the same ISP. I think there are even services to check for net-neutrality. Could you try http://broadband.mpi-sws.org/transparency/ (and/or see how you ISP fares in their list) and post the results?

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22739 - Posted: 19 Dec 2011 | 15:49:19 UTC - in response to Message 22727.

Hi, some random notes, which unfortunately, I'm afraid, are not going to help:

* Slow upload (eg your rate) is typical of current ADSLs, so the value you see is not strange
* Errors on _1 file is not easy to explain. File _2, for example, should be very similar in size and header.


My guess is that the ISP have put some proxy or, more likely, a packet-dependent filter. For example, they may be mis-recognizing some bytes in the file as a who-knowns-what that they want to filter.

The underlying problem in the broader scheme of things is that, nowadays, ISP tend to see uploads as suspicious by default. After all, internet works like a TV: why should one be uploading stuff? :)

By the way, you may try to search the internet for other complaints by users of the same ISP. I think there are even services to check for net-neutrality. Could you try http://broadband.mpi-sws.org/transparency/ (and/or see how you ISP fares in their list) and post the results?

I don't have ADSL, I have cable. Since my upload rates are fine elsewhere, there is something about the path to Barcelona that is likely the problem.

I'm backing out the HTTP 1.0 change. The problem with the _1 file still exists even with that setting. However, as soon as I put that in, uploads for World Community Grid start failing regularly. Without the HTTP 1.0 flag, the WCG uploads are fine.

I'll check the net neutrality site when I get a chance.

Thanks.

____________

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22783 - Posted: 21 Dec 2011 | 1:32:15 UTC - in response to Message 22739.
Last modified: 21 Dec 2011 | 1:33:07 UTC

Are recent WUs uploaded smoothly or the problem persists?

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22788 - Posted: 21 Dec 2011 | 15:20:53 UTC - in response to Message 22783.
Last modified: 21 Dec 2011 | 15:21:38 UTC

Unfortunately, the problem persists.

I have not yet had a chance to try the net neutrality links you posted previously. I may have a chance to try those tonight, however, if not tonight, I will definitely get a chance this weekend.
____________

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 388,572
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22797 - Posted: 22 Dec 2011 | 15:32:21 UTC - in response to Message 22351.

From FAQ - Best configurations for GPUGRID:
    For Vista and Win7 create the file in this folder, C:\ProgramData\BOINC

    Add the following lines:
      <cc_config>
      <options>
      <report_results_immediately>1</report_results_immediately>
      <http_transfer_timeout>900</http_transfer_timeout>
      </options>
      </cc_config>


Boinc has to be closed then opened again for the changes to take effect, reading does just that, reads them but does not implement the changes.

PS. The cc_config.xml file is not there by default in the Windows versions of Boinc, however it is there by default in Linux versions, and comes with a list of options and log flags.



Is there a way to apply that only to GPUGRID under Vista and Windows 7? Some of the other BOINC projects my computers are participating want to reduce their server loads by having the workunits reported in batches instead of immediately after they finish.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22799 - Posted: 22 Dec 2011 | 16:40:01 UTC - in response to Message 22797.

The developers expressed an interested in applying the report_results_immediately option as a project default.

As it's presently only usable as a cc_config option, it would apply to all projects.

Boinc is developing towards improving individual project controls, so perhaps within a couple of months we will start to see this materialize.
I think a feature-rich 7.x client is not too far off.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22841 - Posted: 27 Dec 2011 | 3:59:15 UTC - in response to Message 22727.


...
By the way, you may try to search the internet for other complaints by users of the same ISP. I think there are even services to check for net-neutrality. Could you try http://broadband.mpi-sws.org/transparency/ (and/or see how you ISP fares in their list) and post the results?

I've run most of the tests at this site, and each one indicates that my ISP is not shaping traffic.

Interestingly enough, I've been running WU's over the holiday, and I am now noticing that the problem is "no longer limited" to the _1 files. It now seems to be happening randomly with the other files, too. I've never seen that behavior before. It has always been the _1 files only.

____________

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22868 - Posted: 31 Dec 2011 | 14:19:30 UTC - in response to Message 22841.

One note on my last post. Every wu that I have run over the holiday has experienced this problem. Except for one wu that failed on other files, all have repeatedly failed on the _1 file. Note that all along, the file at least partially uploads, then fails. Sometimes, it partially uploads, fails, uploads some more, fails, and the pattern repeats until the file finally uploads. In my opinion, this could be as a result of the same kind of problem that the other project mentioned in this thread experienced where there was a file hanging around on the server.

With the way that BOINC operates, increasing times before retry up to 10 or more hours, completed WUs could experience a substantial delay before the results are reported. Worst case, I could see completed WUs missing the deadline - though this has not yet happened to me. With the time sensitivity of this project, I think this problem is particularly annoying.

I think I have demonstrated that this is not on my end, that my connection is reasonable, and that my ISP is not filtering. In other words, I think that I have done all I can do at this point. Perhaps it is time to take a look at the server end. I'm not sure anything can be done, but it might be worth a look.
____________

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 388,572
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22959 - Posted: 13 Jan 2012 | 6:45:48 UTC - in response to Message 22868.
Last modified: 13 Jan 2012 | 6:53:42 UTC

I've read that the transatlantic internet connections are significantly slower than the highest speed on-land internet connections.

Wiyosaya, you're in the US, so won't that slow down any connections you make to GPUGRID?

I've seen some other BOINC projects use enough output files to limit the maximum size of these files to about 4 MB, and not show that problem, so how practical would it be for the developers to add a step for splitting the largest output files into multiple, easily recombined pieces, and then test if that helps get good uploads from users on other continents?

Post to thread

Message boards : Server and website : Problems uploading completed work units

//