Message boards : Server and website : http error with HIV workunits
Author | Message |
---|---|
I have had two issues today where a HIV workunit (635688 and 582818) stopped downloading with a http error, on all files in the package afaik. Even after letting it run its course it did not download. Eventually I had to cancel the workunits to keep going. Is this a workunit related issue ? The connection with the server was fine, other packets right before it and after it downloaded fine. What is the best course of action in cases like these ? | |
ID: 11292 | Rating: 0 | rate:
![]() ![]() ![]() | |
Two more cases today. I have noticed that the issue seemingly is caused by THREE download threads being started simultaneously whereas normally only TWO threads are allowed. Hope this provides some insight into the issue. | |
ID: 11305 | Rating: 0 | rate:
![]() ![]() ![]() | |
I just had to abort transfer on 2 HIV workunits. Stalled in download with HTTP error. | |
ID: 11322 | Rating: 0 | rate:
![]() ![]() ![]() | |
Same for me today. | |
ID: 11326 | Rating: 0 | rate:
![]() ![]() ![]() | |
And you can add me to the list. Like the other guys its been doing this for the last couple of days. | |
ID: 11327 | Rating: 0 | rate:
![]() ![]() ![]() | |
We stopped some HIV WUs two days ago, but they left behind remnants. Please abort them at will. | |
ID: 11329 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thanks. I'll have to because I'm being bombarded with them now.. :( and they block processing. | |
ID: 11344 | Rating: 0 | rate:
![]() ![]() ![]() | |
We'll try to cancel them server-side asap, thanks for your patience. | |
ID: 11346 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thanks for that, much appreciated. | |
ID: 11347 | Rating: 0 | rate:
![]() ![]() ![]() | |
We'll try to cancel them server-side asap, thanks for your patience. I also have had 20 WUs errors in download , stuck in mid download , error in computation E.T.C Typical of is below 27/07/2009 10:44:29 a.m. GPUGRID Finished download of 91-KASHIF_HIVPR_sub_so_ba1-5-91-KASHIF_HIVPR_sub_so_ba1-4-100-RND7343_2 27/07/2009 10:44:29 a.m. GPUGRID Started download of 91-KASHIF_HIVPR_sub_so_ba1-5-91-KASHIF_HIVPR_sub_so_ba1-4-100-RND7343_3 27/07/2009 10:45:17 a.m. GPUGRID Finished download of 91-KASHIF_HIVPR_sub_so_ba1-5-91-KASHIF_HIVPR_sub_so_ba1-4-100-RND7343_3 27/07/2009 10:45:17 a.m. GPUGRID Started download of 91-KASHIF_HIVPR_sub_so_ba1-5-pdb_file 27/07/2009 10:46:44 a.m. GPUGRID Finished download of 77-GIANNI_BINDX119-29-par_file 27/07/2009 10:46:44 a.m. GPUGRID Started download of 91-KASHIF_HIVPR_sub_so_ba1-5-psf_file 27/07/2009 10:46:44 a.m. GPUGRID [error] MD5 check failed for 77-GIANNI_BINDX119-29-par_file 27/07/2009 10:46:44 a.m. GPUGRID [error] expected c2605a4451ad8240f29215f84cb6de7e, got d8298542b27b3e9c7a3396c23444223c 27/07/2009 10:46:44 a.m. GPUGRID [error] Checksum or signature error for 77-GIANNI_BINDX119-29-par_file plus other strange behavoiur. is it all sorted out now? Ross ____________ | |
ID: 11363 | Rating: 0 | rate:
![]() ![]() ![]() | |
7/27/2009 5:21:29 AM GPUGRID [error] File 35-KASHIF_HIVPR_dim_ba3-26-35-KASHIF_HIVPR_dim_ba3-25-100-RND7138_1 has wrong size: expected 1210492, got 0 | |
ID: 11364 | Rating: 0 | rate:
![]() ![]() ![]() | |
And again this morning, but just on one host! Always the same. | |
ID: 11367 | Rating: 0 | rate:
![]() ![]() ![]() | |
We cancelled the faulty WUs. Hopefully the change propagates fast to your clients. | |
ID: 11370 | Rating: 0 | rate:
![]() ![]() ![]() | |
yep, change seems to have fixed the servers... | |
ID: 11371 | Rating: 0 | rate:
![]() ![]() ![]() | |
have 2 WUS almosted completed | |
ID: 11373 | Rating: 0 | rate:
![]() ![]() ![]() | |
This would have to happen while I'm on vacation. Just got home to find 2 GPUs stuck on these bad WUs :-( | |
ID: 11415 | Rating: 0 | rate:
![]() ![]() ![]() | |
Uhm.. that means that the clients do not really obey cancellation requests... | |
ID: 11444 | Rating: 0 | rate:
![]() ![]() ![]() | |
I had to cancel them both manually. They probably didn't cancel because they were stuck with download errors. It's way worse than a normally bad WU though because they took the GPUs out of action until I got home to intervene. | |
ID: 11446 | Rating: 0 | rate:
![]() ![]() ![]() | |
Ok, from my log files 7/26/2009 12:03:03 AM|GPUGRID|Sending scheduler request: To report completed tasks. Requesting 82299 seconds of work, reporting 3 completed tasks 7/26/2009 12:03:08 AM|GPUGRID|Scheduler request completed: got 1 new tasks 7/26/2009 12:03:10 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE 7/26/2009 12:03:10 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-COPYRIGHT 7/26/2009 12:03:11 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE: HTTP error 7/26/2009 12:03:11 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE 7/26/2009 12:03:11 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-COPYRIGHT: HTTP error 7/26/2009 12:03:11 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-COPYRIGHT 7/26/2009 12:03:11 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_1 7/26/2009 12:03:11 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_2 7/26/2009 12:03:13 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_1: HTTP error 7/26/2009 12:03:13 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_1 7/26/2009 12:03:13 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_2: HTTP error 7/26/2009 12:03:13 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_2 7/26/2009 12:03:13 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_3 7/26/2009 12:03:13 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-pdb_file 7/26/2009 12:03:14 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_3: HTTP error 7/26/2009 12:03:14 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-3-KASHIF_HIVPR_dim_ba5-24-100-RND4733_3 7/26/2009 12:03:14 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-pdb_file: HTTP error 7/26/2009 12:03:14 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-pdb_file 7/26/2009 12:03:14 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-psf_file 7/26/2009 12:03:14 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-par_file 7/26/2009 12:03:14 AM|Docking@Home|Sending scheduler request: To fetch work. Requesting 120956 seconds of work, reporting 0 completed tasks 7/26/2009 12:03:15 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-psf_file: HTTP error 7/26/2009 12:03:15 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-psf_file 7/26/2009 12:03:15 AM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-myfile.enc 7/26/2009 12:03:17 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-myfile.enc: HTTP error 7/26/2009 12:03:17 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-myfile.enc 7/26/2009 12:03:18 AM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-par_file: HTTP error 7/26/2009 12:03:18 AM|GPUGRID|Backing off 1 min 0 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-par_file my logfile is full of messages like 7/28/2009 12:06:37 PM|GPUGRID|Started download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE 7/28/2009 12:06:38 PM|GPUGRID|Temporarily failed download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE: HTTP error 7/28/2009 12:06:38 PM|GPUGRID|Backing off 1 hr 22 min 54 sec on download of 3-KASHIF_HIVPR_dim_ba5-25-LICENSE taken just now. I went in to abort the work units, but they were not on the Tasks page, so I manually aborted the transfers. That seemed to clean things up. Interestingly, my quad core linux box got 07/28/09 12:21:48|GPUGRID|Sending scheduler request: To fetch work. Requesting 222626 seconds of work, reporting 0 completed tasks 07/28/09 12:21:58|GPUGRID|Scheduler request completed: got 5 new tasks 07/28/09 12:22:00|GPUGRID|Started download of acemd_6.66_x86_64-pc-linux-gnu__cuda 07/28/09 12:22:00|GPUGRID|Started download of libcufft.so.2.1 07/28/09 12:22:48|GPUGRID|Finished download of libcufft.so.2.1 07/28/09 12:22:48|GPUGRID|Started download of libcudart.so.2.1 5 tasks for one GPU? Come on, it was bad enough when it gave me 4, but 5?? ____________ | |
ID: 11447 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yet more failed downloads, starting from 29th 22:22hrs. Really is a pita! That's using client 6.6.36 | |
ID: 11491 | Rating: 0 | rate:
![]() ![]() ![]() | |
there might be some WUs which survived the cancellation from the server. | |
ID: 11492 | Rating: 0 | rate:
![]() ![]() ![]() | |
Message boards : Server and website : http error with HIV workunits