Advanced search

Message boards : Number crunching : have a lot of stuck tasks, abort some?

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 164
Credit: 2,842,236,093
RAC: 1,398,335
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53140 - Posted: 27 Nov 2019 | 4:34:55 UTC
Last modified: 27 Nov 2019 | 5:05:38 UTC

have never seen this many before other than when my router was turned off. Other systems are running fine even those with gpugrid tasks. Looks like all tasks completed just fine, no error, but cannot upload. A restart of boinc did not help.

GPUGRID initial_1344-ELISA_GSN4V1-8-100-RND6294_0_1 15.093 3817.50 K 00:23:01 - 16:48:28 0.00 Kbps Upload pending (Retry in: 02:31:01), retried: 8 JYSArea51
GPUGRID initial_1344-ELISA_GSN4V1-8-100-RND6294_0_2 20.116 3817.50 K 00:26:10 0.71 Kbps Uploading JYSArea51
GPUGRID initial_1344-ELISA_GSN4V1-8-100-RND6294_0_9 0.850 67761.89 K 00:22:58 - 17:47:06 0.00 Kbps Upload pending (Retry in: 03:29:39), retried: 8 JYSArea51
GPUGRID initial_1381-ELISA_GSN0V1-9-100-RND4251_0_1 1.683 3816.54 K 00:02:34 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1381-ELISA_GSN0V1-9-100-RND4251_0_2 1.683 3816.54 K 00:02:34 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1381-ELISA_GSN0V1-9-100-RND4251_0_9 0.094 68042.38 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1512-ELISA_GSN0V1-6-100-RND5965_0_1 3.359 3816.54 K 00:05:13 0.00 Kbps Upload pending, retried: 2 JYSArea51
GPUGRID initial_1512-ELISA_GSN0V1-6-100-RND5965_0_2 3.359 3816.54 K 00:05:12 0.00 Kbps Upload pending, retried: 2 JYSArea51
GPUGRID initial_1512-ELISA_GSN0V1-6-100-RND5965_0_9 0.094 67987.78 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1719-ELISA_GSN0V1-5-100-RND4368_0_1 1.683 3816.54 K 00:02:37 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1719-ELISA_GSN0V1-5-100-RND4368_0_2 1.683 3816.54 K 00:02:35 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID initial_1719-ELISA_GSN0V1-5-100-RND4368_0_9 0.095 67536.57 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test265-TONI_GSNTEST3-11-100-RND0660_0_1 1.682 3817.50 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test265-TONI_GSNTEST3-11-100-RND0660_0_2 1.682 3817.50 K 00:02:34 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test265-TONI_GSNTEST3-11-100-RND0660_0_9 0.094 68081.72 K 00:02:36 0.00 Kbps Upload pending, retried: 1 JYSArea51
GPUGRID test360-TONI_GSNTEST3-6-100-RND5366_0_1 18.440 3817.50 K 00:23:32 0.71 Kbps Uploading JYSArea51
GPUGRID test360-TONI_GSNTEST3-6-100-RND5366_0_2 10.064 3817.50 K 00:15:35 0.00 Kbps Upload pending, retried: 6 JYSArea51
GPUGRID test360-TONI_GSNTEST3-6-100-RND5366_0_9 0.470 68081.16 K 00:12:57 0.00 Kbps Upload pending, retried: 5 JYSArea51


[EDIT] reboot of windows started things going. I suspect the first 67mb files caused a problem which was compouned by subsequent ones of same size all trying to upload concurrently. Need to figure a was to stop this. Have three 1070ti boards but network cant seem to handle the large files when all get done near same time.

In other news I got my first Linux cuda100. It is running on gtx 1660ti.

captainjack
Send message
Joined: 9 May 13
Posts: 160
Credit: 1,221,462,739
RAC: 10,070
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 53155 - Posted: 27 Nov 2019 | 13:49:14 UTC - in response to Message 53140.

JStateson wrote:

I suspect the first 67mb files caused a problem which was compouned by subsequent ones of same size all trying to upload concurrently. Need to figure a was to stop this.


Just a thought, in the cc_config.xml file there is an option for
<max_file_xfers_per_project>N</max_file_xfers_per_project>
.
Maybe that would help.

klepel
Send message
Joined: 23 Dec 09
Posts: 178
Credit: 3,088,333,011
RAC: 1,729,956
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53157 - Posted: 27 Nov 2019 | 15:32:13 UTC

I am not alone! You are not alone!

I observe also slow uploads of the finish WUs on all my computers. BOINC reports about 6 KBps.

I do have particularly problems with this computer: http://www.gpugrid.net/show_host_detail.php?hostid=512293 Uploads stall for hours! Yes, the computer has also some climateprediction.net files to upload, but other projects have no problems to upload and download with faster speeds.

It reminds me of the bandwidth problems GRIDCOIN had with their IT department (not giving sufficient bandwidth to GPUGRID) years ago. Might somebody from the project look into it. Make the WUs longer so the server does not get hammered by so many computers at the same time?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 525,726,686
RAC: 1,676,357
Level
Lys
Scientific publications
wat
Message 53160 - Posted: 27 Nov 2019 | 16:16:43 UTC

Since I run multiple projects on the same hosts, I need to provide sufficient network communication threads for all the uploads/downloads.

Does not help I have asymmetrical upload/downloads speeds because of ADSL2. My 1Mbps upload link is not big enough to handle the large result files from GPUGrid without some strain. Not having any issues uploading though as long as the project servers are accepting connection.

I know that they will take at least a half hour to upload. I use these parameters in cc_config.xml

<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>

klepel
Send message
Joined: 23 Dec 09
Posts: 178
Credit: 3,088,333,011
RAC: 1,729,956
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53162 - Posted: 27 Nov 2019 | 18:19:26 UTC - in response to Message 53160.

I use these parameters in cc_config.xml
<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>


So the cc_config.xml would like look like:
<cc_config>
<options>
<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
</options>
</cc_config>

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 164
Credit: 2,842,236,093
RAC: 1,398,335
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53164 - Posted: 27 Nov 2019 | 20:01:58 UTC

All tasks finally uploaded after a reboot

Need to configure that max allowable number of transfers


Thanks!

Profile Dingo
Avatar
Send message
Joined: 1 Nov 07
Posts: 20
Credit: 81,218,414
RAC: 194,565
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53245 - Posted: 1 Dec 2019 | 10:39:07 UTC

I have a few tasks that have not uploaded for a while and all have "Upload Pending Project Backoff" Do I just let them sit there and wait till they upload. I have tried stopping and starting BOINC but that did not fix it.

GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_0 1.054 10.65 K 00:00:20 - 15:10:42 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_1 0.003 3816.54 K 00:00:18 - 14:50:55 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_2 0.003 3816.54 K 00:00:11 - 12:41:05 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_9 0.000 68065.03 K 00:00:07 - 12:29:29 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_10 100.000 0.27 K 00:00:39 - 12:37:38 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_0 1.057 10.62 K 00:00:42 - 12:39:17 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_1 0.003 3816.54 K 00:00:22 - 12:24:37 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_2 0.003 3816.54 K 00:00:21 - 12:20:38 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_9 0.000 68067.50 K 00:00:04 - 12:08:51 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_10 100.000 0.27 K 00:00:03 - 06:35:08 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_0 1.022 10.99 K 00:00:14 - 08:49:10 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_1 0.003 3816.54 K 00:00:16 - 07:01:25 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_2 0.003 3816.54 K 00:00:11 - 07:53:31 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_9 0.000 68066.14 K 00:00:08 - 06:20:56 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_10 100.000 0.27 K 00:00:07 - 06:05:22 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_0 1.031 10.90 K 00:00:20 - 16:13:14 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_1 0.003 3817.50 K 00:00:20 - 14:14:37 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_2 0.003 3817.50 K 00:00:11 - 13:01:50 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_9 0.000 68080.19 K 00:00:10 - 12:35:48 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_10 100.000 0.27 K 00:00:07 - 11:41:37 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3

stiwi
Send message
Joined: 18 Jun 12
Posts: 1
Credit: 75,657,535
RAC: 12,908
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 53246 - Posted: 1 Dec 2019 | 10:45:48 UTC - in response to Message 53245.

01.12.2019 10:51:50 | GPUGRID | [error] Error reported by file upload server: Server is out of disk space


We have to wait until they fix it :)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 198
Credit: 1,456,827,348
RAC: 912,695
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53247 - Posted: 1 Dec 2019 | 10:46:33 UTC - in response to Message 53245.

This is being treated on this other thread:
http://www.gpugrid.net/forum_thread.php?id=5027

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 947
Credit: 4,353,973
RAC: 58
Level
Ala
Scientific publications
watwatwatwat
Message 53250 - Posted: 1 Dec 2019 | 13:44:56 UTC - in response to Message 53247.

Please be patient, no need to abort

Post to thread

Message boards : Number crunching : have a lot of stuck tasks, abort some?