Server only allows one connection at a time from an IP? 30s cooldown is too short.

Message boards : Server and website : Server only allows one connection at a time from an IP? 30s cooldown is too short.

Author	Message
Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1048 Credit: 40,228,383,983 RAC: 547,485 Level Scientific publications	Message 54877 - Posted: 22 May 2020 \| 14:40:42 UTC Last modified: 22 May 2020 \| 14:42:18 UTC
	So I've been pulling my hair out trying to figure out why I've had such issues trying to load the GPUGRID website or communicate with the project via BOINC. it seemed like only one computer could make a connection, and if that computer was running BOINC, all other systems at the house could not load the GPUGRID website, nor communicate via BOINC. in all instances I was able to successfully ping gpugrid.net from any system, so it wasn't a DNS problem. It's because of 1 and/or 2 things. 1. it seems like on the gpugrid server side, their network is only allowing 1 connection at a time from a single IP address. 2. the 30 second default cooldown after a schedule request is not long enough to release the connection so another computer from the same IP can communicate, so all attempts get blocked. I have solved my problem using a brute force method to force cooldowns longer than default, to 10 mins. now suddenly all systems can reach the project though the website and BOINC. but if I have just one system using the default 30 second cooldown, it hogs the connection and no one else can communicate. please make your default cooldown longer and/or allow multiple connections from a single IP address. this wreaks havoc on people running multiple computers from the same location. there's no need to keep pinging for more work every 30 seconds when WUs run for 30min to many hours. ____________
	ID: 54877 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1589 Credit: 6,676,494,351 RAC: 9,088,187 Level Scientific publications	Message 54878 - Posted: 22 May 2020 \| 14:53:35 UTC - in response to Message 54877.
	This has been the case for many years - I've posted about it before. It's also aggravated by the way that all functions operate on a single server (Grosso). So uploads and downloads also trigger the lockout - I can see another machine trying to download a task while I type this. The current tasks are actually shorter than has been common at this project, which aggravates, as (obviously) has been the influx of new volunteers from SETI. Obviously, volunteers with just a single computer won't know what all the fuss is about!
	ID: 54878 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1313 Credit: 6,017,267,459 RAC: 9,928,960 Level Scientific publications	Message 54880 - Posted: 22 May 2020 \| 15:32:14 UTC
	Count me in as afflicted also. Sure would like a solution from the project end.
	ID: 54880 \| Rating: 0 \| rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 54890 - Posted: 22 May 2020 \| 22:17:51 UTC
	I only run 4 GPUs on 2 hosts and have had that happen. I wonder if it is a built-in defense against DOS attacks?
	ID: 54890 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1313 Credit: 6,017,267,459 RAC: 9,928,960 Level Scientific publications	Message 54893 - Posted: 22 May 2020 \| 23:54:05 UTC
	That is what we postulated a long time ago. But it could just be that one lone server doing all functions just can't support the required https connections that the influx of new users has caused and the server loading to increase beyond what was originally configured.
	ID: 54893 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 54898 - Posted: 23 May 2020 \| 6:03:16 UTC - in response to Message 54893.
	I am not aware of an explicit limit set in the server. It can be anywhere (including ISP throttling). Does it affect the web pages too?
	ID: 54898 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1589 Credit: 6,676,494,351 RAC: 9,088,187 Level Scientific publications	Message 54899 - Posted: 23 May 2020 \| 6:24:00 UTC - in response to Message 54898. Last modified: 23 May 2020 \| 6:52:51 UTC
	Yes. One computer doing a task operation (upload, report, download) can prevent another computer accessing these message boards. Or reading/writing here can prevent another computer doing task operations. Edit - perhaps we should say that the problem happens at the 'connect' phase, when our device is attempting to open a TCP/IP connection to Grosso. I'm pretty sure it's something in the server operating system or web server components, long before any of the BOINC software comes into play.
	ID: 54899 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 400 Credit: 13,823,914,882 RAC: 30,117,349 Level Scientific publications	Message 54911 - Posted: 23 May 2020 \| 20:56:40 UTC - in response to Message 54877.
	I have solved my problem using a brute force method to force cooldowns longer than default, to 10 mins. now suddenly all systems can reach the project though the website and BOINC. but if I have just one system using the default 30 second cooldown, it hogs the connection and no one else can communicate. please make your default cooldown longer and/or allow multiple connections from a single IP address. I can't find any cooldown command in my cc_config. How does one implement this fix??? Should this be set 0 or 1??? <report_results_immediately>1</report_results_immediately>
	ID: 54911 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 54912 - Posted: 23 May 2020 \| 21:28:59 UTC - in response to Message 54911.
	Should this be set 0 or 1??? <report_results_immediately>1</report_results_immediately> It should be set to 1 for GPUGrid, but this property is set by the GPUGrid project in the tasks (so this option has no effect on GPUGrid tasks, until it's set by the project). Look for <report_immediately/> in the client_state.xml and you'll find similar records: <result> <name>2c1dB00_379_1-TONI_MDADex2sc-0-50-RND9291_0</name> <final_cpu_time>0.000000</final_cpu_time> <final_elapsed_time>0.000000</final_elapsed_time> <exit_status>0</exit_status> <state>2</state> <platform>windows_x86_64</platform> <version_num>210</version_num> <plan_class>cuda101</plan_class> <report_immediately/> <wu_name>2c1dB00_379_1-TONI_MDADex2sc-0-50-RND9291</wu_name> <report_deadline>1590700376.000000</report_deadline> <received_time>1590268377.505087</received_time> ...
	ID: 54912 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1313 Credit: 6,017,267,459 RAC: 9,928,960 Level Scientific publications	Message 54913 - Posted: 23 May 2020 \| 22:18:51 UTC - in response to Message 54911.
	I have solved my problem using a brute force method to force cooldowns longer than default, to 10 mins. now suddenly all systems can reach the project though the website and BOINC. but if I have just one system using the default 30 second cooldown, it hogs the connection and no one else can communicate. please make your default cooldown longer and/or allow multiple connections from a single IP address. I can't find any cooldown command in my cc_config. How does one implement this fix??? Should this be set 0 or 1??? <report_results_immediately>1</report_results_immediately> Ian was asking Toni to change the server default to a longer period. 31 seconds is just too often. Both Ian and myself use a proprietary GPUUG client that offers a configurable cooldown period for any project through a special configuration file. With that we have been able to tame both Milkyway and GPUGrid. That is not available with the standard client.
	ID: 54913 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1048 Credit: 40,228,383,983 RAC: 547,485 Level Scientific publications	Message 54916 - Posted: 23 May 2020 \| 23:49:41 UTC - in response to Message 54913.
	You could probably script something to get the job done with boinccmd though. Like project update, wait, disable networking, wait 10mins, Re-enable networking, project update, wait. Something like that. I’d have to look at all the command options available to see if there’s a more elegant solution than this off the cuff guess ____________
	ID: 54916 \| Rating: 0 \| rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 740,445,933 RAC: 18,577 Level Scientific publications	Message 54919 - Posted: 24 May 2020 \| 3:41:56 UTC
	I've seen a somewhat similar problem where if one of my computers was uploading a GPUGRID output file, my other computer was blocked from sending ANYTHING to the internet, such as a request to view a certain webpage. In my case, persuading my ISP to install a different brand of ADSL modem was what it took to fix this problem.
	ID: 54919 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1313 Credit: 6,017,267,459 RAC: 9,928,960 Level Scientific publications	Message 54921 - Posted: 24 May 2020 \| 4:24:40 UTC - in response to Message 54919.
	I'd be curious to find out if this symptom is specific to hardware as in your ADSL modem. I wonder if everyone who is afflicted is an ADSL subscriber. I'm sure there are plenty of folk over in Europe with fiber connections that are immune to the issue. Or those on cable modems in fact?
	ID: 54921 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1105 Credit: 7,822,620,176 RAC: 1,815,365 Level Scientific publications	Message 54922 - Posted: 24 May 2020 \| 4:49:04 UTC - in response to Message 54921.
	Or those on cable modems in fact? I am on cable modem, and I havn't noticed this problem so far.
	ID: 54922 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1313 Credit: 6,017,267,459 RAC: 9,928,960 Level Scientific publications	Message 54923 - Posted: 24 May 2020 \| 5:29:57 UTC - in response to Message 54922. Last modified: 24 May 2020 \| 5:34:56 UTC
	Or those on cable modems in fact? I am on cable modem, and I havn't noticed this problem so far. Interesting. I'm assuming you have at least two or your hosts simultaneously crunching GPUGrid and using the same internet connection? [Edit] The only project website or any website in fact that I have this issue with is GPUGrid. But I am really curious now if only ADSL modem subscribers have the issue. @Ian, are you on an ADSL internet connection?
	ID: 54923 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1105 Credit: 7,822,620,176 RAC: 1,815,365 Level Scientific publications	Message 54924 - Posted: 24 May 2020 \| 5:42:58 UTC - in response to Message 54923.
	Or those on cable modems in fact? I am on cable modem, and I havn't noticed this problem so far. Interesting. I'm assuming you have at least two or your hosts simultaneously crunching GPUGrid and using the same internet connection? four hosts simultaneously
	ID: 54924 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 54927 - Posted: 24 May 2020 \| 9:58:45 UTC - in response to Message 54924.
	Are you positive it's not the upload bandwidth being saturated?
	ID: 54927 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 54928 - Posted: 24 May 2020 \| 10:28:00 UTC - in response to Message 54927.
	Are you positive it's not the upload bandwidth being saturated? I'm sure about it. I have this problem on my symmetrical 1Gbps fiber optics internet connection. Earlier I had an ADSL (through an old phone line copper wire) with 50Mbps download 15Mbps upload bandwidth, which had the same problem.
	ID: 54928 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 400 Credit: 13,823,914,882 RAC: 30,117,349 Level Scientific publications	Message 54932 - Posted: 24 May 2020 \| 13:26:55 UTC - in response to Message 54913.
	...use a proprietary GPUUG client that offers a configurable cooldown period for any project through a special configuration file. That is not available with the standard client. Ohh, it's an elitist thing :-) I had many idle computers this morning, all with Download Pending. I clicked Retry All from my BoincTasks Transfers page. Doing that too often seems to make the problem worse. Also, both http & https flavors suffer from this affliction. My cable modem in the US is an E31N2V1 Hitron Technologies. The spec sheet does not say ADSL: http://www.hitrontech.com/wp-content/uploads/2020/04/5338868b2233337bd4cb308c048cf211.pdf
	ID: 54932 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1589 Credit: 6,676,494,351 RAC: 9,088,187 Level Scientific publications	Message 54934 - Posted: 24 May 2020 \| 14:01:46 UTC - in response to Message 54927. Last modified: 24 May 2020 \| 14:07:54 UTC
	Are you positive it's not the upload bandwidth being saturated? Yes. 1) It's been happening for years, when things were quieter. 2) The hourly project-specified scheduler update is sufficient to block other computers, even when no data needs to be transferred (limit reached or no tasks). 3) Downloads also trigger it. My connection is hybrid, 'Fibre to the Cabinet' - optical trunk feed, VDSL for the final 500m. I'm getting Downstream 70.619 Mbps, Upstream 12.773 Mbps at the moment - the internet link from the UK doesn't allow me to saturate that. And yes - I had to take three goes even to get a preview of that post, because one of the machines in the spare bedroom upstairs decided to upload at the critical moment. I wasn't typing fast enough to saturate either your or my upload link! Edit - now typing from the machine upstairs. It had tried, but failed, to upload (log says 'connect() failed') - so no bandwidth used, except for the attempted handshake. It cleared on the retry, and downloaded a new task as well.
	ID: 54934 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1048 Credit: 40,228,383,983 RAC: 547,485 Level Scientific publications	Message 54935 - Posted: 24 May 2020 \| 15:26:31 UTC
	I have a gigabit up/down fiber connection at both locations that run my computers (separate external IPs) and experience the same problem at both locations if one system is running the default 30s cooldown. Changing the default cooldown on the project server side to something longer like 5-10mins will largely solve this problem for everyone without the need for each user to run a custom client to work around the problem. Toni, please implement this on the project servers. ____________
	ID: 54935 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1589 Credit: 6,676,494,351 RAC: 9,088,187 Level Scientific publications	Message 54936 - Posted: 24 May 2020 \| 16:12:02 UTC - in response to Message 54935.
	Ian, what are your current statistics? My two fastest machines are the two Linux boxes, each with 2x GTX 1660 Super or GTX 1660 Ti. I've got 321 valid tasks showing at the moment - since the start of the current run, probably. The runtimes are Max 8,994.80 sec 149 minutes Min 1,180.58 sec 19 minutes Avg 3,114.27 sec 51 minutes I'm guessing your fastest will be better than 19 minutes - maybe we ought to ask Toni to start with a 5 minute delay, and see how we go, before upping it to 10 minutes if we have to? I'm also worrying about what happens if we get more bad batches - these machines spit out the error tasks in just 3 seconds. Blow two of those in succession, and I'm left waiting for the next scheduler contact.
	ID: 54936 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1048 Credit: 40,228,383,983 RAC: 547,485 Level Scientific publications	Message 54939 - Posted: 24 May 2020 \| 16:52:39 UTC - in response to Message 54936.
	the shortest i've seen on my 2080ti (PL 225W) is about 800s (13.3mins) the longest i've seen on my 2080ti (PL 225W) is about 3200s (53.3mins) the shortest i've seen on my 2070 (PL 150W) is about 1200s (20mins) the longest i've seen on my 2070 (PL 150W) is about 6000s (1.6hrs) They could also allow more than 2 WU per GPU, and increase the max in-progress to reflect that. but really things like bad batches shouldn't be considered for figuring the cooldown IMO. treat that as an edge case. Plan for things to work normally most of the time. ____________
	ID: 54939 \| Rating: 0 \| rate: / Reply Quote

RFGuy_KCCO Send message Joined: 13 Feb 14 Posts: 6 Credit: 1,056,951,005 RAC: 147 Level Scientific publications	Message 54968 - Posted: 26 May 2020 \| 16:30:36 UTC
	Please fix this issue, as it is clearly causing problems with receiving and sending work for many users. I am setting "No New Work" on this project until the issue is corrected.
	ID: 54968 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1105 Credit: 7,822,620,176 RAC: 1,815,365 Level Scientific publications	Message 54969 - Posted: 26 May 2020 \| 18:05:25 UTC
	so far, I have had no connection problems. However, since this afternoon there are many of them. Should be fixed ASAP.
	ID: 54969 \| Rating: 0 \| rate: / Reply Quote

Gunnar Hjern Send message Joined: 22 May 20 Posts: 2 Credit: 22,042,067 RAC: 0 Level Scientific publications	Message 54970 - Posted: 26 May 2020 \| 19:10:59 UTC - in response to Message 54969. Last modified: 26 May 2020 \| 20:10:45 UTC
	Hi! I just experienced the same problem: I have two old HP Z220 with GTX-960 and GTX-750Ti, and both were standing still with files that wouldn't be downloaded, and no new tasks as dl was pending on the current task. It didn't help to abort the stalled downloads, or aborting the whole task - it was STILL complaining about those downloads!! :-( In the end it was nothing to do but to hit the "reset project" button on both of the machines, but that resulted in several hundred MB:s of downloading for each one! :-O Now both machines are up and running again - let's see how long it'll last. Hope admins will sort this problem out as soon as possible, before the server lines will be all bogged down. Happy crunching!!! //Gunnar
	ID: 54970 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,201,255,749 RAC: 0 Level Scientific publications	Message 54972 - Posted: 26 May 2020 \| 22:43:03 UTC - in response to Message 54970.
	It didn't help to abort the stalled downloads, or aborting the whole task - it was STILL complaining about those downloads!! That's a different problem. These tasks were created before the http->https transition, so they still want to download through http, but that won't succeed. You have to abort the downloads, then restart the BOINC manager, or manually edit the client_state.xml file (see the Warning: bad tasks re-appearing in the download queue thread for details).
	ID: 54972 \| Rating: 0 \| rate: / Reply Quote

Gunnar Hjern Send message Joined: 22 May 20 Posts: 2 Credit: 22,042,067 RAC: 0 Level Scientific publications	Message 54973 - Posted: 26 May 2020 \| 23:43:34 UTC - in response to Message 54972.
	Thanks for pointing that out! Didn't know about that problem as I'm pretty new on this project, and when the same thing happened on both my computers simultaneous, I thought it was related to this problem. :-) Hope them faulty tasks will be cleaned out from the database asap!! They are effectively locking up my machines and forcing me to reset the project manually. Happy crunching!!!
	ID: 54973 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Server and website : Server only allows one connection at a time from an IP? 30s cooldown is too short.

	About	Science	Volunteers	Performance	Forum	Join us	Donate