Advanced search

Message boards : Server and website : Project communication failed / project servers may be temporarily down

Author Message
ashes999
Avatar
Send message
Joined: 28 May 10
Posts: 19
Credit: 3,135,753
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 40771 - Posted: 7 Apr 2015 | 11:25:34 UTC

Hi,

I added this project through BAM yesterday (and tried removing it and re-adding it today). I get the same two errors:


4/7/2015 7:26:02 AM | | Attaching to http://www.gpugrid.net/
4/7/2015 7:26:05 AM | | Project communication failed: attempting access to reference site
4/7/2015 7:26:06 AM | | Internet access OK - project servers may be temporarily down.


I read the HTTPS thread, and thought it might be related; but BAM still lists the project website as HTTP.

Over 24 hours, I still see the same error.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40772 - Posted: 7 Apr 2015 | 11:42:11 UTC - in response to Message 40771.

I'm connecting OK so BAM may or may not be the problem.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1409
Credit: 3,494,159,449
RAC: 415,993
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40773 - Posted: 7 Apr 2015 | 14:15:02 UTC

I think we may have hit a problem with (lack of) url redirection.

If I paste "http://www.gpugrid.net" into a web browser address bar, the browser automatically redirects to "https://www.gpugrid.net". But I don't think the BOINC client automatically follows a redirect (probably a wise choice, for security reasons).

My computers are all currently still attached to the http:// version of the master url:

<project>
<master_url>http://www.gpugrid.net/</master_url>

Most are still picking up work OK on demand, but one has gone into a permanent 24-hour backoff cycle because it can't access that old url, and it wants to refresh the scheduler.

The machines which are still working have picked up a double entry for the scheduler:

<scheduler_url>http://www.ps3grid.net/PS3GRID_cgi/cgi</scheduler_url>
<scheduler_url>https://www.gpugrid.net/PS3GRID_cgi/cgi</scheduler_url>

- presumably from a transitional page - but the machine in constant backoff only has a single scheduler entry.

We obviously need to pass an official message to BAM, asking Willy to update the master url: but in the meantime, it might be possible to get backed-off clients (especially directly-attached clients) to fetch work by manually adding that second scheduler url.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1409
Credit: 3,494,159,449
RAC: 415,993
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40774 - Posted: 7 Apr 2015 | 14:34:47 UTC

Afterthought - if anyone does manage to get a newly-attached host to run by using that 'double scheduler url' trick, I'd be interested to hear what initial runtime estimate they get for their first task (as per my post two days ago).

ashes999
Avatar
Send message
Joined: 28 May 10
Posts: 19
Credit: 3,135,753
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 40775 - Posted: 7 Apr 2015 | 15:20:20 UTC - in response to Message 40774.

@Richard if you can explain how to manually add that double-entry (which file?), I will happily be your guinea pig.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1409
Credit: 3,494,159,449
RAC: 415,993
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40776 - Posted: 7 Apr 2015 | 16:25:26 UTC - in response to Message 40775.

@Richard if you can explain how to manually add that double-entry (which file?), I will happily be your guinea pig.

OK - easily described, but please be aware that you will be editing BOINC's most critical data file. Please read carefully, and follow the instructions exactly and carefully. I'm writing for Windows only, since that's what your older computers were running.

BOINC keeps all its data files in a single data folder - you need to locate that folder. The default locations might be

Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC
Windows Vista/Windows 7: C:\ProgramData\BOINC

in both cases, the folder may be hidden. Or you may have chosen your own location when you installed BOINC. In any event, the current location will be displayed in the Event log every time BOINC starts up, on about the 4th line.

Open that folder so you can see the files it contains. The one we will be working on is

client_state.xml

(or it might simply be displayed as 'client_state', with the Type 'XML document' shown separately)

Now, ensure that BOINC is not running - fully shut down

Now, open client_state.xml for editing, using a simple text editor - Notepad (available on all Windows systems) is fine. Right-click on the file name: chose 'edit' (if shown), or 'open with...' and look for Notepad.

Once you have the file open, search (ctrl-F) for 'gpugrid'. The first hit you find should be the 'master url', just after the word <project> (as shown in the first example in my post). Leave that one alone.

Look a little bit further down the file, for <scheduler_url> - just before the <code_sign_key>. You'll probably just have the one line, pointing to www.ps3grid.net

Add the second line below it, as in my second example last time (https://www.gpugrid.net etc.). That's the only change - don't change anything else. Save the file and restart BOINC. You're done.

ashes999
Avatar
Send message
Joined: 28 May 10
Posts: 19
Credit: 3,135,753
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 40778 - Posted: 7 Apr 2015 | 16:50:22 UTC

I didn't realize the machine I have access to now doesn't use BAM. Interestingly, when I registered it with BAM, it still shows HTTP (not HTTPS), but it can connect to the project server.

I'll try the first machine later today and post back my results.

ashes999
Avatar
Send message
Joined: 28 May 10
Posts: 19
Credit: 3,135,753
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 40785 - Posted: 8 Apr 2015 | 2:23:29 UTC
Last modified: 8 Apr 2015 | 2:27:01 UTC

Interestingly, I didn't have any <scheduler_url> tags for GPUGRID.

I tried adding both to match what you have. I get an error like:

4/7/2015 10:29:10 PM | GPUGRID | Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates


The code_sign_key section appeared, though, which is interesting. I still get the same message about project communication failing.

I tried re-deleting both scheduler URLs, but it didn't make a difference.

Deleting and re-adding the project through BAM didn't make a difference.

I wonder if I'm missing a Windows Update with some new root authority or intermediate CA certs?

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 641,182,245
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41048 - Posted: 7 May 2015 | 20:17:04 UTC

Same problem here, I can't add new computer - servers down etc...

Lost Cavallero
Send message
Joined: 6 May 13
Posts: 4
Credit: 50,548,030
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 41110 - Posted: 19 May 2015 | 5:47:53 UTC
Last modified: 19 May 2015 | 6:01:52 UTC

I have the same problem. Same mesage in BOINC log.

I just want to know it it me or is it server ERROR?

UPDATE:

I have removed and add project again and now is working fine!

Robert Gammon
Send message
Joined: 28 May 12
Posts: 63
Credit: 714,535,121
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41130 - Posted: 23 May 2015 | 20:40:41 UTC

I can see wus returned thru the first of May 2015

But a BIG gap in data returned, as the next wus are from April 2014

My data reported to this site and the various stats sites do not show this error

My reported results show a near continuous rise over time

Post to thread

Message boards : Server and website : Project communication failed / project servers may be temporarily down

//