Advanced search

Message boards : Server and website : Welcome back Gpugrid

Author Message
Betting Slip
Send message
Joined: 5 Jan 09
Posts: 574
Credit: 1,910,112,875
RAC: 1,775,295
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46106 - Posted: 9 Jan 2017 | 11:17:25 UTC

Nice to see you reconnected.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 240
Credit: 968,126,081
RAC: 3,516,538
Level
Glu
Scientific publications
watwat
Message 46107 - Posted: 9 Jan 2017 | 11:29:58 UTC

My house is nice and chilly now, -22C outside

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,455,399,277
RAC: 2,665,023
Level
Met
Scientific publications
watwatwat
Message 46108 - Posted: 9 Jan 2017 | 11:39:16 UTC

I am asking just out of curiosity: what was the reason for this lenghty outage?

Profile Logan Carr
Send message
Joined: 12 Aug 15
Posts: 193
Credit: 25,979,525
RAC: 19
Level
Val
Scientific publications
wat
Message 46109 - Posted: 9 Jan 2017 | 13:16:37 UTC - in response to Message 46108.

I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that.

Hope this is helpful

-Logan

Riaan
Send message
Joined: 16 Dec 10
Posts: 4
Credit: 19,812,500
RAC: 0
Level
Pro
Scientific publications
wat
Message 46111 - Posted: 9 Jan 2017 | 18:15:25 UTC

I can't seem to find any reason for the down time nor an apology for it.

Maybe my GPU cycles are better spent on a project that monitors their systems over a weekend and has better up-time.

Since they don't seem to look after their own systems, what would they care about my hard worked data?

Or at least point me at the post that shows you care about us.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1073
Credit: 4,465,023,904
RAC: 418,532
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46112 - Posted: 9 Jan 2017 | 18:43:26 UTC - in response to Message 46109.

I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan

Logan, the BOINC site being down has nothing to do with GPUGrid. Apparently the GPUGrid server crashed during the weekend and nobody noticed. It sure did create a crazy backlog of WUs trying to upload. :-(

Profile Logan Carr
Send message
Joined: 12 Aug 15
Posts: 193
Credit: 25,979,525
RAC: 19
Level
Val
Scientific publications
wat
Message 46113 - Posted: 9 Jan 2017 | 19:55:35 UTC - in response to Message 46112.
Last modified: 9 Jan 2017 | 19:56:53 UTC

I checked the boinc project website last night and it said that Boinc was down. This is probably the reason gpugrid was down. Also, I don't know if anybody's said this but my upload was pending saying "project backoff" something like that. Hope this is helpful -Logan

Logan, the BOINC site being down has nothing to do with GPUGrid. Apparently the GPUGrid server crashed during the weekend and nobody noticed. It sure did create a crazy backlog of WUs trying to upload. :-(



Ah alright. Thanks for letting me know! The timing must have been just right then, haha.

My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away. We all need to step away from our jobs sometimes, so maybe that's what they did.

Either way, the website is back up and that's all that counts, right? Let's all try to think positively about these situations. Also if this website was down for much longer, check gpugrid's twitter account. They post useful stuff there.



I have no intention of lecturing if it appears that way. I'm just trying to make positive vibes :)
____________
Cruncher/Learner in progress.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 332
Credit: 3,759,688,409
RAC: 392,906
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46115 - Posted: 9 Jan 2017 | 22:51:42 UTC - in response to Message 46108.

I am asking just out of curiosity: what was the reason for this lenghty outage?



I would like to know as well.


And somehow, I received 5 ghost units during this outage!



Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,455,399,277
RAC: 2,665,023
Level
Met
Scientific publications
watwatwat
Message 46117 - Posted: 10 Jan 2017 | 6:21:23 UTC - in response to Message 46113.

My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away.

No problem if the scientists themselves had taken the weekend off, they have produced plenty of WUs during last week anyway.
However, I was a little surprised that there was not even one IT person at least in any kind of standby and would have noticed already on Saturday evening that there was a problem that got even worse by Sunday morning.

John
Send message
Joined: 15 Oct 11
Posts: 16
Credit: 73,362,928
RAC: 39,902
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46118 - Posted: 10 Jan 2017 | 15:57:24 UTC - in response to Message 46117.

My assumption is rather that the scientists might have taken the weekend off and maybe that's why it wasn't noticed right away.

No problem if the scientists themselves had taken the weekend off, they have produced plenty of WUs during last week anyway.
However, I was a little surprised that there was not even one IT person at least in any kind of standby and would have noticed already on Saturday evening that there was a problem that got even worse by Sunday morning.


Was wondering the same...nobody checking on the server(s) for (approx. 2 days....)
Did not get my 24 hr bonus because of this..
I know small potatoes..... :)

Stefan
Volunteer moderator
Project developer
Project scientist
Send message
Joined: 5 Mar 13
Posts: 258
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 46140 - Posted: 10 Jan 2017 | 22:56:54 UTC

There was a server crash. Sometimes it can take us a day to notice if we are not currently actively monitoring everything. Sorry for any inconvenience caused by it. Maybe best send us a mail if it happens again.

Wiyosaya
Send message
Joined: 22 Nov 09
Posts: 108
Credit: 143,790,703
RAC: 356,990
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46157 - Posted: 12 Jan 2017 | 15:43:41 UTC - in response to Message 46140.

Thanks for the update.

Unfortunately, the WU that I had gotten before the crash and had finished without error uploaded and was not credited. That has happened before, but not that often, and these were extenuating circumstances, so I am not all that concerned.
____________

captainjack
Send message
Joined: 9 May 13
Posts: 109
Credit: 734,440,997
RAC: 49,537
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46160 - Posted: 12 Jan 2017 | 17:17:51 UTC

Stephan said,


Maybe best send us a mail if it happens again.


Where do we send the email when the GPUGRID site is unavailable?

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 240
Credit: 968,126,081
RAC: 3,516,538
Level
Glu
Scientific publications
watwat
Message 46168 - Posted: 13 Jan 2017 | 12:34:40 UTC

I had two BNBS2 WUs run for 100k+ seconds and had a validation error, can anyone explain this?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,956,147,444
RAC: 6,322,828
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46174 - Posted: 14 Jan 2017 | 17:36:28 UTC - in response to Message 46168.
Last modified: 14 Jan 2017 | 17:38:02 UTC

I had two BNBS2 WUs run for 100k+ seconds and had a validation error, can anyone explain this?

Perhaps your host had a power outage, and these GPUGrid tasks restarted from 0%.
In such cases it is practical to abort the workunits manually, as there's no point in spending time and electricity crunching them.

Here's two excerpts from the stderr.txt of your failed tasks:
1st:
# GPU 2 : 73C # GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 970

2nd:
# GPU 0 : 73C # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690
Note that there's no line explaining the reason to the exit from the application between the 1st and the 2nd line, which is usually the sign of a dirty shutdown.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 240
Credit: 968,126,081
RAC: 3,516,538
Level
Glu
Scientific publications
watwat
Message 46194 - Posted: 16 Jan 2017 | 2:29:01 UTC

How did you get that information zoltan? I've been curious to see some of your WUs

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,956,147,444
RAC: 6,322,828
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46229 - Posted: 18 Jan 2017 | 22:32:40 UTC - in response to Message 46194.

How did you get that information zoltan? I've been curious to see some of your WUs

Every host computer have a list of workunits. If you click on the ID (or name in other view, it's the first column of the tasklist) of a finished task, you can see detailed information of the given task, and the second part is the "stderr output" which is generated by the task while it is running.

Post to thread

Message boards : Server and website : Welcome back Gpugrid