Advanced search

Message boards : Graphics cards (GPUs) : Cancelled by server

Author Message
frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 9001 - Posted: 27 Apr 2009 | 17:22:03 UTC

is this by purpose, or what?

initial replication 10 and killing running jobs???

wuid=415152

590172 32612 27 Apr 2009 9:28:08 UTC 2 May 2009 9:28:08 UTC In progress --- New --- --- ---
590173 29707 27 Apr 2009 9:27:30 UTC 27 Apr 2009 15:58:46 UTC Over Redundant result Cancelled by server 3,358.81 3,946.78 ---
590174 22935 27 Apr 2009 9:27:59 UTC 2 May 2009 9:27:59 UTC In progress --- New --- --- ---
590175 18304 27 Apr 2009 9:28:07 UTC 27 Apr 2009 16:03:36 UTC Over Redundant result Cancelled by server 0.00 --- ---
590176 23183 27 Apr 2009 9:28:20 UTC 2 May 2009 9:28:20 UTC In progress --- New --- --- ---
590177 33634 27 Apr 2009 9:29:38 UTC 2 May 2009 9:29:38 UTC In progress --- New --- --- ---
590178 30738 27 Apr 2009 9:29:00 UTC 27 Apr 2009 16:15:14 UTC Over Redundant result Cancelled by server 0.00 --- ---
590179 28591 27 Apr 2009 9:31:41 UTC 2 May 2009 9:31:41 UTC In progress --- New --- --- ---
590180 19103 27 Apr 2009 9:27:25 UTC 27 Apr 2009 15:55:04 UTC Over Redundant result Cancelled by server 0.00 --- ---
590181 16930 27 Apr 2009 9:29:06 UTC 27 Apr 2009 16:13:09 UTC Over Redundant result Cancelled by server 11.40 3,946.78 ---

Fuzzy Duck
Send message
Joined: 28 Mar 09
Posts: 6
Credit: 6,972,294
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9003 - Posted: 27 Apr 2009 | 18:30:31 UTC - in response to Message 9001.
Last modified: 27 Apr 2009 | 18:39:45 UTC

I just had a similar problem, with a partially completed WU cancelled after 10 hours (approx 80% complete).

27/04/2009 19:19:54 GPUGRID Message from server: Result p1380000-GIANNI_pYIpYVk12204-6-10-RND8950_0 is no longer usable
27/04/2009 19:19:55 GPUGRID Computation for task p1380000-GIANNI_pYIpYVk12204-6-10-RND8950_0 finished

Is this by design or is it an error???

Here is the WU in question.
http://www.gpugrid.net/workunit.php?wuid=413724

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 9015 - Posted: 27 Apr 2009 | 20:49:07 UTC - in response to Message 9003.
Last modified: 28 Apr 2009 | 9:00:23 UTC

We cancelled a set (hopefully small) of running WUs. That happened in relation to a fix reported in another thread (download error). In handling the huge amount of GPUs that you generously donate we struggle, from time to time, with the unpredictable...

Replication for most jobs is 1 or 2 for a few. Some were created with 10 in a prototype we were testing to "push" late WUs.

frankhagen
Send message
Joined: 18 Sep 08
Posts: 65
Credit: 3,037,414
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 9016 - Posted: 27 Apr 2009 | 20:53:57 UTC - in response to Message 9015.

hmm - if things like this happen, you should at least grant some credit for the killed WUs. it's not funny to loose several hours of crunching time due to a faulty scheduler-setup..

Profile DoctorNow
Avatar
Send message
Joined: 18 Aug 07
Posts: 83
Credit: 122,995,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9041 - Posted: 28 Apr 2009 | 3:40:42 UTC - in response to Message 9016.
Last modified: 28 Apr 2009 | 3:41:34 UTC

hmm - if things like this happen, you should at least grant some credit for the killed WUs. it's not funny to loose several hours of crunching time due to a faulty scheduler-setup..

I saw this on the WU on one of my team mates.
The problem is, when the WU is marked as redundant, the reported WU doesn't contain how long it has run in the log file, and she said it was already at 50%! So it isn't even possible to grant partial credits.
____________
Member of BOINC@Heidelberg and ATA!

mscharmack
Avatar
Send message
Joined: 20 Aug 07
Posts: 18
Credit: 1,319,274
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 9044 - Posted: 28 Apr 2009 | 4:05:05 UTC

I've had that done to me also, 90% plus (about 18 hours work) done and chopped off at the knees. A big fat "GOOSE EGG" for the credit. I've started to abort work units where the initial replication is more than "1"

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 9074 - Posted: 28 Apr 2009 | 16:05:29 UTC - in response to Message 9044.

This is not a problem of replication. Replication does not cancel the workunit if you are running it.

It's a problem that WUs were manually canceled to eliminate the one with download result problems, some of these had the files and were actually running.

gdf

Post to thread

Message boards : Graphics cards (GPUs) : Cancelled by server

//