Advanced search

Message boards : Graphics cards (GPUs) : output file missing

Author Message
pharrg
Send message
Joined: 12 Jan 09
Posts: 36
Credit: 1,075,543
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 10403 - Posted: 3 Jun 2009 | 1:15:44 UTC
Last modified: 3 Jun 2009 | 1:16:19 UTC

I had the following occur on one work unit today, messages saying output file missing. I don't know of any errors that occured as there are no error messages, but here are the lines that did appear in the message tab.

6/2/2009 9:36:27 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-LICENSE
6/2/2009 9:36:28 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-LICENSE
6/2/2009 9:36:28 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-COPYRIGHT
6/2/2009 9:36:29 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-COPYRIGHT
6/2/2009 9:36:29 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-p395000-IBUCH_pYEpYIk1_2105-8-10-RND1107_1
6/2/2009 9:36:31 AM GPUGRID Finished download of p1160000-IBUCH_phYIphYI_rpdb_2905-5-psf_file
6/2/2009 9:36:31 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-p395000-IBUCH_pYEpYIk1_2105-8-10-RND1107_2
6/2/2009 9:36:32 AM GPUGRID Starting p1160000-IBUCH_phYIphYI_rpdb_2905-5-10-RND4180_0
6/2/2009 9:36:33 AM GPUGRID Starting task p1160000-IBUCH_phYIphYI_rpdb_2905-5-10-RND4180_0 using acemd version 664
6/2/2009 9:36:37 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-p395000-IBUCH_pYEpYIk1_2105-8-10-RND1107_1
6/2/2009 9:36:37 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-p395000-IBUCH_pYEpYIk1_2105-8-10-RND1107_3
6/2/2009 9:36:43 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-p395000-IBUCH_pYEpYIk1_2105-8-10-RND1107_2
6/2/2009 9:36:43 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-pdb_file
6/2/2009 9:36:44 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-p395000-IBUCH_pYEpYIk1_2105-8-10-RND1107_3
6/2/2009 9:36:44 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-psf_file
6/2/2009 9:37:01 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-pdb_file
6/2/2009 9:37:01 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-par_file
6/2/2009 9:37:06 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-par_file
6/2/2009 9:37:06 AM GPUGRID Started download of p395000-IBUCH_pYEpYIk1_2105-9-p395000
6/2/2009 9:37:07 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-p395000
6/2/2009 9:37:11 AM GPUGRID Finished download of p395000-IBUCH_pYEpYIk1_2105-9-psf_file
6/2/2009 9:37:12 AM GPUGRID Starting p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0
6/2/2009 9:37:12 AM GPUGRID Starting task p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0 using acemd version 664

6/2/2009 10:41:37 AM GPUGRID Computation for task p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0 finished
6/2/2009 10:41:37 AM GPUGRID Output file p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0_1 for task p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0 absent
6/2/2009 10:41:37 AM GPUGRID Output file p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0_2 for task p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0 absent
6/2/2009 10:41:37 AM GPUGRID Output file p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0_3 for task p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0 absent
6/2/2009 10:41:39 AM GPUGRID Started upload of p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0_0
6/2/2009 10:41:42 AM GPUGRID Finished upload of p395000-IBUCH_pYEpYIk1_2105-9-10-RND1107_0_0

The other work unit I had running at the time completed without issue.

pharrg
Send message
Joined: 12 Jan 09
Posts: 36
Credit: 1,075,543
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 10404 - Posted: 3 Jun 2009 | 1:37:58 UTC

Update: I just found this over at the BOINC client forums. It's part of the change log for the just released 6.6.33 version. I've been running 6.6.28, perhaps this is what I saw. If so, maybe it's fixed now.
----------------

I want to stress one change though, especially for the CUDA users:

- client: fixed nasty bug that caused GPU jobs to crash on startup when they're preempting another GPU job. The problem was as follows:

* job A is chosen to preempt job B
* we tell job B to quit, and initialize job A but don't start it; however, we set if scheduler state to SCHEDULED (rather than UNINITIALIZED)

* job B exits, and we start job A. Since its state is not UNITIALIZED, we don't set up its slot dir.

* job A runs in an empty slot dir, doesn't find its files, and bombs out.

* client: add <slot_debug> option (prints messages about allocation of slots, creating/removing files in slot dirs).

-----------------------

I'll install the new version and watch to see if it happens again.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10407 - Posted: 3 Jun 2009 | 5:39:07 UTC
Last modified: 3 Jun 2009 | 5:40:06 UTC

It is not related. The bug addressed in the change would be experienced as a zero time crash. The task starts and IMMEDIATELY dies. It will have no run time on the clock at all.

The scenario is you would have a GPU Grid task running. Down load a new task with an earlier report time. The currently running task is stopped and the new task is started and it will die immediately.

I don't think that we see this issue here because of the differences in issuing work from SaH where you can be running tasks with deadlines 2 weeks hence and then download tasks with a deadline in a week ... those will preempt the running tasks ...

{edit}

In your case I would be looking to the "standard" causes, other applications running, games, heat, drivers, imps, trolls, mice, and other evil spirits ... :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10506 - Posted: 12 Jun 2009 | 20:49:08 UTC

"output file missing" is not the error. You got some error (probably "Incorrect function. (0x1) - exit code 1 (0x1)" here), the GPU-Grid app terminated itself and didn't write all result files - because it didn't get to actually calculating these results.

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : output file missing

//