Message boards : Number crunching : ubuntu cuda100 not surviving restart of client
Author | Message |
---|---|
Restarted the client and lost all 3 Linux cuda 100 tasks. Did not realize this was a problem. | |
ID: 53165 | Rating: 0 | rate: / Reply Quote | |
Restarted the client and lost all 3 Linux cuda 100 tasks. Did not realize this was a problem.The reason for this error is in the stderr output of the task: <core_client_version>7.16.1</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
09:41:49 (11866): wrapper (7.7.26016): starting
09:41:49 (11866): wrapper (7.7.26016): starting
09:41:49 (11866): wrapper: running acemd3 (--boinc input --device 1)
13:57:59 (13231): wrapper (7.7.26016): starting
13:57:59 (13231): wrapper (7.7.26016): starting
13:57:59 (13231): wrapper: running acemd3 (--boinc input --device 0)
ERROR: /home/user/conda/conda-bld/acemd3_1570536635323/work/src/mdsim/context.cpp line 322:
Cannot use a restart file on a different device!
13:58:05 (13231): acemd3 exited; CPU time 5.243312
13:58:05 (13231): app exit status: 0x9e
13:58:05 (13231): called boinc_finish(195)
</stderr_txt>
]]> This could happen only on hosts with multiple GPUs (this is a known bug of the ACEMD3 app).To resolve this you should 1. make notes of task-device pairs 2. suspend all GPUGrid tasks (first the ones which are not running ["ready to start"]) 3. restart your host 4. resume your GPUGrid tasks in the order of the device numbers (the task was running on device 0 should be resumed first and so on) | |
ID: 53173 | Rating: 0 | rate: / Reply Quote | |
This could happen only on hosts with multiple GPUs (this is a known bug of the ACEMD3 app). Thanks, was not aware of that! Going to be a real problem as there is a windows 10 "feature 1909" pending. However, ubuntu will be unaffected. Not sure if you noticed, but my "El Cheapo" P102-100 mining card "D1" is far and away the faster of the 1660Ti "D0" and especially the GTX-1070 "D2" GPUGRID 2.10 New version of ACEMD (cuda100) 0.983C + 1NV (d1) 99.87 02:30:22 (02:30:10) 04:16:50 57.000 Running tb85-nvidia test449-TONI_GSNTEST3-6-100-RND1891_0 12/2/2019 9:53:34 AM JStateson GPUGRID 2.10 New version of ACEMD (cuda100) 0.983C + 1NV (d0) 99.91 02:30:20 (02:30:12) 04:40:43 53.000 Running tb85-nvidia initial_1911-ELISA_GSN4V1-9-100-RND1684_0 12/2/2019 11:52:22 AM JStateson GPUGRID 2.10 New version of ACEMD (cuda100) 0.983C + 1NV (d2) 99.89 02:30:19 (02:30:09) 05:28:30 45.000 Running tb85-nvidia initial_1243-ELISA_GSN4V1-1-100-RND2537_0 12/2/2019 1:44:26 PM JStateson start time for all 3 above was 2:30:19 within 3 seconds. The mining card will finish an hour ahead of the 1660Ti and 2 hours ahead of the 1070 is my guess | |
ID: 53174 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : ubuntu cuda100 not surviving restart of client