Advanced search

Message boards : Number crunching : KASHIF_HIVPR Errors?

Author Message
DigitalDingus
Send message
Joined: 2 Jun 09
Posts: 10
Credit: 21,969,126
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 18563 - Posted: 8 Sep 2010 | 2:58:53 UTC

I've had several of these give an Error While Computing. Anyone else? These WU's seem to estimate at almost twice the computing time as I normally have.
____________

Siegfried Niklas
Avatar
Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18564 - Posted: 8 Sep 2010 | 7:59:04 UTC

I reported it 4 days ago for G92 cards (compute capability 1.1) like 9800GT, 8800 GT (G92)...

http://www.gpugrid.net/forum_thread.php?id=2274

Old man
Send message
Joined: 24 Jan 09
Posts: 42
Credit: 16,676,387
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18565 - Posted: 8 Sep 2010 | 8:21:34 UTC

Here are also one:

http://www.gpugrid.net/result.php?resultid=2935402

My card are gtx 460

stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 460"
# Clock rate: 1.55 GHz
# Total amount of global memory: 804847616 bytes
# Number of multiprocessors: 7
# Number of cores: 56
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18566 - Posted: 8 Sep 2010 | 8:49:56 UTC - in response to Message 18565.

What drivers are you using?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18567 - Posted: 8 Sep 2010 | 9:38:02 UTC - in response to Message 18566.
Last modified: 8 Sep 2010 | 9:48:28 UTC

DigitalDingus is using two 9600 GSO (767MB) cards with driver: 19745
(Q9450, XP x86).
The fail times look random:

2935235 1870438 8 Sep 2010 7:06:12 UTC 8 Sep 2010 7:32:16 UTC Error while computing 1,496.16 11.69 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2934119 1869838 8 Sep 2010 2:53:40 UTC 8 Sep 2010 7:02:58 UTC Error while computing 14,446.09 23.11 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2934086 1869814 8 Sep 2010 1:40:10 UTC 8 Sep 2010 2:53:40 UTC Error while computing 2,728.41 11.33 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2931920 1868719 7 Sep 2010 12:15:59 UTC 8 Sep 2010 0:21:49 UTC Error while computing 20,453.97 14.77 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2930618 1868078 7 Sep 2010 4:36:49 UTC 12 Sep 2010 4:36:49 UTC In progress --- --- --- --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2930026 1867745 7 Sep 2010 4:03:02 UTC 7 Sep 2010 4:36:49 UTC Error while computing 1,912.63 12.89 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2928799 1867124 6 Sep 2010 19:51:29 UTC 6 Sep 2010 22:04:25 UTC Error while computing 7,864.14 8.88 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2928286 1866896 6 Sep 2010 15:19:01 UTC 7 Sep 2010 18:40:38 UTC Completed and validated 72,823.73 1,372.66 4,535.61 5,669.51 ACEMD2: GPU molecular dynamics v6.05 (cuda)
2927745 1866582 6 Sep 2010 15:19:01 UTC 6 Sep 2010 16:51:46 UTC Error while computing 5,424.13 41.77 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2925177 1865300 5 Sep 2010 21:53:33 UTC 6 Sep 2010 15:19:01 UTC Error while computing 36,642.95 80.09 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2924932 1865162 5 Sep 2010 20:14:39 UTC 6 Sep 2010 15:19:01 UTC Error while computing 42,419.78 43.20 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)

I would suggest you try the latest drivers 25896. If you keep getting failures try to find out what else is running when these tasks crash (if anything).

Tapio, your task failed after 4sec GPU time. Some tasks seem to fail within 20sec. These are not very significant and do not reduce your contribution by much. Your card seems to be running well.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18568 - Posted: 8 Sep 2010 | 10:43:59 UTC - in response to Message 18567.

@skgiven,

I had the same problems with windows xp pro + gts250 + 258.96 driver after a lot of hours processing. See other thread.

Success
____________
Ton (ftpd) Netherlands

DigitalDingus
Send message
Joined: 2 Jun 09
Posts: 10
Credit: 21,969,126
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 18572 - Posted: 8 Sep 2010 | 13:53:11 UTC - in response to Message 18568.
Last modified: 8 Sep 2010 | 13:54:16 UTC

Will try the newer nVidia drivers, if any exist. Just upgraded to the latest BOINC in case it made a difference, but it did not. Other than that, I'll be crunching Collatz for a while I think.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18573 - Posted: 8 Sep 2010 | 16:22:43 UTC - in response to Message 18572.

Driver 258.96 exists for this card.
Please try it!

Good luck
____________
Ton (ftpd) Netherlands

Profile Olivier
Avatar
Send message
Joined: 12 Jun 09
Posts: 1
Credit: 2,063,022
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 18588 - Posted: 9 Sep 2010 | 18:33:01 UTC - in response to Message 18563.

Same problem here unfortunatly. Theres something wrong with those kashif units ...

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18602 - Posted: 10 Sep 2010 | 8:57:20 UTC

@skgiven

Hi Kev,

Again after several hours (6) processing aborted. Windows XP-pro - gts250 258.96.
Gives also windows-message and waiting for answer, so no further processing during the night. I do not like this kind of errors. Do not send them anymore to this type of gpu-cards, please?
Good luck.


____________
Ton (ftpd) Netherlands

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18605 - Posted: 10 Sep 2010 | 10:29:07 UTC - in response to Message 18602.

The HIVPR_n1_bound tasks seem very troublesome on CC1.1 cards. I made suggestions to allow crunchers to opt out of crunching some task types. It would involve some work for the scientists on the project design and server layout. If GDF can get it implemented it would allow crunchers to deselect troublesome projects, which would make it useful for other problems too.
Did an update try to automatically install on your system overnight?
I think the issue primarily relates to crunching those tasks, and only occasionally appears for other tasks, so perhaps this can be worked around by the programmers; you managed to crunch two revlo_TRYP work units in the last couple of days, so the card is still a useful, working card. We just need you to crunch the good tasks for that type of card.

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18607 - Posted: 10 Sep 2010 | 11:21:03 UTC - in response to Message 18605.

The error from GPUgrid (HIVPR) causes a windows-error-message, which was waiting for a reply (send or no send to Microsoft). So all GPU-tasks were waiting during the night.
Keep on crunching!
____________
Ton (ftpd) Netherlands

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18608 - Posted: 10 Sep 2010 | 14:39:07 UTC - in response to Message 18607.

I expect the Microsoft Error was along the lines of,
acemd2_6.05_windows_intelx86__cuda *32 has stopped working.
If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight.
I'm guessing you have already restarted the system.

Do you know from the logs if a system update occured at that time of the error message (error logs), or some backup, defrag or other heavy CPU app ran - just in case something other than the task/driver is at fault here?

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18609 - Posted: 10 Sep 2010 | 14:47:29 UTC - in response to Message 18608.

Hi Kev,

I use this machine only for crunching 24/7, so no back-up, no updates etc.
Just Gpugrid and RNA or Ibercivis ore Freehal. I do no have to restart this system.

Success!
____________
Ton (ftpd) Netherlands

Tom Philippart
Send message
Joined: 12 Feb 09
Posts: 57
Credit: 23,376,686
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 18624 - Posted: 11 Sep 2010 | 10:15:18 UTC

I have the same problems with this card:

NVIDIA GPU 0: GeForce 9600 GT (driver version 25721, CUDA version 3010, compute capability 1.1, 496MB, 218 GFLOPS peak)

here's an example:
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [999]
Assertion failed: 0, file swanlib_nv.cpp, line 121

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18628 - Posted: 11 Sep 2010 | 11:01:43 UTC - in response to Message 18624.

Thanks for reporting the error. The same error has been posted up several times now, and the developers are aware of it.
A driver bug is catching out the applications when they run on CC1.1 cards. It does not always occur but is a concern. With long complex GPU calculations the odd error is always expected, but these tasks are more problematic than others.
Several suggestions and potential work around’s have been made.

Siegfried Niklas
Avatar
Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18630 - Posted: 11 Sep 2010 | 13:51:24 UTC - in response to Message 18608.

I expect the Microsoft Error was along the lines of,
acemd2_6.05_windows_intelx86__cuda *32 has stopped working.
If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight.


I did this trick several times over the last month (four 9800GT cards).
System restart without clicking away the "error message pop-up" worked for me mostly - even hours after the error happend.

With the current KASHIF_HIVPR_*_bound* (*_unbound*) errors it worked never.

Profile Fred J. Verster
Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18632 - Posted: 11 Sep 2010 | 20:53:51 UTC - in response to Message 18630.

Computer ID 78963
Report deadline 15 Sep 2010 15:54:10 UTC
Run time 11402.593746
CPU time 736.2813
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
MDIO ERROR: cannot open file "restart.coor"
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Time per step (avg over 275000 steps): 11.463 ms
# Approximate elapsed time for entire WU: 11462.898 s
called boinc_finish

</stderr_txt>
]]>

Validate state Geldig
Claimed credit 6322.41203703704
Granted credit 9483.61805555556
application version ACEMD2: GPU molecular dynamics v6.11 (cuda31)


With an 9800GTX+, it didn't work either.

____________

Knight Who Says Ni N!

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18636 - Posted: 12 Sep 2010 | 10:15:59 UTC - in response to Message 18632.

Fred ... you posted results from a good run out of a 480 and it does not look like you are even running a 9800 anymore so I'm not sure wehere you were going with that.
____________
Thanks - Steve

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18637 - Posted: 12 Sep 2010 | 11:49:15 UTC - in response to Message 18636.

Fred use to have a GTX470, and is now using a GTX480. That task completed on his 480 but failed on a GTX460 (not a 9800GTX+). I did see a 9800 failure against one of his GTX470 successes.

Fred, keep your good cards hooked up to GPUGrid, a GTX480 would be wasted anywhere else.

Profile Fred J. Verster
Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18639 - Posted: 12 Sep 2010 | 12:08:11 UTC - in response to Message 18637.
Last modified: 12 Sep 2010 | 12:21:14 UTC

Since the 9800GTX+ started making 'trouble', like overheating, which resulted
in faults, I first got a GTX470 which I traded for repairing an PII (Compaq).
Then I could buy a 'show-model', from which I've seen it work.
(All kinds of simulations), I bought it for €275 .(€485 normal+BTW)
I found out that these 'monsters', need a 650W(minimal), 850W is better, PSU
It draws 17A from it's 8 pin and 17A from it's 6 pin connectors and an additionel ~6 - 10A from the Mainboard. (ASUS P5E).
Now I have to find a way to get the 470 to work..........
But I'm glad I made the change, for GPUGrid it's working like a charm and on
SETI@Home, I now can run 3 MultiBeam's (0.04CPU+0.33GPU), at a time, so sometimes
BOINC 6.10.58, 64BIT, runs 7 SETI tasks and/or a mix of Einstein and other project.

I use driver 258.96 and CUDA 3.1.
And it looks like those KASHIF_HIVPR WU's, need to have compute capabillity
2.0. (2.1?)
____________

Knight Who Says Ni N!

mwgiii
Send message
Joined: 22 Jan 09
Posts: 8
Credit: 988,332,833
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18663 - Posted: 13 Sep 2010 | 23:48:11 UTC - in response to Message 18639.

All of the KASHIF_HIVPR are generating errors on both of my machines.

Out of the first two pages of my Tasks (40 work units), I have had 24 work units error out, all KASHIF_HIVPR. It is killing my contributions as ftpd said, the GPU crunching halts until I notice the error message.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18664 - Posted: 14 Sep 2010 | 0:13:36 UTC - in response to Message 18663.

Probably best to do a system restart and then abort the download of any KASHIF_HIVPR tasks that you pick up.
Hopefully you will pick up other work units.

mwgiii
Send message
Joined: 22 Jan 09
Posts: 8
Credit: 988,332,833
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18665 - Posted: 14 Sep 2010 | 2:01:32 UTC - in response to Message 18664.

I reboot around every other day. If I see anymore KASHIF, I will abort immediately.
____________

ralle030583
Send message
Joined: 19 Aug 10
Posts: 19
Credit: 830,540
RAC: 0
Level
Gly
Scientific publications
watwatwat
Message 18693 - Posted: 15 Sep 2010 | 18:45:12 UTC - in response to Message 18665.
Last modified: 15 Sep 2010 | 18:46:53 UTC

seems also that all KASHIF.. task fail at my Geforce 9800 GT :-/
(ok currently evething failed cause a OC attemp, but KASHIF task didnt work before OC ^^)
____________

zenitur
Send message
Joined: 25 Sep 10
Posts: 2
Credit: 285,845
RAC: 0
Level

Scientific publications
watwat
Message 18781 - Posted: 29 Sep 2010 | 10:17:57 UTC

I have same error:

http://www.gpugrid.net/result.php?resultid=3030293
http://www.gpugrid.net/result.php?resultid=3028306

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700]
acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char*, int3, int3, size_t, ...): Assertion `0' failed.
SIGABRT: abort called
Stack trace (17 frames):
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d]
/lib/libc.so.6(+0x324c0)[0x7f4e7810d4c0]
/lib/libc.so.6(gsignal+0x35)[0x7f4e7810d445]
/lib/libc.so.6(abort+0x180)[0x7f4e7810e860]
/lib/libc.so.6(__assert_fail+0xf1)[0x7f4e781064e1]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45feae]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x46032f]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45db09]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45b400]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f4e780f9d2d]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569]

Exiting...

</stderr_txt>
]]>

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [700]
acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char*, int3, int3, size_t, ...): Assertion `0' failed.
SIGABRT: abort called
Stack trace (14 frames):
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d]
/lib/libc.so.6(+0x324c0)[0x7f1c49b544c0]
/lib/libc.so.6(gsignal+0x35)[0x7f1c49b54445]
/lib/libc.so.6(abort+0x180)[0x7f1c49b55860]
/lib/libc.so.6(__assert_fail+0xf1)[0x7f1c49b4d4e1]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45d3f9]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f1c49b40d2d]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569]

Exiting...

</stderr_txt>
]]>

Only on KASHIF tasks. TONI always work fine.

zenitur
Send message
Joined: 25 Sep 10
Posts: 2
Credit: 285,845
RAC: 0
Level

Scientific publications
watwat
Message 18807 - Posted: 2 Oct 2010 | 19:26:01 UTC - in response to Message 18781.

I found a reason of my error. This is automatic suspend. After restart KASHIF tasks make an error.

Profile Saenger
Avatar
Send message
Joined: 20 Jul 08
Posts: 134
Credit: 23,657,183
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 18916 - Posted: 11 Oct 2010 | 12:30:47 UTC

I just had this one wrecked:

stderr out
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.34 GHz
# Total amount of global memory: 536150016 bytes
# Number of multiprocessors: 12
# Number of cores: 96
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>


I don't have the faintest idea why it was restarted (or what "restart.coor" is good for at all), I don't run other projects on the GPU in parallel, and I wasn't doing anything on the machine at that time.

____________
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki

Post to thread

Message boards : Number crunching : KASHIF_HIVPR Errors?

//