KASHIF_HIVPR Errors?

Message boards : Number crunching : KASHIF_HIVPR Errors?

Author	Message
DigitalDingus Send message Joined: 2 Jun 09 Posts: 10 Credit: 21,969,126 RAC: 0 Level Scientific publications	Message 18563 - Posted: 8 Sep 2010 \| 2:58:53 UTC
	I've had several of these give an Error While Computing. Anyone else? These WU's seem to estimate at almost twice the computing time as I normally have. ____________
	ID: 18563 \| Rating: 0 \| rate: / Reply Quote

Siegfried Niklas Send message Joined: 23 Feb 09 Posts: 39 Credit: 144,654,294 RAC: 0 Level Scientific publications	Message 18564 - Posted: 8 Sep 2010 \| 7:59:04 UTC
	I reported it 4 days ago for G92 cards (compute capability 1.1) like 9800GT, 8800 GT (G92)... http://www.gpugrid.net/forum_thread.php?id=2274
	ID: 18564 \| Rating: 0 \| rate: / Reply Quote

Old man Send message Joined: 24 Jan 09 Posts: 42 Credit: 16,676,387 RAC: 0 Level Scientific publications	Message 18565 - Posted: 8 Sep 2010 \| 8:21:34 UTC
	Here are also one: http://www.gpugrid.net/result.php?resultid=2935402 My card are gtx 460 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.55 GHz # Total amount of global memory: 804847616 bytes # Number of multiprocessors: 7 # Number of cores: 56 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]>
	ID: 18565 \| Rating: 0 \| rate: / Reply Quote

ignasi Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level Scientific publications	Message 18566 - Posted: 8 Sep 2010 \| 8:49:56 UTC - in response to Message 18565.
	What drivers are you using?
	ID: 18566 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18567 - Posted: 8 Sep 2010 \| 9:38:02 UTC - in response to Message 18566. Last modified: 8 Sep 2010 \| 9:48:28 UTC
	DigitalDingus is using two 9600 GSO (767MB) cards with driver: 19745 (Q9450, XP x86). The fail times look random: 2935235 1870438 8 Sep 2010 7:06:12 UTC 8 Sep 2010 7:32:16 UTC Error while computing 1,496.16 11.69 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2934119 1869838 8 Sep 2010 2:53:40 UTC 8 Sep 2010 7:02:58 UTC Error while computing 14,446.09 23.11 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2934086 1869814 8 Sep 2010 1:40:10 UTC 8 Sep 2010 2:53:40 UTC Error while computing 2,728.41 11.33 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2931920 1868719 7 Sep 2010 12:15:59 UTC 8 Sep 2010 0:21:49 UTC Error while computing 20,453.97 14.77 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2930618 1868078 7 Sep 2010 4:36:49 UTC 12 Sep 2010 4:36:49 UTC In progress --- --- --- --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2930026 1867745 7 Sep 2010 4:03:02 UTC 7 Sep 2010 4:36:49 UTC Error while computing 1,912.63 12.89 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2928799 1867124 6 Sep 2010 19:51:29 UTC 6 Sep 2010 22:04:25 UTC Error while computing 7,864.14 8.88 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2928286 1866896 6 Sep 2010 15:19:01 UTC 7 Sep 2010 18:40:38 UTC Completed and validated 72,823.73 1,372.66 4,535.61 5,669.51 ACEMD2: GPU molecular dynamics v6.05 (cuda) 2927745 1866582 6 Sep 2010 15:19:01 UTC 6 Sep 2010 16:51:46 UTC Error while computing 5,424.13 41.77 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2925177 1865300 5 Sep 2010 21:53:33 UTC 6 Sep 2010 15:19:01 UTC Error while computing 36,642.95 80.09 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2924932 1865162 5 Sep 2010 20:14:39 UTC 6 Sep 2010 15:19:01 UTC Error while computing 42,419.78 43.20 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) I would suggest you try the latest drivers 25896. If you keep getting failures try to find out what else is running when these tasks crash (if anything). Tapio, your task failed after 4sec GPU time. Some tasks seem to fail within 20sec. These are not very significant and do not reduce your contribution by much. Your card seems to be running well.
	ID: 18567 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18568 - Posted: 8 Sep 2010 \| 10:43:59 UTC - in response to Message 18567.
	@skgiven, I had the same problems with windows xp pro + gts250 + 258.96 driver after a lot of hours processing. See other thread. Success ____________ Ton (ftpd) Netherlands
	ID: 18568 \| Rating: 0 \| rate: / Reply Quote

DigitalDingus Send message Joined: 2 Jun 09 Posts: 10 Credit: 21,969,126 RAC: 0 Level Scientific publications	Message 18572 - Posted: 8 Sep 2010 \| 13:53:11 UTC - in response to Message 18568. Last modified: 8 Sep 2010 \| 13:54:16 UTC
	Will try the newer nVidia drivers, if any exist. Just upgraded to the latest BOINC in case it made a difference, but it did not. Other than that, I'll be crunching Collatz for a while I think.
	ID: 18572 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18573 - Posted: 8 Sep 2010 \| 16:22:43 UTC - in response to Message 18572.
	Driver 258.96 exists for this card. Please try it! Good luck ____________ Ton (ftpd) Netherlands
	ID: 18573 \| Rating: 0 \| rate: / Reply Quote

Olivier Send message Joined: 12 Jun 09 Posts: 1 Credit: 2,063,022 RAC: 0 Level Scientific publications	Message 18588 - Posted: 9 Sep 2010 \| 18:33:01 UTC - in response to Message 18563.
	Same problem here unfortunatly. Theres something wrong with those kashif units ...
	ID: 18588 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18602 - Posted: 10 Sep 2010 \| 8:57:20 UTC
	@skgiven Hi Kev, Again after several hours (6) processing aborted. Windows XP-pro - gts250 258.96. Gives also windows-message and waiting for answer, so no further processing during the night. I do not like this kind of errors. Do not send them anymore to this type of gpu-cards, please? Good luck. ____________ Ton (ftpd) Netherlands
	ID: 18602 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18605 - Posted: 10 Sep 2010 \| 10:29:07 UTC - in response to Message 18602.
	The HIVPR_n1_bound tasks seem very troublesome on CC1.1 cards. I made suggestions to allow crunchers to opt out of crunching some task types. It would involve some work for the scientists on the project design and server layout. If GDF can get it implemented it would allow crunchers to deselect troublesome projects, which would make it useful for other problems too. Did an update try to automatically install on your system overnight? I think the issue primarily relates to crunching those tasks, and only occasionally appears for other tasks, so perhaps this can be worked around by the programmers; you managed to crunch two revlo_TRYP work units in the last couple of days, so the card is still a useful, working card. We just need you to crunch the good tasks for that type of card.
	ID: 18605 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18607 - Posted: 10 Sep 2010 \| 11:21:03 UTC - in response to Message 18605.
	The error from GPUgrid (HIVPR) causes a windows-error-message, which was waiting for a reply (send or no send to Microsoft). So all GPU-tasks were waiting during the night. Keep on crunching! ____________ Ton (ftpd) Netherlands
	ID: 18607 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18608 - Posted: 10 Sep 2010 \| 14:39:07 UTC - in response to Message 18607.
	I expect the Microsoft Error was along the lines of, acemd2_6.05_windows_intelx86__cuda 32 has stopped working.* If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight. I'm guessing you have already restarted the system. Do you know from the logs if a system update occured at that time of the error message (error logs), or some backup, defrag or other heavy CPU app ran - just in case something other than the task/driver is at fault here?
	ID: 18608 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18609 - Posted: 10 Sep 2010 \| 14:47:29 UTC - in response to Message 18608.
	Hi Kev, I use this machine only for crunching 24/7, so no back-up, no updates etc. Just Gpugrid and RNA or Ibercivis ore Freehal. I do no have to restart this system. Success! ____________ Ton (ftpd) Netherlands
	ID: 18609 \| Rating: 0 \| rate: / Reply Quote

Tom Philippart Send message Joined: 12 Feb 09 Posts: 57 Credit: 23,376,686 RAC: 0 Level Scientific publications	Message 18624 - Posted: 11 Sep 2010 \| 10:15:18 UTC
	I have the same problems with this card: NVIDIA GPU 0: GeForce 9600 GT (driver version 25721, CUDA version 3010, compute capability 1.1, 496MB, 218 GFLOPS peak) here's an example: MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [999] Assertion failed: 0, file swanlib_nv.cpp, line 121 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.
	ID: 18624 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18628 - Posted: 11 Sep 2010 \| 11:01:43 UTC - in response to Message 18624.
	Thanks for reporting the error. The same error has been posted up several times now, and the developers are aware of it. A driver bug is catching out the applications when they run on CC1.1 cards. It does not always occur but is a concern. With long complex GPU calculations the odd error is always expected, but these tasks are more problematic than others. Several suggestions and potential work around’s have been made.
	ID: 18628 \| Rating: 0 \| rate: / Reply Quote

Siegfried Niklas Send message Joined: 23 Feb 09 Posts: 39 Credit: 144,654,294 RAC: 0 Level Scientific publications	Message 18630 - Posted: 11 Sep 2010 \| 13:51:24 UTC - in response to Message 18608.
	I expect the Microsoft Error was along the lines of, acemd2_6.05_windows_intelx86__cuda 32 has stopped working.* If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight. I did this trick several times over the last month (four 9800GT cards). System restart without clicking away the "error message pop-up" worked for me mostly - even hours after the error happend. With the current KASHIF_HIVPR__bound (_unbound) errors it worked never.
	ID: 18630 \| Rating: 0 \| rate: / Reply Quote

Fred J. Verster Send message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level Scientific publications	Message 18632 - Posted: 11 Sep 2010 \| 20:53:51 UTC - in response to Message 18630.
	Computer ID 78963 Report deadline 15 Sep 2010 15:54:10 UTC Run time 11402.593746 CPU time 736.2813 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 MDIO ERROR: cannot open file "restart.coor" # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Time per step (avg over 275000 steps): 11.463 ms # Approximate elapsed time for entire WU: 11462.898 s called boinc_finish </stderr_txt> ]]> Validate state Geldig Claimed credit 6322.41203703704 Granted credit 9483.61805555556 application version ACEMD2: GPU molecular dynamics v6.11 (cuda31) With an 9800GTX+, it didn't work either. ____________ Knight Who Says Ni N!
	ID: 18632 \| Rating: 0 \| rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 18636 - Posted: 12 Sep 2010 \| 10:15:59 UTC - in response to Message 18632.
	Fred ... you posted results from a good run out of a 480 and it does not look like you are even running a 9800 anymore so I'm not sure wehere you were going with that. ____________ Thanks - Steve
	ID: 18636 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18637 - Posted: 12 Sep 2010 \| 11:49:15 UTC - in response to Message 18636.
	Fred use to have a GTX470, and is now using a GTX480. That task completed on his 480 but failed on a GTX460 (not a 9800GTX+). I did see a 9800 failure against one of his GTX470 successes. Fred, keep your good cards hooked up to GPUGrid, a GTX480 would be wasted anywhere else.
	ID: 18637 \| Rating: 0 \| rate: / Reply Quote

Fred J. Verster Send message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level Scientific publications	Message 18639 - Posted: 12 Sep 2010 \| 12:08:11 UTC - in response to Message 18637. Last modified: 12 Sep 2010 \| 12:21:14 UTC
	Since the 9800GTX+ started making 'trouble', like overheating, which resulted in faults, I first got a GTX470 which I traded for repairing an PII (Compaq). Then I could buy a 'show-model', from which I've seen it work. (All kinds of simulations), I bought it for €275 .(€485 normal+BTW) I found out that these 'monsters', need a 650W(minimal), 850W is better, PSU It draws 17A from it's 8 pin and 17A from it's 6 pin connectors and an additionel ~6 - 10A from the Mainboard. (ASUS P5E). Now I have to find a way to get the 470 to work.......... But I'm glad I made the change, for GPUGrid it's working like a charm and on SETI@Home, I now can run 3 MultiBeam's (0.04CPU+0.33GPU), at a time, so sometimes BOINC 6.10.58, 64BIT, runs 7 SETI tasks and/or a mix of Einstein and other project. I use driver 258.96 and CUDA 3.1. And it looks like those KASHIF_HIVPR WU's, need to have compute capabillity 2.0. (2.1?) ____________ Knight Who Says Ni N!
	ID: 18639 \| Rating: 0 \| rate: / Reply Quote

mwgiii Send message Joined: 22 Jan 09 Posts: 8 Credit: 988,332,833 RAC: 0 Level Scientific publications	Message 18663 - Posted: 13 Sep 2010 \| 23:48:11 UTC - in response to Message 18639.
	All of the KASHIF_HIVPR are generating errors on both of my machines. Out of the first two pages of my Tasks (40 work units), I have had 24 work units error out, all KASHIF_HIVPR. It is killing my contributions as ftpd said, the GPU crunching halts until I notice the error message. ____________
	ID: 18663 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18664 - Posted: 14 Sep 2010 \| 0:13:36 UTC - in response to Message 18663.
	Probably best to do a system restart and then abort the download of any KASHIF_HIVPR tasks that you pick up. Hopefully you will pick up other work units.
	ID: 18664 \| Rating: 0 \| rate: / Reply Quote

mwgiii Send message Joined: 22 Jan 09 Posts: 8 Credit: 988,332,833 RAC: 0 Level Scientific publications	Message 18665 - Posted: 14 Sep 2010 \| 2:01:32 UTC - in response to Message 18664.
	I reboot around every other day. If I see anymore KASHIF, I will abort immediately. ____________
	ID: 18665 \| Rating: 0 \| rate: / Reply Quote

ralle030583 Send message Joined: 19 Aug 10 Posts: 19 Credit: 830,540 RAC: 0 Level Scientific publications	Message 18693 - Posted: 15 Sep 2010 \| 18:45:12 UTC - in response to Message 18665. Last modified: 15 Sep 2010 \| 18:46:53 UTC
	seems also that all KASHIF.. task fail at my Geforce 9800 GT :-/ (ok currently evething failed cause a OC attemp, but KASHIF task didnt work before OC ^^) ____________
	ID: 18693 \| Rating: 0 \| rate: / Reply Quote

zenitur Send message Joined: 25 Sep 10 Posts: 2 Credit: 285,845 RAC: 0 Level Scientific publications	Message 18781 - Posted: 29 Sep 2010 \| 10:17:57 UTC
	I have same error: http://www.gpugrid.net/result.php?resultid=3030293 http://www.gpugrid.net/result.php?resultid=3028306 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700] acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char, int3, int3, size_t, ...): Assertion `0' failed. SIGABRT: abort called Stack trace (17 frames): ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d] /lib/libc.so.6(+0x324c0)[0x7f4e7810d4c0] /lib/libc.so.6(gsignal+0x35)[0x7f4e7810d445] /lib/libc.so.6(abort+0x180)[0x7f4e7810e860] /lib/libc.so.6(__assert_fail+0xf1)[0x7f4e781064e1] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45feae] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x46032f] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45db09] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45b400] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f4e780f9d2d] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569] Exiting... </stderr_txt> ]]> <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [700] acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char, int3, int3, size_t, ...): Assertion `0' failed. SIGABRT: abort called Stack trace (14 frames): ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d] /lib/libc.so.6(+0x324c0)[0x7f1c49b544c0] /lib/libc.so.6(gsignal+0x35)[0x7f1c49b54445] /lib/libc.so.6(abort+0x180)[0x7f1c49b55860] /lib/libc.so.6(__assert_fail+0xf1)[0x7f1c49b4d4e1] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45d3f9] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f1c49b40d2d] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569] Exiting... </stderr_txt> ]]> Only on KASHIF tasks. TONI always work fine.
	ID: 18781 \| Rating: 0 \| rate: / Reply Quote

zenitur Send message Joined: 25 Sep 10 Posts: 2 Credit: 285,845 RAC: 0 Level Scientific publications	Message 18807 - Posted: 2 Oct 2010 \| 19:26:01 UTC - in response to Message 18781.
	I found a reason of my error. This is automatic suspend. After restart KASHIF tasks make an error.
	ID: 18807 \| Rating: 0 \| rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 18916 - Posted: 11 Oct 2010 \| 12:30:47 UTC
	I just had this one wrecked: stderr out <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GT 240" # Clock rate: 1.34 GHz # Total amount of global memory: 536150016 bytes # Number of multiprocessors: 12 # Number of cores: 96 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> I don't have the faintest idea why it was restarted (or what "restart.coor" is good for at all), I don't run other projects on the GPU in parallel, and I wasn't doing anything on the machine at that time. ____________ Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki
	ID: 18916 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : KASHIF_HIVPR Errors?

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
DigitalDingus Send message Joined: 2 Jun 09 Posts: 10 Credit: 21,969,126 RAC: 0 Level Scientific publications	Message 18563 - Posted: 8 Sep 2010 \| 2:58:53 UTC
	I've had several of these give an Error While Computing. Anyone else? These WU's seem to estimate at almost twice the computing time as I normally have. ____________
	ID: 18563 \| Rating: 0 \| rate: / Reply Quote

Siegfried Niklas Send message Joined: 23 Feb 09 Posts: 39 Credit: 144,654,294 RAC: 0 Level Scientific publications	Message 18564 - Posted: 8 Sep 2010 \| 7:59:04 UTC
	I reported it 4 days ago for G92 cards (compute capability 1.1) like 9800GT, 8800 GT (G92)... http://www.gpugrid.net/forum_thread.php?id=2274
	ID: 18564 \| Rating: 0 \| rate: / Reply Quote

Old man Send message Joined: 24 Jan 09 Posts: 42 Credit: 16,676,387 RAC: 0 Level Scientific publications	Message 18565 - Posted: 8 Sep 2010 \| 8:21:34 UTC
	Here are also one: http://www.gpugrid.net/result.php?resultid=2935402 My card are gtx 460 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.55 GHz # Total amount of global memory: 804847616 bytes # Number of multiprocessors: 7 # Number of cores: 56 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]>
	ID: 18565 \| Rating: 0 \| rate: / Reply Quote

ignasi Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level Scientific publications	Message 18566 - Posted: 8 Sep 2010 \| 8:49:56 UTC - in response to Message 18565.
	What drivers are you using?
	ID: 18566 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18567 - Posted: 8 Sep 2010 \| 9:38:02 UTC - in response to Message 18566. Last modified: 8 Sep 2010 \| 9:48:28 UTC
	DigitalDingus is using two 9600 GSO (767MB) cards with driver: 19745 (Q9450, XP x86). The fail times look random: 2935235 1870438 8 Sep 2010 7:06:12 UTC 8 Sep 2010 7:32:16 UTC Error while computing 1,496.16 11.69 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2934119 1869838 8 Sep 2010 2:53:40 UTC 8 Sep 2010 7:02:58 UTC Error while computing 14,446.09 23.11 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2934086 1869814 8 Sep 2010 1:40:10 UTC 8 Sep 2010 2:53:40 UTC Error while computing 2,728.41 11.33 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2931920 1868719 7 Sep 2010 12:15:59 UTC 8 Sep 2010 0:21:49 UTC Error while computing 20,453.97 14.77 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2930618 1868078 7 Sep 2010 4:36:49 UTC 12 Sep 2010 4:36:49 UTC In progress --- --- --- --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2930026 1867745 7 Sep 2010 4:03:02 UTC 7 Sep 2010 4:36:49 UTC Error while computing 1,912.63 12.89 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2928799 1867124 6 Sep 2010 19:51:29 UTC 6 Sep 2010 22:04:25 UTC Error while computing 7,864.14 8.88 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2928286 1866896 6 Sep 2010 15:19:01 UTC 7 Sep 2010 18:40:38 UTC Completed and validated 72,823.73 1,372.66 4,535.61 5,669.51 ACEMD2: GPU molecular dynamics v6.05 (cuda) 2927745 1866582 6 Sep 2010 15:19:01 UTC 6 Sep 2010 16:51:46 UTC Error while computing 5,424.13 41.77 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2925177 1865300 5 Sep 2010 21:53:33 UTC 6 Sep 2010 15:19:01 UTC Error while computing 36,642.95 80.09 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2924932 1865162 5 Sep 2010 20:14:39 UTC 6 Sep 2010 15:19:01 UTC Error while computing 42,419.78 43.20 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) I would suggest you try the latest drivers 25896. If you keep getting failures try to find out what else is running when these tasks crash (if anything). Tapio, your task failed after 4sec GPU time. Some tasks seem to fail within 20sec. These are not very significant and do not reduce your contribution by much. Your card seems to be running well.
	ID: 18567 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18568 - Posted: 8 Sep 2010 \| 10:43:59 UTC - in response to Message 18567.
	@skgiven, I had the same problems with windows xp pro + gts250 + 258.96 driver after a lot of hours processing. See other thread. Success ____________ Ton (ftpd) Netherlands
	ID: 18568 \| Rating: 0 \| rate: / Reply Quote

DigitalDingus Send message Joined: 2 Jun 09 Posts: 10 Credit: 21,969,126 RAC: 0 Level Scientific publications	Message 18572 - Posted: 8 Sep 2010 \| 13:53:11 UTC - in response to Message 18568. Last modified: 8 Sep 2010 \| 13:54:16 UTC
	Will try the newer nVidia drivers, if any exist. Just upgraded to the latest BOINC in case it made a difference, but it did not. Other than that, I'll be crunching Collatz for a while I think.
	ID: 18572 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18573 - Posted: 8 Sep 2010 \| 16:22:43 UTC - in response to Message 18572.
	Driver 258.96 exists for this card. Please try it! Good luck ____________ Ton (ftpd) Netherlands
	ID: 18573 \| Rating: 0 \| rate: / Reply Quote

Olivier Send message Joined: 12 Jun 09 Posts: 1 Credit: 2,063,022 RAC: 0 Level Scientific publications	Message 18588 - Posted: 9 Sep 2010 \| 18:33:01 UTC - in response to Message 18563.
	Same problem here unfortunatly. Theres something wrong with those kashif units ...
	ID: 18588 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18602 - Posted: 10 Sep 2010 \| 8:57:20 UTC
	@skgiven Hi Kev, Again after several hours (6) processing aborted. Windows XP-pro - gts250 258.96. Gives also windows-message and waiting for answer, so no further processing during the night. I do not like this kind of errors. Do not send them anymore to this type of gpu-cards, please? Good luck. ____________ Ton (ftpd) Netherlands
	ID: 18602 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18605 - Posted: 10 Sep 2010 \| 10:29:07 UTC - in response to Message 18602.
	The HIVPR_n1_bound tasks seem very troublesome on CC1.1 cards. I made suggestions to allow crunchers to opt out of crunching some task types. It would involve some work for the scientists on the project design and server layout. If GDF can get it implemented it would allow crunchers to deselect troublesome projects, which would make it useful for other problems too. Did an update try to automatically install on your system overnight? I think the issue primarily relates to crunching those tasks, and only occasionally appears for other tasks, so perhaps this can be worked around by the programmers; you managed to crunch two revlo_TRYP work units in the last couple of days, so the card is still a useful, working card. We just need you to crunch the good tasks for that type of card.
	ID: 18605 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18607 - Posted: 10 Sep 2010 \| 11:21:03 UTC - in response to Message 18605.
	The error from GPUgrid (HIVPR) causes a windows-error-message, which was waiting for a reply (send or no send to Microsoft). So all GPU-tasks were waiting during the night. Keep on crunching! ____________ Ton (ftpd) Netherlands
	ID: 18607 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18608 - Posted: 10 Sep 2010 \| 14:39:07 UTC - in response to Message 18607.
	I expect the Microsoft Error was along the lines of, acemd2_6.05_windows_intelx86__cuda 32 has stopped working.* If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight. I'm guessing you have already restarted the system. Do you know from the logs if a system update occured at that time of the error message (error logs), or some backup, defrag or other heavy CPU app ran - just in case something other than the task/driver is at fault here?
	ID: 18608 \| Rating: 0 \| rate: / Reply Quote

ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 18609 - Posted: 10 Sep 2010 \| 14:47:29 UTC - in response to Message 18608.
	Hi Kev, I use this machine only for crunching 24/7, so no back-up, no updates etc. Just Gpugrid and RNA or Ibercivis ore Freehal. I do no have to restart this system. Success! ____________ Ton (ftpd) Netherlands
	ID: 18609 \| Rating: 0 \| rate: / Reply Quote

Tom Philippart Send message Joined: 12 Feb 09 Posts: 57 Credit: 23,376,686 RAC: 0 Level Scientific publications	Message 18624 - Posted: 11 Sep 2010 \| 10:15:18 UTC
	I have the same problems with this card: NVIDIA GPU 0: GeForce 9600 GT (driver version 25721, CUDA version 3010, compute capability 1.1, 496MB, 218 GFLOPS peak) here's an example: MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [999] Assertion failed: 0, file swanlib_nv.cpp, line 121 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.
	ID: 18624 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18628 - Posted: 11 Sep 2010 \| 11:01:43 UTC - in response to Message 18624.
	Thanks for reporting the error. The same error has been posted up several times now, and the developers are aware of it. A driver bug is catching out the applications when they run on CC1.1 cards. It does not always occur but is a concern. With long complex GPU calculations the odd error is always expected, but these tasks are more problematic than others. Several suggestions and potential work around’s have been made.
	ID: 18628 \| Rating: 0 \| rate: / Reply Quote

Siegfried Niklas Send message Joined: 23 Feb 09 Posts: 39 Credit: 144,654,294 RAC: 0 Level Scientific publications	Message 18630 - Posted: 11 Sep 2010 \| 13:51:24 UTC - in response to Message 18608.
	I expect the Microsoft Error was along the lines of, acemd2_6.05_windows_intelx86__cuda 32 has stopped working.* If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight. I did this trick several times over the last month (four 9800GT cards). System restart without clicking away the "error message pop-up" worked for me mostly - even hours after the error happend. With the current KASHIF_HIVPR__bound (_unbound) errors it worked never.
	ID: 18630 \| Rating: 0 \| rate: / Reply Quote

Fred J. Verster Send message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level Scientific publications	Message 18632 - Posted: 11 Sep 2010 \| 20:53:51 UTC - in response to Message 18630.
	Computer ID 78963 Report deadline 15 Sep 2010 15:54:10 UTC Run time 11402.593746 CPU time 736.2813 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 MDIO ERROR: cannot open file "restart.coor" # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Time per step (avg over 275000 steps): 11.463 ms # Approximate elapsed time for entire WU: 11462.898 s called boinc_finish </stderr_txt> ]]> Validate state Geldig Claimed credit 6322.41203703704 Granted credit 9483.61805555556 application version ACEMD2: GPU molecular dynamics v6.11 (cuda31) With an 9800GTX+, it didn't work either. ____________ Knight Who Says Ni N!
	ID: 18632 \| Rating: 0 \| rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 18636 - Posted: 12 Sep 2010 \| 10:15:59 UTC - in response to Message 18632.
	Fred ... you posted results from a good run out of a 480 and it does not look like you are even running a 9800 anymore so I'm not sure wehere you were going with that. ____________ Thanks - Steve
	ID: 18636 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18637 - Posted: 12 Sep 2010 \| 11:49:15 UTC - in response to Message 18636.
	Fred use to have a GTX470, and is now using a GTX480. That task completed on his 480 but failed on a GTX460 (not a 9800GTX+). I did see a 9800 failure against one of his GTX470 successes. Fred, keep your good cards hooked up to GPUGrid, a GTX480 would be wasted anywhere else.
	ID: 18637 \| Rating: 0 \| rate: / Reply Quote

Fred J. Verster Send message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level Scientific publications	Message 18639 - Posted: 12 Sep 2010 \| 12:08:11 UTC - in response to Message 18637. Last modified: 12 Sep 2010 \| 12:21:14 UTC
	Since the 9800GTX+ started making 'trouble', like overheating, which resulted in faults, I first got a GTX470 which I traded for repairing an PII (Compaq). Then I could buy a 'show-model', from which I've seen it work. (All kinds of simulations), I bought it for €275 .(€485 normal+BTW) I found out that these 'monsters', need a 650W(minimal), 850W is better, PSU It draws 17A from it's 8 pin and 17A from it's 6 pin connectors and an additionel ~6 - 10A from the Mainboard. (ASUS P5E). Now I have to find a way to get the 470 to work.......... But I'm glad I made the change, for GPUGrid it's working like a charm and on SETI@Home, I now can run 3 MultiBeam's (0.04CPU+0.33GPU), at a time, so sometimes BOINC 6.10.58, 64BIT, runs 7 SETI tasks and/or a mix of Einstein and other project. I use driver 258.96 and CUDA 3.1. And it looks like those KASHIF_HIVPR WU's, need to have compute capabillity 2.0. (2.1?) ____________ Knight Who Says Ni N!
	ID: 18639 \| Rating: 0 \| rate: / Reply Quote

mwgiii Send message Joined: 22 Jan 09 Posts: 8 Credit: 988,332,833 RAC: 0 Level Scientific publications	Message 18663 - Posted: 13 Sep 2010 \| 23:48:11 UTC - in response to Message 18639.
	All of the KASHIF_HIVPR are generating errors on both of my machines. Out of the first two pages of my Tasks (40 work units), I have had 24 work units error out, all KASHIF_HIVPR. It is killing my contributions as ftpd said, the GPU crunching halts until I notice the error message. ____________
	ID: 18663 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18664 - Posted: 14 Sep 2010 \| 0:13:36 UTC - in response to Message 18663.
	Probably best to do a system restart and then abort the download of any KASHIF_HIVPR tasks that you pick up. Hopefully you will pick up other work units.
	ID: 18664 \| Rating: 0 \| rate: / Reply Quote

mwgiii Send message Joined: 22 Jan 09 Posts: 8 Credit: 988,332,833 RAC: 0 Level Scientific publications	Message 18665 - Posted: 14 Sep 2010 \| 2:01:32 UTC - in response to Message 18664.
	I reboot around every other day. If I see anymore KASHIF, I will abort immediately. ____________
	ID: 18665 \| Rating: 0 \| rate: / Reply Quote

ralle030583 Send message Joined: 19 Aug 10 Posts: 19 Credit: 830,540 RAC: 0 Level Scientific publications	Message 18693 - Posted: 15 Sep 2010 \| 18:45:12 UTC - in response to Message 18665. Last modified: 15 Sep 2010 \| 18:46:53 UTC
	seems also that all KASHIF.. task fail at my Geforce 9800 GT :-/ (ok currently evething failed cause a OC attemp, but KASHIF task didnt work before OC ^^) ____________
	ID: 18693 \| Rating: 0 \| rate: / Reply Quote

zenitur Send message Joined: 25 Sep 10 Posts: 2 Credit: 285,845 RAC: 0 Level Scientific publications	Message 18781 - Posted: 29 Sep 2010 \| 10:17:57 UTC
	I have same error: http://www.gpugrid.net/result.php?resultid=3030293 http://www.gpugrid.net/result.php?resultid=3028306 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700] acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char, int3, int3, size_t, ...): Assertion `0' failed. SIGABRT: abort called Stack trace (17 frames): ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d] /lib/libc.so.6(+0x324c0)[0x7f4e7810d4c0] /lib/libc.so.6(gsignal+0x35)[0x7f4e7810d445] /lib/libc.so.6(abort+0x180)[0x7f4e7810e860] /lib/libc.so.6(__assert_fail+0xf1)[0x7f4e781064e1] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45feae] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x46032f] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45db09] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45b400] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f4e780f9d2d] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569] Exiting... </stderr_txt> ]]> <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [700] acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char, int3, int3, size_t, ...): Assertion `0' failed. SIGABRT: abort called Stack trace (14 frames): ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d] /lib/libc.so.6(+0x324c0)[0x7f1c49b544c0] /lib/libc.so.6(gsignal+0x35)[0x7f1c49b54445] /lib/libc.so.6(abort+0x180)[0x7f1c49b55860] /lib/libc.so.6(__assert_fail+0xf1)[0x7f1c49b4d4e1] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45d3f9] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f1c49b40d2d] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569] Exiting... </stderr_txt> ]]> Only on KASHIF tasks. TONI always work fine.
	ID: 18781 \| Rating: 0 \| rate: / Reply Quote

zenitur Send message Joined: 25 Sep 10 Posts: 2 Credit: 285,845 RAC: 0 Level Scientific publications	Message 18807 - Posted: 2 Oct 2010 \| 19:26:01 UTC - in response to Message 18781.
	I found a reason of my error. This is automatic suspend. After restart KASHIF tasks make an error.
	ID: 18807 \| Rating: 0 \| rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 18916 - Posted: 11 Oct 2010 \| 12:30:47 UTC
	I just had this one wrecked: stderr out <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GT 240" # Clock rate: 1.34 GHz # Total amount of global memory: 536150016 bytes # Number of multiprocessors: 12 # Number of cores: 96 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> I don't have the faintest idea why it was restarted (or what "restart.coor" is good for at all), I don't run other projects on the GPU in parallel, and I wasn't doing anything on the machine at that time. ____________ Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki
	ID: 18916 \| Rating: 0 \| rate: / Reply Quote