6 Errors Today [Problems with "KASHIF_HIVPR" and "IBUCH

Message boards : Graphics cards (GPUs) : 6 Errors Today [Problems with "KASHIF_HIVPR" and "IBUCH_KID"-WUs]

Author	Message
dataman Send message Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level Scientific publications	Message 9384 - Posted: 6 May 2009 \| 19:07:45 UTC
	Everything has been running well but had 6 errors today across 3 diffrent cards (9800GT's) 1 of these: ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 84: cufftExecC2C (gridCalc2.2) ]]> 1 of these: Cuda error: Kernel [shake_step_2] failed in file 'shake.cu' in line 128 : unknown error. 4 of these: Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error. What's going on? ____________
	ID: 9384 \| Rating: 0 \| rate: / Reply Quote

palmss Send message Joined: 28 Aug 08 Posts: 7 Credit: 60,897,550 RAC: 0 Level Scientific publications	Message 9385 - Posted: 6 May 2009 \| 19:13:44 UTC
	I have a "PmeRealSpace" error too, with a 8800GT here http://www.gpugrid.net/result.php?resultid=631932
	ID: 9385 \| Rating: 0 \| rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 370,320,941 RAC: 0 Level Scientific publications	Message 9391 - Posted: 6 May 2009 \| 19:55:41 UTC
	Same here, meRealSpace error, running an 8800GT. "IBUCH_KID" WU's. Do I see a pattern forming, or just a coincidence? Error WU 634715
	ID: 9391 \| Rating: 0 \| rate: / Reply Quote

[boinc.at] Nowi Send message Joined: 4 Sep 08 Posts: 44 Credit: 3,685,033 RAC: 0 Level Scientific publications	Message 9400 - Posted: 6 May 2009 \| 21:25:38 UTC - in response to Message 9391.
	I have the same error on three WU. GPU is a 8800GT....
	ID: 9400 \| Rating: 0 \| rate: / Reply Quote

dataman Send message Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level Scientific publications	Message 9404 - Posted: 6 May 2009 \| 22:49:01 UTC
	Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unknown error. More errors ... :( ____________
	ID: 9404 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9405 - Posted: 6 May 2009 \| 22:54:55 UTC - in response to Message 9400.
	I had three go quickly one after the other in a 40 mins period today on a 9800GTX+ errors were similar to the above: Two were the same: Cuda error: Kernel [shake_step_1] failed in file 'shake.cu' in line 79 The third was: Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error. Had a replacement running for about three hours - no problems so far, see what we shall see in the morning :) Regards Zy
	ID: 9405 \| Rating: 0 \| rate: / Reply Quote

schizo1988 Send message Joined: 16 Dec 08 Posts: 16 Credit: 10,644,256 RAC: 0 Level Scientific publications	Message 9414 - Posted: 7 May 2009 \| 1:59:15 UTC - in response to Message 9405.
	I have a thread about failed jobs as well, one machine lost 5 jobs and I thought it was machine specific but then one of my other machines got the same error, and had some that were valid but listed warnings messages that seem related to the actual errors, but this is after it finished but a real time system would be impossible not to mention useless unless you could sit and monitor your apps 24/7. they have come out with quite a few new software updates and problems can always arise, and not making it manditory to use the new version would not work either. If we post the errors and make the people who actually understand the software aware of errors I have found this site to be about the best for getting help when you do encounter any type of problem.
	ID: 9414 \| Rating: 0 \| rate: / Reply Quote

loki126 Send message Joined: 18 Nov 08 Posts: 14 Credit: 30,687,791 RAC: 0 Level Scientific publications	Message 9415 - Posted: 7 May 2009 \| 4:11:56 UTC
	Same here. Its the new 7000 Credit WU´s, IBUCH_KID_shao. Here the failed tasks: 1 and 2 I guess they dont get along well with OC:
	ID: 9415 \| Rating: 0 \| rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 370,320,941 RAC: 0 Level Scientific publications	Message 9416 - Posted: 7 May 2009 \| 4:11:59 UTC Last modified: 7 May 2009 \| 4:19:42 UTC
	I really think there is some issue related to "IBUCH_KID" and "KASHIF_HIVPR" WU's. I have had 4 errors today and those have also errored out for other users. My Tasks Error tasks: KASHIF_HIVPR IBUCH_KID IBUCH_KID IBUCH_KID <edit> I've turn back clocks to stock to see if that matters. I've had them OC'd for 8 months, but we'll see if the new WU's are more sensitive. </edit>
	ID: 9416 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9418 - Posted: 7 May 2009 \| 6:47:32 UTC - in response to Message 9416.
	I really think there is some issue related to "IBUCH_KID" and "KASHIF_HIVPR" WU's. I have had 4 errors today and those have also errored out for other users. My Tasks Error tasks: KASHIF_HIVPR IBUCH_KID IBUCH_KID IBUCH_KID <edit> I've turn back clocks to stock to see if that matters. I've had them OC'd for 8 months, but we'll see if the new WU's are more sensitive. </edit> I have had error with this series[IBUCH KID] of work units also. My cards run stock. Same cards seem to run the HIV ones OK. ____________ mike
	ID: 9418 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9419 - Posted: 7 May 2009 \| 7:32:41 UTC - in response to Message 9418.
	Another one last night ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 84: cufftExecC2C (gridCalc2.2) There is an issue lurking somewhere with these WUs. For me it started when the new ones with the Amber facility came out, shortlky after the failures started. I am trying one more - if that fails, I stop until this is resolved Zy
	ID: 9419 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9420 - Posted: 7 May 2009 \| 7:50:32 UTC
	There can be bad "batches" or tasks within a batch that are just plain bad. The good news such as it is, is that here at GPU Grid the tasks tend to die fairly quickly. I will note that they have just changed and are using some new tool and this may be part of the problem. I have seen similar issues in other projects where a change in direction can lead to significant issues with tasks failing. Rosetta when they went in the direction of starting up the effort on Mini-Rosetta caused me to leave the project for a long time as far as major support because so many tasks failed. Now they have most of the bugs out and I am back again. Keep reporting the bad tasks and I am sure they will figure it out ...
	ID: 9420 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9421 - Posted: 7 May 2009 \| 8:21:49 UTC - in response to Message 9415.
	Same here. Its the new 7000 Credit WU´s, IBUCH_KID_shao. Here the failed tasks: 1 and 2 I guess they dont get along well with OC: I had a similar issue. It went away when I went back to 182.50 drivers. You seem to be running beta drivers. ____________ BOINC blog
	ID: 9421 \| Rating: 0 \| rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 9422 - Posted: 7 May 2009 \| 8:22:28 UTC - in response to Message 9420.
	I got a bunch of errors also and was wondering if we add system specs (including driver version) wold it help narrow down were the real issue is? i7-920 HT, 4 GHz on P6T Corsair Dominator 1600 2Gx3 EVGA GTX 295 (626/1496/1036) 185.81 Corsair TX750W, WD Caviar Black 1TB Cool Master HAF 932 Xigmatek Dark Knight-S1283V BOINC 6.6.20 for WCG + GPUGrid 24/7/365 Steve
	ID: 9422 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9423 - Posted: 7 May 2009 \| 8:23:54 UTC - in response to Message 9404.
	Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unknown error. More errors ... :( If you have beta drivers installed (your computers are hidden so I can't look) try the 182.50 drivers. ____________ BOINC blog
	ID: 9423 \| Rating: 0 \| rate: / Reply Quote

ignasi Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level Scientific publications	Message 9424 - Posted: 7 May 2009 \| 9:04:14 UTC - in response to Message 9423.
	On the new IBUCH_KID batch errors... They don't fail completely, but the error rate is apparently higher. We are stopping them for safety at the moment. thanks for your patience, ignasi
	ID: 9424 \| Rating: 0 \| rate: / Reply Quote

Bender10 Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level Scientific publications	Message 9426 - Posted: 7 May 2009 \| 10:27:54 UTC - in response to Message 9422.
	Yes Steve WCG, Posting the specs (driver ver, boinc ver, gpu, gpu overclock, os), help to narrow down where your issue may be. But 'un-hiding' your computers so the MODS can look at your output files also helps (they may ask for this sometimes), when you have a problem. That and enabling 'debugging' if you have a pesky problem... ____________ Consciousness: That annoying time between naps...... Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
	ID: 9426 \| Rating: 0 \| rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 9431 - Posted: 7 May 2009 \| 13:25:14 UTC - in response to Message 9426.
	Specs including versions are in my sig. I will also try to provide more specifics when I post about errors but it sounds like this round is semi-global so I doubt they need any more info at this time. If mods want details of my logs all they need to do is ask and I will "unhide". Interesting way to phrase that ... I prefer to think of it as "Public" or "Private" and in general I like to keep "Private" as much is possible. ____________ Thanks - Steve
	ID: 9431 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9432 - Posted: 7 May 2009 \| 13:33:09 UTC - in response to Message 9431.
	Specs including versions are in my sig. I will also try to provide more specifics when I post about errors but it sounds like this round is semi-global so I doubt they need any more info at this time. If mods want details of my logs all they need to do is ask and I will "unhide". Interesting way to phrase that ... I prefer to think of it as "Public" or "Private" and in general I like to keep "Private" as much is possible. I'll show mine if you'll show me yours:D ____________ mike
	ID: 9432 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9433 - Posted: 7 May 2009 \| 13:43:26 UTC - in response to Message 9420.
	Keep reporting the bad tasks and I am sure they will figure it out ... Absolutely - am totally behind them in trying to find out whats wrong, it could be at my end, I dont know. Its no good just pumping out errored ones though, there is only so many they need to track an issue. Meanwhile by stopping for a while I can put the hardware through proper testing, just to eliminate that side of the equation. Having said all that, at present the one I started this morning still running fine, 63% done, which given the others that failed on mine, is illogical on the face of it. Regards Zy
	ID: 9433 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9435 - Posted: 7 May 2009 \| 14:33:09 UTC
	My first 2 errors ever AFAIK, the 1st a 76-KASHIF_HIVPR WU and the 2nd one of the infamous 76-IBUCH_KID WUs. Two different cards, both 9600 GSO. Notice a similarity in the error messages?: <core_client_version>6.6.24</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 9600 GSO" # Clock rate: 1674000 kilohertz # Total amount of global memory: 402325504 bytes # Number of multiprocessors: 12 # Number of cores: 96 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 50: cufftExecC2C (gridcalc2.1) called boinc_finish </stderr_txt> ]]> <core_client_version>6.6.20</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 9600 GSO" # Clock rate: 1458000 kilohertz # Total amount of global memory: 804978688 bytes # Number of multiprocessors: 12 # Number of cores: 96 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" ERROR: c:\cy </stderr_txt> ]]>
	ID: 9435 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9444 - Posted: 7 May 2009 \| 18:14:11 UTC - in response to Message 9435.
	Got one through ok, then the next went bang after 30 mins. Successful one was: http://www.gpugrid.net/result.php?resultid=636960 A GIANNI The one that failed this time - a KASHIF_HIVPR http://www.gpugrid.net/result.php?resultid=639025 ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) With this one I was at the PC when it went. There was a system warning popup message, didnt get it word for word, only saw a flash as it disappeared , " something something could not be contacted, video driver restarted", dont hang your hat off that word for word, but essentially it looks as though the Video Driver lost connection, and the system auto restarted the video driver, when it did that, instant computation error. I will ferret in the log files, I have the PC logged to death, hopefully I can dig something up about it. Two more downloaded, A GIANNI and a KASHIF, I suspended the GIANNI, and will try another KASHIF, see what happens. Regards Zy
	ID: 9444 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9446 - Posted: 7 May 2009 \| 19:22:20 UTC - in response to Message 9444. Last modified: 7 May 2009 \| 19:31:33 UTC
	The KASHIF lasted 37 mins and went bang. A GIANNI is now running The failed KASHIF: http://www.gpugrid.net/result.php?resultid=640997 Error was: Cuda error: Kernel [fft_data_swizzle_out] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 94 : unknown error. (Not seen a "swizzle_out" error before) Started this one - a GIANNI - and on past performance it will probably go through ok: http://www.gpugrid.net/result.php?resultid=641393 [Edit] Any debuging switch or log file - whatever - that I can enable this end that will help, please let me know and I will. If you want me to run a series of suspect ones (etc) let me know how, I will [/Edit] Regards Zy
	ID: 9446 \| Rating: 0 \| rate: / Reply Quote

[boinc.at] Nowi Send message Joined: 4 Sep 08 Posts: 44 Credit: 3,685,033 RAC: 0 Level Scientific publications	Message 9451 - Posted: 7 May 2009 \| 20:59:37 UTC
	I have gotten another error of a 2-KASHIF_HIVPR-WU (result). The error appeared after more than 16 hours of computation on a 8800GT. Now I have three errors in a row. In my opinion is this unacceptable!!!!!!
	ID: 9451 \| Rating: 0 \| rate: / Reply Quote

(_KoDAk_) Send message Joined: 18 Oct 08 Posts: 43 Credit: 6,924,807 RAC: 0 Level Scientific publications	Message 9459 - Posted: 8 May 2009 \| 7:34:55 UTC Last modified: 8 May 2009 \| 7:35:47 UTC
	boinc 6.6.24 x64 By KoDAkthebest and some ERRORS ( http://www.gpugrid.net/results.php?hostid=31714 ____________
	ID: 9459 \| Rating: 0 \| rate: / Reply Quote

ignasi Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level Scientific publications	Message 9461 - Posted: 8 May 2009 \| 8:53:00 UTC - in response to Message 9459.
	We are digging into these problems. thanks, ignasi
	ID: 9461 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9462 - Posted: 8 May 2009 \| 9:12:01 UTC - in response to Message 9461. Last modified: 8 May 2009 \| 9:15:20 UTC
	Hi Ignasi I had a look at all my computation error ones this morning now that most have finally gone through. All the KASHIF one's when crunched by a 9800GTX+ or below go bang. If the wingman is a 260 inclusive and above, they go through. I am aware is a crude deduction on my part as I have a very limited overview of the problems, however it does now seem pretty solid that KASHIF's dont through on cards rated 9800GTX+ and below. If thats starting to be the case, do you still want the cards of 9800GTX+ and below to run the KASHIF's? If you do, fine, I just hate running ones that will go bang as it only delays their crunching by cards that can do it. If you dont, I can just abort a KASHIF if I spot one coming through. Regards Zy
	ID: 9462 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9464 - Posted: 8 May 2009 \| 9:33:49 UTC - in response to Message 9462. Last modified: 8 May 2009 \| 9:53:02 UTC
	I am right to say that all the problems are related to older cards, like 8800,9800 and so on? Did anyone experience repeated failures on those workunits with a 260,275,295 or 285? gdf
	ID: 9464 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9465 - Posted: 8 May 2009 \| 9:47:32 UTC - in response to Message 9464. Last modified: 8 May 2009 \| 9:51:01 UTC
	Additional to my post at 9444 above. Just remembered, and its only a part of it - its real annoying that I only got a flash of it as it went away - the error message referred to a file "nv???????" it maybe a DLL reference, cant remember. NV is probably no stunning revelation, but there it is for what its worth. Whatever the final full name, the error message claimed it had "stopped", and the system had restarted it. Instatantly I had the WU go bang. All cpu based models for other projects I run, have been unaffected by all this whether during normal running or when the KASHIFs go bang. I seem to remember another post about a week ago, where there was a suspicion voiced about the memory size possibly being too small for these. ie at present maybe it needs 1GB cards, and goes bang on 512mb cards? Regards Zy
	ID: 9465 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9467 - Posted: 8 May 2009 \| 10:05:25 UTC - in response to Message 9464.
	Just had another KASHIF go bang, it lasted 57 mins http://www.gpugrid.net/result.php?resultid=643475 Error message: Cuda error: Kernel [fft_data_swizzle_out] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 94 : unknown error. swizzle_out is starting to be a common one for me. Got to go out now and meet a Client, wont be back until around 4pm UTC. Regards Zy
	ID: 9467 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9468 - Posted: 8 May 2009 \| 10:41:34 UTC
	I have had random failures on all my cards[8800gt/9600gso/9800gt/gts250] except the gtx260-192/216. Some fail in a short period others linger much longer. ____________ mike
	ID: 9468 \| Rating: 0 \| rate: / Reply Quote

SkyeHunter Send message Joined: 7 Mar 09 Posts: 12 Credit: 1,254,285 RAC: 0 Level Scientific publications	Message 9469 - Posted: 8 May 2009 \| 11:02:27 UTC Last modified: 8 May 2009 \| 11:06:56 UTC
	Yup, similar issue here. Yesterday got a WU that got stuck at 18% on my 8800GT. No error messages though, the Boinc manager thought the process was still running but remained for at least 12 hours at the same progress... Cancelled the WU manually and started another one 18 hours ago. Usually WU's tend to take little less than 13 hours, and the current one hasn't been reporting yet (nor a new WU got uploaded, I keep my queue very short...). Propbably this evening I will see a similar issue.
	ID: 9469 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9470 - Posted: 8 May 2009 \| 12:09:45 UTC - in response to Message 9465.
	Additional to my post at 9444 above. Just remembered, and its only a part of it - its real annoying that I only got a flash of it as it went away - the error message referred to a file "nv???????" it maybe a DLL reference, cant remember. NV is probably no stunning revelation, but there it is for what its worth. Whatever the final full name, the error message claimed it had "stopped", and the system had restarted it. Instatantly I had the WU go bang. All cpu based models for other projects I run, have been unaffected by all this whether during normal running or when the KASHIFs go bang. I seem to remember another post about a week ago, where there was a suspicion voiced about the memory size possibly being too small for these. ie at present maybe it needs 1GB cards, and goes bang on 512mb cards? Regards Zy My GTS250's are only 512Mb and they seem to work with KASHIF wu. I did suggest the driver version as a culprit. I was having problems last week on my GTX260's and after uninstalling the driver (a 185 variant) and going back to 182.50 seemed to cure its problems. ____________ BOINC blog
	ID: 9470 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9471 - Posted: 8 May 2009 \| 12:14:16 UTC - in response to Message 9469.
	Yup, similar issue here. Yesterday got a WU that got stuck at 18% on my 8800GT. No error messages though, the Boinc manager thought the process was still running but remained for at least 12 hours at the same progress... Cancelled the WU manually and started another one 18 hours ago. Usually WU's tend to take little less than 13 hours, and the current one hasn't been reporting yet (nor a new WU got uploaded, I keep my queue very short...). Propbably this evening I will see a similar issue. Ahh the "never ending wu" bug. What version of BOINC are you running? It seems to have been fixed in 6.6.23 onwards. ____________ BOINC blog
	ID: 9471 \| Rating: 0 \| rate: / Reply Quote

dyeman Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level Scientific publications	Message 9472 - Posted: 8 May 2009 \| 12:24:45 UTC - in response to Message 9471.
	See this thread also. I had hanging WUs using 6.6.17 and installing 6.6.23 didn't help. Installing Nvidia driver 185.85 fixed the hanging problem but haven't had a WU process successfully since (though may not be a driver issue - currently running a GIANNI WU and is at 67% and looking OK)
	ID: 9472 \| Rating: 0 \| rate: / Reply Quote

dataman Send message Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level Scientific publications	Message 9473 - Posted: 8 May 2009 \| 13:14:10 UTC - in response to Message 9464.
	I am right to say that all the problems are related to older cards, like 8800,9800 and so on? Did anyone experience repeated failures on those workunits with a 260,275,295 or 285? gdf I have 7 9800GT's and one 8800GT. All have experienced failures. I'm on 6.6.20 and 185.85. I'm shutting them down until this problem is fixed. Good Luck! ____________
	ID: 9473 \| Rating: 0 \| rate: / Reply Quote

[boinc.at] Nowi Send message Joined: 4 Sep 08 Posts: 44 Credit: 3,685,033 RAC: 0 Level Scientific publications	Message 9474 - Posted: 8 May 2009 \| 14:09:50 UTC - in response to Message 9468.
	I have had random failures on all my cards[8800gt/9600gso/9800gt/gts250] except the gtx260-192/216. All of this are GPU lower than G200. Maybe this is a clue.
	ID: 9474 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9476 - Posted: 8 May 2009 \| 14:48:13 UTC
	I hate to be a wet blanket. But my 9800GT has five (5) total successful runs on just page one of my task list so it is NOT the card unless related to memory as this card has 1M VRAM ... I am using driver 182.50, so it may be THAT ... WIn XP Pro, 32-bit is the other variant that may be an issue. BOINC Version 6.5.0 ... The 6.6x versions did have some scheduler problems from something in the teens at least to 6.6.22 ... 6.6.23 and later seems to have cured that issue.
	ID: 9476 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9478 - Posted: 8 May 2009 \| 15:53:18 UTC - in response to Message 9476.
	Above I mentioned a file that was "stopped" and restarted at the same moment the WU went bang. I found the error message for it. I have no idea whether it means anything to the current problem, or what it means in itself ...... however, posted for completeness as it did happen at the exact moment the WU went bang. "nvlddmkm" was what I was struggling to remember on the system error message at the time the WU went bang. The error message reads: "The description for Event ID 4101 from source Display cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: nvlddmkm " It was located in: Event Viewer/Custom Views/Administrative Events Source: display. At the time it said it was "restarted" presumably referring to nvlddmkm - whatever that is :) Regards Zy
	ID: 9478 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9480 - Posted: 8 May 2009 \| 16:24:22 UTC - in response to Message 9473.
	I am right to say that all the problems are related to older cards, like 8800,9800 and so on? Did anyone experience repeated failures on those workunits with a 260,275,295 or 285? gdf I have 7 9800GT's and one 8800GT. All have experienced failures. I'm on 6.6.20 and 185.85. I'm shutting them down until this problem is fixed. Good Luck! I'll give it one more day, maybe two and I will do likewise. I am very surprised at the admin/developers this time. Usually there is a little more input/concern shown. Have I missed a thread from the project that explains what is happening and their concern?? ____________ mike
	ID: 9480 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9481 - Posted: 8 May 2009 \| 16:31:50 UTC Last modified: 8 May 2009 \| 16:38:38 UTC
	I have had my fair share of those also and installed all latest drivers Win7 185.85 which include cuda 2.2 on this machine and boinc 6.6.28. To my surprise i see now in boinc that my 9600 GT seems only be able todo cuda 1.0 instructions. So maybe the errors created by these workunits are related to instruction which only can be performed by the newest 2x5 models. Since non of them seem to have much errors on these units But somehow i have had less problems with my machine since the latest drivers am installed, it runs kinda rock solid (only BF2 and gameguard games are an issue) BUT i'll remind you guys everything i run is BETA so problems can occur. That it runs almost without a problem on my machine is no garantee it will on yours. I guess if you have a 2X5 card you probably will see a gain in processing speed if some of the cuda 2.2 intructions can or/and are implemented
	ID: 9481 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9482 - Posted: 8 May 2009 \| 17:12:29 UTC - in response to Message 9481.
	Some positives for comparison as the KASHIFs are going bang with me, I've left the hardware/software setup alone so there is fair comparison. GIANNIs seem to run fine. I am 7hrs into a TONI_HIVPR, so touch wood that seems like it will go through, will finish in about 5/6 hours. I have a IBUCH_HIVPR lined up as the next to go. Regards Zy
	ID: 9482 \| Rating: 0 \| rate: / Reply Quote

naja002 Send message Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level Scientific publications	Message 9484 - Posted: 8 May 2009 \| 19:47:18 UTC Last modified: 8 May 2009 \| 20:00:00 UTC
	I have aborted all: KASHIF_HIVPR and IBUCH_KID and will now continue to do so. I have 5x 8800GS and 1x 8800GT--those WUs do not complete on my rigs and most of them hang. Yesterday I completed ONE WU instead of 9-11. 5K ppd instead of 50Kppd. Was on 6.6.17, 3 rigs 185.26, 1 rig 182.50 As of last night all rigs are: 6.6.28 and 3x 185.26, 1x 185.85--seems to have helped some. This is an "across the farm" thing for me now. Problems initially started on the dual gpu rigs, but now it's across the board.... My rigs are not hidden. The Phunam-PC is a new setup--the intial errors are from setup, OCing, etc. I understand those. The new ones are part of this mess. Hoping it gets sorted out soon.... EDIT: I have kept 1 KASHIF_HIVPR that appears to be running ok on a single Gpu rig. However, 1st sign of trouble and it's history.....
	ID: 9484 \| Rating: 0 \| rate: / Reply Quote

Aardvark Send message Joined: 27 Nov 08 Posts: 28 Credit: 82,362,324 RAC: 0 Level Scientific publications	Message 9488 - Posted: 8 May 2009 \| 21:46:09 UTC Last modified: 8 May 2009 \| 21:54:19 UTC
	Likewise here, failures on KASHIF_HIVPR and IBUCH_KID Two different machines. One with 32 bit Vista, 8800GT (O/C), client 6.6.20 & 185.86 driver. The other with 64 bit Vista, 9800 GX2 (Not O/C), client 6.6.20 & 182.50 driver. I have now updated both drivers to 185.85, which is latest release.
	ID: 9488 \| Rating: 0 \| rate: / Reply Quote

SkyeHunter Send message Joined: 7 Mar 09 Posts: 12 Credit: 1,254,285 RAC: 0 Level Scientific publications	Message 9492 - Posted: 8 May 2009 \| 22:58:30 UTC - in response to Message 9471.
	Ahh the "never ending wu" bug. What version of BOINC are you running? It seems to have been fixed in 6.6.23 onwards. Indeed, nice description of what happened here. Installed Boinc 6.5.0 and WU picked up nicely where it blocked ... Although it was KASHIF WU, it apparently was the scheduler to blame ....
	ID: 9492 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9493 - Posted: 8 May 2009 \| 23:31:56 UTC
	Well again had a unit error out of 13 hours of work, and looks like the big gun machines run them all fine. I can't go on like this i lost hundreds of hours of time and money for nothing. For the time being i am also shutting down the gpugrid till this issue is solved.
	ID: 9493 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9495 - Posted: 8 May 2009 \| 23:50:11 UTC - in response to Message 9461.
	I am aware that there is hard work going on re finding the cause/fix. If its possible that someone could timeout for 2 mins to advise us all whether you still want the KASHIFs run by lower based cards, I suspect it would help enourmously as we could then abort to leave them to the big guns knowing its not going to cause issues in the bug-finding, and we carry on with the other WUs. At present it seems lots are shutting down from doing anything in the absense of any advice, understandably, but the other WUs seem ok. Just a gentle suggestion ... Regards Zy
	ID: 9495 \| Rating: 0 \| rate: / Reply Quote

naja002 Send message Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level Scientific publications	Message 9499 - Posted: 9 May 2009 \| 4:27:01 UTC Last modified: 9 May 2009 \| 4:51:52 UTC
	The last KASHIF_HIVPR did in fact error out.....No more for me. I'm just going to have to check my rigs 1-2x/day and send them back.... I am aware that there is hard work going on re finding the cause/fix. If its possible that someone could timeout for 2 mins to advise us all whether you still want the KASHIFs run by lower based cards, I suspect it would help enourmously as we could then abort to leave them to the big guns knowing its not going to cause issues in the bug-finding, and we carry on with the other WUs. At present it seems lots are shutting down from doing anything in the absense of any advice, understandably, but the other WUs seem ok. Just a gentle suggestion ... Regards Zy My guess would be that they are still releasing them because they run on the higher end cards. They can still get the work completed. However, if that's the case, then I think the server needs to be setup to issue specific WU to specific cards. The server gets plenty of info from our rigs---so I don't see why that can't be done....
	ID: 9499 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9504 - Posted: 9 May 2009 \| 7:23:36 UTC
	Nothing will likely be done until sometime Monday, I am also at No New Work until problem is resolved. ____________ mike
	ID: 9504 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9506 - Posted: 9 May 2009 \| 8:34:26 UTC - in response to Message 9504.
	The real problem is that we do not understand why these WUs crash. There are several Kashif_XXX workunits and only a set of them does crash on some machines. We will stop the crashing WUs as more testing did not really help. gdf
	ID: 9506 \| Rating: 0 \| rate: / Reply Quote

Bymark Send message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level Scientific publications	Message 9515 - Posted: 9 May 2009 \| 10:34:36 UTC Last modified: 9 May 2009 \| 10:40:31 UTC
	I have a big problem with my new asus 260: hostid=35303 I downgraded all drivers, and now waiting to get more task. "reached daily quota of 4 results" heh ;), Any suggestion? Seti gpus working fine....... ____________ "Silakka" Hello from Turku > Åbo.
	ID: 9515 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9521 - Posted: 9 May 2009 \| 10:57:09 UTC Last modified: 9 May 2009 \| 11:06:44 UTC
	Sadly yes the famous units which we discussing all over the forum
	ID: 9521 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9525 - Posted: 9 May 2009 \| 11:28:04 UTC - in response to Message 9515. Last modified: 9 May 2009 \| 11:30:34 UTC
	I have a big problem with my new asus 260: hostid=35303 I downgraded all drivers, and now waiting to get more task. "reached daily quota of 4 results" heh ;), Any suggestion? Seti gpus working fine....... The ones crashing on that machine are not the suspect WUs that they have now stopped issuing, those crashing on that machine usually run fine. He also has a 260 which is outside the problems, its the lower cards that did have issues in the past. Something else lurketh. No idea what personally, over to the Gurus for that. Regards Zy
	ID: 9525 \| Rating: 0 \| rate: / Reply Quote

Sandro Send message Joined: 19 Aug 08 Posts: 22 Credit: 3,660,304 RAC: 0 Level Scientific publications	Message 9528 - Posted: 9 May 2009 \| 11:59:27 UTC - in response to Message 9464.
	I am right to say that all the problems are related to older cards, like 8800,9800 and so on? Did anyone experience repeated failures on those workunits with a 260,275,295 or 285? gdf Yes. My GTX 260 running under 64bit Ubuntu also crashes WUs <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 938803200 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> exit status: 11 (0xb) <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 938803200 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]>
	ID: 9528 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9530 - Posted: 9 May 2009 \| 12:22:08 UTC Last modified: 9 May 2009 \| 12:33:49 UTC
	Let's gather some of that information: - all failures reported here affect G92 and G9x-class chips - G200 usually runs them just fine - there are some errors with G200 as well, but this could just be the normal error rate - Pauls G92 runs fine (and hopefully others) -> it's a bug which is triggered by a special client configuration - BOINC 6.6.x, 6.5.0 and 6.4.7 are definitely affected -> the version likely doen't matter - driver 185.8x, 185.6x and 182.50 are reported to be affected, but 182.50 for XP32 works for Paul -> did anyone try older drivers? E.g. 182.08, which has a very solid track record - Pauls card has 1 GB of memory, whereas most G92 cards have 512 MB or less Do we have any other reports of G9x cards, which run these tasks fine? Could anyone check the memory consumption of these WUs with RivaTuner? EDIT: only certain WUs of the "IBUCH_KID" and "KASHIF_HIVPR" series are affected. Do we know which ones? Are the ones which work for Pauls card by pure coincidence all of the type which works? For example my 9800GTX+ 512MB on Vista 64, 185.66 and 6.5.0 finished: 88-KASHIF_HIVPR_dim_ba2-2-100-RND8763_0 7-KASHIF_HIVPR_mon_ba5-6-100-RND3602_1 57-KASHIF_HIVPR_mon_ba4-4-100-RND1833_1 and failed 79-KASHIF_HIVPR_n1_for_ba1-4-100-RND9984_0 175-IBUCH_KID_shao_ba1-1-100-RND4198_2 93-IBUCH_KID_shao_ba2-0-100-RND9546_1 MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9530 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9534 - Posted: 9 May 2009 \| 13:09:50 UTC
	I am on 6.4.5 and use either 177.82 or 180.22 on Ubuntu 64. I have had many failures on all cards Except my 260's[192/216] ____________ mike
	ID: 9534 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9538 - Posted: 9 May 2009 \| 13:39:04 UTC - in response to Message 9530.
	Let's gather some of that information: - all failures reported here affect G92 and G9x-class chips - G200 usually runs them just fine - there are some errors with G200 as well, but this could just be the normal error rate - Pauls G92 runs fine (and hopefully others) -> it's a bug which is triggered by a special client configuration - BOINC 6.6.x, 6.5.0 and 6.4.7 are definitely affected -> the version likely doen't matter - driver 185.8x, 185.6x and 182.50 are reported to be affected, but 182.50 for XP32 works for Paul -> did anyone try older drivers? E.g. 182.08, which has a very solid track record - Pauls card has 1 GB of memory, whereas most G92 cards have 512 MB or less Do we have any other reports of G9x cards, which run these tasks fine? Could anyone check the memory consumption of these WUs with RivaTuner? EDIT: only certain WUs of the "IBUCH_KID" and "KASHIF_HIVPR" series are affected. Do we know which ones? Are the ones which work for Pauls card by pure coincidence all of the type which works? For example my 9800GTX+ 512MB on Vista 64, 185.66 and 6.5.0 finished: 88-KASHIF_HIVPR_dim_ba2-2-100-RND8763_0 7-KASHIF_HIVPR_mon_ba5-6-100-RND3602_1 57-KASHIF_HIVPR_mon_ba4-4-100-RND1833_1 and failed 79-KASHIF_HIVPR_n1_for_ba1-4-100-RND9984_0 175-IBUCH_KID_shao_ba1-1-100-RND4198_2 93-IBUCH_KID_shao_ba2-0-100-RND9546_1 MrS I have 4 machines with GTS250's (512Mb). They are running under XP32 with 182.50 drivers and seem fine. I have an i7 with dual GTX260's. It is running under XP32 with 182.50 drivers and also seems fine. I had problems a week ago with 185.xx (beta) drivers and uninstalled them before reinstalling 182.50 drivers. Problems seemed to go away after that. All machines currently running BOINC 6.6.28. I had one IBUCH_KID wu, which I aborted after seeing post from GDF regarding them being in error. KASHIF_HIVPR seem fine. ____________ BOINC blog
	ID: 9538 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9540 - Posted: 9 May 2009 \| 13:45:59 UTC - in response to Message 9538.
	Oh, so it also affects linux. MAybe it's not much point searching for windows and drivers versions then. I had one IBUCH_KID wu, which I aborted after seeing post from GDF regarding them being in error. KASHIF_HIVPR seem fine. Some WUs of both series are affected, but not on G200 based cards (GTX 2xx). MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9540 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9545 - Posted: 9 May 2009 \| 14:15:43 UTC
	Well, I just had a crash on the i7 67-KASHIF_HIVPR_n1_for_ba3-2-100-RND8737, this is a task that died at least twice before. The thing is, I was playing a game at the time. Low intensity turn based strategy game. But, I cannot say if that had any effect. THe game seemed to die and the graphics driver crashed. That said, the other tasks in progress seemed to stay Ok ... More interesting is that there were three different errors ... Of course, the task was run on three different class cards. And I am running BOINC 6.6.28 on that machine ... still 182.50 drivers though.
	ID: 9545 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9549 - Posted: 9 May 2009 \| 14:32:07 UTC - in response to Message 9545. Last modified: 9 May 2009 \| 14:34:33 UTC
	I have been having a closer look at my errors , and a few from others. This bares some checking, but it appears on the face of it that the crashed ones do have a common element "signal 11". The "h-bond" message is a red herring to this. as it refers to the "Amber" processes (is that right ?), no matter the detail, it was cleared up in another thread as a non issue, just a text message re the internal processes in the WU, not its validity as a successful WU. "Signal 11" does appear vertually every time from the ones I looked at. I am aware signal 11 is an issue way down in the Communication Layer - which in itself rings a bell considering the way current problems effects some cards and not others - some operating systems not others - but I have no idea of where to take that logic further, or even if indeed it has validity, I dont have that level of knowledge. Signal 11 I am aware can appear for many many reasons, and can be difficult to work out what the reason is, but if its the case this time, at least its the start down the right road. Regards Zy
	ID: 9549 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9553 - Posted: 9 May 2009 \| 15:20:44 UTC
	@Zydor: I don't see "signal 11", neither in my nor in your latest results. @Paul: that's number 3 of these tasks which have failed on a G200 card. But the circumstances were slightly unusual.. not sure if it means anything. @all: ouch, 2 more errors for me: - "30-KASHIF_HIVPR_dim_ba3-4-100-RND0655_0" - seems "normal" - "p2690000-IBUCH_pYIpYVkp01_0705-2-10-RND1281_1" - not normal The second task registered only 3s cpu time, so it may have happened while the driver was still restarting. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9553 \| Rating: 0 \| rate: / Reply Quote

Bymark Send message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level Scientific publications	Message 9559 - Posted: 9 May 2009 \| 16:27:50 UTC - in response to Message 9525. Last modified: 9 May 2009 \| 16:50:40 UTC
	I have a big problem with my new asus 260: hostid=35303 I downgraded all drivers, and now waiting to get more task. "reached daily quota of 4 results" heh ;), Any suggestion? Seti gpus working fine....... The ones crashing on that machine are not the suspect WUs that they have now stopped issuing, those crashing on that machine usually run fine. He also has a 260 which is outside the problems, its the lower cards that did have issues in the past. Something else lurketh. No idea what personally, over to the Gurus for that. Regards Zy Now i have exactly the same drivers boinc etc. as my fine working ati 260. Still waiting for new wu's, seti is working fine, same power 550w all should be identical, maybe a hardware problem but then I don't understand why seti gpus working without failure. Runnig one seti Gpu: Seti acount for same computer Hardware monitor ----------------------------------------------------- AMD Athlon 64 X2 5600+ hardware monitor Temperature sensor 0 33°C (91°F) [0x149] (Core #0) Temperature sensor 1 38°C (99°F) [0x15A] (Core #1) Dump hardware monitor Hardware monitor ----------------------------------------------------- GeForce GTX 260 hardware monitor Temperature sensor 0 71°C (159°F) [0x47] (GPU Core) ____________ "Silakka" Hello from Turku > Åbo.
	ID: 9559 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9560 - Posted: 9 May 2009 \| 18:08:31 UTC - in response to Message 9559.
	Well, you also got >6 errors a day, but your problem is totally unrelated to what is being discussed int his thread. Might help to ask in a separate thread, if you need further assistence. Do 3D Mark and/or Furmark run on your card? Seti stresses the hardware less than GPU-Grid. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9560 \| Rating: 0 \| rate: / Reply Quote

[AF] Profanateur Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level Scientific publications	Message 9562 - Posted: 9 May 2009 \| 19:03:31 UTC - in response to Message 9560.
	And for my pbs ? with driver other than 182.5.
	ID: 9562 \| Rating: 0 \| rate: / Reply Quote

Aardvark Send message Joined: 27 Nov 08 Posts: 28 Credit: 82,362,324 RAC: 0 Level Scientific publications	Message 9567 - Posted: 10 May 2009 \| 0:34:22 UTC
	Success on 52-KASHIF_HIVPR_mon_ba3-7-100-RND3244_0. 64 bit Vista, 9800 GX2 (Not O/C), client 6.6.20 & 182.85 driver.
	ID: 9567 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9578 - Posted: 10 May 2009 \| 8:17:23 UTC
	Aardvark, so far the "KASHIF_HIVPR_mon" have also been fine for my machine. Thanks for the info.. seems like these are indeed not the trouble makers. Profanateur, if I remember correctly you have a separate thread regarding your problem elsewhere. And since on your machine all WUs error you are facing a different problems than what is discussed here. I think I wrote some suggestions in that other thread.. well, I hope. At least I wanted to write something ;) What do you mean by pbs? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9578 \| Rating: 0 \| rate: / Reply Quote

[AF] Profanateur Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level Scientific publications	Message 9581 - Posted: 10 May 2009 \| 8:49:47 UTC
	pbs =problems=failure. sorry but I'm french.
	ID: 9581 \| Rating: 0 \| rate: / Reply Quote

boincwoman Send message Joined: 9 May 09 Posts: 1 Credit: 2,096,817 RAC: 0 Level Scientific publications	Message 9585 - Posted: 10 May 2009 \| 11:31:08 UTC
	I'm new here. Have errors with this: 75-IBUCH_HIVPR_mon_ba8-4-100-RND5234 id: 451357 100-KASHIF_HIVPR_n1_for_ba4-4-100-RND3172 id: 448737 Shuttle XPC Vista Enterprise 64 bit 2 Gb ram AMD Opteron 2.4 GHz model 180 Geeforce 9400GT 1 Gb ram newly bought Boinc 6.6.20 ComputerID: 35365 The Boincwoman
	ID: 9585 \| Rating: 0 \| rate: / Reply Quote

refla Send message Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level Scientific publications	Message 9586 - Posted: 10 May 2009 \| 12:12:50 UTC - in response to Message 9530. Last modified: 10 May 2009 \| 12:15:06 UTC
	xp/32 + 9600GT@181.20 + BOINC6.4.5 cannot survive!
	ID: 9586 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9588 - Posted: 10 May 2009 \| 12:25:02 UTC - in response to Message 9586.
	Refla, not sure what you mean. You only have successful WUs and others which are listed as "aborted by user". Sure, they can't survive if you abort them ;) Boincwoman, your machine has not completed any WUs so far. So i'm not sure if we can attribute your failure of the "IBUCH_HIVPR" to the error discussed here. If your card is passively cooled it may be overheating (check with GPU-Z and report temperatures). Otherwise your setup should be fine. However, the card is very slow: it has 16 shaders ("stream processors"), whereas at least 50 are officially recommended (FAQ). You'll have problems to meat the GPU-Grid deadlines and you may want to take a look at seti for your GPU. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9588 \| Rating: 0 \| rate: / Reply Quote

[AF] Profanateur Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level Scientific publications	Message 9605 - Posted: 10 May 2009 \| 20:18:45 UTC
	Errors todays : 10/05/2009 10:53:19 GPUGRID Output file p1760000-IBUCH_pYIpYVkp01_0705-4-10-RND5135_0_1 for task p1760000-IBUCH_pYIpYVkp01_0705-4-10-RND5135_0 absent 10/05/2009 16:56:28 GPUGRID Output file p2750000-IBUCH_pYIpYVkp01_0705-4-10-RND5064_1_1 for task p2750000-IBUCH_pYIpYVkp01_0705-4-10-RND5064_1 absent
	ID: 9605 \| Rating: 0 \| rate: / Reply Quote

refla Send message Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level Scientific publications	Message 9608 - Posted: 10 May 2009 \| 21:11:16 UTC - in response to Message 9588.
	ETA: I aborted them because WUs' progress has not advanced in a long time(at least more than 1 hour). The situation has not changed even I rebooted my computer. After 2 WUs, I deem if the last number in the task name more than zero, it should be a bad WU. Details in http://www.gpugrid.net/forum_thread.php?id=1041 My English is not good enough, I hope you can understand what I mean. :)
	ID: 9608 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9613 - Posted: 10 May 2009 \| 21:48:37 UTC
	Profanateur, your problem is not related to what is being discussed here. Very many of your WUs error, this is different from the "KASHIF_HIVPR" and "IBUCH_KID" issue. You actually completed some, so your software should be fine. However, you are running a very new driver and two overclocked cards, which are very different. All of these or their combination could lead to problems. I suggest you start a new thread (instead of posting a little in different threads), write down your current config (software versions, clocks, GPU temperatures) and then change some parameters, document the changes and see if it helps. By that I mean - run only 1 of the cards to see if one is broken - reduce all clocks to standard values - run other stability tests - try well-tested drivers like 182.50 or 182.08 - maybe more If you do that we (or you yourself ;) should be able to get you going. Regards, MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9613 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9615 - Posted: 10 May 2009 \| 21:57:31 UTC - in response to Message 9608.
	refla, that's strange. You're running 6.4.5, so you shouldn't be affected by the slow-6.6.20-bug. Also most of your canceled WUs may belong to the critical "KASHIF_HIVPR" and "IBUCH_KID" series, but some were also "IBUCH_pYIpYVkp01", which have not been reported to fail massively. Furthermore your WUs are crunched just fine on G200-based cards, whereas no G9x returned any of them. Sorry, don't know what this means.. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9615 \| Rating: 0 \| rate: / Reply Quote

refla Send message Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level Scientific publications	Message 9627 - Posted: 11 May 2009 \| 3:55:09 UTC - in response to Message 9615.
	ETA, please tell me how to avoid/recover the case that WU's progress freezes. You can see not only me who met this case. Before I abandoned them, other GPUGriders have done the same operation.
	ID: 9627 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9630 - Posted: 11 May 2009 \| 10:12:05 UTC - in response to Message 9627.
	ETA, please tell me how to avoid/recover the case that WU's progress freezes. You can see not only me who met this case. Before I abandoned them, other GPUGriders have done the same operation. @refla: I would suggest you switch to BOINC 6.6.23. Your driver version is not shown, but as ETA has said above I would suggest 182.50 drivers as they seem to be reliable. ____________ BOINC blog
	ID: 9630 \| Rating: 0 \| rate: / Reply Quote

palmss Send message Joined: 28 Aug 08 Posts: 7 Credit: 60,897,550 RAC: 0 Level Scientific publications	Message 9631 - Posted: 11 May 2009 \| 10:41:28 UTC
	Hi I have another error(Kernel [nb_k] failed in file 'nb.cu' in line 202 : unknown error) on a new type of WU http://www.gpugrid.net/result.php?resultid=645509
	ID: 9631 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9632 - Posted: 11 May 2009 \| 11:08:39 UTC - in response to Message 9631.
	Hi I have another error(Kernel [nb_k] failed in file 'nb.cu' in line 202 : unknown error) on a new type of WU http://www.gpugrid.net/result.php?resultid=645509 What driver version are you using? ____________ BOINC blog
	ID: 9632 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9633 - Posted: 11 May 2009 \| 11:23:09 UTC Last modified: 11 May 2009 \| 11:23:33 UTC
	Have the "EVIL" work units been disabled or deleted? I have stopped work on 8[250's and below] of my cards. The two 260s are doing OK. ____________ mike
	ID: 9633 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9635 - Posted: 11 May 2009 \| 12:25:39 UTC - in response to Message 9633. Last modified: 11 May 2009 \| 12:27:45 UTC
	Yes they stopped issuing the suspect ones on Saturday, its not all KASHIF's that are suspect, there are several types of KASHIF WUs, it was only one particular type of KASHIF WU that was giving grief. See http://www.gpugrid.net/forum_thread.php?id=1034&nowrap=true#9506 Regards Zy
	ID: 9635 \| Rating: 0 \| rate: / Reply Quote

[AF] Profanateur Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level Scientific publications	Message 9644 - Posted: 11 May 2009 \| 17:20:39 UTC - in response to Message 9613. Last modified: 11 May 2009 \| 17:21:29 UTC
	Profanateur, your problem is not related to what is being discussed here. Very many of your WUs error, this is different from the "KASHIF_HIVPR" and "IBUCH_KID" issue. You actually completed some, so your software should be fine. However, you are running a very new driver and two overclocked cards, which are very different. All of these or their combination could lead to problems. I suggest you start a new thread (instead of posting a little in different threads), write down your current config (software versions, clocks, GPU temperatures) and then change some parameters, document the changes and see if it helps. By that I mean - run only 1 of the cards to see if one is broken - reduce all clocks to standard values - run other stability tests - try well-tested drivers like 182.50 or 182.08 - maybe more If you do that we (or you yourself ;) should be able to get you going. Regards, MrS I have no errors with 182.50. I said that from beginning.
	ID: 9644 \| Rating: 0 \| rate: / Reply Quote

Bymark Send message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level Scientific publications	Message 9645 - Posted: 11 May 2009 \| 18:11:52 UTC - in response to Message 9560. Last modified: 11 May 2009 \| 18:56:22 UTC
	My solution on the 260 was Boinc 6.4.7 and driver 178.28, now working as a train. Slow but getting faster, like a first mosquito this summer today in Turku Finland. Thomas Bymark ____________ "Silakka" Hello from Turku > Åbo.
	ID: 9645 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9649 - Posted: 11 May 2009 \| 20:48:02 UTC - in response to Message 9644.
	Profanateur wrote: I have no errors with 182.50. I said that from beginning. Actually you said "And for my pbs ? with driver other than 182.5." Which I understand as "I'm not interested in my problems with 182.50, only in the problems with other drivers". Well, no. Actually when I read that post I thought something like "Isn't that the guy with many errors and the exotic setup? What does he want to say?" Now that I know I understand you. So if you know 182.50 works, why don't you use it? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9649 \| Rating: 0 \| rate: / Reply Quote

[AF] Profanateur Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level Scientific publications	Message 9653 - Posted: 11 May 2009 \| 21:11:30 UTC
	'cause I want last release to have Occlusion ambiant in game.
	ID: 9653 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9654 - Posted: 11 May 2009 \| 21:15:55 UTC - in response to Message 9653.
	Then you'll be glad to hear about this ;) MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9654 \| Rating: 0 \| rate: / Reply Quote

Andrew Send message Joined: 9 Dec 08 Posts: 29 Credit: 18,754,468 RAC: 0 Level Scientific publications	Message 9655 - Posted: 11 May 2009 \| 21:43:53 UTC Last modified: 11 May 2009 \| 21:47:27 UTC
	I had 2 fail on my 8800GT, one on 5th May, and one right now. My screen actually went black for a few seconds and I briefly saw windows error reporting in process explorer! Driver version 182.50 I believe. Card was stock clocks at the time (fine). 5th May one was 159-IBUCH_KID_shao_ba1-0-100-RND5509_1: and the one just now was 53-KASHIF_HIVPR_n1_for_ba1-2-100-RND0722_1: which had the swizzle error others have described.
	ID: 9655 \| Rating: 0 \| rate: / Reply Quote

palmss Send message Joined: 28 Aug 08 Posts: 7 Credit: 60,897,550 RAC: 0 Level Scientific publications	Message 9658 - Posted: 11 May 2009 \| 22:46:02 UTC - in response to Message 9632.
	Hi I have another error(Kernel [nb_k] failed in file 'nb.cu' in line 202 : unknown error) on a new type of WU http://www.gpugrid.net/result.php?resultid=645509 What driver version are you using? I have the version 181.22 driver
	ID: 9658 \| Rating: 0 \| rate: / Reply Quote

refla Send message Joined: 12 Feb 09 Posts: 9 Credit: 385,357 RAC: 0 Level Scientific publications	Message 9663 - Posted: 12 May 2009 \| 4:37:40 UTC - in response to Message 9630.
	ETA, please tell me how to avoid/recover the case that WU's progress freezes. You can see not only me who met this case. Before I abandoned them, other GPUGriders have done the same operation. @refla: I would suggest you switch to BOINC 6.6.23. Your driver version is not shown, but as ETA has said above I would suggest 182.50 drivers as they seem to be reliable. MarkJ: Thanks, I will test it. :)
	ID: 9663 \| Rating: 0 \| rate: / Reply Quote

naja002 Send message Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level Scientific publications	Message 9697 - Posted: 13 May 2009 \| 1:32:41 UTC - in response to Message 9649. Last modified: 13 May 2009 \| 1:40:24 UTC
	Well, no. Actually when I read that post I thought something like "Isn't that the guy with many errors and the exotic setup? What does he want to say?" MrS That may be me. If so, the many errors are from 2 sources: my fault and not my fault ;) Some of these WUs are a nightmare and I don't accept responsibility for that. However, I have had an issue or 3 on my end...those things I understand and accept responsibility for...;) The i7 upgrade produced a lot of initial errors, because of driver compatibility. I've produced 1 successful WU after another for long periods of time. When I start to develop issues--I try to sort it out and get it straight, but when the issues are really not on my end...there's not much that I can do except ride it out. But I can say that I've used the 185.26 driver on 3 rigs (initially 4) for a month before all these issues arose. So, the issue is with the WUs being incompatible with the driver v. the Driver being incompatible with the WUs. In other words, any incompatibility change is in the WUs....not the driver. I cannot speak for any other version of 185.xx though... Also, IBUCH_pYIpYVkp01_0705-4-10 seems to be another WU with issues, but I think that is already known.... HTH
	ID: 9697 \| Rating: 0 \| rate: / Reply Quote

ignasi Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level Scientific publications	Message 9745 - Posted: 14 May 2009 \| 9:50:14 UTC - in response to Message 9697.
	Also, IBUCH_pYIpYVkp01_0705-4-10 seems to be another WU with issues, but I think that is already known.... HTH This runs fine actually. Are you referring to any error in particular? ignasi
	ID: 9745 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9748 - Posted: 14 May 2009 \| 11:23:34 UTC Last modified: 14 May 2009 \| 11:24:13 UTC
	Just had a KASHIF go bang. Its appears to be the old hassles on the face of it - just highlighting it for the record due to recent hassles with some KASHIFs. http://www.gpugrid.net/result.php?resultid=667592 It had been running for 11hrs15 so was different from the others I had go - they were early, this was late in processing, almost finished when it went. "One of those things" I suspect. The network connection was down at the time, a major BT Network fault that had been extant for nearly 24 hrs, the latter should have had no affect, just mentioned for completeness as it was down when the WU went bang. Regards Zy
	ID: 9748 \| Rating: 0 \| rate: / Reply Quote

naja002 Send message Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level Scientific publications	Message 9750 - Posted: 14 May 2009 \| 12:53:20 UTC - in response to Message 9745.
	Also, IBUCH_pYIpYVkp01_0705-4-10 seems to be another WU with issues, but I think that is already known.... HTH This runs fine actually. Are you referring to any error in particular? ignasi p3400000-IBUCH_pYIpYVkp01_0705-4-10-RND9113 p1390000-IBUCH_pYIpYVkp01_0705-3-10-RND2928 p2200000-IBUCH_pYIpYVk52804-9-10-RND5157 I'm not sure what the quadro cards are equivalent to....8 series, 9, 200.....
	ID: 9750 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9754 - Posted: 14 May 2009 \| 14:23:29 UTC - in response to Message 9750.
	Please look at the driver thread. gdf
	ID: 9754 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : 6 Errors Today [Problems with "KASHIF_HIVPR" and "IBUCH_KID"-WUs]

	About	Science	Volunteers	Performance	Forum	Join us	Donate