Recent problems for WUs on older GPUs

Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs

Author	Message
GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9642 - Posted: 11 May 2009 \| 17:05:36 UTC
	We are having problems with several workunits and GPUs which are NOT 260/275/285/295. As we test on newer cards, we have not spotted the problem before. The problem appears only for workunits using Amber format (all the KASHIF ones). We are now removed all that we could remove, but left some KASHIF out as they do run on newer cards just fine. We are testing KASHIF_HIV_* on two 8800 cards under windows and Linux , running fine so far. Keep you updated. gdf
	ID: 9642 \| Rating: 0 \| rate: / Reply Quote

Blackbird74 Send message Joined: 20 Nov 08 Posts: 3 Credit: 362,118 RAC: 0 Level Scientific publications	Message 9670 - Posted: 12 May 2009 \| 12:38:36 UTC - in response to Message 9642. Last modified: 12 May 2009 \| 12:39:03 UTC
	I had a bunch of compute errors on my 8800GT, but then the latest KASHIF_HIVPR completed OK over a couple of days. Full task list: http://www.gpugrid.net/results.php?userid=9833 Latest KASHIF_HIVPR WU completed fine: http://www.gpugrid.net/workunit.php?wuid=449234 Comp specs: http://www.gpugrid.net/show_host_detail.php?hostid=17613 Doesn't seem much rhyme nor reason to the fails other than the recent probs with WUs in general (blackout).
	ID: 9670 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9673 - Posted: 12 May 2009 \| 13:17:43 UTC - in response to Message 9670. Last modified: 12 May 2009 \| 13:36:30 UTC
	Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. In the next few days we will perform a server update and application updates to use CUDA2.2. gdf
	ID: 9673 \| Rating: 0 \| rate: / Reply Quote

TomaszPawel Send message Joined: 18 Aug 08 Posts: 121 Credit: 59,836,411 RAC: 0 Level Scientific publications	Message 9678 - Posted: 12 May 2009 \| 19:46:57 UTC - in response to Message 9673.
	2009-05-12 15:08:32 GPUGRID Starting task 4-KASHIF_HIVPRFE_dim_ba1-2-4-RND6858_0 using acemd version 664 hmmm it is now 39.8% after 6:40H, and it's says that it remains 10H.... Is it normal on GTX260 and 182.08 and 6.6.20 and XP 32 ? ____________ POLISH NATIONAL TEAM - Join! Crunch! Win!
	ID: 9678 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9682 - Posted: 12 May 2009 \| 20:36:42 UTC - in response to Message 9673. Last modified: 12 May 2009 \| 20:39:12 UTC
	Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. Or people avoiding them like the plague. People on our team have been reporting stuck and failed WUs like never before. In the next few days we will perform a server update and application updates to use CUDA2.2. Will we still be able to use our older non-CUDA2.2 cards?
	ID: 9682 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9687 - Posted: 12 May 2009 \| 21:34:54 UTC - in response to Message 9682.
	Will we still be able to use our older non-CUDA2.2 cards? That's just the software version and depends on the driver. There's also the CUDA hardware capability, which is the critical one. This one should stay as it was before (minimum of 1.1 required). Thomasz, your GTX 260 is not exactly an older card (as stated in the first post of this thread). MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9687 \| Rating: 0 \| rate: / Reply Quote

TomaszPawel Send message Joined: 18 Aug 08 Posts: 121 Credit: 59,836,411 RAC: 0 Level Scientific publications	Message 9689 - Posted: 12 May 2009 \| 22:09:10 UTC - in response to Message 9687.
	It is as clear as crystal ... But it usually crunch 7-8h a WU not 17!!! And in this tread this type of WU is mentioned so maby it is relevant? ____________ POLISH NATIONAL TEAM - Join! Crunch! Win!
	ID: 9689 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9691 - Posted: 12 May 2009 \| 22:23:59 UTC - in response to Message 9689.
	Alright.. could be the usual 6.6.20 bug. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9691 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9693 - Posted: 12 May 2009 \| 23:21:21 UTC - in response to Message 9691.
	Alright.. could be the usual 6.6.20 bug. MrS Sadly I may have seen it on a 6.6.23 processed task. That means that the real problem has not been addressed, though the changes in 6.6.23 and later make it better, but not cured.
	ID: 9693 \| Rating: 0 \| rate: / Reply Quote

The Brain QC Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level Scientific publications	Message 9699 - Posted: 13 May 2009 \| 8:58:22 UTC - in response to Message 9693.
	Have 5-KASHIF_HIVPR_dim_ba1-4-100-RND6112_0 using acemd version 664 running since 21 hours on 9800gx2, 68% done, never had such long wu on gpugrid, usually i make like 3/4 wus in 21 hour. Hope credit will be as great as the time it takes to compute ;).
	ID: 9699 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9701 - Posted: 13 May 2009 \| 9:47:29 UTC - in response to Message 9673. Last modified: 13 May 2009 \| 9:58:36 UTC
	In the next few days we will perform a server update and application updates to use CUDA2.2. gdf So do we need to upgrade to 185.85 drivers and cuda 2.2 dll's? Or will the app work out which cuda version and only use the instruction set that is supported? Will GPUgrid download the cuda 2.2 dll's or will we need to put them somewhere (like the projects\gpugrid folder) when the new app is released? Oh and seeing as you are changing the app, is there a chance you could report the driver version and the cuda version in the wu info. It might help with the debugging. core_client_version>6.6.28</core_client_version> <![CDATA[ <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTS 250" # Clock rate: 1836000 kilohertz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 16 # Number of cores: 128 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" # Time per step: 46.163 ms # Approximate elapsed time for entire WU: 46163.094 s called boinc_finish </stderr_txt> ]]> ____________ BOINC blog
	ID: 9701 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9707 - Posted: 13 May 2009 \| 12:12:15 UTC Last modified: 13 May 2009 \| 12:18:19 UTC
	Well i downloaded as allways both the driver and the cuda toolkit from nvidia site. After the initial pause on gpugrid i have to report that i did not have a failing unit for a few days now. Not sure if anyone else does download the Cuda toolkit or just the driver. I am almost done with the test unit which finishes in about an 1/2 hour or so I hope the new received IBUCH ones will finish also without issues. If they all finish without error i start to get the feeling the problems are solved .... i hope :D
	ID: 9707 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9711 - Posted: 13 May 2009 \| 13:42:36 UTC - in response to Message 9673. Last modified: 13 May 2009 \| 14:07:26 UTC
	Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. In the next few days we will perform a server update and application updates to use CUDA2.2. gdf Just finished looking at a LOT of KASHIF_HIVPR WUs. The situation is not improving at all and is not a driver issue. What happens is these WUs are downloaded and either fail or are aborted repeatedly until they happen to be assigned to a GTX 260 or above, then they complete. The problem is not fixed and is not improving. IMO it needs to be dealt with ASAP. Here's just a few examples: http://www.gpugrid.net/workunit.php?wuid=440561 http://www.gpugrid.net/workunit.php?wuid=442250 http://www.gpugrid.net/workunit.php?wuid=454479 http://www.gpugrid.net/workunit.php?wuid=449101 http://www.gpugrid.net/workunit.php?wuid=457871 http://www.gpugrid.net/workunit.php?wuid=458509
	ID: 9711 \| Rating: 0 \| rate: / Reply Quote

TomaszPawel Send message Joined: 18 Aug 08 Posts: 121 Credit: 59,836,411 RAC: 0 Level Scientific publications	Message 9713 - Posted: 13 May 2009 \| 14:40:23 UTC - in response to Message 9711. Last modified: 13 May 2009 \| 14:41:25 UTC
	2009-05-12 15:08:32 GPUGRID Starting task 4-KASHIF_HIVPRFE_dim_ba1-2-4-RND6858_0 using acemd version 664 hmmm it is now 39.8% after 6:40H, and it's says that it remains 10H.... Is it normal on GTX260 and 182.08 and 6.6.20 and XP 32 ?" whell, now it crunch that WU 18H and it is 83%, it says 3h30min remaining... ____________ POLISH NATIONAL TEAM - Join! Crunch! Win!
	ID: 9713 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9714 - Posted: 13 May 2009 \| 14:52:02 UTC - in response to Message 9713.
	CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. gdf
	ID: 9714 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9715 - Posted: 13 May 2009 \| 15:01:22 UTC - in response to Message 9714.
	CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. gdf Are you saying that without 185 version drivers we will not be able to successfully do GPU Grid work. I have card/box combinations that will not accept that version and run properly. If 185 version driver and above is "required" to crunch here, I will be taking my farm to FAH. ____________ mike
	ID: 9715 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9716 - Posted: 13 May 2009 \| 15:06:05 UTC
	The test unit ended also without problem and new ibuchs on the way. I haven't had any cancelled other then one being in queue for almost 2 days so nothing speical on that. I am still running the 185.85 and boinc 6.6.28 Except the usual problems with boinc issues like fetch and such it runs stable for me, my slow 9600 Gt seems to do well. But i had to lower my clock on my cpu since i had to disable my watercooling, the 9850 BE is a hothead because with the huge cooler on it becomes 55 C. But its today extremly warm here i measured 31 C in the room ambient temp.
	ID: 9716 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9717 - Posted: 13 May 2009 \| 15:31:00 UTC - in response to Message 9716.
	The test unit ended also without problem and new ibuchs on the way. I haven't had any cancelled other then one being in queue for almost 2 days so nothing speical on that. I am still running the 185.85 and boinc 6.6.28 Except the usual problems with boinc issues like fetch and such it runs stable for me, my slow 9600 Gt seems to do well. But i had to lower my clock on my cpu since i had to disable my watercooling, the 9850 BE is a hothead because with the huge cooler on it becomes 55 C. But its today extremly warm here i measured 31 C in the room ambient temp. Your computers are hidden so how can we verify?
	ID: 9717 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9719 - Posted: 13 May 2009 \| 17:06:28 UTC - in response to Message 9711. Last modified: 13 May 2009 \| 17:09:56 UTC
	Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. In the next few days we will perform a server update and application updates to use CUDA2.2. gdf Just finished looking at a LOT of KASHIF_HIVPR WUs. The situation is not improving at all and is not a driver issue. What happens is these WUs are downloaded and either fail or are aborted repeatedly until they happen to be assigned to a GTX 260 or above, then they complete. The problem is not fixed and is not improving. IMO it needs to be dealt with ASAP. Here's just a few examples: http://www.gpugrid.net/workunit.php?wuid=440561 http://www.gpugrid.net/workunit.php?wuid=442250 http://www.gpugrid.net/workunit.php?wuid=454479 http://www.gpugrid.net/workunit.php?wuid=449101 http://www.gpugrid.net/workunit.php?wuid=457871 http://www.gpugrid.net/workunit.php?wuid=458509 Here's a new KASHIF_HIVPR that was just downloaded to me (and I aborted). Notice that it just caused an error on a GTX 260 {after running a long time I might add). http://www.gpugrid.net/workunit.php?wuid=459189 That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. Take a look for yourself: http://www.gpugrid.net/results.php?hostid=32169 It sure looks like the KASHIF_HIVPR problem also bites the faster cards, just not as often. Our team members have also been reporting the same problem on the GTX 260 and above. So it's documented. Any chance of getting this fixed?
	ID: 9719 \| Rating: 0 \| rate: / Reply Quote

TomaszPawel Send message Joined: 18 Aug 08 Posts: 121 Credit: 59,836,411 RAC: 0 Level Scientific publications	Message 9720 - Posted: 13 May 2009 \| 18:14:44 UTC - in response to Message 9713. Last modified: 13 May 2009 \| 18:16:05 UTC
	2009-05-12 15:08:32 GPUGRID Starting task 4-KASHIF_HIVPRFE_dim_ba1-2-4-RND6858_0 using acemd version 664 hmmm it is now 39.8% after 6:40H, and it's says that it remains 10H.... Is it normal on GTX260 and 182.08 and 6.6.20 and XP 32 ?" whell, now it crunch that WU 18H and it is 83%, it says 3h30min remaining... lol after 24h of crunching - 3600 pionts... ____________ POLISH NATIONAL TEAM - Join! Crunch! Win!
	ID: 9720 \| Rating: 0 \| rate: / Reply Quote

Bymark Send message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level Scientific publications	Message 9721 - Posted: 13 May 2009 \| 18:20:00 UTC - in response to Message 9719.
	Yep, the best driver for a 260 is Boinc 6.4.7 and driver 178.28. and cuda 2. Working fine......... Today the error rate for Kashif wus is lower, so it could have been a problem with drivers. In the next few days we will perform a server update and application updates to use CUDA2.2. gdf Just finished looking at a LOT of KASHIF_HIVPR WUs. The situation is not improving at all and is not a driver issue. What happens is these WUs are downloaded and either fail or are aborted repeatedly until they happen to be assigned to a GTX 260 or above, then they complete. The problem is not fixed and is not improving. IMO it needs to be dealt with ASAP. Here's just a few examples: http://www.gpugrid.net/workunit.php?wuid=440561 http://www.gpugrid.net/workunit.php?wuid=442250 http://www.gpugrid.net/workunit.php?wuid=454479 http://www.gpugrid.net/workunit.php?wuid=449101 http://www.gpugrid.net/workunit.php?wuid=457871 http://www.gpugrid.net/workunit.php?wuid=458509 Here's a new KASHIF_HIVPR that was just downloaded to me (and I aborted). Notice that it just caused an error on a GTX 260 {after running a long time I might add). http://www.gpugrid.net/workunit.php?wuid=459189 That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. Take a look for yourself: http://www.gpugrid.net/results.php?hostid=32169 It sure looks like the KASHIF_HIVPR problem also bites the faster cards, just not as often. Our team members have also been reporting the same problem on the GTX 260 and above. So it's documented. Any chance of getting this fixed? ____________ "Silakka" Hello from Turku > Åbo.
	ID: 9721 \| Rating: 0 \| rate: / Reply Quote

Alain Maes Send message Joined: 8 Sep 08 Posts: 63 Credit: 1,437,484,959 RAC: 9,643 Level Scientific publications	Message 9722 - Posted: 13 May 2009 \| 18:31:01 UTC - in response to Message 9714. Last modified: 13 May 2009 \| 18:31:51 UTC
	Ubuntu 9.04 comes standard with driver version 180.44, which avoids so far to have to fiddle with manual interventions. Wiil they follow before or after GPUGRID decides to require the 185 version drivers? If a manual update of the Linux community is required, please advise in advance. Many thanks Kind regards Alain
	ID: 9722 \| Rating: 0 \| rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 370,320,941 RAC: 0 Level Scientific publications	Message 9723 - Posted: 13 May 2009 \| 19:06:03 UTC - in response to Message 9719.
	That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260.
	ID: 9723 \| Rating: 0 \| rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 370,320,941 RAC: 0 Level Scientific publications	Message 9724 - Posted: 13 May 2009 \| 19:12:45 UTC
	In light of the issues with the older GPU's and the KASHIR_HIVPR WU's, what is the best version of nvidia driver to use? I have been aborting them when I see them, to get them over to a 200 series as quick as possible. I don't think it is beneficial for the project for me to let this sit in my queue for 12 hours, then run for another several before failing anyway. I'd prefer not to babysit, so should I roll back my current 185.66 to the last WHQL approved non-185.xx driver, which is 182.50? I guess I could just try this and report the results, but I wanted to know if anyone has already tried this 182.50 driver w/ an older (non-200-series) card.
	ID: 9724 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9725 - Posted: 13 May 2009 \| 20:21:33 UTC - in response to Message 9723.
	That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260. You're right. Not my machine and I didn't see the 2 cards. But OK here's an example from a machine with only a GTX 260: http://www.gpugrid.net/result.php?resultid=663665
	ID: 9725 \| Rating: 0 \| rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 370,320,941 RAC: 0 Level Scientific publications	Message 9727 - Posted: 13 May 2009 \| 21:16:45 UTC - in response to Message 9725.
	That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260. You're right. Not my machine and I didn't see the 2 cards. But OK here's an example from a machine with only a GTX 260: http://www.gpugrid.net/result.php?resultid=663665 :-) That one reports as "Aborted by user". So I don't think it errored out under normal circumstances -- it's was manually aborted.
	ID: 9727 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9728 - Posted: 13 May 2009 \| 21:36:49 UTC - in response to Message 9715.
	CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. gdf Are you saying that without 185 version drivers we will not be able to successfully do GPU Grid work. I have card/box combinations that will not accept that version and run properly. If 185 version driver and above is "required" to crunch here, I will be taking my farm to FAH. Is this query unworthy of an answer? ____________ mike
	ID: 9728 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9730 - Posted: 13 May 2009 \| 21:56:37 UTC - in response to Message 9714.
	CUDA 2.2 libs will be distributed with the application, but you will need to upgrade the driver to the latest 185 version. gdf Thanks. I'd suggest a note in the news section on the home page. That way people can start organising things. I have already set GPUgrid to "no new work" so I can finish off what I have before doing the driver upgrades. I've got a few machines to do :) ____________ BOINC blog
	ID: 9730 \| Rating: 0 \| rate: / Reply Quote

Aardvark Send message Joined: 27 Nov 08 Posts: 28 Credit: 82,362,324 RAC: 0 Level Scientific publications	Message 9731 - Posted: 13 May 2009 \| 22:07:07 UTC
	Task ID 665546 had been running well along with another task. As I was about to run a program that would "use" the GPU I decided to suspend all tasks and exit Boinc. Once I had completed my task I launched Boinc, all tasks appeared still suspended. So far so good.I then resumed all tasks, and task 665546 immediately went to "compute error". I also had another task 652947 that had been running for 29 out of about 30 hours and failed (different machine). When I get the time I will compile a list of the failures and successes over the past few days.
	ID: 9731 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9734 - Posted: 13 May 2009 \| 22:38:33 UTC
	which card/machine combinations are not possible to use the 185.85 version may i ask mike047 ?
	ID: 9734 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9736 - Posted: 13 May 2009 \| 23:19:18 UTC - in response to Message 9727. Last modified: 13 May 2009 \| 23:20:16 UTC
	That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260. You're right. Not my machine and I didn't see the 2 cards. But OK here's an example from a machine with only a GTX 260: http://www.gpugrid.net/result.php?resultid=663665 :-) That one reports as "Aborted by user". So I don't think it errored out under normal circumstances -- it's was manually aborted. The user is one of my team members and he reported it as being stuck. It had processed for over twice as long as his other WUs and showed no progress. He was using BOINC client v6.6.28, not v6.6.20 so that wasn't the problem. :-)
	ID: 9736 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9737 - Posted: 13 May 2009 \| 23:49:00 UTC - in response to Message 9727.
	That same GTX 260 has only 3 recent failed WUs, all of them KASHIF_HIVPR. It looks like you have a GTX 260 and an 8800GT. All three tasks failed while running on the 8800GT (device 1), not on the GTX 260. You're right. Not my machine and I didn't see the 2 cards. But OK here's an example from a machine with only a GTX 260: http://www.gpugrid.net/result.php?resultid=663665 :-) That one reports as "Aborted by user". So I don't think it errored out under normal circumstances -- it's was manually aborted. Here's a bunch more for your viewing pleasure: http://www.gpugrid.net/result.php?resultid=659111 http://www.gpugrid.net/result.php?resultid=664645 http://www.gpugrid.net/result.php?resultid=666952 http://www.gpugrid.net/result.php?resultid=647270 http://www.gpugrid.net/result.php?resultid=660927 http://www.gpugrid.net/result.php?resultid=666863 Certainly not as common as with the slower cards, but not at all hard to find. The last 2 are test WUs...
	ID: 9737 \| Rating: 0 \| rate: / Reply Quote

K1atOdessa Send message Joined: 25 Feb 08 Posts: 249 Credit: 370,320,941 RAC: 0 Level Scientific publications	Message 9738 - Posted: 14 May 2009 \| 0:00:01 UTC - in response to Message 9737.
	@Beyond - I didn't doubt you. :-) @GDF/Admin: Given these KASHIF_HIVPR seems to error out a lot, especially with "older, slower" cards, but also with new 200-series occasionally as well (as shown by Beyond), are no new ones going to be created? I can understand cleaning out the queue, but I have gotten several today and with my cards I almost certainly expect them to error out. If I catch them in my queue, I try to abort them so they can move to a 200-series with a better change of finishing in a timely manner. Is there any analysis from the project on why these particular WU's are an issue? I've read comments about the drivers possibly being an issue, but given the 2.2 CUDA software on the server will require these 185.xx drivers I expect to continue having issues with these WU's if they are still in queue. All others work fine.
	ID: 9738 \| Rating: 0 \| rate: / Reply Quote

dataman Send message Joined: 18 Sep 08 Posts: 36 Credit: 100,352,867 RAC: 0 Level Scientific publications	Message 9739 - Posted: 14 May 2009 \| 2:10:31 UTC
	As GPUGrid clearly does not want to put in much effort to support 8 and 9 series cards, I'm done here for now. I'd rather shut them down than to waste time and electricity in an endless circle jerk of BOINC versions and drivers. But hey, 3.7 million credits was a good run for me here. There will be a new GPU project out soon. Sad really, as I think some of the science was worth doing here. :) Ciao. ____________
	ID: 9739 \| Rating: 0 \| rate: / Reply Quote

mike047 Send message Joined: 21 Dec 08 Posts: 47 Credit: 7,330,049 RAC: 0 Level Scientific publications	Message 9742 - Posted: 14 May 2009 \| 6:51:56 UTC - in response to Message 9734.
	which card/machine combinations are not possible to use the 185.85 version may i ask mike047 ? I don't have that information at hand presently. Basically I use Ubuntu 8.04lts. The 260 and 250 cards have no trouble using 180.22 and might be able to use a higher driver without issue. Some of my 8800/9600gso/9800 cards will not accept any driver above 177.82. All mother boards are Gigabyte P35/45. I don't know what the issues are with this project and I am willing "to do" a little work to be able to run this project. BUT, I am unwilling to babysit and periodically change drivers to suit a project that is becoming unwilling to respond to my queries and the queries of others. Unfortunately I have invested in many Nvidia cards that at the present cannot be used else where in Boinc. FAH is the only other place that can use my cards. I have one box working there now and it has run absolutely trouble free with NO intervention on my part. The + to FAH is that my internet is not shut down when it has to upload, the 50+m uploads from here shut my internet down...I know that is not a project fault but it is an issue for me. This is a good project with good science but it has gotten away from communicating with the participants in a timely manners. IMHO the project has slipped badly from where it was several months ago. ____________ mike
	ID: 9742 \| Rating: 0 \| rate: / Reply Quote

JockMacMad TSBT Send message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level Scientific publications	Message 9743 - Posted: 14 May 2009 \| 8:15:20 UTC - in response to Message 9742. Last modified: 14 May 2009 \| 8:20:00 UTC
	I can confirm my BFG GTX-260 192 Shader card is also getting alot of these errors with 185.81. One example ____________
	ID: 9743 \| Rating: 0 \| rate: / Reply Quote

JockMacMad TSBT Send message Joined: 26 Jan 09 Posts: 31 Credit: 3,877,912 RAC: 0 Level Scientific publications	Message 9751 - Posted: 14 May 2009 \| 13:16:48 UTC - in response to Message 9743.
	Oh and SETI has nVidia support so there is another BOINC project. ____________
	ID: 9751 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9752 - Posted: 14 May 2009 \| 14:10:05 UTC - in response to Message 9751. Last modified: 14 May 2009 \| 14:18:20 UTC
	We have tested with drivers 185.xx on a 8800GT. All the WUs fail. With driver 180.xx all WU are fine. So, we can just suggest to downgrade to older drivers (180.xx) seem to work. We have reported the issue to Nvidia. gdf
	ID: 9752 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9761 - Posted: 14 May 2009 \| 16:06:59 UTC - in response to Message 9736. Last modified: 14 May 2009 \| 16:13:19 UTC
	The user is one of my team members and he reported it as being stuck. It had processed for over twice as long as his other WUs and showed no progress. He was using BOINC client v6.6.28, not v6.6.20 so that wasn't the problem. :-) Yes, and maybe no ... 6.6.20 stunk in this regard... it really sucked swamp water ... 6.6.23 and later, I for one thought, fixed it ... now I am not so sure. What I *THINK* happened is that most of the causes have been cleaned up ... but sometimes something bad happens. And THEN, you get a task that runs long. There are still issues with the way that the resource scheduling is done. I am banging my head on the wall about things that I think I can clearly demonstrate to be patted on the head and told to go 'way you bother me ... I mean, just last night I had five tasks all started and die in less than a second. At the moment the answer is that this is not possible. My 2,200+ log file of those two seconds notwithstanding ... Anyway, ... I am far less sanguine about how "fixed" we are ... {edit} An example: 12-TONI_HIVPR_mon_ba20-7-100-RND1398_0 and that was run on a 6.6.25 client ... 182.50 drivers I think at the time. 115 ms step size ...
	ID: 9761 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9830 - Posted: 16 May 2009 \| 9:38:33 UTC - in response to Message 9761.
	We have managed to replicate the problem on one of our machines. This should lead to a solution soon. Be patient. gdf
	ID: 9830 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9836 - Posted: 16 May 2009 \| 10:42:06 UTC - in response to Message 9830.
	We have managed to replicate the problem on one of our machines. This should lead to a solution soon. Be patient. Oh, now we have to be patient too???? :) Its good news GDF ... thanks for the note.
	ID: 9836 \| Rating: 0 \| rate: / Reply Quote

Toby Broom Send message Joined: 11 Dec 08 Posts: 25 Credit: 360,187,443 RAC: 183,715 Level Scientific publications	Message 9843 - Posted: 16 May 2009 \| 11:54:10 UTC
	I worked out the numbers on my computers, they all run 182.50 drivers. ID: 30829 (8800GT 256Mb) - 11% failure rate ID: 33373 (9800GX2 512Mb) - 46% failure rate ID: 26481 (9800GX2 & GTX260) - 29% failure rate ID: 34636 (9800GX2 & 8800GT) - 18% failure rate It seems strange that the 8800GT is the most reliable card give the issues. 26481, did have an issue that I know was my fault, so that's a little higher than expected.
	ID: 9843 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9844 - Posted: 16 May 2009 \| 11:55:45 UTC Last modified: 16 May 2009 \| 11:57:23 UTC
	Hmm i am not convinced its just the drivers i started under win xp with 182.50 driver and boinc 6.6.28 but again i see the ibuch unit hang on 64.688% for more then an hour after 13 hours of calculation. So i start to believe this one is going to crash as well
	ID: 9844 \| Rating: 0 \| rate: / Reply Quote

Matteo Send message Joined: 30 Mar 09 Posts: 1 Credit: 176,953 RAC: 0 Level Scientific publications	Message 9849 - Posted: 16 May 2009 \| 12:25:17 UTC
	My card is an 9800GTX whith 185.82 driver and Boinc 6.6.20. I don't want to downgrade drivers, so, in the mean time, i suspended any WU's for GPUGRID. I hope to see good news asap. Sorry for my bad english... Greetings, Matteo
	ID: 9849 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9855 - Posted: 16 May 2009 \| 13:18:55 UTC - in response to Message 9844.
	Hmm i am not convinced its just the drivers GDF said the problems appear with 185.xx and don't show up with some 180.xx, which apparently noone else is still using. This does not mean that 182.xx is fine and I think the usual "KASHIF_HIVPR" and "IBUCH_KID" problems definitely affect 182.50. It seems to be a problem with the driver, triggered by some new WUs. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9855 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9858 - Posted: 16 May 2009 \| 14:11:17 UTC - in response to Message 9855.
	Hmm i am not convinced its just the drivers GDF said the problems appear with 185.xx and don't show up with some 180.xx, which apparently noone else is still using. This does not mean that 182.xx is fine and I think the usual "KASHIF_HIVPR" and "IBUCH_KID" problems definitely affect 182.50. It seems to be a problem with the driver, triggered by some new WUs. Well, I have some of these named tasks running on my 9800GT and the GTX295s ... but they don't seem to want to run on the new GTX260 or my GTX280 ... As far as I know, at the moment I am running 182.50 everywhere ... I suppose I could roll back to the 180.xx to see if I can get a task and if it dies ... heck, nothing else seems to be bothering this problem.
	ID: 9858 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9860 - Posted: 16 May 2009 \| 14:20:20 UTC - in response to Message 9858.
	Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9860 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9861 - Posted: 16 May 2009 \| 14:32:34 UTC - in response to Message 9860.
	Sorry, not very specific post. Not all WUs with those names are affected, e.g. see here. "KASHIF_HIVPR_mon" and "KASHIF_HIVPR_dim" have been fine for me. MrS Well, I just rolled the driver back to 180.4 and still got an invalid function. THe tasks die immediately. gettingevery depressed ... can't tell if it is my new systems or bad tasks ...
	ID: 9861 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9863 - Posted: 16 May 2009 \| 14:58:51 UTC Last modified: 16 May 2009 \| 15:01:12 UTC
	This tasks: p1480000-RAUL_pYEpYI1605-0-10-RND5295_0 started up and I have 5:10 or so on the clock ... so, unlike all the rest, finally got one running. It is running on the new MB, but the old GPU. SO, this batch of tasks is so bad that most of them won't run on anything ... though my GTX 295s seem to be rolling on ... {edit} I was wrong ... it is on one of the new GTX 260 cards ...
	ID: 9863 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9867 - Posted: 16 May 2009 \| 17:29:21 UTC - in response to Message 9863. Last modified: 16 May 2009 \| 17:30:11 UTC
	GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff. http://www.gpugrid.net/result.php?resultid=677172 Regards Zy
	ID: 9867 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9878 - Posted: 16 May 2009 \| 21:07:46 UTC
	Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9878 \| Rating: 0 \| rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 9883 - Posted: 16 May 2009 \| 21:37:49 UTC - in response to Message 9867. Last modified: 16 May 2009 \| 21:39:14 UTC
	GIANNI_FB's have come in for some flak lately , so thought I would post a successful one as comparator. The stop/starts in there were me, due to non-BOINC related stuff. http://www.gpugrid.net/result.php?resultid=677172 Regards Zy And here's a 205-GIANNI_FB that failed on the same machine after running a LONG time: http://www.gpugrid.net/result.php?resultid=677771
	ID: 9883 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9891 - Posted: 17 May 2009 \| 0:52:05 UTC - in response to Message 9878.
	Glad your new rig made it through one WU successfully! The oters don't look too well, though. They error on most other hosts as well, but 3 have been finished by other GT200 cards. One of them uses 185.85, but I can't see the others. MrS I think I had TWO problems, one was OC got turned on by mistake and the automatic mode OC probably tried to do too much. What it broke is not entirely clear to me. It may also have been the BIOS ... I flashed that with the latest and turned off the OC mode at the same time so it is hard to know which it was. The second problem was of course the bad tasks which would have failed with the other error messages if I had not had problem one on both rigs. Now I am running into power limits (again) ... I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply...
	ID: 9891 \| Rating: 0 \| rate: / Reply Quote

[AF>Amis des Lapins]Gillo... Send message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level Scientific publications	Message 9896 - Posted: 17 May 2009 \| 2:34:55 UTC
	The similar for me, http://www.gpugrid.net/result.php?resultid=678214 http://www.gpugrid.net/result.php?resultid=664263
	ID: 9896 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9897 - Posted: 17 May 2009 \| 2:44:10 UTC - in response to Message 9896.
	The similar for me, http://www.gpugrid.net/result.php?resultid=678214 http://www.gpugrid.net/result.php?resultid=664263 I don't understand ... you don't like valid tasks?
	ID: 9897 \| Rating: 0 \| rate: / Reply Quote

[AF>Amis des Lapins]Gillo... Send message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level Scientific publications	Message 9898 - Posted: 17 May 2009 \| 3:06:09 UTC - in response to Message 9897.
	Hello, oops http://www.gpugrid.net/workunit.php?wuid=466073 http://www.gpugrid.net/workunit.php?wuid=458046 give me 5500 points for 17/24 hours of crunch (260GTX 216 SPU O/C stable )
	ID: 9898 \| Rating: 0 \| rate: / Reply Quote

Mark Henderson Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level Scientific publications	Message 9899 - Posted: 17 May 2009 \| 4:24:17 UTC Last modified: 17 May 2009 \| 5:03:20 UTC
	I maybe lucky but I am having very few problems. 185.85 drivers, XP64, 2 EVGA 260s, Boinc 6.6.28 I had 1 compute error yesterday but that was my fault for suspending right as it started and unsuspending a couple of seconds later, and a couple of others that everyone else in the quorum errored out on. I have heard of hanging WUs but have never had one of those either. but 99 percent of the time it runs great. I always take great care to run driver sweeper in safe mode after uninstalling Nvidia drivers before updating. I do not know if this matters that much though. I also never let the gpu temps get over 65c with moderate OC. Also 4 cpu units of either seti astropulse, einstein or abc running along side at same time always. I had a 9800gt in this computer for about a month that ran good as well. Replaced it with a 260 this week.
	ID: 9899 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9900 - Posted: 17 May 2009 \| 9:59:36 UTC Last modified: 17 May 2009 \| 10:03:20 UTC
	Now i have been able to save a few hanging units It seems to work for me first make sure to disable keep units in memory under options. I pause all other units available then i pause the unit which does not move in progress then push it to continue, i know it costs alot of time because it jumps back to some point in time. Untill now i had 4 units which kept at a certain % and did not move in more then half an hour so i started messing with them. When i woke up this morning i saw a 92-kashif_hivpr_dim unit reporting to have done 0.700 % in 7 hours so i paused it, ofcourse it jumped back to 0.426 % when it started over but now did in half an hour 1.5 %. So the reason seems to be the units get stuck in the calculations and finally error out if this takes too long. But i can tell you its a pain in the ass problem when they hang you hardly notice, we don't have time to watch them all day if the units progress or not.
	ID: 9900 \| Rating: 0 \| rate: / Reply Quote

[boinc.at] Nowi Send message Joined: 4 Sep 08 Posts: 44 Credit: 3,685,033 RAC: 0 Level Scientific publications	Message 9901 - Posted: 17 May 2009 \| 10:27:29 UTC
	Now I have the fourth error WU in a row. :-((( http://www.gpugrid.net/result.php?resultid=678849 http://www.gpugrid.net/result.php?resultid=679319 http://www.gpugrid.net/result.php?resultid=680211 http://www.gpugrid.net/result.php?resultid=680860 It wastes a lot of GPU-time for scientific knowledge! It costs a lot of credits... It costs a lot of fun... Is GPUGRID going to be used only with newer cards? Attention! Sarcasm! Is there a hidden deal with NVIDIA to push cards with G200...? My System Q9550 @ 3.4 8800 GT @ stock 4 GB Windows 7 RC 64 Bit 185.85
	ID: 9901 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9903 - Posted: 17 May 2009 \| 12:22:29 UTC - in response to Message 9901.
	Nowi, take a look further up in this thread. [AF>EDLS>BIOMED], take a look here. I edited the title to make it more clear that this problem also affects previous versions. Mark, I also get few errors, but if I look at my tasks I see that these are "friendly" WUs, almost none of the trouble makers. This makes it harder to blame it on config differences.. uBronan, I think what you're doing is in the end similar to a BOINC restart. It's good to know that this helps, but still it's irritating that it seems to happen so often. Which BOINC version do you run? The thing is, i'm running 6.5.0, 185.66 and Vista 64 and from looking at my results I think I did not have a single hanging WU. Every day 2 succesful returns, except when errors occured or with the one "kashif_hivpr_dim" that I had. It registered a runtime of 89839s = 24:57h and gave 10096 credits. The interval between the previous result and this one is 24:55h, so I don't think it was hanging at all. Of course, just because I ran one of them alright does not mean the problem doesn't exist. I just can't see the pattern.. is it the 6.6.x clients? It's not all of the WUs, it's not all of the 185 drivers, it's not all of the G9x GPUs. What's left? Paul, I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... Do you think that's a good idea? I don't know your 230V, but at 115V the power supplies loose efficiency compared to 230V. 30A @ 115V is 3.5kW, quite massive :D I know we can draw at least 2kW over the regular 230V, whereas I heard the US net may deliver something around 1.5kW at 110V. Our 3 phase plugs are 380V and I think you can get 5 - 6 kW from them.. but you're not talking about these, right? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9903 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9906 - Posted: 17 May 2009 \| 12:48:21 UTC
	Sadly i was not paying attention so the last one did error out again,but to be honest i was expecting it to fail also since i had to restart it 3 time in a row to start seeing progress. I am on Win XP pro with 182.50 driver and boinc 6.6.28 , for me there was however indeed some gain with the 185.85 but i just wanted to make sure the drivers aren't the issue. The newer driver gave a little faster finishing time the old was 20 - 27 hours and the 85 between 19 - 23 hours. I have been trying to test the older 180.XX driver, But it made my system unstable for some reason so i cleared out all nvidia stuff and reinstalled 182.50 whql version. I am now going to change back the boinc to 6.5.0
	ID: 9906 \| Rating: 0 \| rate: / Reply Quote

[AF>Amis des Lapins]Gillo... Send message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level Scientific publications	Message 9908 - Posted: 17 May 2009 \| 13:24:06 UTC
	Thank you for link. I'opened the Web page of my pc , as for GPU 260GTX of this pc's I am with boinc 6.6.20 who satisfied me and Nvidia 182.08 on Win Xp pro64. http://www.gpugrid.net/hosts_user.php?userid=1695 On the contrary for points over 24h00:10.000 points on GPU 260 O/C:( all GPU 280/285GTX @+
	ID: 9908 \| Rating: 0 \| rate: / Reply Quote

[AF>Amis des Lapins]Gillo... Send message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level Scientific publications	Message 9909 - Posted: 17 May 2009 \| 13:29:32 UTC
	Drivers Nvidia 1XX.XX http://www.nvidia.fr/Download/Find.aspx?lang=fr
	ID: 9909 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9910 - Posted: 17 May 2009 \| 15:11:06 UTC - in response to Message 9908.
	I am with boinc 6.6.20 who satisfied me Except for the fact that some of your tasks take longer than they should? MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9910 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9912 - Posted: 17 May 2009 \| 15:27:21 UTC - in response to Message 9903. Last modified: 17 May 2009 \| 15:31:13 UTC
	I really gotta call that electrician to change my old 3 phase 230 V UPS socket into a 30A 115 supply... Do you think that's a good idea? I don't know your 230V, but at 115V the power supplies loose efficiency compared to 230V. 30A @ 115V is 3.5kW, quite massive :D I know we can draw at least 2kW over the regular 230V, whereas I heard the US net may deliver something around 1.5kW at 110V. Our 3 phase plugs are 380V and I think you can get 5 - 6 kW from them.. but you're not talking about these, right? Yes it does, the problem is that to get a 230V UPS is about twice as expensive as a normal one ... the lat time I looked to get one about the size I would need would be about 3K ... The problem is that I can tell that I am pulling way high on the circuits in use ... if I change to another dedicated line, well, then I can leave some on the current room sockets and the rest on the dedicated line. The only point of the exercise is to get more power to the room ... I think adding new GPUs is pushing me up to the line again ... at least I got rid of the power hungry systems that were slower than dirt. In a month or so I will likely get an upgrade card to replace the 9800GT though I will likely keep it in the closet for that time when I upgrade to wider MB and might need a slot filler ...
	ID: 9912 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9915 - Posted: 17 May 2009 \| 17:05:16 UTC - in response to Message 9912.
	OK, except cost there's nothing to argue against a dedicated line :) MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9915 \| Rating: 0 \| rate: / Reply Quote

[AF>Amis des Lapins]Gillo... Send message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level Scientific publications	Message 9918 - Posted: 17 May 2009 \| 18:20:11 UTC - in response to Message 9910. Last modified: 17 May 2009 \| 18:57:45 UTC
	Yes really 84000s instead of 42000s for 14-KASHIF_HIVPR_dim_ba3-8-100-RND7871_1 http://www.gpugrid.net/result.php?Resultid=680472
	ID: 9918 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 9921 - Posted: 17 May 2009 \| 19:29:15 UTC - in response to Message 9918.
	OK, to put it more clear: you don't like the long runtime, but you say 6.6.18/20 satisfied you. The post I linked to says that the long runtime is caused by an error in 6.6.20 and some previous clients. So something doesn't add up and you may want to up-/ or downgrade ;) MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 9921 \| Rating: 0 \| rate: / Reply Quote

[AF>Amis des Lapins]Gillo... Send message Joined: 21 Mar 08 Posts: 7 Credit: 24,394,688 RAC: 0 Level Scientific publications	Message 9928 - Posted: 17 May 2009 \| 20:45:20 UTC Last modified: 17 May 2009 \| 20:50:33 UTC
	I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite.
	ID: 9928 \| Rating: 0 \| rate: / Reply Quote

Aardvark Send message Joined: 27 Nov 08 Posts: 28 Credit: 82,362,324 RAC: 0 Level Scientific publications	Message 9932 - Posted: 17 May 2009 \| 22:47:37 UTC - in response to Message 9921.
	I rolled back my drivers from 185.85 to 182.50. With windows Vista 64 bit, Boinc client 6.6.28. Since which I have returned three successfull results, one of which had run for 30 hours on one core of my 9800 GX2 and gave me just over 10,000 credits :-) So at present this role back on the driver is working for me (touch wood). I also rolled back the driver on my other machine from 185.85 to 180.48.With windows Vista 32 bit, Boinc client 6.6.20 (Yes, I know :-) ). This has so far returned one result, plus another well on its way. I realise that neither of these is a large sample. But looks promising given the quantity of failures I had seen just prior to changeing drivers. I will now leave alone for a few days and see how things turn out.
	ID: 9932 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9933 - Posted: 18 May 2009 \| 1:03:27 UTC
	I am finding it hard to tell what is going on... I seem to be getting tasks our of order so that they don't sort well on the results pages. As I watch the computers they seem to be returning mostly good results ... with occasional errors. Well, I guess I will have to wait till Monday when the staff comes back in and fixes the universe ... :)
	ID: 9933 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9937 - Posted: 18 May 2009 \| 8:46:07 UTC
	I still cannot make heads nor tails of the pattern of errors. One of the problems of course is the difficulty of gathering data about the failures. Some of the older tasks that failed on one of my systems passes on another system that is very much alike. I thought I was onto something about memory size where some of my cards have that 895 instead of 1G and the tasks passed on the 1G cards. Alas, I quickly found another case where it failed on mine and passed on someone else's card and they too had only 895 M VRAM. Driver versions 182.50 on my systems failed, but the systems where the task passed also were running the same version. The tasks are of all name classes... Even my i7 with the pair of GTX295 cards finally had [url-http://www.gpugrid.net/result.php?resultid=685755]one fail[/url], the message is singularly unhelpful.
	ID: 9937 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9938 - Posted: 18 May 2009 \| 8:55:14 UTC
	I upgraded just the drivers on all my machines to 185.85. I had a couple of machines start getting errors. Interestingly Seti doesn't get errors with their app. However as i'm using an app_info for them I dropped in the latest DLL's. It may just be their app is more compatible or maybe the combination of current driver with cuda 2.2 DLL's that make it work. Has anyone tried updating the DLL's and see if that cures the problem? The only way I could see to do this is to setup an app_info so that you don't get issues with the file signatures. I'll downgrade Maul (it has 2 x GTX260's) to 182.50 once its knocked over its current cuda work. At least it can get back to being productive while this issue gets worked out. My other machines can concentrate on Seti for a while. ____________ BOINC blog
	ID: 9938 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9942 - Posted: 18 May 2009 \| 13:19:05 UTC - in response to Message 9928. Last modified: 18 May 2009 \| 13:27:35 UTC
	I am crossed has 6.6.28 boinc It is possible that Seti beta is responsble of this probleme. Thanks for your help, I keep posted PS3GRID about suite. Believe me you don't want to run seti beta together with other projects. Seti itself has been crashing my gpugrid units also but sometimes runned without problems seti seems only to use cuda 1.0 instructions with no optimisations if you don't use the optimized ones. The optimized kwsn application has caused me failures on gpugrid as well. But thats probably because seti was being running together in the same time as gpugrid while i have only 1 cuda device. I advice you not to use seti and gpugrid at the same time it has been known to me to crash many units. Although sometimes it looks like nothing is wrong i found some units keep the memory locked so when some units are finished the ram is not released properly causing other projects (gpugrid) to error out. Another one which is gonna give you problems together with gpugrid can be CPDN which has units which eat up to at least 1,5 GB memory, so that meant for me 4 units with 1,5 GB minimal gave me a load of 7,2 GB ram memory being used :D Now believe me that makes trouble, if i had booted under win 64 i prolly could run them since i have 8 gb memory. But since i run 32 bits windows it only uses 3,2 GB. Have anyone tried to use updated dll's Believe me i tried all combinations of drivers, boinc and cuda versions. Everytime same result in the end some units simply crash, even when babysitting them they seem to know when i am busy doing other tasks and crash ;) So it looks to me that if a unit gets locked it will die if you are not in time to pause and restart the unit to work. I mean by that: The unit is locked at x,xxx % for a at least an hour if it does move the % you can try the pause/restart trick but some units will still crash no matter what i do. Now make sure not to restart it too quick after is started again because that will surely crash it also !!
	ID: 9942 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9958 - Posted: 18 May 2009 \| 22:37:42 UTC - in response to Message 9942.
	we are running this set of workunits called x-GIANNI_newFB-... If they go on ok, then we have isolated the problem with G90 chips. It is not solved yet but still at least we would know where to look. gdf
	ID: 9958 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9961 - Posted: 19 May 2009 \| 9:19:15 UTC - in response to Message 9942.
	The CPDN memory limit of 1.5Gb is set that way to allow for four running on a quad, and enough left over for op sys etc within the quoted figure of 1.5Gb. Each of the larger CPDN WUs takes up 210-220Mb in memory, therefore four of them will eat around 850Mb, with a comfortable margin for opsys etc, within the stated 1.5Gb. Its not 1.5Gb each WU, that figure they state as advisory, is total memory on the PC, not per WU. Most CPDN models are much smaller - either side of 100Mb - albeit on the larger size than most BOINC WUs. I have happily run four of the biggest ones on my quad without issues and GPUGRID on the 9800GTX+. Usually I have two of the bigger CPDN ones running with two SETI Astropulse on the quad and GPUGRID on the 9800GTX+, they run fine with no issues. Regards Zy
	ID: 9961 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9964 - Posted: 19 May 2009 \| 11:24:57 UTC - in response to Message 9961.
	We should be able to test a fix by tomorrow. It's a test, as the problem is not completely understood. gdf
	ID: 9964 \| Rating: 0 \| rate: / Reply Quote

uBronan Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level Scientific publications	Message 9967 - Posted: 19 May 2009 \| 13:20:26 UTC Last modified: 19 May 2009 \| 13:24:15 UTC
	Zydor did you actually select the big units on your account page since by default they are not loaded, you really should read what it is stated there. Those units are minimal 1,5 Gb of memory nothing else the warning is clear >.< Sadly i have forgotten to take a screenshot when i was running 4 of those biggest units at the same time. Ofcourse i think the change that you get 1 or even more then 2 of those big ones is very small. I have not seen recently any of those big units, so it could have been a freak moment that i received 4 of those big units at once. It also can be that these huge units are only send to x64 machines i have not been following up any news about them. The only thing i can say with you running seti and gpugrid together that in my case it ended up several times with crashing my pc or the unit, but again i was having more problems with the seti beta then with the normal seti. If it does not happen to you does not mean other people can be so lucky that all goes well.
	ID: 9967 \| Rating: 0 \| rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 9969 - Posted: 19 May 2009 \| 15:10:24 UTC
	I just had a case where a suspended CPDN task caused two GPU Grid tasks to go into waiting for memory state. I had to stop BOINC and restart it so that the CPDN task (only 300K) would be swapped out ... As usual i reported it so that there is another bug for UCB to ignore ... :)
	ID: 9969 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 9981 - Posted: 19 May 2009 \| 21:32:48 UTC - in response to Message 9967. Last modified: 19 May 2009 \| 21:36:08 UTC
	Task Manager reported the larger ones taking up 220Mb min and did go to 400Mb at times, and four did run fine. You can get four by setting preferences for only those units. I thought the same as you re the 1.5Gb, but also thought it strange they would produce one that size even these days, it would cripple many PCs, not a good thing for general release. I therefore checked it out with CPDN, the response was they take up at the most 500Mb, and four of them would fit on a PC with 3Gb with no issues. When I ran the four, Task Manager reported either side of 220-400Mb in use, may well have gone to 500Mb when I was not watching, didnt log it. The post and respones is : http://climateprediction.net/board/viewtopic.php?f=21&t=8675 I had no doubts you had issues running the two. Its also the case that often responses on success of combinations can produce as much info to help debug as failures, as a comparitor can help isolate an issue. I've often cursed when something hasnt worked, then scratched my head when I discovered others were having some success - helped me. Regards Zy
	ID: 9981 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 9991 - Posted: 20 May 2009 \| 8:16:08 UTC - in response to Message 9981.
	So, it seems that there is a bug in the compiler/hardware which appears only on pre G200 cards. We found a way to avoid it for now, but it limits what we can do, so it is not a solution. gdf
	ID: 9991 \| Rating: 0 \| rate: / Reply Quote

jrobbio Send message Joined: 13 Mar 09 Posts: 59 Credit: 324,366 RAC: 0 Level Scientific publications	Message 9992 - Posted: 20 May 2009 \| 8:46:27 UTC - in response to Message 9991.
	So, it seems that there is a bug in the compiler/hardware which appears only on pre G200 cards. We found a way to avoid it for now, but it limits what we can do, so it is not a solution. gdf Well at least it isn't a mystery any more. When you are on the bleeding edge, one should expect some cuts. Hope it gets resolved in the not too distant future. Rob
	ID: 9992 \| Rating: 0 \| rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 9995 - Posted: 20 May 2009 \| 11:27:05 UTC - in response to Message 9991.
	So, it seems that there is a bug in the compiler/hardware which appears only on pre G200 cards. We found a way to avoid it for now, but it limits what we can do, so it is not a solution. gdf How come the G200 based cards also get failures? Will there be an updated app for the non-G200 machines, or perhaps all machines? Will this be a cuda 2.2 app or stick with the old version for the time being? Can we use the 185.85 drivers now or with the new app (assuming there will be one)? ____________ BOINC blog
	ID: 9995 \| Rating: 0 \| rate: / Reply Quote

KWSN-Sir Papa Smurph Send message Joined: 17 Mar 09 Posts: 5 Credit: 7,136,253 RAC: 0 Level Scientific publications	Message 9998 - Posted: 20 May 2009 \| 12:00:52 UTC
	I am running an 8800GT and 3 9800Gtx+ cards. I have had zero complete Kashif Wu's . Could you make a way for me to "opt out" of those type of units? I still get occasional errors with other units but the Kashif are 100% failure rates for me. (3 different machines) I am really unable to babysit my machines as I am away from home for days at at time.....
	ID: 9998 \| Rating: 0 \| rate: / Reply Quote

Scott Brown Send message Joined: 21 Oct 08 Posts: 144 Credit: 2,973,555 RAC: 0 Level Scientific publications	Message 10000 - Posted: 20 May 2009 \| 12:48:35 UTC - in response to Message 9958.
	we are running this set of workunits called x-GIANNI_newFB-... If they go on ok, then we have isolated the problem with G90 chips. It is not solved yet but still at least we would know where to look. gdf These hang on my Pent D 830 with a 9600GSO. See here for a hung result that was aborted after more than 24 hours of no progress (hung at about 21%).
	ID: 10000 \| Rating: 0 \| rate: / Reply Quote

rbpeake Send message Joined: 30 Jul 08 Posts: 17 Credit: 80,343,188 RAC: 0 Level Scientific publications	Message 10002 - Posted: 20 May 2009 \| 14:40:55 UTC - in response to Message 9998.
	Just as a data point of reference, I have had 100% success on all work units using a GTX 260 Core 216 card, running CUDA 2.2 and 185.85 driver, even on work units that have had failures previously.
	ID: 10002 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 10004 - Posted: 20 May 2009 \| 15:16:35 UTC - in response to Message 9991.
	There is another way ..... Tap your well heeled benefactor you have tucked away for a mere $400,000 worth of vouchers to upgrade Crunchers pre-200's to 300GTXs - a snip at the price.... And ..... added Value!! ..... You'd also solve chruncher recruitment for while, they'd queue round to the next street, let alone next Block, for that one ..... sigh ........ aways nice to dream once in a while :) Regards Zy
	ID: 10004 \| Rating: 0 \| rate: / Reply Quote

rbpeake Send message Joined: 30 Jul 08 Posts: 17 Credit: 80,343,188 RAC: 0 Level Scientific publications	Message 10005 - Posted: 20 May 2009 \| 15:24:44 UTC - in response to Message 10004.
	There is another way ..... Tap your well heeled benefactor.... Regards Zy Believe me, I am tapping my heels that my card continues to function as well as it has! I just bought it, so it would be a big disappointment if there were issues so soon....but the issue fix would appear to be possible without a card upgrade, hopefully....(although NVIDIA I would guess is tapping its heels that many will upgrade...ouch! ;)
	ID: 10005 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 10020 - Posted: 21 May 2009 \| 10:09:42 UTC - in response to Message 9995.
	How come the G200 based cards also get failures? Let's see what the fix can do and who still gets failures afterwards. Mind you, there's also the "regular failure rate", some kind of "noise floor" which affects all cards. Will there be an updated app for the non-G200 machines, or perhaps all machines? Will this be a cuda 2.2 app or stick with the old version for the time being? Can we use the 185.85 drivers now or with the new app (assuming there will be one)? Not speaking officially, but I wouldn't rush to introduce another variable in the current situation. Wait until the dust settles and we're confident that the problems have been solved. 185.66 has been running fine for me with non-troublemaker WUs, so I'll keep using it until I see problems. I do have a WU issued today and it appears to use client 6.64, so it may look like no new app for now. But this could be tied to an old type of WU as well. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 10020 \| Rating: 0 \| rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 10071 - Posted: 22 May 2009 \| 21:31:39 UTC - in response to Message 10020.
	Just had a Kashif go bang ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) http://www.gpugrid.net/result.php?resultid=699822 I have had a problem on that PC re Office, and got it back online 5 hours ago. However I dont think it was that issue, I think it looks like the old WU problem surfacing - maybe in one of the older WUs still in the system?? Regards Zy
	ID: 10071 \| Rating: 0 \| rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 10088 - Posted: 23 May 2009 \| 14:38:09 UTC - in response to Message 10071.
	Looks like the old problem and the WU was created past 20 May 16:44 CEST, when the fix was applied. I think it would be better to post such observations in the new thread, so they don't get lost. MrS ____________ Scanning for our furry friends since Jan 2002
	ID: 10088 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : Recent problems for WUs on older GPUs

	About	Science	Volunteers	Performance	Forum	Join us	Donate