Advanced search

Message boards : Number crunching : Error while computing

Author Message
barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17639 - Posted: 16 Jun 2010 | 15:59:50 UTC

Almost all my GPUGRID wu's fail after 5 seconds "Computation Error"

Boinc 6.10.56
wxWidgets 2.8.10
Nvidia GTX275 driver 8.17.11.9745

Some wu's still computing correctly.

What can this be ? I did not recently update

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17643 - Posted: 16 Jun 2010 | 22:09:10 UTC - in response to Message 17639.

At least you are completing the odd task, but your problem is at least 16days old, going by your tasks. You seem to be completing the tasks if they actually run for any length of time, but most fail after a few seconds, so you could be sitting idle for long periods (after too many failures)!

Maybe a different driver will work. You could try the most recent one or perhaps a much older driver 195.xx

A few weeks back I had the same problem with my GTX260 on Win 7 x64 (same as you). In the end I gave up and put it into an XP system! It now works fine.

The problem may be related to the reported RAM size on Win7 systems, and expected size by the app or Boinc. Yours is reported as,
NVIDIA GeForce GTX 275 (877MB) driver: 19745
- I'm guessing it actually has 896MB

So I would suggest you try the 257.21 driver released in the last day or so. If that fails try an older driver.

NVidia

Good luck,

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17658 - Posted: 17 Jun 2010 | 18:36:54 UTC - in response to Message 17643.

Gosh... I must have been on NVidia site just a split second before this new driver was out... Updated the driver but I am idle (reached limit of 5 tasks per day) so I must wait for a while to see if it make a difference.

I will keep an eye on it for the coming days

The best proof that I did not change a thing, is that this problem started during my vacation. No any automatic updates will be carried out on my system, so I am pretty sure that it is not because of changes in my system.

Will see what happens with the new driver

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17667 - Posted: 18 Jun 2010 | 15:04:54 UTC - in response to Message 17658.

Updated the driver.... BTW I not do any overclocking or so...

These messages from the last WU... problem is still the same

18/06/2010 07:13:56 GPUGRID Starting h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0
18/06/2010 07:13:56 GPUGRID Starting task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 using acemd2 version 605
18/06/2010 07:14:14 GPUGRID Computation for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 finished
18/06/2010 07:14:14 GPUGRID Output file h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_1 for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 absent
18/06/2010 07:14:14 GPUGRID Output file h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_2 for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 absent
18/06/2010 07:14:14 GPUGRID Output file h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_3 for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 absent
18/06/2010 07:14:14 GPUGRID Starting h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0
18/06/2010 07:14:14 GPUGRID Starting task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 using acemd2 version 605
18/06/2010 07:14:15 GPUGRID Started upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_0
18/06/2010 07:14:15 GPUGRID Started upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_4
18/06/2010 07:14:16 GPUGRID Finished upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_0
18/06/2010 07:14:16 GPUGRID Finished upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_4
18/06/2010 07:14:16 GPUGRID Started upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_7
18/06/2010 07:14:17 GPUGRID Finished upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_7
18/06/2010 07:14:32 GPUGRID Computation for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 finished
18/06/2010 07:14:32 GPUGRID Output file h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_1 for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 absent
18/06/2010 07:14:32 GPUGRID Output file h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_2 for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 absent
18/06/2010 07:14:32 GPUGRID Output file h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_3 for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 absent
18/06/2010 07:14:33 GPUGRID Started upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_0
18/06/2010 07:14:33 GPUGRID Started upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_4
18/06/2010 07:14:34 GPUGRID Finished upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_0
18/06/2010 07:14:34 GPUGRID Finished upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_4
18/06/2010 07:14:34 GPUGRID Started upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_7
18/06/2010 07:14:35 GPUGRID Finished upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_7


Any suggestions ? (while I will try an older driver)

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17668 - Posted: 18 Jun 2010 | 15:17:44 UTC - in response to Message 17667.

as expected.... an older driver has the same result as before.

so it is likely not the driver, but something else...

suggestions still welcome

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17672 - Posted: 19 Jun 2010 | 0:58:11 UTC - in response to Message 17668.

Use XP or Linux, if you can.

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17680 - Posted: 21 Jun 2010 | 6:07:17 UTC - in response to Message 17672.

Not possible...

Why GPUGRID not make a more stable application ?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17681 - Posted: 21 Jun 2010 | 6:26:00 UTC - in response to Message 17680.

Hi Barts

Can you look in Device Manager, right click on your card, select properties and then details and in the pull down list is there an entry that is called "Install Error" anywhere in the list?



____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 17684 - Posted: 21 Jun 2010 | 13:20:16 UTC - in response to Message 17681.
Last modified: 21 Jun 2010 | 13:24:28 UTC

I would encourage you to try the Linux on a stick. http://www.gpugrid.net/forum_thread.php?id=2203

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17687 - Posted: 22 Jun 2010 | 20:53:54 UTC - in response to Message 17681.
Last modified: 22 Jun 2010 | 21:18:39 UTC

Hi, No install errors in that driver section, however I do see multiple entries {3ab22e31-8264-4b4e-9af5-a8d2d8e33e62}
[1]..[17] and [25] behind it.

About linux... uhm... I have nothing against linux, although the support for Nvidia is 'difficult'..

So again... why GPUGRID is not making the application more stable ? Mine was running ok and without driver updates it start getting bad

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17697 - Posted: 25 Jun 2010 | 14:42:42 UTC - in response to Message 17687.
Last modified: 25 Jun 2010 | 15:04:56 UTC

I'm getting runaway errors for TONI_CAPBIND on my quad GT240 system. The other tasks work fine. Each TONI_CAPBIND fails after about 20sec.
Vista Ult x64, driver 19621.

I have now stopped picking up new tasks, communication deffered for 7h.
I cannot change the operating system, it gets used too much.
Did a system restart. One task (TONI_HERG) is due to complete in about 90min, so I will see if the restart made any difference.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17698 - Posted: 25 Jun 2010 | 16:27:17 UTC - in response to Message 17697.

When I manually reported the finished TONI_HERG work unit, Boinc picked up 2 new tasks :) Fortunately they are TONI_KID work units and both have made it to 1% (about 7min).

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17707 - Posted: 26 Jun 2010 | 20:37:49 UTC - in response to Message 17684.

I would encourage GPU grid to get a decent round of debugging in these tasks that seems to be highly unstable, or come out with a good and clear report why these tasks fail. In that case we can do something about it.... and GPUgrid does not have all those failed tasks

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 740,445,933
RAC: 45,306
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17715 - Posted: 27 Jun 2010 | 20:04:48 UTC
Last modified: 27 Jun 2010 | 20:25:06 UTC

GPUGRID has an option to participate in testing of new software versions that have passed server site testing but still need additional testing on a wider variety of computers. You may want to check if you have unknowingly set your account to participate.

Also, a BOINC CPU workunits project where you may want to avoid participating for now, since it does not seem fully compatible with GPUGRID: PrimeGrid.

A few comments on why one of my workunits failed:

6/27/2010 7:15:15 AM GPUGRID Computation for task D273r4-TONI_HERGunb1-59-100-RND6573_0 finished
6/27/2010 7:15:15 AM GPUGRID Output file D273r4-TONI_HERGunb1-59-100-RND6573_0_1 for task D273r4-TONI_HERGunb1-59-100-RND6573_0 absent
6/27/2010 7:15:15 AM GPUGRID Output file D273r4-TONI_HERGunb1-59-100-RND6573_0_2 for task D273r4-TONI_HERGunb1-59-100-RND6573_0 absent
6/27/2010 7:15:15 AM GPUGRID Output file D273r4-TONI_HERGunb1-59-100-RND6573_0_3 for task D273r4-TONI_HERGunb1-59-100-RND6573_0 absent

The failure happened just after I had enabled getting workunits from the PrimeGrid BOINC project, and got several workunits with the completion time overestimated enough that two of them went into high-priority mode immediately. A third CPU core was already running a The Lattice Project workunit in high-priority mode. I had set BOINC not to use the fourth CPU core. It looks like the workunit was simply not able to recover from having all the CPU cores BOINC could use in high-priority mode at once.

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17739 - Posted: 28 Jun 2010 | 22:04:33 UTC - in response to Message 17715.

This has nothing to do with the number of CPU cores. I have AQUA running as well taking all my cores and still a GPUGRID can run. ACEMD2: GPU molucar dynamics runs fine here..

It is really a very instable TONI_* or one of the other new WU's.

Better that GPU grid has a look at this instability before they send out more of those WU's

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17741 - Posted: 28 Jun 2010 | 23:28:17 UTC - in response to Message 17739.

Hey barts ... I took a look at about 10 of your errored WUs and what I noticed is that they are all different WU types and most of them have already been sucessfully completed by other computers (no multiple errors on different machines). Maybe before claiming there is an unstable WU type please double check around a little before just throwing the blame blanket on GPUGrid.

Might I suggest a clean install of the driver? Uninstall, boot to safe mode (F8), run driver sweeper to clean up any old remants, boot again to safe mode and install the driver you want to use. Now reboot one more time and see how it goes.

Do you have your BOINC directories excluded from AV scanning? Both the data and Program directories.
____________
Thanks - Steve

=Lupus=
Send message
Joined: 10 Nov 07
Posts: 10
Credit: 12,777,491
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 17742 - Posted: 28 Jun 2010 | 23:30:40 UTC

I am observing that in the last few days there were some TONI_CAPBIND's failing on my machine... 2 cancelled by server (ok thats not an error) 3 with exit code 98, one just finished ok... There seems to be a problem with them ^.^

BOINC_64_6.10.56 on Vista64,
"27.06.2010 19:00:33 NVIDIA GPU 0: GeForce GTX 260 (driver version 19107, CUDA version 2030, compute capability 1.3, 896MB, 582 GFLOPS peak)"

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17750 - Posted: 29 Jun 2010 | 9:26:23 UTC - in response to Message 17742.

Some of the work units must be different in some way that causes them to fail, usually after a few seconds. Some tasks just won’t run for me while others work fine. This is mostly the case on Vista and Win7, so it is operating system related, depends on your exact GPU, and in the recent past (last few months) definitely driver related too (I found some drivers work for some tasks, while other drivers fail all tasks). So it is just down to getting the correct driver for the tasks (if you can). Otherwise the only choice is to change operating system. XP and Linux seem to work the best.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 17751 - Posted: 29 Jun 2010 | 9:32:58 UTC - in response to Message 17739.

This has nothing to do with the number of CPU cores. I have AQUA running as well taking all my cores and still a GPUGRID can run. ACEMD2: GPU molucar dynamics runs fine here..

It is really a very instable TONI_* or one of the other new WU's.

Better that GPU grid has a look at this instability before they send out more of those WU's



Barts,
these workunits seem to work just fine for us. Try the USB key (see join link, this will allow you to run faster and leave untounched your home system.

gdf

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 17849 - Posted: 3 Jul 2010 | 11:11:21 UTC - in response to Message 17751.

If it starts running instable while the PC is untouched, I was on holiday when this started to happen.... Then it can only be something in GPUGRID causing this. "Error while computing" as error message does not give me any information, so maybe a GPUGRID member can investigate the real reason why the WU's have an error. If it is in my system, I know what I can fix, if it is in GPUGRID, they can fix.

I don't see the point of running anothter OS especially for GPUGRID. Many other projects (e.g. MilkyWay like my GPU also)....

So please come up with some real reasons why these errors happen, not just a try another OS

slicedbread
Send message
Joined: 23 Jul 09
Posts: 2
Credit: 332,582
RAC: 0
Level

Scientific publications
wat
Message 18146 - Posted: 23 Jul 2010 | 15:18:51 UTC - in response to Message 17849.
Last modified: 23 Jul 2010 | 15:31:12 UTC

Try turning off TDR.

http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx


1.make a txt file call it update.reg, make sure it has no txt extension.
2.edit and add these lines.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrLevel"=dword:00000000

3.run update.reg, select yes when asked to update registry.
4.restart.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18147 - Posted: 23 Jul 2010 | 16:58:21 UTC - in response to Message 18146.

Makes for interesting reading ... even though it says specifically to only use these reg keys for testing I wonder if your suggestion of disabling detection and recovery would actually improve performance because it (hopefully) the OS will no longer be spending as many cycles watching what the GPU is doing?

slicedbread ... have you tried this yourself?
____________
Thanks - Steve

slicedbread
Send message
Joined: 23 Jul 09
Posts: 2
Credit: 332,582
RAC: 0
Level

Scientific publications
wat
Message 18150 - Posted: 23 Jul 2010 | 19:36:57 UTC - in response to Message 18147.

Yes, i've tried this because i had errors. works on windows 7.

Not sure if this will give you a performance boost. :/

bigtuna
Volunteer moderator
Send message
Joined: 6 May 10
Posts: 80
Credit: 98,784,188
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18153 - Posted: 24 Jul 2010 | 9:13:46 UTC - in response to Message 17849.

If it starts running instable while the PC is untouched, I was on holiday when this started to happen.... Then it can only be something in GPUGRID causing this. "Error while computing" as error message does not give me any information, so maybe a GPUGRID member can investigate the real reason why the WU's have an error. If it is in my system, I know what I can fix, if it is in GPUGRID, they can fix.

This has effectively already been done. When a work unit fails an identical task is automatically reissued to different computer. Comparing your results to the results of others is an excellent troubleshooting technique. If a work unit fails on your system and also fails on other systems the work unit is most likely "bad". OTOH if a work unit fails on your system but other volunteers complete the work unit without errors the problem is most likely your system.

I don't see the point of running anothter OS especially for GPUGRID. Many other projects (e.g. MilkyWay like my GPU also)....

The point of running a different OS is to differentiate between hardware and software issues. That, and FatDog-64 is totally cool and easy (including the nVidia drivers, they install with a single click). If your system works perfect with one OS and works less than perfect with a different OS it is likely that there is some sort of software issue.

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 18220 - Posted: 1 Aug 2010 | 13:41:47 UTC - in response to Message 18153.

So you're asking me to throw away my current OS with my current programs solely for GPU GRID sake. Too bad that most programs I use are not available for linux.

OTOH. System has been running without problems from the beginning. While no hardware changes is done AND no software change is done, only (and solely) GPUGRID) started to run instable. It is a pity that problems are pinpointed to the (volunterring) users. For the next batch of GPU tasks, can you print a message inside the BOINC message list WHY there is an "error in computing"

There is a reason for failing the computation, GPUGRID is able to detect it, and just says "Error in computing"... It would be handy if it says a real reason of the failure instead of a meaningless phrase that does not mean anything to anyone.

"Workunit Corrupt", "NVIDIA Driver incompatible" or another of such message would be at least a little handy.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18229 - Posted: 2 Aug 2010 | 10:16:46 UTC - in response to Message 18220.

Barts, more error info would probably help the scientists too.

GPUGrid has to use NVidia drivers, CUDA from NVidia and Boinc. If there is a problem with the drivers, a CUDA bug or an issue with Boinc it makes things difficult to trace and fix.

Differences in card designs also makes it more difficult, so one GTX275 will work fine, but another fails tasks and the only differences seems to be the amount of RAM on the card. Under Win7 my Palit GTX260-216 worked, then started to fail more and more task types (no matter which driver I used); possibly a CUDA bug. When I installed XP it worked fine again and when I installed Linux it ran equally well.

You could dual boot the system with Linux, all you need is a Linux CD and some space on your existing drive or a USB stick.

I would first try the latest Boinc Beta version along with the latest drivers; the Boinc Beta says it fixed a CUDA leak so it might help.

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 18238 - Posted: 3 Aug 2010 | 13:25:51 UTC - in response to Message 18229.

I know all about being able to do dual boot, but it won't be more than just a test adding another 'PC' into my account with again another starting date etc.

I will give the beta boinc a try... meanwhile I just leave my OS as it is, my PC is not dedicated GPUGRID only, I use it for other things too

barts
Send message
Joined: 28 Aug 09
Posts: 12
Credit: 4,537,060
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 18250 - Posted: 5 Aug 2010 | 18:30:09 UTC - in response to Message 18238.

The beta also not works.

Milkyway = Running correct - no failures
Collatz Conjuncture = Running correct - no failures
GPUGGRID = Failing 85% of the WU's

For me 1+1=2... there must be something wrong in GPUGRID

Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not

Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!)

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18251 - Posted: 5 Aug 2010 | 21:14:29 UTC - in response to Message 18250.

The beta also not works.

Milkyway = Running correct - no failures
Collatz Conjuncture = Running correct - no failures
GPUGGRID = Failing 85% of the WU's

For me 1+1=2... there must be something wrong in GPUGRID

Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not

Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!)


I can understand you frustration but if you take a look through the "Top Hosts" listing you can find lots of 275 cards that are returning error free.

Not only that but the very WUs that are erroring on your machine are completing sucessfully on others.

Maybe your card is starting to go bad? Milkyway and Collatz do not exercise your card as much as GPUGrid so I don't think they are good bellweathers for determining a card's functionality/ stability

Have you tried running anay of the standard GPU benchmark program lately?
Furmark, OCCT, etc.
____________
Thanks - Steve

jjwhalen
Send message
Joined: 23 Nov 09
Posts: 29
Credit: 17,591,899
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18252 - Posted: 5 Aug 2010 | 21:15:36 UTC

In case anyone is tracking broken workunits, taskID 2778863, a TONI_CAPBIND, threw an unhandled exception after 1.01sec. I see it also crashed on (all 5) other hosts. The stderr looks very complete, including runtime debugger output.

This is the first WU crash I've had since upgrading to a GTX 465SC and figuring out what overclock was tolerable. The computerID is 57387.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18256 - Posted: 5 Aug 2010 | 23:46:43 UTC - in response to Message 18252.
Last modified: 5 Aug 2010 | 23:51:52 UTC

barts, the only way you are going to know for sure if your card is stuffed is if you try it on Linux or XP running GPUGrid tasks; a 7min task elsewhere will not tell you much.

jjwhalen,
6 Failures now, so it is a bad task/bug:

errors Too many errors (may have bug)

Profile KPX
Send message
Joined: 29 Sep 09
Posts: 5
Credit: 116,222,589
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18322 - Posted: 11 Aug 2010 | 15:45:16 UTC

I have this "Error while computing" problem as well. In my case, it seems GPUGrid is not detecting my graphics card... I thought installing the latest nVidia driver would fix this, but it didn't. Any idea what's wrong? I am posting the failed WU details, and the computer details below that:
-------------------------------------------------------------------------------
Name h232f99r168-TONI_CAPBINDsp2-72-100-RND1083_0
Workunit 1789399
Created 11 Aug 2010 5:21:12 UTC
Sent 11 Aug 2010 5:47:17 UTC
Received 11 Aug 2010 5:48:51 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -40 (0xffffffffffffffd8)
Computer ID 71984
Report deadline 16 Aug 2010 5:47:17 UTC
Run time 0
CPU time 0
stderr out

<core_client_version>6.10.57</core_client_version>
<![CDATA[
<message>
- exit code -40 (0xffffffd8)
</message>
<stderr_txt>
# Using device 0
# There is no device supporting CUDA.
# Device 0: "Device Emulation (CPU)"
# Clock rate: 1.35 GHz
# Total amount of global memory: -1 bytes
# Number of multiprocessors: 16
# Number of cores: 128
SWAN: FATAL : No device found

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 0
Granted credit 0
application version ACEMD2: GPU molecular dynamics v6.05 (cuda)

-------------------------------------------------------------------------------
CPU type GenuineIntel
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz [Family 6 Model 23 Stepping 10]
Number of processors 4
Coprocessors NVIDIA GeForce GT 240 (474MB) driver: 25896
Operating System Microsoft Windows 7
Ultimate x64 Edition, (06.01.7600.00)
BOINC client version 6.10.57
Memory 4095.12 MB
Cache 6144 KB
Swap space 8188.38 MB
Total disk space 149.05 GB
Free Disk Space 101.51 GB
Measured floating point speed 2849.9 million ops/sec
Measured integer speed 8782.37 million ops/sec
Average upload rate 32.48 KB/sec
Average download rate 300.82 KB/sec
Average turnaround time 0.97 days
Maximum daily WU quota per CPU 1/day
Tasks 33
Number of times client has contacted server 286
Last time contacted server 11 Aug 2010 5:48:51 UTC
% of time BOINC client is running 99.9352 %
While BOINC running, % of time host has an Internet connection 100 %
While BOINC running, % of time work is allowed 99.9917 %
Task duration correction factor 2.510605

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18323 - Posted: 11 Aug 2010 | 16:24:11 UTC - in response to Message 18322.
Last modified: 11 Aug 2010 | 16:27:48 UTC

Your GT240 has 96shaders and not 128, so the driver that is installed needs to be uninstalled.
Then restart in Safe Mode and install the correct driver.
After that restart again.

-Update Boinc while you are at it.

Profile KPX
Send message
Joined: 29 Sep 09
Posts: 5
Credit: 116,222,589
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18334 - Posted: 13 Aug 2010 | 0:40:28 UTC - in response to Message 18323.

You are right, the number of shaders is detected incorrectly. But what do you mean by correct driver? I have installed the latest one from the nvidia website... why is that not correct?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18337 - Posted: 13 Aug 2010 | 9:41:45 UTC - in response to Message 18334.
Last modified: 13 Aug 2010 | 9:46:03 UTC

I see you have not updated Boinc yet and still have 112 shaders.

Uninstall Boinc, restart, uninstall the present (Probably corrupt) driver, restart to Safe Mode. Install the latest (25896) driver. Restart, install Boinc and restart again before trying any tasks.

Terry
Send message
Joined: 9 Mar 09
Posts: 1
Credit: 42,239
RAC: 0
Level

Scientific publications
wat
Message 18384 - Posted: 22 Aug 2010 | 4:54:46 UTC - in response to Message 18337.

I'm getting computational errors now as well on my win7 64 bit machine, I believe this just started. I'll let the project run a few more days and if it continues then I'll just drop the project. It's not worth the hassle for me to trouble shoot this since these are home computers that I set up to run projects while not in use.

You want to provide additional information in the information error I'd be happy to post what I get.

Regards.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18385 - Posted: 22 Aug 2010 | 15:58:40 UTC - in response to Message 18384.
Last modified: 22 Aug 2010 | 16:21:36 UTC

You have a G210M graphics card.
With only 16 shaders this card is not up to running GPUGRID tasks - even if it did not crash tasks it would probably take 4days to complete.
You should stop trying to use it with GPUGRID as all your tasks are failing and the card is too slow to complete in a reasonable time.
It may be of some use to other GPU projects (SETI, Einstein, Folding@home, Collatz) but not all; it will not work on MilkyWay.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 740,445,933
RAC: 45,306
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18757 - Posted: 23 Sep 2010 | 1:37:35 UTC
Last modified: 23 Sep 2010 | 1:38:56 UTC

One idea on a possible cause for the errors: On my computer, they appear to happen only if all three of the following programs are running at once:


A GPUGRID workunit.

Norton Internet Security 2010, in full scan mode, especially if manually started in this mode. BOINC directories excluded from scanning.

Windows Live Mail version 2009 (Build 14.0.8117.0416) - the current version for 64-bit Vista; in newsgroups mode.


When the error occurs, many flashing dots appear on the screen - too many to read the screen well; and the GPUGRID workunit tries to restart but eventually fails.

How close is this combination to what others are running when they see failures?

9/21/2010 3:06:14 PM Starting BOINC client version 6.10.56 for windows_x86_64
9/21/2010 3:06:14 PM log flags: file_xfer, sched_ops, task
9/21/2010 3:06:14 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
9/21/2010 3:06:14 PM Data directory: C:\ProgramData\BOINC
9/21/2010 3:06:14 PM Running under account Bobby
9/21/2010 3:06:16 PM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10]
9/21/2010 3:06:16 PM Processor: 6.00 MB cache
9/21/2010 3:06:16 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe
9/21/2010 3:06:16 PM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
9/21/2010 3:06:16 PM Memory: 8.00 GB physical, 16.11 GB virtual
9/21/2010 3:06:16 PM Disk: 919.67 GB total, 723.13 GB free
9/21/2010 3:06:16 PM Local time is UTC -5 hours
9/21/2010 3:06:42 PM NVIDIA GPU 0: GeForce 9800 GT (driver version 19621, CUDA version 3000, compute capability 1.1, 1024MB, 336 GFLOPS peak)
9/21/2010 3:06:43 PM GPUGRID URL http://www.gpugrid.net/; Computer ID 48221; resource share 35

About a dozen other BOINC projects, but all other GPU-using projects disabled when the errors occurred.

Speedy
Send message
Joined: 19 Aug 07
Posts: 42
Credit: 28,391,082
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 19102 - Posted: 29 Oct 2010 | 21:51:35 UTC

I had a task p35-IBUCH_1_TRYP_101025-3-4-RND1655_0 fail after 4.38 hours with the following errors MDIO ERROR: cannot open file "restart.coor" ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b)
called boinc_finish. I'm running Win7 64 bit Boinc 6.10.58 with a GTX 470 driver 260.89. Link to result3205760 Exit status 98 (0x62)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19105 - Posted: 30 Oct 2010 | 0:04:16 UTC - in response to Message 19102.

Update to 26099 from 26089 - different issue but you should still do it.

Dont know the reason for this specific IBUCH error; only one of the scientist could tell you (unless it is a driver issue).

You might want to read this thread, http://www.gpugrid.net/forum_thread.php?id=2123

GPU crunching is folly at times, better luck with your next task.

Profile Fred J. Verster
Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19128 - Posted: 30 Oct 2010 | 18:52:50 UTC - in response to Message 19105.
Last modified: 30 Oct 2010 | 18:54:08 UTC

A lot of 'older' NVidia cards, can sometimesbe used with success.
IIRC, the requirements for the GPU, are more demanding, FERMI,TESLA,
GTS250 didn't work in my rig. GT240,269-16,275,285,295 is OK, I heard, but Compute Cap.has to be 1.3. minimal, 2.0 recommended and CUDA 3.1
GTX480 failiar.

Probably already asked a thousend times, is there also an
ATI 5000, series, especially, 5850 & 5870 and 5970(2x5870)
?

SETI, which will swap 2 Servers, BOINC DATA Base and Replica to handle in (much) increased load, a kind of permanent DDOS-attack......., you could say, it proofed to work, [i]Distributed Computing, whithout a doubt!

Now it's 'cracking'under it's heavy-(users)load. 1 million people, each using ~3
hosts, exception doen't prove the rule, here.

And I'm pleasted with the bonus added, when you return a task, whithin 12 or 24 hours? Will have a look :)
____________

Knight Who Says Ni N!

Profile KPX
Send message
Joined: 29 Sep 09
Posts: 5
Credit: 116,222,589
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20367 - Posted: 7 Feb 2011 | 18:09:12 UTC

I started getting a new error on my remotely accessed GTX 570. Any idea what might be causing it or how to fix it?

Name p15-IBUCH_7_mutEGFR_110124-14-20-RND5105_0
Workunit 2302680
Created 7 Feb 2011 13:42:46 UTC
Sent 7 Feb 2011 13:48:21 UTC
Received 7 Feb 2011 13:51:04 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -40 (0xffffffffffffffd8)
Computer ID 71052
Report deadline 12 Feb 2011 13:48:21 UTC
Run time 0
CPU time 0
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code -40 (0xffffffd8)
</message>
<stderr_txt>
# Using device 0
# There are 3 devices supporting CUDA
# Device 0: "�"
# Clock rate: 0.00 GHz
# Total amount of global memory: 4475442 bytes
# Number of multiprocessors: 1615004
# Number of cores: 12920032
# Device 1: "�"
# Clock rate: 0.00 GHz
# Total amount of global memory: 4475442 bytes
# Number of multiprocessors: 1615004
# Number of cores: 12920032
# Device 2: "�"
# Clock rate: 0.00 GHz
# Total amount of global memory: 4475442 bytes
# Number of multiprocessors: 1615004
# Number of cores: 12920032
SWAN: FATAL : No device found

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 0
Granted credit 0
application version ACEMD2: GPU molecular dynamics v6.13 (cuda31)

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20368 - Posted: 7 Feb 2011 | 18:16:12 UTC - in response to Message 20367.

@KPX,

I think you have to download the new nvidia.driver 266.58 and try again!

Good luck,


____________
Ton (ftpd) Netherlands

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1589
Credit: 6,560,769,351
RAC: 5,635,230
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20369 - Posted: 7 Feb 2011 | 19:32:35 UTC - in response to Message 20367.

I started getting a new error on my remotely accessed GTX 570. Any idea what might be causing it or how to fix it?

What do you mean by 'remotely accessed'? Both your machines here run Windows 7. If you use, in addition, the Windows "Remote Desktop" program, your tasks are bound to fail. That's not just GPUGrid, or BOINC GPU tasks in general, but all CUDA-based programs.

This is because of the new security model used for Windows 7 (and Vista) video drivers. When you use RDP, the NVidia driver - any version - is swapped out, and a Microsoft RDP driver, not CUDA compatible, is swapped in in its place.

Other remote access products, such as VNC or LogMeIn, do not suffer from this drawback.

Profile KPX
Send message
Joined: 29 Sep 09
Posts: 5
Credit: 116,222,589
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20370 - Posted: 7 Feb 2011 | 20:12:11 UTC - in response to Message 20369.
Last modified: 7 Feb 2011 | 20:12:58 UTC

Yes, I am accessing a Win7 computer over internet from a Win7 computer. I am using LogMeIn and alternatively testing TeamViewer, as I am already aware of the Windows 7 Remote Desktop problem with Cuda. However, the units are still failing. Is there anything I need to do to disable some of the Windows services related to Remote Desktop, or is there anything to setup in LogMeIn or TeamViewer? Or do I need to forget both these easy programs and learn VNC (which is bloody complicated...)?

Post to thread

Message boards : Number crunching : Error while computing

//