Author |
Message |
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
Almost all my GPUGRID wu's fail after 5 seconds "Computation Error"
Boinc 6.10.56
wxWidgets 2.8.10
Nvidia GTX275 driver 8.17.11.9745
Some wu's still computing correctly.
What can this be ? I did not recently update
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
At least you are completing the odd task, but your problem is at least 16days old, going by your tasks. You seem to be completing the tasks if they actually run for any length of time, but most fail after a few seconds, so you could be sitting idle for long periods (after too many failures)!
Maybe a different driver will work. You could try the most recent one or perhaps a much older driver 195.xx
A few weeks back I had the same problem with my GTX260 on Win 7 x64 (same as you). In the end I gave up and put it into an XP system! It now works fine.
The problem may be related to the reported RAM size on Win7 systems, and expected size by the app or Boinc. Yours is reported as,
NVIDIA GeForce GTX 275 (877MB) driver: 19745
- I'm guessing it actually has 896MB
So I would suggest you try the 257.21 driver released in the last day or so. If that fails try an older driver.
NVidia
Good luck, |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
Gosh... I must have been on NVidia site just a split second before this new driver was out... Updated the driver but I am idle (reached limit of 5 tasks per day) so I must wait for a while to see if it make a difference.
I will keep an eye on it for the coming days
The best proof that I did not change a thing, is that this problem started during my vacation. No any automatic updates will be carried out on my system, so I am pretty sure that it is not because of changes in my system.
Will see what happens with the new driver |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
Updated the driver.... BTW I not do any overclocking or so...
These messages from the last WU... problem is still the same
18/06/2010 07:13:56 GPUGRID Starting h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0
18/06/2010 07:13:56 GPUGRID Starting task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 using acemd2 version 605
18/06/2010 07:14:14 GPUGRID Computation for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 finished
18/06/2010 07:14:14 GPUGRID Output file h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_1 for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 absent
18/06/2010 07:14:14 GPUGRID Output file h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_2 for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 absent
18/06/2010 07:14:14 GPUGRID Output file h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_3 for task h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0 absent
18/06/2010 07:14:14 GPUGRID Starting h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0
18/06/2010 07:14:14 GPUGRID Starting task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 using acemd2 version 605
18/06/2010 07:14:15 GPUGRID Started upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_0
18/06/2010 07:14:15 GPUGRID Started upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_4
18/06/2010 07:14:16 GPUGRID Finished upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_0
18/06/2010 07:14:16 GPUGRID Finished upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_4
18/06/2010 07:14:16 GPUGRID Started upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_7
18/06/2010 07:14:17 GPUGRID Finished upload of h232f99r83-TONI_CAPBINDsp2-7-100-RND6332_0_7
18/06/2010 07:14:32 GPUGRID Computation for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 finished
18/06/2010 07:14:32 GPUGRID Output file h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_1 for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 absent
18/06/2010 07:14:32 GPUGRID Output file h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_2 for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 absent
18/06/2010 07:14:32 GPUGRID Output file h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_3 for task h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0 absent
18/06/2010 07:14:33 GPUGRID Started upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_0
18/06/2010 07:14:33 GPUGRID Started upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_4
18/06/2010 07:14:34 GPUGRID Finished upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_0
18/06/2010 07:14:34 GPUGRID Finished upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_4
18/06/2010 07:14:34 GPUGRID Started upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_7
18/06/2010 07:14:35 GPUGRID Finished upload of h232f99r116-TONI_CAPBINDsp2-4-100-RND6115_0_7
Any suggestions ? (while I will try an older driver)
|
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
as expected.... an older driver has the same result as before.
so it is likely not the driver, but something else...
suggestions still welcome |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Use XP or Linux, if you can. |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
Not possible...
Why GPUGRID not make a more stable application ? |
|
|
|
Hi Barts
Can you look in Device Manager, right click on your card, select properties and then details and in the pull down list is there an entry that is called "Install Error" anywhere in the list?
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
I would encourage you to try the Linux on a stick. http://www.gpugrid.net/forum_thread.php?id=2203 |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
Hi, No install errors in that driver section, however I do see multiple entries {3ab22e31-8264-4b4e-9af5-a8d2d8e33e62}
[1]..[17] and [25] behind it.
About linux... uhm... I have nothing against linux, although the support for Nvidia is 'difficult'..
So again... why GPUGRID is not making the application more stable ? Mine was running ok and without driver updates it start getting bad |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I'm getting runaway errors for TONI_CAPBIND on my quad GT240 system. The other tasks work fine. Each TONI_CAPBIND fails after about 20sec.
Vista Ult x64, driver 19621.
I have now stopped picking up new tasks, communication deffered for 7h.
I cannot change the operating system, it gets used too much.
Did a system restart. One task (TONI_HERG) is due to complete in about 90min, so I will see if the restart made any difference. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
When I manually reported the finished TONI_HERG work unit, Boinc picked up 2 new tasks :) Fortunately they are TONI_KID work units and both have made it to 1% (about 7min).
|
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
I would encourage GPU grid to get a decent round of debugging in these tasks that seems to be highly unstable, or come out with a good and clear report why these tasks fail. In that case we can do something about it.... and GPUgrid does not have all those failed tasks |
|
|
|
GPUGRID has an option to participate in testing of new software versions that have passed server site testing but still need additional testing on a wider variety of computers. You may want to check if you have unknowingly set your account to participate.
Also, a BOINC CPU workunits project where you may want to avoid participating for now, since it does not seem fully compatible with GPUGRID: PrimeGrid.
A few comments on why one of my workunits failed:
6/27/2010 7:15:15 AM GPUGRID Computation for task D273r4-TONI_HERGunb1-59-100-RND6573_0 finished
6/27/2010 7:15:15 AM GPUGRID Output file D273r4-TONI_HERGunb1-59-100-RND6573_0_1 for task D273r4-TONI_HERGunb1-59-100-RND6573_0 absent
6/27/2010 7:15:15 AM GPUGRID Output file D273r4-TONI_HERGunb1-59-100-RND6573_0_2 for task D273r4-TONI_HERGunb1-59-100-RND6573_0 absent
6/27/2010 7:15:15 AM GPUGRID Output file D273r4-TONI_HERGunb1-59-100-RND6573_0_3 for task D273r4-TONI_HERGunb1-59-100-RND6573_0 absent
The failure happened just after I had enabled getting workunits from the PrimeGrid BOINC project, and got several workunits with the completion time overestimated enough that two of them went into high-priority mode immediately. A third CPU core was already running a The Lattice Project workunit in high-priority mode. I had set BOINC not to use the fourth CPU core. It looks like the workunit was simply not able to recover from having all the CPU cores BOINC could use in high-priority mode at once. |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
This has nothing to do with the number of CPU cores. I have AQUA running as well taking all my cores and still a GPUGRID can run. ACEMD2: GPU molucar dynamics runs fine here..
It is really a very instable TONI_* or one of the other new WU's.
Better that GPU grid has a look at this instability before they send out more of those WU's |
|
|
|
Hey barts ... I took a look at about 10 of your errored WUs and what I noticed is that they are all different WU types and most of them have already been sucessfully completed by other computers (no multiple errors on different machines). Maybe before claiming there is an unstable WU type please double check around a little before just throwing the blame blanket on GPUGrid.
Might I suggest a clean install of the driver? Uninstall, boot to safe mode (F8), run driver sweeper to clean up any old remants, boot again to safe mode and install the driver you want to use. Now reboot one more time and see how it goes.
Do you have your BOINC directories excluded from AV scanning? Both the data and Program directories.
____________
Thanks - Steve |
|
|
=Lupus=Send message
Joined: 10 Nov 07 Posts: 10 Credit: 12,777,491 RAC: 0 Level
Scientific publications
|
I am observing that in the last few days there were some TONI_CAPBIND's failing on my machine... 2 cancelled by server (ok thats not an error) 3 with exit code 98, one just finished ok... There seems to be a problem with them ^.^
BOINC_64_6.10.56 on Vista64,
"27.06.2010 19:00:33 NVIDIA GPU 0: GeForce GTX 260 (driver version 19107, CUDA version 2030, compute capability 1.3, 896MB, 582 GFLOPS peak)" |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Some of the work units must be different in some way that causes them to fail, usually after a few seconds. Some tasks just won’t run for me while others work fine. This is mostly the case on Vista and Win7, so it is operating system related, depends on your exact GPU, and in the recent past (last few months) definitely driver related too (I found some drivers work for some tasks, while other drivers fail all tasks). So it is just down to getting the correct driver for the tasks (if you can). Otherwise the only choice is to change operating system. XP and Linux seem to work the best. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
This has nothing to do with the number of CPU cores. I have AQUA running as well taking all my cores and still a GPUGRID can run. ACEMD2: GPU molucar dynamics runs fine here..
It is really a very instable TONI_* or one of the other new WU's.
Better that GPU grid has a look at this instability before they send out more of those WU's
Barts,
these workunits seem to work just fine for us. Try the USB key (see join link, this will allow you to run faster and leave untounched your home system.
gdf |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
If it starts running instable while the PC is untouched, I was on holiday when this started to happen.... Then it can only be something in GPUGRID causing this. "Error while computing" as error message does not give me any information, so maybe a GPUGRID member can investigate the real reason why the WU's have an error. If it is in my system, I know what I can fix, if it is in GPUGRID, they can fix.
I don't see the point of running anothter OS especially for GPUGRID. Many other projects (e.g. MilkyWay like my GPU also)....
So please come up with some real reasons why these errors happen, not just a try another OS |
|
|
|
Try turning off TDR.
http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
1.make a txt file call it update.reg, make sure it has no txt extension.
2.edit and add these lines.
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrLevel"=dword:00000000
3.run update.reg, select yes when asked to update registry.
4.restart. |
|
|
|
Makes for interesting reading ... even though it says specifically to only use these reg keys for testing I wonder if your suggestion of disabling detection and recovery would actually improve performance because it (hopefully) the OS will no longer be spending as many cycles watching what the GPU is doing?
slicedbread ... have you tried this yourself?
____________
Thanks - Steve |
|
|
|
Yes, i've tried this because i had errors. works on windows 7.
Not sure if this will give you a performance boost. :/ |
|
|
bigtunaVolunteer moderator Send message
Joined: 6 May 10 Posts: 80 Credit: 98,784,188 RAC: 0 Level
Scientific publications
|
If it starts running instable while the PC is untouched, I was on holiday when this started to happen.... Then it can only be something in GPUGRID causing this. "Error while computing" as error message does not give me any information, so maybe a GPUGRID member can investigate the real reason why the WU's have an error. If it is in my system, I know what I can fix, if it is in GPUGRID, they can fix.
This has effectively already been done. When a work unit fails an identical task is automatically reissued to different computer. Comparing your results to the results of others is an excellent troubleshooting technique. If a work unit fails on your system and also fails on other systems the work unit is most likely "bad". OTOH if a work unit fails on your system but other volunteers complete the work unit without errors the problem is most likely your system.
I don't see the point of running anothter OS especially for GPUGRID. Many other projects (e.g. MilkyWay like my GPU also)....
The point of running a different OS is to differentiate between hardware and software issues. That, and FatDog-64 is totally cool and easy (including the nVidia drivers, they install with a single click). If your system works perfect with one OS and works less than perfect with a different OS it is likely that there is some sort of software issue.
|
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
So you're asking me to throw away my current OS with my current programs solely for GPU GRID sake. Too bad that most programs I use are not available for linux.
OTOH. System has been running without problems from the beginning. While no hardware changes is done AND no software change is done, only (and solely) GPUGRID) started to run instable. It is a pity that problems are pinpointed to the (volunterring) users. For the next batch of GPU tasks, can you print a message inside the BOINC message list WHY there is an "error in computing"
There is a reason for failing the computation, GPUGRID is able to detect it, and just says "Error in computing"... It would be handy if it says a real reason of the failure instead of a meaningless phrase that does not mean anything to anyone.
"Workunit Corrupt", "NVIDIA Driver incompatible" or another of such message would be at least a little handy. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Barts, more error info would probably help the scientists too.
GPUGrid has to use NVidia drivers, CUDA from NVidia and Boinc. If there is a problem with the drivers, a CUDA bug or an issue with Boinc it makes things difficult to trace and fix.
Differences in card designs also makes it more difficult, so one GTX275 will work fine, but another fails tasks and the only differences seems to be the amount of RAM on the card. Under Win7 my Palit GTX260-216 worked, then started to fail more and more task types (no matter which driver I used); possibly a CUDA bug. When I installed XP it worked fine again and when I installed Linux it ran equally well.
You could dual boot the system with Linux, all you need is a Linux CD and some space on your existing drive or a USB stick.
I would first try the latest Boinc Beta version along with the latest drivers; the Boinc Beta says it fixed a CUDA leak so it might help.
|
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
I know all about being able to do dual boot, but it won't be more than just a test adding another 'PC' into my account with again another starting date etc.
I will give the beta boinc a try... meanwhile I just leave my OS as it is, my PC is not dedicated GPUGRID only, I use it for other things too |
|
|
bartsSend message
Joined: 28 Aug 09 Posts: 12 Credit: 4,537,060 RAC: 0 Level
Scientific publications
|
The beta also not works.
Milkyway = Running correct - no failures
Collatz Conjuncture = Running correct - no failures
GPUGGRID = Failing 85% of the WU's
For me 1+1=2... there must be something wrong in GPUGRID
Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not
Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!)
|
|
|
|
The beta also not works.
Milkyway = Running correct - no failures
Collatz Conjuncture = Running correct - no failures
GPUGGRID = Failing 85% of the WU's
For me 1+1=2... there must be something wrong in GPUGRID
Back to the latests release version of BOINC. the beta does not show the message tab anymore which makes if even more hard to find out some info if there are failures or not
Anyone having options to get GPUGRID running again as it was running before may 2010 (as before that time it was running correct !! - and no - it did not start failing because of changes in PC OS, SW or drivers as it started failing during a holiday !!)
I can understand you frustration but if you take a look through the "Top Hosts" listing you can find lots of 275 cards that are returning error free.
Not only that but the very WUs that are erroring on your machine are completing sucessfully on others.
Maybe your card is starting to go bad? Milkyway and Collatz do not exercise your card as much as GPUGrid so I don't think they are good bellweathers for determining a card's functionality/ stability
Have you tried running anay of the standard GPU benchmark program lately?
Furmark, OCCT, etc.
____________
Thanks - Steve |
|
|
jjwhalen Send message
Joined: 23 Nov 09 Posts: 29 Credit: 17,591,899 RAC: 0 Level
Scientific publications
|
In case anyone is tracking broken workunits, taskID 2778863, a TONI_CAPBIND, threw an unhandled exception after 1.01sec. I see it also crashed on (all 5) other hosts. The stderr looks very complete, including runtime debugger output.
This is the first WU crash I've had since upgrading to a GTX 465SC and figuring out what overclock was tolerable. The computerID is 57387.
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
barts, the only way you are going to know for sure if your card is stuffed is if you try it on Linux or XP running GPUGrid tasks; a 7min task elsewhere will not tell you much.
jjwhalen,
6 Failures now, so it is a bad task/bug:
errors Too many errors (may have bug) |
|
|
KPXSend message
Joined: 29 Sep 09 Posts: 5 Credit: 116,222,589 RAC: 0 Level
Scientific publications
|
I have this "Error while computing" problem as well. In my case, it seems GPUGrid is not detecting my graphics card... I thought installing the latest nVidia driver would fix this, but it didn't. Any idea what's wrong? I am posting the failed WU details, and the computer details below that:
-------------------------------------------------------------------------------
Name h232f99r168-TONI_CAPBINDsp2-72-100-RND1083_0
Workunit 1789399
Created 11 Aug 2010 5:21:12 UTC
Sent 11 Aug 2010 5:47:17 UTC
Received 11 Aug 2010 5:48:51 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -40 (0xffffffffffffffd8)
Computer ID 71984
Report deadline 16 Aug 2010 5:47:17 UTC
Run time 0
CPU time 0
stderr out
<core_client_version>6.10.57</core_client_version>
<![CDATA[
<message>
- exit code -40 (0xffffffd8)
</message>
<stderr_txt>
# Using device 0
# There is no device supporting CUDA.
# Device 0: "Device Emulation (CPU)"
# Clock rate: 1.35 GHz
# Total amount of global memory: -1 bytes
# Number of multiprocessors: 16
# Number of cores: 128
SWAN: FATAL : No device found
</stderr_txt>
]]>
Validate state Invalid
Claimed credit 0
Granted credit 0
application version ACEMD2: GPU molecular dynamics v6.05 (cuda)
-------------------------------------------------------------------------------
CPU type GenuineIntel
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz [Family 6 Model 23 Stepping 10]
Number of processors 4
Coprocessors NVIDIA GeForce GT 240 (474MB) driver: 25896
Operating System Microsoft Windows 7
Ultimate x64 Edition, (06.01.7600.00)
BOINC client version 6.10.57
Memory 4095.12 MB
Cache 6144 KB
Swap space 8188.38 MB
Total disk space 149.05 GB
Free Disk Space 101.51 GB
Measured floating point speed 2849.9 million ops/sec
Measured integer speed 8782.37 million ops/sec
Average upload rate 32.48 KB/sec
Average download rate 300.82 KB/sec
Average turnaround time 0.97 days
Maximum daily WU quota per CPU 1/day
Tasks 33
Number of times client has contacted server 286
Last time contacted server 11 Aug 2010 5:48:51 UTC
% of time BOINC client is running 99.9352 %
While BOINC running, % of time host has an Internet connection 100 %
While BOINC running, % of time work is allowed 99.9917 %
Task duration correction factor 2.510605 |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Your GT240 has 96shaders and not 128, so the driver that is installed needs to be uninstalled.
Then restart in Safe Mode and install the correct driver.
After that restart again.
-Update Boinc while you are at it. |
|
|
KPXSend message
Joined: 29 Sep 09 Posts: 5 Credit: 116,222,589 RAC: 0 Level
Scientific publications
|
You are right, the number of shaders is detected incorrectly. But what do you mean by correct driver? I have installed the latest one from the nvidia website... why is that not correct? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
I see you have not updated Boinc yet and still have 112 shaders.
Uninstall Boinc, restart, uninstall the present (Probably corrupt) driver, restart to Safe Mode. Install the latest (25896) driver. Restart, install Boinc and restart again before trying any tasks. |
|
|
TerrySend message
Joined: 9 Mar 09 Posts: 1 Credit: 42,239 RAC: 0 Level
Scientific publications
|
I'm getting computational errors now as well on my win7 64 bit machine, I believe this just started. I'll let the project run a few more days and if it continues then I'll just drop the project. It's not worth the hassle for me to trouble shoot this since these are home computers that I set up to run projects while not in use.
You want to provide additional information in the information error I'd be happy to post what I get.
Regards. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You have a G210M graphics card.
With only 16 shaders this card is not up to running GPUGRID tasks - even if it did not crash tasks it would probably take 4days to complete.
You should stop trying to use it with GPUGRID as all your tasks are failing and the card is too slow to complete in a reasonable time.
It may be of some use to other GPU projects (SETI, Einstein, Folding@home, Collatz) but not all; it will not work on MilkyWay. |
|
|
|
One idea on a possible cause for the errors: On my computer, they appear to happen only if all three of the following programs are running at once:
A GPUGRID workunit.
Norton Internet Security 2010, in full scan mode, especially if manually started in this mode. BOINC directories excluded from scanning.
Windows Live Mail version 2009 (Build 14.0.8117.0416) - the current version for 64-bit Vista; in newsgroups mode.
When the error occurs, many flashing dots appear on the screen - too many to read the screen well; and the GPUGRID workunit tries to restart but eventually fails.
How close is this combination to what others are running when they see failures?
9/21/2010 3:06:14 PM Starting BOINC client version 6.10.56 for windows_x86_64
9/21/2010 3:06:14 PM log flags: file_xfer, sched_ops, task
9/21/2010 3:06:14 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
9/21/2010 3:06:14 PM Data directory: C:\ProgramData\BOINC
9/21/2010 3:06:14 PM Running under account Bobby
9/21/2010 3:06:16 PM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10]
9/21/2010 3:06:16 PM Processor: 6.00 MB cache
9/21/2010 3:06:16 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe
9/21/2010 3:06:16 PM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
9/21/2010 3:06:16 PM Memory: 8.00 GB physical, 16.11 GB virtual
9/21/2010 3:06:16 PM Disk: 919.67 GB total, 723.13 GB free
9/21/2010 3:06:16 PM Local time is UTC -5 hours
9/21/2010 3:06:42 PM NVIDIA GPU 0: GeForce 9800 GT (driver version 19621, CUDA version 3000, compute capability 1.1, 1024MB, 336 GFLOPS peak)
9/21/2010 3:06:43 PM GPUGRID URL http://www.gpugrid.net/; Computer ID 48221; resource share 35
About a dozen other BOINC projects, but all other GPU-using projects disabled when the errors occurred. |
|
|
SpeedySend message
Joined: 19 Aug 07 Posts: 43 Credit: 38,741,082 RAC: 700,964 Level
Scientific publications
|
I had a task p35-IBUCH_1_TRYP_101025-3-4-RND1655_0 fail after 4.38 hours with the following errors MDIO ERROR: cannot open file "restart.coor" ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b)
called boinc_finish. I'm running Win7 64 bit Boinc 6.10.58 with a GTX 470 driver 260.89. Link to result3205760 Exit status 98 (0x62) |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Update to 26099 from 26089 - different issue but you should still do it.
Dont know the reason for this specific IBUCH error; only one of the scientist could tell you (unless it is a driver issue).
You might want to read this thread, http://www.gpugrid.net/forum_thread.php?id=2123
GPU crunching is folly at times, better luck with your next task. |
|
|
|
A lot of 'older' NVidia cards, can sometimesbe used with success.
IIRC, the requirements for the GPU, are more demanding, FERMI,TESLA,
GTS250 didn't work in my rig. GT240,269-16,275,285,295 is OK, I heard, but Compute Cap.has to be 1.3. minimal, 2.0 recommended and CUDA 3.1
GTX480 failiar.
Probably already asked a thousend times, is there also an
ATI 5000, series, especially, 5850 & 5870 and 5970(2x5870)?
SETI, which will swap 2 Servers, BOINC DATA Base and Replica to handle in (much) increased load, a kind of permanent DDOS-attack......., you could say, it proofed to work, [i]Distributed Computing, whithout a doubt!
Now it's 'cracking'under it's heavy-(users)load. 1 million people, each using ~3
hosts, exception doen't prove the rule, here.
And I'm pleasted with the bonus added, when you return a task, whithin 12 or 24 hours? Will have a look :)
____________
Knight Who Says Ni N! |
|
|
KPXSend message
Joined: 29 Sep 09 Posts: 5 Credit: 116,222,589 RAC: 0 Level
Scientific publications
|
I started getting a new error on my remotely accessed GTX 570. Any idea what might be causing it or how to fix it?
Name p15-IBUCH_7_mutEGFR_110124-14-20-RND5105_0
Workunit 2302680
Created 7 Feb 2011 13:42:46 UTC
Sent 7 Feb 2011 13:48:21 UTC
Received 7 Feb 2011 13:51:04 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -40 (0xffffffffffffffd8)
Computer ID 71052
Report deadline 12 Feb 2011 13:48:21 UTC
Run time 0
CPU time 0
stderr out
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code -40 (0xffffffd8)
</message>
<stderr_txt>
# Using device 0
# There are 3 devices supporting CUDA
# Device 0: "�"
# Clock rate: 0.00 GHz
# Total amount of global memory: 4475442 bytes
# Number of multiprocessors: 1615004
# Number of cores: 12920032
# Device 1: "�"
# Clock rate: 0.00 GHz
# Total amount of global memory: 4475442 bytes
# Number of multiprocessors: 1615004
# Number of cores: 12920032
# Device 2: "�"
# Clock rate: 0.00 GHz
# Total amount of global memory: 4475442 bytes
# Number of multiprocessors: 1615004
# Number of cores: 12920032
SWAN: FATAL : No device found
</stderr_txt>
]]>
Validate state Invalid
Claimed credit 0
Granted credit 0
application version ACEMD2: GPU molecular dynamics v6.13 (cuda31) |
|
|
ftpd Send message
Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level
Scientific publications
|
@KPX,
I think you have to download the new nvidia.driver 266.58 and try again!
Good luck,
____________
Ton (ftpd) Netherlands |
|
|
|
I started getting a new error on my remotely accessed GTX 570. Any idea what might be causing it or how to fix it?
What do you mean by 'remotely accessed'? Both your machines here run Windows 7. If you use, in addition, the Windows "Remote Desktop" program, your tasks are bound to fail. That's not just GPUGrid, or BOINC GPU tasks in general, but all CUDA-based programs.
This is because of the new security model used for Windows 7 (and Vista) video drivers. When you use RDP, the NVidia driver - any version - is swapped out, and a Microsoft RDP driver, not CUDA compatible, is swapped in in its place.
Other remote access products, such as VNC or LogMeIn, do not suffer from this drawback. |
|
|
KPXSend message
Joined: 29 Sep 09 Posts: 5 Credit: 116,222,589 RAC: 0 Level
Scientific publications
|
Yes, I am accessing a Win7 computer over internet from a Win7 computer. I am using LogMeIn and alternatively testing TeamViewer, as I am already aware of the Windows 7 Remote Desktop problem with Cuda. However, the units are still failing. Is there anything I need to do to disable some of the Windows services related to Remote Desktop, or is there anything to setup in LogMeIn or TeamViewer? Or do I need to forget both these easy programs and learn VNC (which is bloody complicated...)? |
|
|