Advanced search

Message boards : Graphics cards (GPUs) : All tasks erroring out

Author Message
Mark Henderson
Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 22199 - Posted: 2 Oct 2011 | 0:54:15 UTC

Could someone look at my tasks please, they error out as soon as they try to start. I ran GPU grid fine about 4 months ago but just started back and it will not run anymore.
Thanks

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22201 - Posted: 2 Oct 2011 | 19:08:24 UTC - in response to Message 22199.

Per chance are you using Sli?

When you can pick up new tasks, do a system restart, make sure you are using default clocks and the GPU temps are fine, suspend other tasks and see if you can get a task to pass 10seconds.

The system cannot find the path specified. (0x3) - exit code 3 (0x3)

SWAN: FATAL : swanMemcpyDtoH failed

Assertion failed: 0, file swanlib_nv.c, line 390

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Mark Henderson
Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 22211 - Posted: 4 Oct 2011 | 1:36:58 UTC

Problem solved, i took out card 2 and it ran good with the one card. I put card 2 back in and it works now ??

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22212 - Posted: 4 Oct 2011 | 10:09:32 UTC - in response to Message 22211.

I have seen this odd behavior before. Re-seating a card sometimes helps. Might have been a physical connection issue or a driver problem.

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22284 - Posted: 16 Oct 2011 | 16:13:14 UTC - in response to Message 22201.

Per chance are you using Sli?

When you can pick up new tasks, do a system restart, make sure you are using default clocks and the GPU temps are fine, suspend other tasks and see if you can get a task to pass 10seconds.

The system cannot find the path specified. (0x3) - exit code 3 (0x3)

SWAN: FATAL : swanMemcpyDtoH failed

Assertion failed: 0, file swanlib_nv.c, line 390

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

I have similar problem on my two i7 win 7 x64 systems. Both have one ati hd6970 (main lcd) and two gtx570 (dummy plugs), not sli. now on 280.26 driver, but this error happend a lot on previous verions too.

I get several different errors trying gpugrid with these. have tried both short and long runs.

GPUs are factory clock speeds, but i have even tried underclocking both processor and memory 10%, still errors.

Everything else runs fine and other projects run fine. Temps are good, now down to 45-54C. But even in the summer when they ran hotter other projects still no problems.

Now during my last batch of tests, somehow each system ran 1 each acemd2 to completeion and validation, otherwise they all fail within a few seconds. And can't figure out anything i would have done at the time to make that happen and each happened a a different time on each system. I've even watched gpu usage when one starts, i doesn't get much above 1%, so i'm most convinced it is not a clock issue or tempertaure issue. I'm more convinced this is some software conflict or setup issue, only i can't figure out what, nothing seems to help.

most of the errors are 3, but i've seen 1 and 193 too.

<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>

So the big question is, what path is this application looking for ? Maybe it would be helpful if the error could print out what it was looking for and what it found, then maybe it would be easier to debug.

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22309 - Posted: 20 Oct 2011 | 23:35:34 UTC

Based on some suggestions by SKGiven , I tried some things.
-
First I backup 6.13.1 then uninstalled and deleted the folder.
-
With a new copy of 6.12.34 installed and only attching to GPUgrid so there could be no interference from any other project, I got an immediate failure. I tried using the current location which is on my B drive. After that I uninstalled that one and installed 6.12.34 to C drive in the default locations, that too failed. 6.12.34 runs GPUgrid on 3 other computers, although it is the x32 version not the x64 versaion. Next I tried to install the x32 version, attach only to GPUgrid, that to produced an error immediately.

It is something else, but I still wonder what the application error is looking for, what path ????

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using device 0
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 570"
# Clock rate: 1.59 GHz
# Total amount of global memory: 1309212672 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Device 1: "GeForce GTX 570"
# Clock rate: 1.59 GHz
# Total amount of global memory: 1309212672 bytes
# Number of multiprocessors: 15
# Number of cores: 120
MDIO: cannot open file "restart.coor"
SWAN: FATAL : swanMemcpyDtoH failed

Assertion failed: 0, file swanlib_nv.c, line 390

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22313 - Posted: 21 Oct 2011 | 9:39:27 UTC
Last modified: 21 Oct 2011 | 9:43:07 UTC

Krunchin-Keith, Are these the GTX570 + ATI card machines? I know at one point nvidia put something into their drivers to disable the ATI driver.

Can you try them as single GTX570 card machines and see how they go? Maybe remove ATI card, uninstall ATI driver, run driver sweeper (or equivilent) and then do a clean install of the nvidia driver before trying GPUgrid again.

Make sure you didn't install BOINC in PAE (protected application execution also known as service) mode. Win 7 and Vista don't allow the graphics drivers to be referenced by a service.

Lastly you did install the x64 drivers didn't you? The BOINC version doesn't matter so much, but the drivers may.
____________
BOINC blog

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22315 - Posted: 21 Oct 2011 | 17:09:29 UTC - in response to Message 22313.

Krunchin-Keith, Are these the GTX570 + ATI card machines? I know at one point nvidia put something into their drivers to disable the ATI driver.


Don't quite understand that, the ATI runs fine and the two GTX570s run fine with otehr projects, both together.


Can you try them as single GTX570 card machines and see how they go? Maybe remove ATI card, uninstall ATI driver, run driver sweeper (or equivilent) and then do a clean install of the nvidia driver before trying GPUgrid again.



Not at this time. I ahve run them one ati and 1 gtx570 but i forget the results, i was working on another problem at the time.

I also have a temamate running pretty much same configuration and models of GPU's with windows 7 something, but I don't know all the other details of his system, and he does do gpugrid ok.

I can't pull them apart now but at some point in the spring I plan to change them to liquid kooling where i have to pull everything out. I may try more tests then with single gpu if I cannot find any other soultion.

Since i have plenty of other projects that run fine, GPUgrid just has to suffer with not much work output from me.




Make sure you didn't install BOINC in PAE (protected application execution also known as service) mode. Win 7 and Vista don't allow the graphics drivers to be referenced by a service.

Never do that, i've been doing this 7 years know, i know about that.



Lastly you did install the x64 drivers didn't you? The BOINC version doesn't matter so much, but the drivers may.

Yes, all drivers are x64, no question, i check carefully when downloading.


Would be nice to know what this application error means ?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22322 - Posted: 21 Oct 2011 | 23:02:25 UTC - in response to Message 22315.

Both ATI and NVidia cards can dynamically allocate themselves system memory. If this overlaps (and no-doubt ATI and NVidia did not sit down to make sure this doesn't happen), one card could be corrupting the data on the other.

Did you try disabling the ATI/AMD GPU, rebooting and running GPUGrid tasks?

Have not tried this under W7, but you can setup hardware profiles (that you can select from during boot).

In the spring when you are rebuilding your systems, you might want to do a clean install of the OS and use Comodo Programs Manager - its a tool that tracks installations and allows you to Fully uninstall. Very useful with video drivers.

GL

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22495 - Posted: 10 Nov 2011 | 12:19:12 UTC

I'm back !!!

After many trials I think I have found a solution to this.

Since these two systems were new builds, I had not played with any rom settings yet, leaving the factor set. After lots of failures while doing something else unrelated yesterday, I decided to look into rom settings on one unit. Almost all the voltage type settings were set to "auto". I decided to change all these to "Standard" or "Normal" to keep them stable and from using any of the level 1 or level 2 savings levels and be sure none were overclocked. I also found one other thing, the PCIE Frequency on #2 was set at 95MHz and it said 100MHz was the standard, so I changed that too. Later I found #1 was on Auto, so I set all it the same to normal on voltages and the PCIE Frequency to 100MHz. This may explain why one system, in the past, trashed more work than the other, being the frequency might have been more off standard than the other.

After reboots and waiting for boinc to load, I re-enabled GPUgrid on #2, #1 was still active but still had only been completing 1 out of 12, all the others errored within those first 12 seconds.

Both systems have the one ATI HD6970 GPU and 2 GTX570 GPU's.

#2 got 4 tasks and began to run 2 on the GTX570's. A surprise to me, both passed the 12 second mark and kept going. after time 1 finished sucessfully and a 3rd started also getting past 12 seconds.

by bedtime #1 had retrieved work and was running 2 long runs having passed the error point.

So as of this morning BOTH systems have had NO Errors on GPUgrid.

#1 has completed (valid and credtied) 2 long runs and running 2 more.

#2 has completed (valid and credtied) 4 short and 1 long and is running 2 more long.

This represents the longest consecutive number run at once WITHOUT error for either host. Long runs are taking 6.3 to 7.3 hours.

It is not a driver conflict between ati and nvidia, nor memory problem, nor gpu overclocking issue. I'm still at gpu standard speeds for these factory overclocked models, but even when i undeclocked them 10% that did not help. I did rarely have 1 run and complete while at the standrd stock overclock speed.

Different boinc versions had no change during testing. Even fresh install. Trying one project only had no change even on several boinc versions, so it is not a project conflict.

I'm now on ATI catalyst 11.9 and NVIDIA 285.62 but during all this I had many other combinations starting at about 11.4 or 270.XX. Those never made such as difference as the last change to voltages/frequency.

I will keep monitoring this to see if problem returns and it will be a while before I go back into rom and make any changes, so i'm not 100% sure whether it was a particular voltage for cpu,memory,gpu or the PCIE frequency.

At least for now i'm back in business and producing valid work here at one of my favorite valued projects.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22497 - Posted: 10 Nov 2011 | 14:42:10 UTC - in response to Message 22495.
Last modified: 10 Nov 2011 | 14:43:40 UTC

PCIe does not tolerate frequency change (It's running at 100MHz). Do not over/underclock the PCIe bus. It will decrease the stability of the PCIe bus, which is crucial for GPUGrid tasks running for hours. I've posted similar advice here.
Unlike CPU or GPU core / shader clocks, the PCIe bus has a strict norm of its speed. Even lowering PCIe clock will make it unstable. It's quite misleading, that some BIOSes allow to change the PCIe clock. It's very dangerous to do that.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 22499 - Posted: 10 Nov 2011 | 17:21:03 UTC - in response to Message 22497.
Last modified: 10 Nov 2011 | 17:21:45 UTC

That's a nice find, KKeith. Maybe it should be added to the FAQs. Thanks for reporting.

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22500 - Posted: 10 Nov 2011 | 17:38:40 UTC - in response to Message 22497.

PCIe does not tolerate frequency change (It's running at 100MHz). Do not over/underclock the PCIe bus. It will decrease the stability of the PCIe bus, which is crucial for GPUGrid tasks running for hours. I've posted similar advice here.
Unlike CPU or GPU core / shader clocks, the PCIe bus has a strict norm of its speed. Even lowering PCIe clock will make it unstable. It's quite misleading, that some BIOSes allow to change the PCIe clock. It's very dangerous to do that.

I did not originally do any under/overclocking on the PCIe bus or anything in the ROM not even any voltage changes or tweeking, those settings I found yesterday were how it was set when I got the mobo's. I set them back to the 100MHz that the rom said was the standard. I have no idea why one was set to 95MHz and the other was on Auto, it was not done by me, or anyone at my house as notone else has access to my systems. Now both are set at 100MHz.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22504 - Posted: 11 Nov 2011 | 0:07:06 UTC - in response to Message 22500.

I did not originally do any under/overclocking on the PCIe bus or anything in the ROM not even any voltage changes or tweeking, those settings I found yesterday were how it was set when I got the mobo's.

I believe you. Based on the PCIe standard, this is quite strange, that you got the mobo with those settings.

I set them back to the 100MHz that the rom said was the standard. I have no idea why one was set to 95MHz and the other was on Auto, it was not done by me, or anyone at my house as notone else has access to my systems. Now both are set at 100MHz.

You've done it right. Maybe someone tampered with the mobo's BIOS settings before you got it.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 22506 - Posted: 11 Nov 2011 | 9:19:33 UTC - in response to Message 22504.

The Bios can link FSB and PCIE frequencies, but this is not normally the default setting! Even on like boards the FSB is not always the same, but a 5% difference is quite high (+/- 1% would be normal).

I've only seen such large changes on boards that struggle to support the CPU, and these tend to downclock. Sometimes a Bios update can rectify this problem.

If this is not the case then perhaps the GPU or Operating System was influencing the PCIE frequency on the board, but that would be a new one to me. More likely the board was not new, and someone had a play with it before you got it, or the Bios just lost configuration (seen more often on routers, but can happen on plenum boards).
When building systems it's always a good idea to physically reset the bios to factory defaults, before configuring. I always check and configure the Bios before loading an OS.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Graphics cards (GPUs) : All tasks erroring out

//