Advanced search

Message boards : Graphics cards (GPUs) : 2 GTX295's installed

Author Message
Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5667 - Posted: 16 Jan 2009 | 9:16:57 UTC

I have two nVidia GTX295 cards installed in my machine.

BOINC/GPUGRID are only using two of the cores. I'm on Vista x64 and have four monitors attached and the desktop extended onto all four. I have SLI and PhysX disabled, I removed the bridge between the cards and still the same.

My machine is a 4.2GHz water cooled QX9770 processor, 4GB 1800MHz RAM, 1TB disk space.

Do instructions exist for getting 4 cores to work in 9800GX2's or GTX295's?

Thanks!

Jason

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1947
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5668 - Posted: 16 Jan 2009 | 9:43:49 UTC - in response to Message 5667.

Please complete one WU at least, so that we can have some info from the application.

gdf

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5680 - Posted: 16 Jan 2009 | 18:50:16 UTC - in response to Message 5668.

Some completed now it appears.

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5681 - Posted: 16 Jan 2009 | 19:12:01 UTC - in response to Message 5680.

........ wow Rapture - just ordered - so I'm a few days behind you. Are you pleased with the configuration?

P.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1947
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5682 - Posted: 16 Jan 2009 | 19:30:54 UTC - in response to Message 5681.

The application is seeing only one card.
Do you have SLI enabled? In this case disable it.

gdf

Profile Clooney
Send message
Joined: 27 Oct 07
Posts: 4
Credit: 1,193,734
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5685 - Posted: 16 Jan 2009 | 20:25:57 UTC - in response to Message 5682.

Unable SLI - than it works

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5686 - Posted: 16 Jan 2009 | 20:30:30 UTC

Well, he said SLI was disabled. Sounds similar to this issue, which has something to do with new 181.xx drivers apparently breaking something.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5689 - Posted: 16 Jan 2009 | 22:18:41 UTC

Or he booted up, disabled SLI, but did not restart BOINC ...

BOINC would see the post boot SLI mode and not know that he deselected SLI ...
____________

Donnie
Send message
Joined: 13 Nov 08
Posts: 11
Credit: 11,185,470
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 5691 - Posted: 16 Jan 2009 | 22:45:55 UTC

I also have a machine that operates on Vista 64 and on every restart, Windows never recognizes the 2 graphic cards I have in the machine.

I set the option to prevent BOINC from starting on boot so I can go fiddle with the NVIDIA & desktop settings. In the NVIDIA control panel I have to disable the PhysX capability, then go to the desktop display settings and extend the monitor into monitor 3.

I believe that if the PhysX capability isn't disabled, the settings won't hold (extend into monitor X). After I have the monitor extended, I then go back into NVIDIA control panel and enable PhysX then select multiple monitors.

NVIDIA then recognizes both monitors and I start BOINC which will indicate
GPU(2) and supply the work for both cards.

I believe the last driver update I downloaded from NVIDIA was 180.84. Don't know if this helps, but this is the only 64 bit OS machine I have with dual cards.

My 32 bit Windows OS has no problem detecting both cards on boot up.

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5697 - Posted: 17 Jan 2009 | 2:59:19 UTC - in response to Message 5691.
Last modified: 17 Jan 2009 | 3:00:35 UTC

SLI is disabled, PhysX is disabled. I tried without the SLI strap installed, rolling back to 180.87 drivers, removing the four GTX295's from the Device Manager and letting Windows Vista x64 reinstall them and then reinstalling the drivers... Still only two GPU's. Apparently the first GPU in each card is being recognized for CUDA applications.

Windows sees four devices, Quad-SLI works fine for gaming...

I can remove one of the cards and Windows sees two GPU's and instead of Quad-SLI in the nVidia Control Panel I get "Multi-GPU Mode".

Windows sees the GPU's but I can't get BOINC *or* Folding at Home to see the second two GPU's.

jaf

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5698 - Posted: 17 Jan 2009 | 3:15:03 UTC - in response to Message 5697.
Last modified: 17 Jan 2009 | 3:22:42 UTC

SLI is disabled, PhysX is disabled. I tried without the SLI strap installed, rolling back to 180.87 drivers, removing the four GTX295's from the Device Manager and letting Windows Vista x64 reinstall them and then reinstalling the drivers... Still only two GPU's. Apparently the first GPU in each card is being recognized for CUDA applications.

Windows sees four devices, Quad-SLI works fine for gaming...

I can remove one of the cards and Windows sees two GPU's and instead of Quad-SLI in the nVidia Control Panel I get "Multi-GPU Mode".

Windows sees the GPU's but I can't get BOINC *or* Folding at Home to see the second two GPU's.

jaf


Yea!

He found a bug ...

Um, sorry ...

It sure looks like a bug to me ... now, the question is where?

{edit - add}
I will post on the BOINC Dev ... ETA, could you see if you can get the devs here to to prod UCB ... like as not they won't say anything to my post ...

{Add}
Sent this:

We seem to have a situation with the new multi-processor cards, especially when you have two (and probably for more than two) cards installed. The cards in question are the new GTX295 with two GPU processors ...

As reported on the GPU Grid forums:

==========
SLI is disabled, PhysX is disabled. I tried without the SLI strap installed, rolling back to 180.87 drivers, removing the four GTX295's from the Device Manager and letting Windows Vista x64 reinstall them and then reinstalling the drivers... Still only two GPU's. Apparently the first GPU in each card is being recognized for CUDA applications.

Windows sees four devices, Quad-SLI works fine for gaming...

I can remove one of the cards and Windows sees two GPU's and instead of Quad-SLI in the nVidia Control Panel I get "Multi-GPU Mode".

Windows sees the GPU's but I can't get BOINC *or* Folding at Home to see the second two GPU's.
SLI is disabled, PhysX is disabled. I tried without the SLI strap installed, rolling back to 180.87 drivers, removing the four GTX295's from the Device Manager and letting Windows Vista x64 reinstall them and then reinstalling the drivers... Still only two GPU's. Apparently the first GPU in each card is being recognized for CUDA applications.

Windows sees four devices, Quad-SLI works fine for gaming...

I can remove one of the cards and Windows sees two GPU's and instead of Quad-SLI in the nVidia Control Panel I get "Multi-GPU Mode".

Windows sees the GPU's but I can't get BOINC *or* Folding at Home to see the second two GPU's.
==========

Because this is issue seems to affect both BOINC *AND* Folding, it may be with the CUDA drivers instead of BOINC ... but, heads up guys this needs to be looked at ...
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5699 - Posted: 17 Jan 2009 | 3:23:49 UTC

ON reflection , since it affects both BOINC and Folding, it is, as I said, more likely to be an issue with the CUDA drivers ... are you sure you have the latest?
____________

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5700 - Posted: 17 Jan 2009 | 4:22:32 UTC - in response to Message 5699.

Paul,

I appreciate the help, I actually have three GTX295 cards and if I can get two to work I'm going to try and get the third fired up.

What's really weird is that if I put just a single one of my GTX295 boards in my machine, neither BOINC nor F@H will use the second core. F@H will try but it always immediately returns with UNSTABLE_MACHINE.

I've tried every nVidia driver version that supports the GTX295 that is publically available:

180.87 (first known GTX295 driver)
181.20 (current WHQL)
181.22 (beta made available tonight (1/16/09))
185.20 (leaked beta available on guru3d.com)

All of these drivers perform the same for me on my EVGA/nVidia 790i Ultra motherboard.

I've installed Windows 7 BETA on a second partition with 181.20 WHQL drivers. Same behavior.

I have spent hours in my BIOS playing with various settings and slowing my system way down just to be sure. No joy.

I'm not a CUDA dev so I can't know for sure, but everything seems to point at CUDA. However.. I've read forum posts from others have been able to get multiple GPU's working with F@H; some with EVGA boards like mine and some with other vendors' cards. That's why I spent so much time in my BIOS.

It's not a power problem. This machine has two supplies, a 650W Antec for my case fans and water pumps and a 1,200W Thermaltake Toughpower for the motherboard & video cards. Prior to the GTX295's I had water cooled tri-sli GTX280's in this machine that ran fine under full F@H load. Now I can't get a single GTX295 to fold on its second GPU for even a split second.

jaf

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5701 - Posted: 17 Jan 2009 | 4:54:06 UTC

To my suspicious mind the key is that the system seems to recognize up to TWO processors ...

Why not four I don't know ...

BUt, it does if you only have one card.

Why NOT try the third card just for grins and see if it sees it at all?

It might surprise us and see all 6 processors then ... can't see 4, but can see 6 may be another clue (if it happens) ...

If it does not appear to BOINC THEN, well, at least we have another clue.

I doubt if it is settings for the simple reason that the OS and the tools see the two CPUs on the GPU cards. That is why I smell trouble with the drivers ... but who the heck knows ... you could send the cards to me to try ... and I am sure I would get around to sending them back in a year or two ... :)
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5702 - Posted: 17 Jan 2009 | 4:59:09 UTC

exchange twixt me and Rom:


Rom,

If the guy has only one card installed it is seen as two processors ...

I he installs the second, only one processor on each card is seen ...

SLI is not installed ...

Windows reports 4 CPUs on the two GPUs ...

The thread we are discussing this on is: http://www.gpugrid.net/forum_thread.php?id=653

Should you want to join in there ...


On Jan 16, 2009, at 8:01 PM, Rom Walton wrote:

Paul, thanks for the bug report.

To clarify, the GTX295 only reports itself as a single CUDA device, and you and others expect it to report itself as two CUDA devices.

Something is tickling the back of my brain that says that could be expected behavior if the internal stream/multiprocessor count doubles between the single processor vs. multiprocessor cards. Since there is already an internal SLI connection between the two graphics processors on a single card.

I would be interested to see what the coprocessor xml structure reported by BOINC reports for Both the single processor version vs. the multiprocessor version of the same chip.

I'm not in front of my computer at the moment, i'm on my cell phone. So I can't look up the source to see the excact field name.

----- Rom

____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5703 - Posted: 17 Jan 2009 | 5:04:02 UTC
Last modified: 17 Jan 2009 | 5:07:38 UTC

One more depressing thought ... bad card ... you said you have three?

Try all three one at a time ... make sure they all get seen as we expect.

Card A, then Card B, Then card C

Try pairs to see if any of the three pairs show up as expected,

AB
AC
BC

Lastly, try all three (as: ABC) to see how they show up ...

We got feedback from Rom and he is one of the good guys (Paul's opinion) so, maybe he can get us somewhere ...

{edited for clarity}

{add}
Are you trying to run Folding or BOINC with GPU Grid?

Now I am confused when I review earlier posts ...
____________

Rom Walton (BOINC)
Send message
Joined: 17 Jan 09
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 5704 - Posted: 17 Jan 2009 | 6:07:45 UTC

Could you post your <coprocs> xml block for each of the configurations you have listed? It can be found in your sched_request*.xml file that is sent to GPUGRID.

It is what information the CUDA system returns to BOINC during detection.

It should look something like this:


<coprocs>
<coproc_cuda>
<count>1</count>
<name>GeForce 9800 GTX/9800 GTX+</name>
<req_secs>0.000000</req_secs>
<req_instances>0</req_instances>
<totalGlobalMem>536870912</totalGlobalMem>
<sharedMemPerBlock>16384</sharedMemPerBlock>
<regsPerBlock>8192</regsPerBlock>
<warpSize>32</warpSize>
<memPitch>262144</memPitch>
<maxThreadsPerBlock>512</maxThreadsPerBlock>
<maxThreadsDim>512 512 64</maxThreadsDim>
<maxGridSize>65535 65535 1</maxGridSize>
<totalConstMem>65536</totalConstMem>
<major>1</major>
<minor>1</minor>
<clockRate>1688000</clockRate>
<textureAlignment>256</textureAlignment>
<deviceOverlap>0</deviceOverlap>
<multiProcessorCount>16</multiProcessorCount>
</coproc_cuda>
</coprocs>

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5705 - Posted: 17 Jan 2009 | 7:09:46 UTC - in response to Message 5704.

Could you post your <coprocs> xml block for each of the configurations you have listed? It can be found in your sched_request*.xml file that is sent to GPUGRID.

It is what information the CUDA system returns to BOINC during detection.
....


*I* would like to see it with one card, then with two ... and three while I am asking for the moon ...

____________

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5708 - Posted: 17 Jan 2009 | 7:30:07 UTC - in response to Message 5704.

Created a new partition and installed Windows XP Professional x32 and fully patched up to SP3.

Boom. BOINC/GPUGRID worked the first time with one card installed, I haven't stuck a second one in yet, but I'm betting it'll work.

This time when I booted BOINC the message was this:

1/17/2009 1:04:06 AM||CUDA devices found
1/17/2009 1:04:06 AM||Coprocessor: GeForce GTX 295 (2)

It's the first time it's ever said "GeForce GTX 295 (2)", always before it did not have the " (2)".

Embarrassingly enough, I cannot find the work files for GPUGRID. I found BOINC, but where are your files for GPUGRID?

Sticking a second card in now.

jaf

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5709 - Posted: 17 Jan 2009 | 7:39:51 UTC - in response to Message 5708.

Yep it worked:

1/17/2009 1:38:17 AM||CUDA devices found
1/17/2009 1:38:17 AM||Coprocessor: GeForce GTX 295 (4)

putting the third card in now. ;)

jaf

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5711 - Posted: 17 Jan 2009 | 9:05:22 UTC - in response to Message 5709.

Argh, I damaged the third card. In all the handling I lightly scratched the back of the card and knocked off a tiny little resistor or capacitor (it's so small I can't tell which it is).

I'll ship it in for an RMA, but the 6 GPU test will have to wait.

jaf

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5716 - Posted: 17 Jan 2009 | 11:05:55 UTC

Oh dear, may it rest in peace *taking hat off*

Regarding the problem: I think this should be reported to nVidia, as it really looks like their newer drivers for Vista(+) broke something. Or maybe it's by design and BOINC has to adopt to it, but then we'd have to know about this.

Further supporting this theory is this thread, where the OP uses Vista and was crunching along happily on both GPUs of his 9800GX2 with driver 180.48 until he installed 181.20. Since then he can not get BOINC to detect both devices any more.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5720 - Posted: 17 Jan 2009 | 13:23:32 UTC

Rom, should swing by in the morning, but if he does not I will pop him a reminder e-mail that we may have isolated the issue and ask him to look here ... but probably won't have too ... he good guy ... :)

Sadly we had the one death in the family ...

But, you cannot make eggs without hatching a few omelets ... or something like that ...
____________

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5732 - Posted: 17 Jan 2009 | 17:13:57 UTC - in response to Message 5720.

All four GPU's have been cranking along merrily overnight. Worst core temp is 81'C, which is hot for sure but I can tell from the sound that the fans are not at 100% yet.

I now have a very good test bed to help isolate and debug this problem and more when my third card is swapped out. If you guys would like me to help in some way just let me know. I have driver 181.20 working swimmingly on four GPU's in Windows XP Pro x32, and on the same machine a different partition that fails in Windows Vista x64.

Interestingly I did a clean install of Windows 7 Beta x64 the other night and installed the 64-bit 181.20 WHQL driver and Windows 7 behaved exactly the same as my Vista intsall. This leads me to believe that a clean install of Vista x64 would not resolve the problem.

My current hypothesis is an x64-specific problem in CUDA with detecting second GPU's in multi-GPU slot cards.

jaf

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5735 - Posted: 17 Jan 2009 | 18:49:58 UTC - in response to Message 5732.

My current hypothesis is an x64-specific problem in CUDA with detecting second GPU's in multi-GPU slot cards.


Yea! We think the same ...

Never did like tham thar 64-bit wangdoodles ...
____________

Rapture
Send message
Joined: 16 Jan 09
Posts: 9
Credit: 99,593
RAC: 0
Level

Scientific publications
watwat
Message 5769 - Posted: 18 Jan 2009 | 22:27:19 UTC - in response to Message 5735.

A couple of days ago I sent in a support request to EVGA. At the time I was thinking that perhaps the BIOS in the EVGA cards was at fault. Here's my original message:

I purchased three EVGA GTX295 Plus cards for the purposes of gaming and running Folding@Home's GPU client. While the cards work perfectly well for gaming, I'm finding it impossible to get the F@H client to run on the second GPU of the cards despite a tremendous effort debugging the problem (found here: viewtopic.php?f=52&t=7874). Others seem to be having similar difficulties with the EVGA boards, but there's at least one guy with XFX boards that is having no problems. The most noticable difference is the slight overclock of the EVGA boards and the fact that the XFX BIOS numbers end in .72 and .73 while my EVGA cards end in .90 and .92. Some detail can be found here: viewtopic.php?f=52&t=7936.

I realize that this is a very esoteric issue, but is there a possibility that I can be notified somehow when EVGA updates the GTX295 Plus's BIOS? Thanks for any help you guys can offer and thanks for really great products and service.


Here is their response:

I’ve received an update from my PM team regarding folding at home:

- NVIDIA is aware of the issue and is working on it.
- XFX will suffer this problem as well, it is a driver bug.

Once nvidia notifies us of an update will make sure our folding team is informed immediately. Unfortunately we do not have a ETA either on such drivers.

Post to thread

Message boards : Graphics cards (GPUs) : 2 GTX295's installed

//