Advanced search

Message boards : Graphics cards (GPUs) : Quadro 1700 Discrepancy

Author Message
Tixx
Send message
Joined: 15 Jan 09
Posts: 7
Credit: 1,766,054
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 7654 - Posted: 20 Mar 2009 | 0:32:33 UTC

Hi all,

Ive got a perculiar issue here. Im running 4 x quadro 1700 card on 4 x almost identical systems. yet only 2 of my boxes regularily report gg wus on time, while the other two always fail to finish by the deadline. all 4 cards are identical in manufacture/make. cpus are all core2duos (some faster then others) and same amount of ram. im also using the same nvidia drivers (latest) and boinc 6.6.15. Any reason why this might be the case?

Cheers

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 7655 - Posted: 20 Mar 2009 | 1:42:43 UTC - in response to Message 7654.

Everything looks identical except in the OS. All the machines with these cards are XP 32-bit, but there is one difference with the cards that aren't working. They list the OS as:

Microsoft Windows XP
Professional x86 Edition, Service Pack 3, v.3311, (05.01.2600.00)

The v.3311 portion is not in the OS listing for machines with cards that are working. Indeed, these cards are doing the same work about 20 hours faster or more than the cards in machines with the v.3311 addition. This is the only thing that I can see as different and that might be causing some slowdown in the cards. What is the difference in the XP installs?




Profile Bender10
Avatar
Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 7658 - Posted: 20 Mar 2009 | 2:18:06 UTC - in response to Message 7655.

Microsoft Windows XP
Professional x86 Edition, Service Pack 3, v.3311, (05.01.2600.00)


This looks like a new build or Release Candidate for XP SP3...maybe..

Did you just patch your XP boxes..?

____________


Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.

Tixx
Send message
Joined: 15 Jan 09
Posts: 7
Credit: 1,766,054
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 7659 - Posted: 20 Mar 2009 | 2:56:17 UTC - in response to Message 7658.

good pickup fellas, totally forgot to check the service pack numbers.
never thought they would make a 20hr difference per wu crunch.

I will update the ones that have the RC version of SP3 to the new one and see what happens.

Cheers

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 47,698,744
RAC: 118,540
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 7660 - Posted: 20 Mar 2009 | 3:34:47 UTC - in response to Message 7659.

Another thing to check is what those computers are doing ('real' work, not BOINC).

While the configurations might be the same, if some boxes are heavily loaded and others aren't, that will drastically affect the run time since BOINC tasks run at the lowest priority. Since the CPU needs to funnel data to the GPU, if the GPU isn't running GG because something else is running instead, the GPU will get starved for work, resulting in longer run times.

Mike

Tixx
Send message
Joined: 15 Jan 09
Posts: 7
Credit: 1,766,054
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 7661 - Posted: 20 Mar 2009 | 3:56:53 UTC - in response to Message 7660.

well all 4 are running the same boinc projects, and nothing else. ie. dedicated crunchers.

another peculiar thing is that the 2 comps that are finishing wus on time are using 0.03% cpu + 1 cuda, whereas the 2 slow comps are using 0.02% cpu + 1 cuda.

Any way to force the two slow ones to use the extra 0.01 cpu??


ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7671 - Posted: 20 Mar 2009 | 20:12:21 UTC - in response to Message 7661.

another peculiar thing is that the 2 comps that are finishing wus on time are using 0.03% cpu + 1 cuda, whereas the 2 slow comps are using 0.02% cpu + 1 cuda.


Not sure why the numbers would be different, but they should not make any difference, i.e. they don't affect how the BOINC client runs the apps. Interesting: on my machine with 6.5.0 it shows 0.14 CPU.

MrS
____________
Scanning for our furry friends since Jan 2002

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7687 - Posted: 21 Mar 2009 | 8:32:30 UTC

Can you check with cpuz what sort of cpu they have and especially look at the optimisation codes like mmx,sse,sse2,sse3 and ssse types the more the faster.
I get the feeling that even though huge amounts are calculated on the gpu some parts still are done by the cpu before being fed to the gpu.
So those codes can make a huge difference thats my 2 cents :D

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7701 - Posted: 21 Mar 2009 | 12:37:00 UTC - in response to Message 7687.

I get the feeling that even though huge amounts are calculated on the gpu some parts still are done by the cpu before being fed to the gpu.
So those codes can make a huge difference thats my 2 cents :D


These instruction set extensions are no magic bullets. You'd need to carefully hand-optimize and vectorize your code, something compilers are not very good at and which is not always possible. Or use libraries where other people already did that job for you.

Anyway, the cpu part of GPU-Grid has so far been very insensitive to CPU speed. And his cards are not very fast (-> need less CPU), plus his CPUs are fast: they're all C2Ds with at least SSE3.

Thinking outside the box is nice, but I'd be very surprised if you found the right direction here ;)
It looks more like a software problem, maybe some debug options were enabled in that pre-release SP3.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 7745 - Posted: 22 Mar 2009 | 13:26:43 UTC - in response to Message 7654.
Last modified: 22 Mar 2009 | 14:01:25 UTC

Hi all,

Ive got a perculiar issue here. Im running 4 x quadro 1700 card on 4 x almost identical systems. yet only 2 of my boxes regularily report gg wus on time, while the other two always fail to finish by the deadline. all 4 cards are identical in manufacture/make. cpus are all core2duos (some faster then others) and same amount of ram. im also using the same nvidia drivers (latest) and boinc 6.6.15. Any reason why this might be the case?
Cheers


The reason for the difference is you dont have identical cpus, they are markedly different. There are two Core2 Duo E Series and 2 Core2 6400's

Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz [x86 Family 6 Model 15 Stepping 11] 5640.97 million ops/sec
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [x86 Family 6 Model 23 Stepping 6] 6804.65 million ops/sec
Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] 4690.81 million ops/sec
Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 2] 4623.15 million ops/sec

The integer processing speed is the key. If you look at the slower results, they are from the slower interger rated boxes, which is "normal" in that the interger processing is the key part of Crunching.

They also dont have the same RAM - two have 3Gb and two have 2Gb

Regards
Zy

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 7746 - Posted: 22 Mar 2009 | 13:56:24 UTC - in response to Message 7745.
Last modified: 22 Mar 2009 | 14:11:45 UTC

Looked a bit closer at the results. The slower boxes as expected do crunch slower, they do however complete in time. Looking across all four boxes, there are a large amount of User Aborts after a few hours / after 24 hours, particularly the faster machines, paradoxically.

What issues have you faced that caused the User Aborts well before the 4 day completion time?

Another thought - have you been overclocking any of them? Particularly the two faster boxes.

There are a significant number of compute errors well before the 4 day completion time, are they used for other tasks that have - maybe - crashed, taking with them the GPU WU?

You can get compute errors when the BOINC Manager is closed down before stopping the local host link (eg closing down or rebooting the PC without properly closing BOINC/Models) - what version of BOINC Manager do you use?

Regards
Zy

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7748 - Posted: 22 Mar 2009 | 14:15:07 UTC - in response to Message 7745.
Last modified: 22 Mar 2009 | 14:17:17 UTC

The reason for the difference is you dont have identical cpus, they are markedly different.

...

The integer processing speed is the key.
... in that the interger processing is the key part of Crunching.


How do you know that? Sorry, but this sounds just plain wrong. Even on GTX 280-class cards the CPU-speed does not matter (within sane limits) and these guys need the CPU every ~25 ms. His cards have times per step of 400 - 500 ms, that's one CPU-interaction every half second!

They also dont have the same RAM - two have 3Gb and two have 2Gb


The amount of memory is only important if you run out of it. On my machine the acemd-task uses 24 MB of main mem. How likely is it that these 24 MB are just too much for a machine with 3 GBs and windows can not swap anything else to disk?

Just let the op sort out this software question first, i.e. install the proper SP3.

Edit: as Zydor just mentioned clock speeds.. I've seen 400 MHz shader clock in one result instead of the usual 900 MHz. That would cause a significant slow-down.

MrS
____________
Scanning for our furry friends since Jan 2002

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 7752 - Posted: 22 Mar 2009 | 17:46:32 UTC

There appears to be some issue with NVIDIA display drivers and XP's RC v.3311. see here.


Post to thread

Message boards : Graphics cards (GPUs) : Quadro 1700 Discrepancy

//