Advanced search

Message boards : Graphics cards (GPUs) : New application version

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5674 - Posted: 16 Jan 2009 | 16:25:16 UTC

We are having problems with the Windows application which seems to fail on a certain type of workunits.
So, we have also delayed the Linux application to let them out together. Probably by the weekend.

Sorry for the delay.

gdf

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5675 - Posted: 16 Jan 2009 | 16:31:58 UTC

Thanks for the update. Will this version reduce the Windows cpu usage?

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5677 - Posted: 16 Jan 2009 | 16:45:56 UTC - in response to Message 5675.

No, it will just increase process priority as in this version there are already enough other changes.
If all works, the reduced cpu version will follow in two days.

gdf

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5678 - Posted: 16 Jan 2009 | 17:10:09 UTC - in response to Message 5677.

No, it will just increase process priority as in this version there are already enough other changes.
If all works, the reduced cpu version will follow in two days.

gdf


That's certainly fine, thanks.

Profile DoctorNow
Avatar
Send message
Joined: 18 Aug 07
Posts: 83
Credit: 122,995,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5679 - Posted: 16 Jan 2009 | 18:15:15 UTC - in response to Message 5677.

If all works, the reduced cpu version will follow in two days.

Yes, this is what I'm waiting for.
I hope it will be much better than the current one, it's a mess on a slow card like mine. ;-)
____________
Member of BOINC@Heidelberg and ATA!

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5684 - Posted: 16 Jan 2009 | 20:09:16 UTC - in response to Message 5679.

If all works, the reduced cpu version will follow in two days.

Yes, this is what I'm waiting for.
I hope it will be much better than the current one, it's a mess on a slow card like mine. ;-)


WOn't change a ting for the slow cards ...

It will only reduce the laod on the CPU allowing more production towards CPU intense projects if you are tunning them.

So, for example, my productivity on the i7 and quad should rise some because I run n+1 and with the extra load on the processor supporting GPU grid I "lose" some capacity for other projects ...
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5687 - Posted: 16 Jan 2009 | 20:35:01 UTC - in response to Message 5684.

I think DrNow means that the high cpu load especially hurts on his slow card, as the "well, you're getting enough credits from your GPU to compensate" doesn't work as well as with e.g. a GTX 260.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile DoctorNow
Avatar
Send message
Joined: 18 Aug 07
Posts: 83
Credit: 122,995,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5706 - Posted: 17 Jan 2009 | 7:19:23 UTC - in response to Message 5684.
Last modified: 17 Jan 2009 | 7:21:47 UTC

WOn't change a ting for the slow cards ...

You would wonder...
Having a 9600GT on an X2 and the same time an app with high CPU load reduces the capability to crunch on other WUs a lot!
I give you some examples from my WU-list:
This was the last one with app 6.45 and look at the CPU-load, not even 5 minutes!!!
With this app I made the most credits because with the 2+1 setting the GPU almost crunched on it's own, it worked bravely. :-)
6.48 already made the CPU-load higher and it recuded the crunching overall massively for me, the setting 2+1 didn't worked that good anymore, was the same like not to use it.
The CPU-load with 6.52 was better, but still too high.
And with 6.55 it didn't changed that much, but due to my problem with the "unspecified launch failure" I had to recude the clock rate and I guess, this influences the CPU-load as well a bit.
Only positive thing with the higher CPU-load I see in the GPU temperature, it isn't that high as earlier anymore.
I'm thrilled how the new app will work... ;-)
____________
Member of BOINC@Heidelberg and ATA!

Profile Edboard
Avatar
Send message
Joined: 24 Sep 08
Posts: 72
Credit: 12,410,275
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5737 - Posted: 17 Jan 2009 | 22:18:10 UTC

Yesterday I made my first WU aplication version 6.60 and it made less use of the GPU (the temperature is 7ยบ lower than the same GPU processing a v.6.55 WU) and the points per hour generated were ~380 while with the v6.55 there are ~550 (so with older WUs I get ~50% more points).

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 5743 - Posted: 18 Jan 2009 | 8:46:19 UTC

Hmmm... Still get a 6.55 WU...

Profile Clooney
Send message
Joined: 27 Oct 07
Posts: 4
Credit: 1,193,734
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5746 - Posted: 18 Jan 2009 | 13:58:13 UTC - in response to Message 5743.

only get the 6.55 too

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5749 - Posted: 18 Jan 2009 | 15:27:02 UTC - in response to Message 5746.

only get the 6.55 too


I don't think it is a released application. Which is a shame as I am eagerly awaiting the upgrade so that I can get higher GPU production on my other projects. The shame of it for me is that the GPU Grid connected machines are my number 1 and 3 producers ...

But, I can wait ... I guess ... sigh ... :)
____________

Profile Edboard
Avatar
Send message
Joined: 24 Sep 08
Posts: 72
Credit: 12,410,275
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5752 - Posted: 18 Jan 2009 | 15:44:29 UTC

I'm now crunching my second v6.60 WU. The other three WUs in cache are v6.55. This one seems that it will last like the one I already done (~8.5 hours in a OC gtx280). If I switch to a v6.55 suspending the v6.60 being crunched, in five minutes the GPU goes from 58ยบ to 65ยบ with the fan fixed to 70%.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5767 - Posted: 18 Jan 2009 | 21:22:34 UTC - in response to Message 5752.

I'm now crunching my second v6.60 WU. The other three WUs in cache are v6.55. This one seems that it will last like the one I already done (~8.5 hours in a OC gtx280). If I switch to a v6.55 suspending the v6.60 being crunched, in five minutes the GPU goes from 58ยบ to 65ยบ with the fan fixed to 70%.


Well, someone is lucky ... all I am getting are 6.55 tasks ... :)

Maybe we have to clear out the queue of the tasks generated with 6.55?

Or just doing a limited run test to see if it is working?

I guess I have to wait, sniff, sniff ...
____________

Profile Edboard
Avatar
Send message
Joined: 24 Sep 08
Posts: 72
Credit: 12,410,275
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5772 - Posted: 19 Jan 2009 | 8:57:01 UTC - in response to Message 5767.

ยทยทยทWell, someone is luckyยทยทยท


Lucky? I suppose you are joking.... I'm happy that now I only have v6.55 WUs!

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5777 - Posted: 19 Jan 2009 | 11:28:20 UTC - in response to Message 5772.

All the 6.60 crashed in our Windows system, so we removed it for now.


gdf

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 323,472,298
RAC: 584,066
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 5779 - Posted: 19 Jan 2009 | 14:44:49 UTC - in response to Message 5777.
Last modified: 19 Jan 2009 | 14:45:27 UTC

All the 6.60 crashed in our Windows system, so we removed it for now.


gdf


Thanks for that GDF, some Projects don't bother to Test New Applications & just release the Junk Wu's & let you spend countless hours processing them before you catch it ...

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5836 - Posted: 21 Jan 2009 | 15:54:17 UTC - in response to Message 5779.

New applications uploaded.

- credits are now more consistent among WUs
- higher process priority in Windows
- several other minor changes
gdf

Wolfram1
Send message
Joined: 24 Aug 08
Posts: 45
Credit: 3,431,862
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5839 - Posted: 21 Jan 2009 | 17:03:13 UTC - in response to Message 5836.

New applications uploaded.

- credits are now more consistent among WUs
- higher process priority in Windows
- several other minor changes
gdf



How can we recognize the new applications? Do they have a new version?

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5840 - Posted: 21 Jan 2009 | 17:25:54 UTC - in response to Message 5839.

..... I have a couple showing version 6.61.

Wolfram1
Send message
Joined: 24 Aug 08
Posts: 45
Credit: 3,431,862
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5841 - Posted: 21 Jan 2009 | 18:00:17 UTC - in response to Message 5840.

..... I have a couple showing version 6.61.


Ok, shall I kill my 2 "old" WUs in the queue?

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5843 - Posted: 21 Jan 2009 | 18:23:29 UTC - in response to Message 5841.

.... Hmm - just watching my first 6.61 WU going thru - it is making more use of the CPU than the 6.55 WUs. About a 10% uplift.

P.

Wolfram1
Send message
Joined: 24 Aug 08
Posts: 45
Credit: 3,431,862
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5845 - Posted: 21 Jan 2009 | 19:19:18 UTC - in response to Message 5843.

.... Hmm - just watching my first 6.61 WU going thru - it is making more use of the CPU than the 6.55 WUs. About a 10% uplift.

P.



Do you think the overall duration will be shorter?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5851 - Posted: 21 Jan 2009 | 22:37:02 UTC - in response to Message 5845.

The higher priority -> higher cpu usage should result in better GPU utilization, which would mean slightly shorter calculation times, especially with 2+1 / 4+1 on a dual or quad core, respectively. Is anyone already seeing higher GPU temps? I'm still working through my 6.55s.. or am letting my GPU do the dirty work for me ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5853 - Posted: 21 Jan 2009 | 23:09:41 UTC - in response to Message 5851.

The higher priority -> higher cpu usage should result in better GPU utilization, which would mean slightly shorter calculation times, especially with 2+1 / 4+1 on a dual or quad core, respectively. Is anyone already seeing higher GPU temps? I'm still working through my 6.55s.. or am letting my GPU do the dirty work for me ;)


Me too... I have 6.61 tasks queued but, none in work yet ...

Um, not to be difficult, but the goal for many of us was to REDUCE the CPU usage. With my 280 card I am fine with the processing time, but, am not so fine with the high CPU usage ... especially since it is to the best of my understanding mostly doing a polling loop, which might be necessary, but is hardly productive use of the CPU. Increasing the priority of the task and wasting more CPU time polling seems to me to be a step backwards.

It does help GPU Grid slightly, but at the cost of every other BOINC project I am running (or will be running) alongside the GPU tasks.

While on the subject. It occurs to me that the architecture of the GPU Grid application is not correct. There should be one polling thread that then dispatches to the service threads as needed. What I mean is this, if I have a 2 or more GPU system you will launch (based on earlier observations) two application instances both of which poll their individual GPU ... In that this is not really productive use of the CPU when I have two tasks doing nothing it gets ugly and when I have more than that it is REALLY bad.

With the dispatch scheme, the one thread is in the polling loop and it polls the GPUs one after the other checking to see if the GPU needs grooming, and if so, should wake the service thread to service the GPU ... less overhead in that only one thread is in idle poll mode ...

While on the subject, I have 1G cards for the most part and the memory load seems to run about 50% ... again, is is possible that this could be tailored to available memory so that those with more than 512M would have larger loads and thus lesser demands for re-fills?

These musing are from when I had two GPUs in the i7 system and there was so much usage of the CPU that the i7 was running in essentially 7 + 2 mode though it was trying to run as 8 + 2 ... I know GPU Grid wants to maximize the productivity for this project, but it should not do it at the expense of the other BOINC projects ... I am not sure what the CPU load is for SETI@Home (anyone out there running both projects?) but if it is minimal, perhaps we need to collaborate with them?



____________

Profile [SG]Arsenic
Send message
Joined: 19 Oct 08
Posts: 5
Credit: 2,217,455
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 5854 - Posted: 22 Jan 2009 | 1:28:19 UTC - in response to Message 5853.
Last modified: 22 Jan 2009 | 1:31:12 UTC

6.61 (CUDA) app uses 40% of my CPU (dualcore, so 80% of a single core) - even worse than 6.55!! We had a version with acceptable CPU usage (6.56), release that via app_info.xml (as has been requested numerous times). 40% CPU usage is not within an acceptable limit anymore for me, so until that is sorted, I'll only run it on my PS3.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5855 - Posted: 22 Jan 2009 | 2:35:33 UTC - in response to Message 5851.

The higher priority -> higher cpu usage should result in better GPU utilization, which would mean slightly shorter calculation times, especially with 2+1 / 4+1 on a dual or quad core, respectively. Is anyone already seeing higher GPU temps? I'm still working through my 6.55s.. or am letting my GPU do the dirty work for me ;)

MrS

More thoughts ...

I think I agree with Digi421 ... 6.55 was high on CPU, the next release lowered CPU usage and now we look like we are going to see a substantial increase ... though I am still doing my 6.55 tasks ...

Next question is why are we polling and not using the IRQ? If we used IRQ when the GPU was in need there would be no need for an idle poll loop.

Suggestion: An option to select "nice" or GPU performance.

I posted this on BOINC Dev:



There is a little bit of a conundrum going on and I thought I would post the question here ...

The "ideal" situation with BOINC running tasks on the GPU is that the actual CPU load would be negligible. GPU Grid seems to be having issues with application versions where the CPU load fluctuates as they make point releases. For those dedicated to the single GPU only project the CPU load is not an issue. For those with mixed loads the impact on CPU class projects may be less than desirable if the CPU load imposed by GPU tasks is too high.

For example, I had two GPUs in my i7 box and with BOINC Manager 6.5.0 I was running 8 + 2 as expected, well, not quite, the 8th CPU task got 0 CPU because the two GPU tasks took up too much CPU time.

My first question is why do we need to use a polling loop instead of an IRQ?

Second question, if we HAVE to use a polling loop why are we not using a single poll loop to query all GPUs and then dispatch to support threads to service those GPUs in need of grooming?

Third, why can we not have an option in BOINC Manager to select "Nice" vs. GPU Performance? In "Nice" the stress would be to use the GPU but not at the expense of huge overhead of a polling loop that is doing essentially no useful work with the tradeoff that the GPU will sometimes be idle while waiting for service. This has the added advantage for those with stressed systems that the GPU would have time to "cool" while waiting for servicing. In performance, the GPU poll loop would be higher priority and the emphasis would be on keeping the GPU at 100% usage ... Default in BOINC Manager would be "Nice" (as in GPU projects play nice with CPU projects).

I have not yet done a comparison test with SETI@Home to see what the usage is, but, at GPU Grid while running their application 6.55 my CPU usage is:

Q9300, 4 cores, single GPU, 5-11%
i7, 4 cores, HT, 8 VIrtual CPUs, 3-6%

Another participant reports: "6.61 (CUDA) app uses 40% of my CPU (dualcore, so 80% of a single core)"

Though GPU processing can be faster, when you factor in the overhead of also consuming the CPU resources at the same time it may not be such a bargain.


____________

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5857 - Posted: 22 Jan 2009 | 5:15:03 UTC

I am currently crunching two 6.61 WU's, and they are using 20% and 23% of a cpu core on my quad. That equates to 80-92% of a single core each, as opposed to the 6.55 WU's that were anywhere from 40%-80% of a single core. So, from my experience the cpu-usage in the best case of 6.61 is about the worst case for 6.55.

I'll have to check the run times, credits / hr, etc. to determine what to make of it.

Profile DoctorNow
Avatar
Send message
Joined: 18 Aug 07
Posts: 83
Credit: 122,995,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5858 - Posted: 22 Jan 2009 | 6:27:33 UTC - in response to Message 5854.
Last modified: 22 Jan 2009 | 6:36:11 UTC

6.61 (CUDA) app uses 40% of my CPU (dualcore, so 80% of a single core) - even worse than 6.55!!

Indeed...
Just started my first 6.61 WU and am bitterly disappointed.
My system (AMD X2 5200 and 9600GT) uses almost a complete core (about 90%!)for crunching on the GPUGrid task and becomes even more sluggish than before. :-(
I think I stop crunching here 'til a new version comes out, it's not worth it.
____________
Member of BOINC@Heidelberg and ATA!

Profile rebirther
Avatar
Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 5859 - Posted: 22 Jan 2009 | 8:24:39 UTC - in response to Message 5858.

6.61 (CUDA) app uses 40% of my CPU (dualcore, so 80% of a single core) - even worse than 6.55!!

Indeed...
Just started my first 6.61 WU and am bitterly disappointed.
My system (AMD X2 5200 and 9600GT) uses almost a complete core (about 90%!)for crunching on the GPUGrid task and becomes even more sluggish than before. :-(
I think I stop crunching here 'til a new version comes out, it's not worth it.


Yes, this way goes in the wrong direction, we need an app thats using only a little bit of a cpu core.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5860 - Posted: 22 Jan 2009 | 8:33:22 UTC - in response to Message 5859.

Yes, this way goes in the wrong direction, we need an app thats using only a little bit of a cpu core.


Actually we need the choice ... I prefer to run a mix of CPU / GPU tasks with the emphasis on CPU tasks as that is my current weight ... but some that are GPU Grid exclusive may want to have this mode of operation that gets the most out of the GPU card. I can make an argument for either direction which is why there should be the option.

Personally I would give up GPU performance for less load on the CPU ... a real elegant solution would actually have three or four levels of performance ... anyway, though it is early days I still see lots of room for improvement and I think I need to make some tests with SaH's application to see how theirs loads the system ... with the thought of course, if they can have a light load, why cannot we have the same here?

The answer is likely to be "because" ... but ... this is one reason why I can hardly wait for other projects to start to provide GPU applications so we can start to have some CHOICE ... and vote with our feet ... of course, I have yet to see a project that really takes the desires of participants to heart ... I mean I asked CPDN to have an option to only DL one task per computer so I would not have the things hanging around for years ... their answer was to abort them ... sigh ... waste ...
____________

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5861 - Posted: 22 Jan 2009 | 9:35:20 UTC
Last modified: 22 Jan 2009 | 9:39:18 UTC

Oops. GDF - it looks like we need 6.55 back or preferably 6.56 - or even 6.next if you can fix 6.56 for all....

I'm looking for the GPU task to use less than 50% of a CPU so that I can run 3+2 on my box.

Profile Lazarus-uk
Send message
Joined: 16 Nov 08
Posts: 29
Credit: 122,821,515
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5862 - Posted: 22 Jan 2009 | 9:44:41 UTC - in response to Message 5861.



Just noticed a new app (6.59) running on my Linux install. That has also increased CPU usage, from ~10-12% CPU, to ~50%. Does this mean that I'm going to have to go back to running GPU 3 + 1, even on Linux? Ah well, I may as well go back to Windoze...at least I know what I'm doing there.


Profile rebirther
Avatar
Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 5865 - Posted: 22 Jan 2009 | 12:10:20 UTC

I have checked my latest result with 6.61 and there is no speed increase, one bad issue is that the system getting sluggerish.

Profile [AF>Libristes>Jip] Elgran...
Avatar
Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5866 - Posted: 22 Jan 2009 | 13:03:10 UTC

New Linux App (6.59)
On Q6600 ~3GHz with GTX280 Gpu, cpu usage reach 12% (9% with Linux app version 6.58).
On Celeron 420 ~1,6 GHz with 8800GTS512 Gpu, cpu usage go to 32%.

These figures aren't very good especially for my low end computer.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 5869 - Posted: 22 Jan 2009 | 13:29:46 UTC - in response to Message 5866.

As said in this thread the Windows low cpu version will come in a few days.
For Linux there should be no changes at all.

gdf

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 68
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 5871 - Posted: 22 Jan 2009 | 15:37:28 UTC - in response to Message 5869.

...
For Linux there should be no changes at all.

gdf


Actually the change with the Linux CPU usage started with the new WU types...
Some of them need more CPU than the old GPUTEST WUs, and some less or the same as the GPUTEST ones.
____________

pixelicious.at - my little photoblog

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5873 - Posted: 22 Jan 2009 | 17:02:49 UTC - in response to Message 5869.

As said in this thread the Windows low cpu version will come in a few days.
For Linux there should be no changes at all.

gdf


Well, the change caught us off guard and I missed the note that we would get a high CPU usage version to test the changes on all systems (though it seems that there are still problems with 64-Bit windows XP at least, see PoorBoy's other thread).

Going WAYYYYY back I did find the little teensy tiny note you wrote ...

When ETA announced with glee the high CPU use version, well, the glee seemed to be for the wrong reason ... or at least it struck me as such. I want to support GPU Grid, but not at the cost of one CPU per GPU ... that said, did you see my other questions below?

I am running my first 6.61 version tasks (on the i7) and the CPU use is up, not as bad as I feared, though up from 3-6% to solid 6 with occasional 7% load.

I have a 295 card arriving tomorrow and I was going to try it in a Linux box that I started up for some other reasons to see what the load is there ... (if I can get it to work at all, as someone else mentioned, my linux skills are, ahem, sub-par) ...
____________

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5874 - Posted: 22 Jan 2009 | 17:06:26 UTC

On a GTX280 in 3+1 I get this times for a 6.61 WU with 2478 credits in relation to a 6.55 WU with 2435 credits

6.61 # Approximate elapsed time for entire WU: 22036.278 s
6.55 # Approximate elapsed time for entire WU: 19793.840 s

From this point of view there is no advantage for the 6.61.

____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5875 - Posted: 22 Jan 2009 | 18:38:20 UTC - in response to Message 5874.
Last modified: 22 Jan 2009 | 18:55:51 UTC

On a GTX280 in 3+1 I get this times for a 6.61 WU with 2478 credits in relation to a 6.55 WU with 2435 credits

6.61 # Approximate elapsed time for entire WU: 22036.278 s
6.55 # Approximate elapsed time for entire WU: 19793.840 s

From this point of view there is no advantage for the 6.61.


ON my first task completed with 6.61 I have similar results ...

{edit - add}
CPU usage on the Q9300 has doubled.

Now it takes almost the full core. An interesting question... does CUDA work best on systems using HT because the CPU core can make more rational decisions on use of the CPU's capacities?

On the Q9300 of course, there is no HT so it is more all or nothing on the use of the core on the chip ...

Not sure what to make of this yet ... but it is an interesting side note...

I am still curious as to the load that exists on Linux with a 695 card with two GPU cores ... on the i7 it would take a full core to support the two GPU cores with the current application. The older application there would have been at least a few percentage "free" that another task could still run.
____________

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 5879 - Posted: 22 Jan 2009 | 19:35:51 UTC

The new windows application 6.61 certainly seems to sucking up huge amounts of CPU time.

On the the machine with GTX260 card switching on the monitor just gave a black screen - the only way to get the display back was to reboot having first suspended Boinc activity from the other machine.

The other machine, with the slower 8600GTS card, didn't suffer from any black screens - I'm not sure why that should be the case. Both machines drive the same monitor via a KVM switch.

On both machines I've noticed one of the the other project's tasks getting much less share of the CPU and as a result taking 50% longer than normal to complete.

Accordingly, I've had to switch both machines to 1 + 3 mode by setting % of processors to 99. Something I've been able to avoid doing until this point.

Phoneman1

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5883 - Posted: 22 Jan 2009 | 21:08:22 UTC
Last modified: 22 Jan 2009 | 21:10:31 UTC

Well, I didn't announce anything, I just interpreted GDFs comments on 6.61. Seems like I may have been wrong regarding a possible speed up due to the higher priority.. but then it was a question to the guys already running it, not a statement.

And Kokomiko, you couldn't possibly get a speedup (due to higher priority) if you're already running 3+1 on a quad. It could only help if you're running 4+1 and your GPU is underutilized.

Why we have to use polling at all? Sure, the concept seems very ill-conceived, but that's actually what nVidia proposed for CUDA.

@Stefan: the new WU types feature more complex models, so if the time per step goes up on your GTX 260 you approach the performance region of the older cards with the old WUs, where things get sluggish.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5889 - Posted: 22 Jan 2009 | 23:24:52 UTC

Lots of reported errors with 6.61.

Lot's of Errors

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5890 - Posted: 22 Jan 2009 | 23:30:09 UTC - in response to Message 5883.

Why we have to use polling at all? Sure, the concept seems very ill-conceived, but that's actually what nVidia proposed for CUDA.


Not sure I was trying to give the impression that it was ill-conceived. Sometimes the use of fast acting polling loop with a large amount of sleeping time are sometimes a good enough solution to the problem. Now that we have the realities and some experience I was asking the question so that maybe we can move in the direction were we are using a more sophisticated mode of operation.

If you look closely I was asking broader questions than that. I know it CAN seem like carping if you are in the mood to read it that way. What I am doing is brainstorming what I can see, what I can infer and compare that with what I have experienced as a systems engineer ...

Like the memory load. I have been pondering the memory load and the CPU load and I mused about the difference betwixt SaH and GPU Grid and speculated that the SaH CPU load was a lot lower. Of course that is likely because the whole of the SaH task can be loaded into VRAM and then the program executed. GPU Gird appears larger and thus we have load and store actions. THAT said, I also pondered on the faster cards with 1G or more VRAM if we could lower that CPU load if we used bigger slices of data, slices sized for the amount of available VRAM.

Using IRQ and or a single polling task which dispatches to "grooming" tasks that do the other operations means that we only have one task that sucks up CPU while doing the polling of ALL of the GPUs and only when grooming needs to be done do we run the other threads. Operationally what would happen is that when BOINC Launched a task it would check to see if there was a polling task, if there was not, one would be launched ... if there was one, it would be used ... in other words, with one GPU we have two tasks running ... for two, three ... and for that lucky guy with 3 GTX 295s, 7 ...

Anyway, all this in an attempt to make things better for all of us. Wasted CPU is wasted CPU ... if this IS the only way to get er done ... then we will live with it ... but, I think we should be wanting to look at all of the possibilities ...
____________

Profile K1atOdessa
Send message
Joined: 25 Feb 08
Posts: 249
Credit: 370,320,941
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5893 - Posted: 23 Jan 2009 | 1:00:43 UTC - in response to Message 5889.

Lots of reported errors with 6.61.

Lot's of Errors


Sorry, wrong link. Correct link

GDF's Response to issue

JKuehl2
Send message
Joined: 18 Jul 08
Posts: 33
Credit: 3,233,174
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 5904 - Posted: 23 Jan 2009 | 10:33:20 UTC
Last modified: 23 Jan 2009 | 10:35:40 UTC

Finally a version that runs fine with 4+1 Tasks (4 for CPU 1 for ACEMD).

Boinc 6.6.2 + Acemd 6.6.1 does the trick.
CPU Utilisation is at about 12% (Quad Q6600 @3.4 GHZ)

Finally a Version that produces same heat for the GTX260 as 3+1 Tasks (one Core dedicated for GPUGRID via using "processlasso"

4+1 with 6.6.2 and 6.5.0 produced about 18% CPU usage.

way to go guys - iยดm really impressed that you nearly got it working now. now only thing left is the implementation of a less time-consuming polling.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5908 - Posted: 23 Jan 2009 | 10:52:32 UTC - in response to Message 5904.

Finally a version that runs fine with 4+1 Tasks (4 for CPU 1 for ACEMD).

Boinc 6.6.2 + Acemd 6.6.1 does the trick.
CPU Utilisation is at about 12% (Quad Q6600 @3.4 GHZ)


YOu be lucky ...

I just tried it and my results stay the same ...

22% CPU usage ... which means essentially one full core used only for baby sitting ...
____________

Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 5919 - Posted: 23 Jan 2009 | 17:48:40 UTC

I can just point out:

before - 1 hour CPU time per WU
Now - 6 hours

NOT the way to go. I strongly encourage a solution

P.S. On linux

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5924 - Posted: 23 Jan 2009 | 20:17:12 UTC

Hi Paul,

about the "ill-conceived": that was actually coming from me, I was nto trying to read it into your comment. I think it's a wrong concept that the CPU has to ask the GPU if it's ready. The GPU should be able to tell the CPU about this. Don't know if there is an IRQ for this.. seems too straight forward if it was ;)

And I forgot about your comment regarding one polling loop for all GPUs. Surely seems like a good idea, but may not be neccessary if they can get the cpu usage down as expected. The only problem I see with that approach is that in BOINC every task is insulated from the others. This is by design and it's actually a strength.. it lets you utilize multi cores rather easily and protects you from all kinds of strange interaction (which you usually get when you parallelize outside a core). One would have to program this carefully, but it seems manageable nevertheless.

Regarding the VRAM: right now the WUs use about 70 - 80 MB of VRAM. More complex models could be simulated, which would require more memory, but then they time per step would go up and the screen would get really sluggish even on GT200 cards. I guess that's not what people want right now ;) Similarly if one told the GPU to compute several steps within one polling interval (to reduce the polling frquency) and I don't know if it would be possible at all. bottom line: I don't see potential for improvement by using more VRAM.

Regards,
MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5925 - Posted: 23 Jan 2009 | 20:26:13 UTC

Oh, and my 6.61 result for a 2479 credit WU, running BOINC 6.5.0 under XP 32 and 4+1:

app 6.61
CPU time | time per step
28500 | 67.8

app 6.55
CPU time | time per step
14700 | 77.5
21200 | 72.3
20600 | 75.6
17300 | 87.0 (odd)
21200 | 70.4
19200 | 72.8

If I discard the odd one I get an average of 19400s and 73.7 ms/step for the old app. So it seems I am seeing an increase in GPU speed and an increase in CPU time. Based on my current RAC of 5000 that would give me 5430 RAC. Not bad.

MrS
____________
Scanning for our furry friends since Jan 2002

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 5926 - Posted: 23 Jan 2009 | 20:42:03 UTC - in response to Message 5925.

CPU time. Based on my current RAC of 5000 that would give me 5430 RAC. Not bad.


But has your other project(s) suffered more than GPU has gained???

You rarely get anything for nothing!

Phoneman1

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5928 - Posted: 23 Jan 2009 | 21:01:20 UTC - in response to Message 5926.

CPU time. Based on my current RAC of 5000 that would give me 5430 RAC. Not bad.


But has your other project(s) suffered more than GPU has gained???

You rarely get anything for nothing!

Phoneman1


Yes ...

And this is especially acute if GPU Grid is not a high interest project for you. I don't know, there are, what, 211 protean folding projects out there? If not that many, maybe only 197 ... :)

Anyway, the question is not directly winners and losers on a project by project basis, but science as a whole ...

The change in 6.55 to 6.61 increased by 50% the load on the CPUs, this means that for every hour of GPU Grid I lose half an hour for some other project just due to idle polling. I know they are working on it and I am NOT ranting ... I am just expressing a little frustration in the way things are working and though I have been dedicating more resources here than I normally would it is because this is a new technology I want to see get off the ground.

THAT said, I will also say that at this time I like the project's responsiveness with ETA "G" whatsis name stopping by to see what is on our minds (if anything, sometimes mine is a blank) ... but I do not want it to be forgotten that "playing nice" is important ... Just as an example, if GPU Grid cannot get the CPU load down for their project tasks ... well I stop work for project all the time ... or reduce the share they get ... the only advantage they have at the moment is that they are the only science project around using the GPU on BOINC ... SETI@Home is not doing science, they are exploring, whole nother animal ...

Anyway, in the last month I have gone from a 9800 GT to add a GTX 280 AND just now a GTX 295 ...which experience I will talk about in another thread soon ...
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5929 - Posted: 23 Jan 2009 | 21:12:20 UTC - in response to Message 5924.

Hi Paul,

about the "ill-conceived": that was actually coming from me, I was nto trying to read it into your comment. I think it's a wrong concept that the CPU has to ask the GPU if it's ready. The GPU should be able to tell the CPU about this. Don't know if there is an IRQ for this.. seems too straight forward if it was ;)


Hi ...

No harm, no foul (Fowl?) ... I worry about communication sometimes because people read into what I write sometimes ... and I know I miss a lot of the detail because I am so literal minded ... a regular Sar Trek Mr. Spock type ... that is me ...

The video cards do have an IRQ assigned to them which is why I asked about using the IRQs instead.

And I forgot about your comment regarding one polling loop for all GPUs. Surely seems like a good idea, but may not be neccessary if they can get the cpu usage down as expected. The only problem I see with that approach is that in BOINC every task is insulated from the others. This is by design and it's actually a strength.. it lets you utilize multi cores rather easily and protects you from all kinds of strange interaction (which you usually get when you parallelize outside a core). One would have to program this carefully, but it seems manageable nevertheless.


If they are going to continue to use a polling loop this is still a good idea. The point being that we reduce the number of poling loops to the minimum required, which is one. As to BOINC and task isolation this is not necessarily a show stopper. The first task starts, checks to see if there is a polling task alive, if not it starts on, otherwise it continues after regestering with the live polling loop. In between times it is servicing its GPU the task would sleep ... the polling loop would not strictly be a BOINC task, the BOINC task would be the task that you currently have sans the polling loop ...

Regarding the VRAM: right now the WUs use about 70 - 80 MB of VRAM. More complex models could be simulated, which would require more memory, but then they time per step would go up and the screen would get really sluggish even on GT200 cards. I guess that's not what people want right now ;) Similarly if one told the GPU to compute several steps within one polling interval (to reduce the polling frquency) and I don't know if it would be possible at all. bottom line: I don't see potential for improvement by using more VRAM.


Just a thought ... when I look at the application using one of the nvidia tools I would see about 50% memory load ... sometimes the way to reduce loads like that is to upload more stuff if there is room ... the bottom line is I want more efficiency ... :)

____________

J.D.
Send message
Joined: 2 Jan 09
Posts: 40
Credit: 16,762,688
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwat
Message 5931 - Posted: 23 Jan 2009 | 21:44:20 UTC - in response to Message 5928.


The change in 6.55 to 6.61 increased by 50% the load on the CPUs, this means that for every hour of GPU Grid I lose half an hour for some other project just due to idle polling. I know they are working on it and I am NOT ranting ... I am just expressing a little frustration


When the beastly software gets out of control, don't fear to slay it. ;-)

Meanwhile, I wonder why the Windows port of GPUGRID loads the CPU so...

Today I see loads of 12% to 14% per core on a 2.4 GHz Phenom under Linux 2.6.28 and nvidia-kernel 180.22 with GPUGRID application version 6.59 and GTX 260 GPU hardware.

Could it be that my GPUs are underutilized?
Or is the Windows software that inefficient?

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5933 - Posted: 24 Jan 2009 | 0:30:17 UTC - in response to Message 5931.

Could it be that my GPUs are underutilized?
Or is the Windows software that inefficient?


You are now a witness to a great truth ... Microsoft does not actually write software that is all that good. :)

Though the load of 12-14% is also pretty high as these things SHOULD be going. On a quad that is a 50% CPU load (25% of the total power of the system) ...

On my windows quad I am seeing about 22% load and on the i7 about 7% per GPU task (I now have 3 GPU CPUs running, two in the new GTX 295 and the GTX 280) ...

This is why i was suggesting a single polling loop for all GPU Grid processes with the processes registered with BOINC being the processes that groom the individual GPU core.

I tried to get a GPU going on the linux box and I guess I am too stupid because I was not able to do so. When I D/L the driver package it told me I needed other things and it was just too much of a mess that I said the heck with it for the moment. I may play with it later, but I fell out of love with DOS style command lines back about the time I bought my Lisa ...

Anyway, you are losing half the processing power of your core just to see if the GPU needs attention. the real cost should be about 1% ... anything above that is actually quite high. And if it is per GPU core, that 3% figure for my system for example, would mean that 6% would be really, really wasted overhead. I don't mind 3% overhead that much, but 3% times 3 cores is 9% of the system CPU in a idle detection loop ... not good ...

So, I am back to my IRQ, single polling thread, or the ability to select "Nice" vs Performance for the GPU ...
____________

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 5934 - Posted: 24 Jan 2009 | 0:44:48 UTC

Got a 6.61 WU. 30-40% of CPU on Athlon X2. GTX 260. BOINC 6.5.0. It's terrible.

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5956 - Posted: 24 Jan 2009 | 15:37:07 UTC

Tried a few 6.61 units on my Athlon X2 3800+, 9500GT, BOINC 6.50, 32-bit Vista Home Premium. The GPU tasks took over one full core (never dropping below 47% of total CPU usage, going as high as 73%, and most typically in the range of 52-56%). With GPU task running, the machine is virtually unusable as it takes several seconds to switch between simple tasks such as a Mozilla web page tab and an already open BOINC manager. Unfortunately, I am forced to suspend crunching GPUGRID on this box at least until another app version is available.

mscharmack
Avatar
Send message
Joined: 20 Aug 07
Posts: 18
Credit: 1,319,274
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 5959 - Posted: 24 Jan 2009 | 19:14:33 UTC
Last modified: 24 Jan 2009 | 19:15:35 UTC

Something is going on with the project.

Athlon 64 X2 6000
Win Vista Home Premium 32 bit
SLI 2- XFX 9600 GSO with 798 MB DDR2 RAM. (almost 1.6 MB total)
2 Gig PC6400 DDR2

This started just a couple of days ago. When working on multiple BOINC projects along with GPU@Home, the work units under GPU@home operate normally, however the other projects just seem to crawl along. Its been taking some 8 actual seconds to to complete one CPU second of work. A couple of days ago, there was no conflicts in any BOINC project. How much actual memory is this project taking to cause such a slow down. It just seems to go against the realm of CUDA and GPU processing.
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5961 - Posted: 24 Jan 2009 | 19:33:38 UTC - in response to Message 5959.

Something is going on with the project.

Athlon 64 X2 6000
Win Vista Home Premium 32 bit
SLI 2- XFX 9600 GSO with 798 MB DDR2 RAM. (almost 1.6 MB total)
2 Gig PC6400 DDR2

This started just a couple of days ago. When working on multiple BOINC projects along with GPU@Home, the work units under GPU@home operate normally, however the other projects just seem to crawl along. Its been taking some 8 actual seconds to to complete one CPU second of work. A couple of days ago, there was no conflicts in any BOINC project. How much actual memory is this project taking to cause such a slow down. It just seems to go against the realm of CUDA and GPU processing.


The project is still in the early days and this is a major complaint with the GPU Grid application. The CPU usage is much higher than desired / expected. The project is aware and is promising a newer version of the application that uses less CPU though that has not happened yet.

The latest release 6.61 almost doubled the CPU useage over 6.55 where we had hoped that the opposite would happen, that the usage of 6.55 would be halved. The idea was that the GPU run times would be lowered. THough that has not been my experience to this point.

Most of us agree that the theroetical ideal would be for negligible CPU use by the GPU tasks on the order of 1% to 3% of the core usage where the lowest I have experienced is 3% of total system resource usage.

Sadly, we are stuck in the situation that if we want to keep the GPU spinning we have to pay for it with significant reductions in the productivity of the CPU projects we support.
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5962 - Posted: 24 Jan 2009 | 21:25:13 UTC

Scott, are your cpu usage numbers set to 100% = 1 core? Otherwise they would look strange. And the very slow responsiveness you're seeing is due to your relatively slow card (32 shaders vs recommended 50+). Nothing has changed here during the different client versions. But the WUs have changed, now we also crunch more complex models, which take even longer to process (and thus make the lag worse)

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5964 - Posted: 24 Jan 2009 | 21:45:58 UTC - in response to Message 5931.

Meanwhile, I wonder why the Windows port of GPUGRID loads the CPU so...

Could it be that my GPUs are underutilized?
Or is the Windows software that inefficient?


No, your Linux should be fine!

Trying a short explanation: the GPU crunches for some 10th of milliseconds and when it's done the CPU needs to intervene. If it doesn't the GPU runs dry. Under linux the project uses "nanosleep", don't know if it's a function or library or whatever.. anyway, it allows to send the cpu task into sleep mode and wake it up at a precisely controlled time.
There is no nanosleep for windows and the underlying reason seems to be that the smallest time steps the windows scheduler knows is 1 ms, so the timing control is much less efficient (and the polling has to be more agressive, otherwise GPU performance suffers).

MrS
____________
Scanning for our furry friends since Jan 2002

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5970 - Posted: 24 Jan 2009 | 23:26:13 UTC - in response to Message 5962.

Scott, are your cpu usage numbers set to 100% = 1 core? Otherwise they would look strange.


Was writing the earlier posts fairly quickly before having to leave home to do some work. Anyway, the percentage usage is from the windows task manager processes list.

And the very slow responsiveness you're seeing is due to your relatively slow card (32 shaders vs recommended 50+). Nothing has changed here during the different client versions. But the WUs have changed, now we also crunch more complex models, which take even longer to process (and thus make the lag worse)

MrS


While the effects on former app versions was more evident on the slower card (vs. for example my 9600GSO), these were always relatively modest effects. Also, with 6.55 app versions, this box crunched a few of the different types (US, USPME, GPUTEST, JAN) of work with no real differences to note other than ms/time step and overall run time. It is only with the 6.61 app version that the machine is so badly affected.

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5976 - Posted: 25 Jan 2009 | 5:37:48 UTC

Well, here is an interesting twist:

http://www.gpugrid.net/workunit.php?wuid=184242

I thought that work was tied to the application version, but I guess not. The two machines that errored out this unit before I got it list the app version as 6.59 and 6.61, respectively?


ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5996 - Posted: 25 Jan 2009 | 15:31:02 UTC - in response to Message 5976.

6.59 is the linux equivalent of 6.61 for win ;)

Anyway, the percentage usage is from the windows task manager processes list.


You have a dual core cpu, so the default windows behaviour should be to show 100% as both cores under full load and 50% for one core under full load. To exceed 50% one has to go multithreaded. When you say you se cpu usages of up to 73% that would mean 1 and a half core are used. Nobody else reported something like that before for GPU-Grid.

And regarding the other issue: so it seems the different WU types are not causing your massive slow down and it looks like the new client is really to blame for this.

MrS
____________
Scanning for our furry friends since Jan 2002

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 5998 - Posted: 25 Jan 2009 | 16:38:14 UTC - in response to Message 5996.

6.59 is the linux equivalent of 6.61 for win ;)


Ah...got it!


You have a dual core cpu, so the default windows behaviour should be to show 100% as both cores under full load and 50% for one core under full load. To exceed 50% one has to go multithreaded. When you say you se cpu usages of up to 73% that would mean 1 and a half core are used. Nobody else reported something like that before for GPU-Grid.

And regarding the other issue: so it seems the different WU types are not causing your massive slow down and it looks like the new client is really to blame for this.

MrS


I have tried to test it to reproduce the more than 50% load, and haven't been able to do so. Maybe it was just something screwy with that particular workunit. I downloaded a new one and it has a load of 46-50% and is fairly constant.

Unfortunately, it still has the continued slow-down problems. Also, I think you were correct in your initial assessment of more difficult work. The last unit I downloaded gives time to completion estimates that are twice what any of the pre-6.61 units were. Old work would averaged about 2.5% complete per hour on my 9500GT, but the current work is running at less than half that speed.



ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6001 - Posted: 25 Jan 2009 | 17:48:09 UTC - in response to Message 5998.

They're trying to compensate for the increased model complexity by including less steps in each WU, so that the total completion time is more or less constant (12h on a 9800GT). This may or may not work well on different hardware, i.e. slower cards could have a disproportionately slower memory interface or something like that. But I guess a factor of 2 is not expected.

MrS
____________
Scanning for our furry friends since Jan 2002

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 6002 - Posted: 25 Jan 2009 | 18:24:27 UTC - in response to Message 6001.

...i.e. slower cards could have a disproportionately slower memory interface or something like that. But I guess a factor of 2 is not expected.


I thought about that as a possibility, especially since the 9500GT is a 128-bit card. However, that should mean that on my 9600GSO (192-bit) I would see a more modest effect, but so far it seems to be no different than before the 6.61 apps (both have GDDR3, with the 9500GT having a bit faster memory clock).

I am actually wondering if there could be a CPU pattern emerging. It seems that several of the above notes in this thread, including mine, are with AMD machines and not Intel's (indeed, my 9600GSO is in an Intel box). There are some significant architectural differences between those two, but I have no idea how they might play into differences in CPU - GPU interactions?



ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6003 - Posted: 25 Jan 2009 | 19:01:54 UTC - in response to Message 6002.

I'd say the GPU-CPU interaction is, although time critical, rather limited. I.e. any differences would emerge from differences in the chipsets and drivers rather than the CPUs themselfs.

And I'd say we need more data, statistically relevant numbers, to be certain about such performance changes. Not to disregard your finding and your concerns, but watching the progress bar of one WU is not yet enough ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6004 - Posted: 25 Jan 2009 | 19:05:36 UTC - in response to Message 5970.
Last modified: 25 Jan 2009 | 19:06:20 UTC

While the effects on former app versions was more evident on the slower card (vs. for example my 9600GSO), these were always relatively modest effects. Also, with 6.55 app versions, this box crunched a few of the different types (US, USPME, GPUTEST, JAN) of work with no real differences to note other than ms/time step and overall run time. It is only with the 6.61 app version that the machine is so badly affected.

V6.61 is a BIG step backwards here too. More CPU usage, noticeably poorer video responsiveness. Instead of improving, things are getting worse. V6.56 worked great on Win32, everything since (including the downgrade to v6.55) has been a step backward for the users IMO.

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 6018 - Posted: 26 Jan 2009 | 7:24:10 UTC

I wish developers to be lucky and smart ot less than f@h founders! Wish GPUGRID luck in this New Year!

And I wish to crunch on 6.62 app version, with CPU usage less than 10% on one core......

Rabinovitch
Avatar
Send message
Joined: 25 Aug 08
Posts: 143
Credit: 64,937,578
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 6048 - Posted: 27 Jan 2009 | 5:53:46 UTC

So, what about "low CPU" app for Windowzz?

Scott Brown
Send message
Joined: 21 Oct 08
Posts: 144
Credit: 2,973,555
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 6146 - Posted: 29 Jan 2009 | 2:04:52 UTC

Well, it appears to have definitely been the 6.61 app. The 6.62 app is now crunching away happily on the 9500GT at the same or better speeds than were observed with 6.55. The CPU load is around 3%, and the machine is fairly smooth in usage with the severe slowdown seen under 6.61 now absent.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6149 - Posted: 29 Jan 2009 | 3:00:24 UTC

My machine is now again usable since v6.62. With v6.61 it was useless except for crunching. CPU usage has gone from 22% on the quad to 1%. Thanks, v6.62 is GREAT!

Neil A
Send message
Joined: 9 Oct 08
Posts: 50
Credit: 12,676,739
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 6151 - Posted: 29 Jan 2009 | 3:07:29 UTC

6.62 is amazing...running in +/- 3-4 mins of Q9950 CPU time and 2x GTX 260 Core 216 Superclocked video cards. My CPU contributions are climbing again. 6.62 looks like a winner to me! Thanks GDF & Co.


BTW, I am averaging 9K of points per card per day and climbing still.

Neil

____________
Crunching for the benefit of humanity and in memory of my dad and other family members.

Post to thread

Message boards : Graphics cards (GPUs) : New application version

//