Advanced search

Message boards : Graphics cards (GPUs) : Wasted CPU time - a simple fix

Author Message
Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5078 - Posted: 30 Dec 2008 | 16:01:39 UTC

This was asked about in the v6.56 thread and never answered so I'll try again in more detail. Why are we not using the v6.56 application for 32bit Windows? Sure there are issues with 64bit machines but it's simple to set the BOINC server to DL different apps to 32bit and 64bit boxes.

1) v6.56 uses less than 1/100 the CPU time compared to v6.55 (at least on acemd WUs on my machine)

2) Setting the BOINC server to download v6.56 to 32bit boxes and 6.55 to 64bit boxes should be simple

3) By NOT doing this the CPU time wasted is massive, with no upside at all

4) Maybe it's time to separate the 32bit and 64bit apps anyway and not try to make one app work for everything

Can anyone think of a valid reason not to do this? Comments are warmly invited.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5082 - Posted: 30 Dec 2008 | 16:23:33 UTC

I did not see the full explanation as to why the 6.56 version was pulled ...

However, were I running the software world here I would have likely done the same thing. If a version is not working correctly on one platform, depending on the problem it also may suggest the results from the other platform may be compromised.

Secondly, the results from the different versions may not allow proper "batching" because the processing is so different ...

We could go on with speculation.

THAT aside, this is an early day project and is for CUDA still in serious development. Issues such as this are to be expected. In another thread a re-release of a new version is projected in about a week and to *ME* this is a reasonable timeframe especially with the holidays intervening.

These are the my thoughts ...

if it bothers you too much, I suggest you consider just pause the project for a week or so ...

I am here to test CUDA and to learn. In GENERAL, this class project is not that much of my interest. Never did like Biology. But I am madder at SaH so ... if I want to do CUDA, I need to do it here ... :)

In the mean time, take a gander at the BOINC Dev mailing list and make comments there about the issue of a project over-subscribing CPU usage to the thread Dr. Anderson is running solicing comments on work fetch policy. I say this though I know he wants to only focus on work fetch, I don't see how that can be done in isolation from Resource Share and task execution scheduling and consumption monitoring ...

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5087 - Posted: 30 Dec 2008 | 17:41:21 UTC - in response to Message 5082.

I did not see the full explanation as to why the 6.56 version was pulled ...

However, were I running the software world here I would have likely done the same thing. If a version is not working correctly on one platform, depending on the problem it also may suggest the results from the other platform may be compromised.

It was not an issue of compromised results. The issue was that the app crashed on 64bit machines. It ran better on 32bit than v6.55: v6.56 had faster completion times and massively less CPU usage. v6.56 was a HUGE step forward for the client, it's just too bad that it's not being used where it provided so much benefit.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5088 - Posted: 30 Dec 2008 | 17:51:46 UTC - in response to Message 5082.
Last modified: 30 Dec 2008 | 17:53:23 UTC

These are the my thoughts ...

if it bothers you too much, I suggest you consider just pause the project for a week or so ...

That sounds like, don't make suggestions. If you don't like it leave. I don't think that's the attitude of the developers here. They seem very responsive. I think they welcome our suggestions.

I am here to test CUDA and to learn. In GENERAL, this class project is not that much of my interest. Never did like Biology. But I am madder at SaH so ... if I want to do CUDA, I need to do it here ... :)

That's fine for you, although running a project because you're mad at another one is questionable. I like this project. I'm here because I want to help the project by donating GPU time and by helping the development of the project by providing beta feedback. That's the point of being in beta IMO. That said I'd also like to help other projects by using CPU time efficiently, not wasting it needlessly.

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 5090 - Posted: 30 Dec 2008 | 18:03:14 UTC - in response to Message 5082.


Secondly, the results from the different versions may not allow proper "batching" because the processing is so different ...


Quite so but currently we have a different version for Linux....

I'm definitely in favour of an early return to 6.56 if the project team can engineer it even if it is just for Win 32 bit in the interim.

I've just analysed my results from the machine with the GTX260 (since the card was installed on Dec 9th).
Version 6.56 - Total 5 WUs completed OK[color=yellow] 4 WUs worth 1887.97 credits - CPU secs 2803.49 to 3481.52 1 WU worth 2435.94 credits - CPU secs 2153.80 [/color] Version 6.55 - Total 26 WUs completed OK[color=yellow] 6 WUs worth 1887.97 credits - CPU secs 11563.18 to 11701.56 13 WUs worth 2435.94 credits - CPU secs 4991.89 to 6386.68[/color] 7 WUs worth 3232.06 credits - CPU secs 4114.21 to 5110.20 Version 6.53 - Total 16 WUs completed OK 6 WUs worth 2435.94 credits - CPU secs 5120.47 to 5553.50 10 WUs worth 3232.06 credits - CPU secs 3736.47 to 21641.35 Version 6.52 - Total 5 WUs completed OK 5 WUs worth 3232.06 credits - CPU secs 2753.04 to 4302.24

If you compare the two pairs of lines in yellow you'll see why I am keen to get back to 6.56...

The numbers also show the 1887.97 credit units particularily demanding of CPU time in comparison to the other units, as mentioned elsewhere on this forum.

Phoneman1

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5091 - Posted: 30 Dec 2008 | 18:14:48 UTC
Last modified: 30 Dec 2008 | 18:19:48 UTC

Phoneman1, thanks for the analysis. At least it looks like the GTX260 is doing better CPU wise than the slower cards.
Here's the analysis for my 9600 GSO:


Version 6.56 - Total 2 WUs completed OK

WUs worth 3232.06 credits - CPU secs: 334.95 to 338.27 each


Version 6.55 - Total 25 WUs completed OK

WUs worth 3232.06 credits - CPU secs: 34,341.28 to 51,533.36 each

That's over 100 to 1 difference

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 5092 - Posted: 30 Dec 2008 | 18:23:02 UTC - in response to Message 5087.
Last modified: 30 Dec 2008 | 18:25:07 UTC

I'm running Vista 64 on both my rigs and 6.56 ran flawless. I don't think I had any bad WU. So I don't know what the issue was.

Pat

Update: I had 1 WU go bad.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5093 - Posted: 30 Dec 2008 | 18:32:55 UTC - in response to Message 5092.
Last modified: 30 Dec 2008 | 18:35:48 UTC

From the v6.56 thread:

"Every single report of this error was a machine with either XP-64 (SP1 or 2) or Server 2003."

Apparently Vista wasn't affected.

Update: I had 1 WU go bad.

Was your one error "error 212"?

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 51,279,371
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 5095 - Posted: 30 Dec 2008 | 19:08:01 UTC - in response to Message 5090.
Last modified: 30 Dec 2008 | 19:11:06 UTC

Yea that's really a nice analysis Phoneman!

Would be interesting to see if there's also a difference in the total elapsed time and ms/step with the two different applications.

It's nice if 6.56 needed less CPU time, but it wouldn't be much of an improvement if the overall computation needs longer...

[edit]Also - do you run 4+1 or 3+1 tasks on your Quads?
I haven't had enough time to test it, I only saw that if 6.56 is run with 3+1 it takes the same CPU % than 6.55 therefore I believe if 6.56 is run with 4+1 it needs less CPU but the wall clock time a WU needs is longer...
____________

pixelicious.at - my little photoblog

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5096 - Posted: 30 Dec 2008 | 20:32:24 UTC - in response to Message 5095.

Would be interesting to see if there's also a difference in the total elapsed time and ms/step with the two different applications.

It's nice if 6.56 needed less CPU time, but it wouldn't be much of an improvement if the overall computation needs longer...

[edit]Also - do you run 4+1 or 3+1 tasks on your Quads?

In my case v6.55 took 71,800 - 72,800 seconds/3232 point WU. v6.56 took 70,224 - 70,255 seconds.
All WUs were run in the 4+1 configuration. v6.56 wins on both points: less GPU time and MUCH less CPU time.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 5097 - Posted: 30 Dec 2008 | 20:35:49 UTC - in response to Message 5088.

These are the my thoughts ...

if it bothers you too much, I suggest you consider just pause the project for a week or so ...

That sounds like, don't make suggestions. If you don't like it leave. I don't think that's the attitude of the developers here. They seem very responsive. I think they welcome our suggestions.

I am here to test CUDA and to learn. In GENERAL, this class project is not that much of my interest. Never did like Biology. But I am madder at SaH so ... if I want to do CUDA, I need to do it here ... :)

That's fine for you, although running a project because you're mad at another one is questionable. I like this project. I'm here because I want to help the project by donating GPU time and by helping the development of the project by providing beta feedback. That's the point of being in beta IMO. That said I'd also like to help other projects by using CPU time efficiently, not wasting it needlessly.



I guess it did not come across as intended. I did not mean to imply shut up or leave. What I was trying to convey, and failed (sorry) was that one of the characteristics of BOINC projects will be a certain slowness in fixing many issues. The dichotomy of holiday periods is that the participants tend to spend more time BOINCing and looking at it and asking questions when the staff are also on vacation ...

Bad things also tend to happen on these occasions.

All I was trying to say was that if the loss of CPU time bothered you significantly it might be worthwhile to consider doing a different project. Not to leave, merely to pause, and I did use the word pause...

Simple fixes are not always so simple. We are having feeder issues... splitting the versions would most likely make these issues worse...

As to the other ...

When there are only two projects that have CUDA apps, well, you have only those to select from ... if you don't want to do one, you have little option but to do the other ...

And, though I am not a fan of Bio projects or math projects for that matter ... my project list is not as long as some, but I have contributed to 57 so far ... and as work is available I think I do work for abut 43 right now ... though many seem to be out of work a lot right at this point in time ... call it hmmm by last update I dropped work into 30 projects in the last 24 hours ...

Some more than others ... :)

When other project start doing CUDA I will be lowering my contribution here ...

Anyway miscommunication all around ... again, I apologize for the misunderstanding ...

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5098 - Posted: 30 Dec 2008 | 20:54:55 UTC - in response to Message 5097.

Not a problem Paul. Thanks for the reply!

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 5403 - Posted: 9 Jan 2009 | 1:28:50 UTC

Any news on a fixed v6.56 client or at least sending the v6.56 app to 32bit machines? Never have gotten a response concerning this issue.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6173 - Posted: 29 Jan 2009 | 15:27:21 UTC

Looks like the problem is fixed now with v6.62. Thanks!

Post to thread

Message boards : Graphics cards (GPUs) : Wasted CPU time - a simple fix

//