Advanced search

Message boards : Number crunching : Stealth CUDA?

Author Message
Gerry Rough
Avatar
Send message
Joined: 28 Sep 09
Posts: 15
Credit: 3,688,434
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 13015 - Posted: 5 Oct 2009 | 1:42:25 UTC
Last modified: 5 Oct 2009 | 1:49:59 UTC

I've noticed that my CUDA device is never seen as running on the tasks tab (always "waiting to run"), yet when I look an hour later, the time to completion is one hour less, and the elapsed time is one hour more, as if to say that the 9800 GT is indeed working, but doesn't show up as running. Go figure!

So,

1) Is this normal?

2) My elapsed time shows ~18 hours to run a WU, but my tasks are showing up as 1.5 to 2 hours worth of time to complete when they validate. What gives here?

Here are a couple of my WUs here and here. I do know one thing for sure, I am not finishing my WUs and getting credit for anywhere near the two hour stated and validated times for my WUs. Something does sound a bit fishy. But then again, these are CUDA devices! :-)

Win Vista
CUDA device: GeForce 9800 GT (driver version 19038, compute capability 1.1, 512MB, est. 65GFLOPS)
[edit:] BOINC version: 6.6.36
____________

(Click for detailed stats)

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13017 - Posted: 5 Oct 2009 | 6:41:51 UTC
Last modified: 5 Oct 2009 | 6:45:19 UTC

I cannot remember when and which versions started accounting for the GPU time more correctly ... we would have to go back through the notes to find where they made the changes and which version actually started to get it right.

In the early versions the CPU time expended was accounted for and not the GPU time. Since there is almost no CPU time expended...

This is complicated by the issue that the server side software also does not do the accounting correctly for almost all versions. On GPU Grid the actual run time is only found inside the task: or in the case of task 1331486 the time was documented thusly: "Approximate elapsed time for entire WU: 57307.425 s", or about 16 hours which based on my experience with my 9800GT card is normal to low ... the 7,160 accounts for only the number of CPU seconds consumed in the processing of the task ...

Hope this helps ...

{edit}
Oh, yes it is about normal ...

{edit2}
And the credit granted is essentially a "flat rate" award and the 3K looks about right too ... the payment is about 2,500 to 5,800 depending. Somewhere GDF explains all this but I don't have enough energy to hunt it down my disability is resurgent, if you cannot find it maybe some kind sole will remember where it is documented ... FAQ section?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1423
Credit: 3,517,431,451
RAC: 646,139
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13019 - Posted: 5 Oct 2009 | 8:17:45 UTC - in response to Message 13015.

I've noticed that my CUDA device is never seen as running on the tasks tab (always "waiting to run"), yet when I look an hour later, the time to completion is one hour less, and the elapsed time is one hour more, as if to say that the 9800 GT is indeed working, but doesn't show up as running. Go figure!

That sounds like you have the "Don't use GPU while the computer is in use" option set, and it's working as intended:

If you look at the task, it will always be 'waiting to run' - waiting for you to go away!

But while you're not looking, it will be running - you could see that with a remote monitoring tool like BoincView, or even a remote copy of BOINC Manager on another machine.

Gerry Rough
Avatar
Send message
Joined: 28 Sep 09
Posts: 15
Credit: 3,688,434
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 13020 - Posted: 5 Oct 2009 | 10:14:04 UTC - in response to Message 13019.

I've noticed that my CUDA device is never seen as running on the tasks tab (always "waiting to run"), yet when I look an hour later, the time to completion is one hour less, and the elapsed time is one hour more, as if to say that the 9800 GT is indeed working, but doesn't show up as running. Go figure!

That sounds like you have the "Don't use GPU while the computer is in use" option set, and it's working as intended:

If you look at the task, it will always be 'waiting to run' - waiting for you to go away!

But while you're not looking, it will be running - you could see that with a remote monitoring tool like BoincView, or even a remote copy of BOINC Manager on another machine.


That's what I thought too when I first encountered it. But here are my preferences from GPUGrid:

Suspend work while computer is in use? no

Suspend GPU work while computer is in use? no

And again in my BAM! preferences:

Do work while computer is in use? Yes

Use GPU while computer is in use? Yes

So that idea didn't work out too well. I think Paul's idea makes more sense: since the cpu time is very small, perhaps it shows up as waiting because the cpu is running under the radar, so to speak. BOINC isn't noticing the work being done because the cpu is being used so little.

I wondered at the beginning as well if any or all of my BOINC version, driver version, or compute capability were the cause of the angst. Here is a copy and paste from my first post:

CUDA device: GeForce 9800 GT (driver version 19038, compute capability 1.1, 512MB, est. 65GFLOPS)
BOINC version: 6.6.36

It does seem though that the WUs are running fine and within normal expected times, so it is not as though something is causing an error or the WU is not apparently, if not visually, running normally.
____________

(Click for detailed stats)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1423
Credit: 3,517,431,451
RAC: 646,139
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13022 - Posted: 5 Oct 2009 | 10:30:18 UTC - in response to Message 13020.

Have you looked in your message log?

You may, or may not, see any 'suspending' messages - that version doesn't show 'preempting' messages for normal CPU task switches, and I've moved on to testing v6.10.10/11, so I can't check.

But it will certainly show 'resuming' messages if my theory is right.

I've never seen 'waiting to run' unless the task is genuinely stopped for some reason. I don't think that theory holds water.

There is a third copy of the switch, in the manager's advanced/preferences.

Gerry Rough
Avatar
Send message
Joined: 28 Sep 09
Posts: 15
Credit: 3,688,434
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 13023 - Posted: 5 Oct 2009 | 12:25:42 UTC - in response to Message 13022.

Have you looked in your message log?

You may, or may not, see any 'suspending' messages - that version doesn't show 'preempting' messages for normal CPU task switches, and I've moved on to testing v6.10.10/11, so I can't check.

But it will certainly show 'resuming' messages if my theory is right.

I've never seen 'waiting to run' unless the task is genuinely stopped for some reason. I don't think that theory holds water.

There is a third copy of the switch, in the manager's advanced/preferences.


I think you must have eaten your Wheaties this morning! Went to the messages tab, and I did find a few 'resuming' messages and no 'preempting' messages. That seems to explain it. Then went to the advanced tab and checked the 'Use GPU while computer is in use' box.

WOW!! This BOINC thingy really works like I tell it to! Go Figure! :-)
____________

(Click for detailed stats)

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13024 - Posted: 5 Oct 2009 | 17:43:01 UTC - in response to Message 13023.

Have you looked in your message log?

You may, or may not, see any 'suspending' messages - that version doesn't show 'preempting' messages for normal CPU task switches, and I've moved on to testing v6.10.10/11, so I can't check.

But it will certainly show 'resuming' messages if my theory is right.

I've never seen 'waiting to run' unless the task is genuinely stopped for some reason. I don't think that theory holds water.

There is a third copy of the switch, in the manager's advanced/preferences.


I think you must have eaten your Wheaties this morning! Went to the messages tab, and I did find a few 'resuming' messages and no 'preempting' messages. That seems to explain it. Then went to the advanced tab and checked the 'Use GPU while computer is in use' box.

WOW!! This BOINC thingy really works like I tell it to! Go Figure! :-)

Just remember, you have now set local preferences on that machine and the web based preferences are not going to affect that machine's operation. I don't know why it did not propagate the preferences correctly though you are using a version, at least for GPU purposes, that is fairly well out of date.

As to the run times they are the cumulative of the actual consumption, but not of wall clock time so you could have taken a week to run up those numbers. Again, these are UI and Web issues that folks like Richard have been nagging about for like forever ... anyway, it is good to know you are going again...

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13025 - Posted: 5 Oct 2009 | 17:44:58 UTC - in response to Message 13022.

Have you looked in your message log?

You may, or may not, see any 'suspending' messages - that version doesn't show 'preempting' messages for normal CPU task switches, and I've moved on to testing v6.10.10/11, so I can't check.

Just a question, does 6.10.11? I know we asked for those messages to be made available .... were they added in the later versions? Or are they still among the missing?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1423
Credit: 3,517,431,451
RAC: 646,139
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13026 - Posted: 5 Oct 2009 | 19:29:19 UTC - in response to Message 13025.

Have you looked in your message log?

You may, or may not, see any 'suspending' messages - that version doesn't show 'preempting' messages for normal CPU task switches, and I've moved on to testing v6.10.10/11, so I can't check.

Just a question, does 6.10.11? I know we asked for those messages to be made available .... were they added in the later versions? Or are they still among the missing?

Testing with v6.10.10.....

05/10/2009 19:58:47 GPUGRID [cpu_sched] Preempting p845000-IBUCH_005_pYEEI_2909-4-20-RND5912_1 (removed from memory)
05/10/2009 19:59:29 GPUGRID [cpu_sched] Starting p845000-IBUCH_005_pYEEI_2909-4-20-RND5912_1(resume)
05/10/2009 19:59:29 GPUGRID Restarting task p845000-IBUCH_005_pYEEI_2909-4-20-RND5912_1 using acemd version 671

No, looks like it's still debug only for the preempt.

And this particular one doesn't obey 'Run according to preferences' - mine are set to 'Run always', and I still got the behaviour quoted. A couple (batteries, exclusive app) were moved from 'always' to 'prefernces' for v6.10.10, but evidently not this.

We ought to get to the bottom of this preferences malarky. Gerry, if you wouldn't mind helping with some research (consider it a favour returned), could you:

Have a look inside global_prefs.xml (*) - do you see a tag for <run_gpu_if_user_active>? (about six lines down in mine). If so, what is its value (0 or 1)?

Then, the same question about global_prefs_override.xml - is the tag there (third line this time), and can you confirm the value is 1? (It should be, as Paul says you've just created the file)

I'm rather expecting, unless the Wheaties have worn off, that the tag line will be missing from the main global_prefs.xml file. If so, please could you try changing the setting on the website, then changing it back again, and finally updating the project? I'm wondering if that would cause a missing setting to be added to the file. In your case, it's a bit complicated by your use of BAM - I don't use that system, so you know best which way to change settings - I'll leave the details to you.

Finally, once the <run_gpu_if_user_active> tag is properly in the main preferences file(s) and set to 1, could you go into BOINC Manager advanced|Preferences again, and click 'Clear' - that should delete the _override file and restore normal running under BAM/web control, but without stealth mode.

[ * all files referenced live in your BOINC Data folder - see second or third line in messages at startup for location ]

Gerry Rough
Avatar
Send message
Joined: 28 Sep 09
Posts: 15
Credit: 3,688,434
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 13033 - Posted: 6 Oct 2009 | 2:38:57 UTC - in response to Message 13026.

We ought to get to the bottom of this preferences malarky. Gerry, if you wouldn't mind helping with some research (consider it a favour returned), could you:

Have a look inside global_prefs.xml (*) - do you see a tag for <run_gpu_if_user_active>? (about six lines down in mine). If so, what is its value (0 or 1)?


I think I see that:

<run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active />


Then, the same question about global_prefs_override.xml - is the tag there (third line this time), and can you confirm the value is 1? (It should be, as Paul says you've just created the file)


Here it is:

run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active>1</run_gpu_if_user_active>


I'm rather expecting, unless the Wheaties have worn off, that the tag line will be missing from the main global_prefs.xml file. If so, please could you try changing the setting on the website, then changing it back again, and finally updating the project? I'm wondering if that would cause a missing setting to be added to the file. In your case, it's a bit complicated by your use of BAM - I don't use that system, so you know best which way to change settings - I'll leave the details to you.

Finally, once the <run_gpu_if_user_active> tag is properly in the main preferences file(s) and set to 1, could you go into BOINC Manager advanced|Preferences again, and click 'Clear' - that should delete the _override file and restore normal running under BAM/web control, but without stealth mode.


Since the Preferences were set to 1 already as you suggested, I went into the advanced|preferences tab and hit the clear button. Updated/synchronized BOINC with the web based BAM! settings (tools tab|synchronize with BAM!: that is how they synchronize in BAM!) and everything seems to be running fine: CUDA WU is running smoothly in regular mode and visible. I think ya done did it!

Also, what version should I be using for BOINC on that machine for my CUDA, or should I just wait for the new stable version to come out. 6.6.36 might not be the latest, but it seems to be running stable and without hiccups for now.

Any other update suggestions would be welcome. HTH
____________

(Click for detailed stats)

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13043 - Posted: 6 Oct 2009 | 9:07:18 UTC

*MY* update suggestion to you is that if it is working now ... don't ...

6.10.x is mostly about ATI cards though some minor bug fixes are also being added... the thing is, if they are not bothering you (the bugs that is) ... don't bother to update.

Oh, and I hope the missing angle bracket is just a missed copy paste ... (second set, first line, first character < is missing)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1423
Credit: 3,517,431,451
RAC: 646,139
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13044 - Posted: 6 Oct 2009 | 9:38:56 UTC - in response to Message 13033.

The first set is interesting:

<run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active />

That space before the closing / has been causing all sorts of problems with optimised and 'anonymous platform' xml files. Although it appears (and in standard XML would be accepted) to set the flag, I wonder if BOINC failed to parse it correctly and hence created the 'stealth mode'?

Where could it have come from - BOINC's own web code, or is that one you set through BAM?

My samples yesterday were copied from a machine running BOINC preferences only, with no contact with BAM - and they all showed the explicit 0 1 format. So I suspect BAM.

FWIW, the parsing bug should finally have been eliminated with the very latest BOINC v6.10.13, released for alpha testing last night. But I agree with Paul: that on its own is not a sufficient reason for upgrading!

Gerry Rough
Avatar
Send message
Joined: 28 Sep 09
Posts: 15
Credit: 3,688,434
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 13046 - Posted: 6 Oct 2009 | 11:59:28 UTC - in response to Message 13043.

*MY* update suggestion to you is that if it is working now ... don't ...

6.10.x is mostly about ATI cards though some minor bug fixes are also being added... the thing is, if they are not bothering you (the bugs that is) ... don't bother to update.

Oh, and I hope the missing angle bracket is just a missed copy paste ... (second set, first line, first character < is missing)


Correct. Just missed the opening character in the copy and paste. That would have really fouled up the works, much like missing the first bracket in a BBCode tag. Yikes!
____________

(Click for detailed stats)

Gerry Rough
Avatar
Send message
Joined: 28 Sep 09
Posts: 15
Credit: 3,688,434
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 13047 - Posted: 6 Oct 2009 | 12:11:37 UTC - in response to Message 13044.


FWIW, the parsing bug should finally have been eliminated with the very latest BOINC v6.10.13, released for alpha testing last night. But I agree with Paul: that on its own is not a sufficient reason for upgrading!


Good. But actually I was thinking about the other upgrades when I wrote that: the compute capability has been upgraded, at least if I read that correctly elsewhere here at GPUGrid, and the driver version. But since everything looks good, perhaps it is good to leave well enough alone.

Still getting used to being conscious of the little things while running GPUGrid. I've been with alphas before, but with CUDA code and an entirely new technology to help out with, I suppose you can never be too careful to get it right. :-/
____________

(Click for detailed stats)

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13056 - Posted: 6 Oct 2009 | 20:05:53 UTC - in response to Message 13047.

Still getting used to being conscious of the little things while running GPUGrid. I've been with alphas before, but with CUDA code and an entirely new technology to help out with, I suppose you can never be too careful to get it right. :-/

For what it is worth the GPU addition has all the flavor of the days of BOINC Beta where almost every day was an adventure and we went through version after version trying to get something that would work and be stable.

The only good news is that we have several projects that reliably issue work for GPUs, two for ATI, hope for OpenCL in the works and BOINC versions that almost work correctly ...

Sadly we also seem to be duplicating the mode where a change will get in and it is nearly impossible to get it back out (Strict FIFO on GPU tasks) even though it is deadly to most users of BOINC ... (FIFO is great for single project users, death if you want a queue and resource allocations to your liking)...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13150 - Posted: 12 Oct 2009 | 10:18:24 UTC - in response to Message 13024.



WOW!! This BOINC thingy really works like I tell it to! Go Figure! :-)

Just remember, you have now set local preferences on that machine and the web based preferences are not going to affect that machine's operation. I don't know why it did not propagate the preferences correctly though you are using a version, at least for GPU purposes, that is fairly well out of date.
[/quote]

It is a "Default Boinc" Location Thing!

If you configure a Work location but you dont add your computer to the Work group, the configurations you set wont propagate to your system.

Post to thread

Message boards : Number crunching : Stealth CUDA?

//