Advanced search

Message boards : Graphics cards (GPUs) : ... still babysiting ...

Author Message
Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4847 - Posted: 25 Dec 2008 | 13:41:50 UTC

The scheduler in the 6.5.0 must still have problems with long time projects. I've CPDN (2 tasks, one with over 200 hours, one with 1600 hours), PrimeGrid, MilkyWay and ABC running as CPU tasks. In this case the BM don't ask for work automatically. If I stop one or 2 projects, I get this call and answer:

25.12.2008 13:33:54|GPUGRID|Sending scheduler request: Requested by user. Requesting 167135 seconds of work, reporting 0 completed tasks
25.12.2008 13:33:59|GPUGRID|Scheduler request completed: got 0 new tasks
25.12.2008 13:33:59|GPUGRID|Message from server: No work sent
25.12.2008 13:33:59|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
25.12.2008 13:33:59|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

If I stop everything I get:

25.12.2008 14:06:34|GPUGRID|Sending scheduler request: Requested by user. Requesting 218577 seconds of work, reporting 1 completed tasks
25.12.2008 14:08:15|GPUGRID|Scheduler request completed: got 1 new tasks
25.12.2008 14:08:47|GPUGRID|Sending scheduler request: To fetch work. Requesting 216986 seconds of work, reporting 0 completed tasks
25.12.2008 14:08:52|GPUGRID|Scheduler request completed: got 1 new tasks
25.12.2008 14:09:23|GPUGRID|Sending scheduler request: To fetch work. Requesting 216750 seconds of work, reporting 0 completed tasks
25.12.2008 14:09:28|GPUGRID|Scheduler request completed: got 1 new tasks

The behavior of the scheduler of the 6.3.21 was much better. If I leave home now for more than 24 hours I have to downgrade or my boxes are running all dry.

____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4848 - Posted: 25 Dec 2008 | 13:57:11 UTC

You can also try to extend the cache size. I know we should if we are on HS connections run with a lean cache, but, I had to up mine to 0.4 days to get work reliably.

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4850 - Posted: 25 Dec 2008 | 14:09:25 UTC - in response to Message 4848.

You can also try to extend the cache size. I know we should if we are on HS connections run with a lean cache, but, I had to up mine to 0.4 days to get work reliably.


My work cache is set to 2.00 days. The problem exist on the boxes with the fast cards (GTX280 and GTX260²).

BTW: The GTX280 runs more than 30% faster with 3+1, the GTX260² runs fine with 4+1, all with Vista 64 bit.

____________

Profile [BOINC@Poland]AiDec
Send message
Joined: 2 Sep 08
Posts: 53
Credit: 9,213,937
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 4851 - Posted: 25 Dec 2008 | 15:04:49 UTC

I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended).
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4860 - Posted: 25 Dec 2008 | 23:16:56 UTC - in response to Message 4851.

I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended).


Me too, on 6.4.2.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile DoctorNow
Avatar
Send message
Joined: 18 Aug 07
Posts: 83
Credit: 122,995,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4863 - Posted: 26 Dec 2008 | 7:53:49 UTC - in response to Message 4860.

I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended).


Me too, on 6.4.2.

Yep, me too now, even on 6.3.21. :-\
And I first thought, it only was a client problem of 6.5.0, that's why I switched back because I didn't had this problem before with the 6.3.21...
____________
Member of BOINC@Heidelberg and ATA!

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4865 - Posted: 26 Dec 2008 | 8:30:27 UTC

Just to dip my oar, my experience is different. I am running 6.5.0 and it has been returning and fetching work normally for me. Though I just got one task with a 168 hour run time... forcing the prior task into high priority (it just completed). Now that it is running the time is coming down quite nicely thank you.

Even more interesting is the task *AFTER* that came in at 17:25 which for the long task is about the run time I would expect.

YMMV

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4867 - Posted: 26 Dec 2008 | 11:49:27 UTC

I have a mix of long time and short time projects running. On MilkyWay and PrimGrid I get also work if the BM calls for one second free time. One CPDN task has work for 850 hours. So the calls for work on GPUGrid are too short to get work. If I stop CPDN, MW and PG calls immediately a lot of WUs and after restart CPDN some projects going in high prio mode. I have to set all projects (without GPUGrid) to NNW and then to stop. That's the only way for me to get new work on this Vista 64 machine with the GTX280.

But I don't understand, that the other box with the same OS, a GTX20², the same projects and 2 CPDN WUS with aggregate 1232 hours work has not the problem.

I think, the main problem is, that the calculated free time for a work call is the same for CPU und GPU tasks. Is it possible, to make a different calculaton of free time for the WU calls for GPU and CPU work?
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4873 - Posted: 26 Dec 2008 | 13:00:36 UTC - in response to Message 4867.

I have a mix of long time and short time projects running. On MilkyWay and PrimGrid I get also work if the BM calls for one second free time. One CPDN task has work for 850 hours. So the calls for work on GPUGrid are too short to get work. If I stop CPDN, MW and PG calls immediately a lot of WUs and after restart CPDN some projects going in high prio mode. I have to set all projects (without GPUGrid) to NNW and then to stop. That's the only way for me to get new work on this Vista 64 machine with the GTX280.

But I don't understand, that the other box with the same OS, a GTX20², the same projects and 2 CPDN WUS with aggregate 1232 hours work has not the problem.

I think, the main problem is, that the calculated free time for a work call is the same for CPU und GPU tasks. Is it possible, to make a different calculaton of free time for the WU calls for GPU and CPU work?

This is the subject of a post in the SaH NC forum where I discuss how GPU processing breaks the resource share model ... which is the basis for making these calculations.

Splitting the model to have two separate calculations only works when the project is pure CPU or Pure GPU processing. That model also breaks when you have a situation like SaH where you have capabilities to run on both processing elements.

More interesting to me is the fact that the long neglected issues with credit calculations *COULD* have been the solution to this conundrum. Even sadder is that we predicted issues such as this back in beta testing of BOINC when discussing the future. Unfortunately the developers kept telling us we did not understand and that a correctly operating credit calculation system was not important.

However, if we did have correct characterization of the CPU and GPU as to their capabilities in processing as defined by the original model of Cobblestones, then we would know what the processing capabilities of each processor system was, from there you know the total capacity, can look at the current loading, and allocate the resources. From THERE, you can ask for the correct type of work to properly "balance" the resource allocation ... and so on ...

I hate to be right all the time ... :)

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4878 - Posted: 26 Dec 2008 | 15:57:38 UTC

... and again ... it's no fun, can't grab new work ...

12/26/08 16:51:11|GPUGRID|Sending scheduler request: Requested by user. Requesting 115479 seconds of work, reporting 0 completed tasks
12/26/08 16:51:16|GPUGRID|Scheduler request completed: got 0 new tasks
12/26/08 16:51:16|GPUGRID|Message from server: No work sent
12/26/08 16:51:16|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/26/08 16:51:16|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
12/26/08 16:52:01|GPUGRID|Sending scheduler request: Requested by user. Requesting 186492 seconds of work, reporting 0 completed tasks
12/26/08 16:52:07|GPUGRID|Scheduler request completed: got 0 new tasks
12/26/08 16:52:07|GPUGRID|Message from server: No work sent
12/26/08 16:52:07|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
12/26/08 16:52:07|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/26/08 16:53:27|GPUGRID|Sending scheduler request: Requested by user. Requesting 255197 seconds of work, reporting 0 completed tasks
12/26/08 16:53:32|GPUGRID|Scheduler request completed: got 0 new tasks
12/26/08 16:53:32|GPUGRID|Message from server: No work sent
12/26/08 16:53:32|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/26/08 16:53:32|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
12/26/08 16:54:23|GPUGRID|Sending scheduler request: Requested by user. Requesting 255197 seconds of work, reporting 0 completed tasks
12/26/08 16:54:28|GPUGRID|Scheduler request completed: got 0 new tasks
12/26/08 16:54:28|GPUGRID|Message from server: No work sent
12/26/08 16:54:28|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
12/26/08 16:54:28|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.



____________

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4879 - Posted: 26 Dec 2008 | 16:16:55 UTC - in response to Message 4878.

Which hostid?

Please report also these issues on the boinc-alpha mailing lists.

gdf

Black Beard
Send message
Joined: 16 Nov 08
Posts: 7
Credit: 982,855
RAC: 0
Level
Gly
Scientific publications
watwat
Message 4881 - Posted: 26 Dec 2008 | 16:37:23 UTC

I'm having this problem also. In my case I got the impression that the problem was caused by everything wanting to run in 'high priority' mode.

If I want to download GPUGRID wu's I must suspend the other three projects I run on this machine. If I want to download wu's for any of the other projects I must suspend GPUGRID.

My host ID is 19688.

How can I stop all my projects from running in High priority mode?

I have my cache set to three days plus one day extra in my computing preferences.

Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4885 - Posted: 26 Dec 2008 | 19:14:08 UTC - in response to Message 4879.
Last modified: 26 Dec 2008 | 19:14:33 UTC

Which hostid?

Please report also these issues on the boinc-alpha mailing lists.

gdf


Host ID 7785
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4900 - Posted: 26 Dec 2008 | 23:27:26 UTC - in response to Message 4860.

I would like just to confirm this strange behavior (not requesting more work if other projects are not suspended).


Me too, on 6.4.2.


To add more detail:
- I have 2 CPU projects, one with many many WUs (due to some previuos error, nevermind) and a normal one
- all WUs (CPU+GPU) are in high priority mode due to this massive amount of cached WUs
- cache size is set to 1.25 days
- GPU-Grid has 37.5% ressource share

And even if the current GPU-WU is down to a few hours of runtime BOINC won't request new work, until I suspend the project with many WUs.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4912 - Posted: 27 Dec 2008 | 0:46:39 UTC

Just a note, Dr. Anderson has aknowledged that there are issues with with the work fetch policy which is contributing to our misery. He plans to start working on this soon ...

Others have chimed in with suggestions (including me) and hopefully he will actually look at the real problem which is a little bit bigger ... but that is only Paul's opinion ...

In the mean time we will have to fiddle with it to get work I think ...

Post to thread

Message boards : Graphics cards (GPUs) : ... still babysiting ...

//