Author |
Message |
|
Hi all,
I've been aborting most of the WUs sent to my machine at the advice of the messages appearing in the BOINC manager, i.e., the WU is 1 or 2 days overdue and I likely won't get credit for it. I am assuming that, even if I let it run to completion, that the effort will be wasted since the WU would be sent to someone else and they MIGHT BE ABLE to run in the allocated time.
So, it seems to me that the report deadlines are unrealisticly early for my NVidia Quadro FX 1700 running with my Intel Xeon E5410 2.33 GHz CPU.
If this continues to be the case, I'll just detach from the project since my CPU cycles are being wasted.
Is anybody else, particularly those running the WU server, aware of this problem?
Scott Howard |
|
|
|
Hi all,
So, it seems to me that the report deadlines are unrealisticly early for my NVidia Quadro FX 1700 running with my Intel Xeon E5410 2.33 GHz CPU.
If this continues to be the case, I'll just detach from the project since my CPU cycles are being wasted.
Is anybody else, particularly those running the WU server, aware of this problem?
Scott Howard
Your FX1700 has 32 shaders, so like other cards (9500GT, etc.), it should be able to run within the 4-day deadline. However, it will always be very close and will frequently benefit from manual reporting. Your shader clock is quite low, also, which is why you are considerably slower than other 32 shader cards. If possible, you should look into significantly overclocking the shaders (as well as core and memory clocks--DDR2 on the card is quite slow). Even with an OC, you can expect 48-hour+ calculation times for most workunits, with the longer workunits (32xx credit and 29xx credit) really pushing the 4-day limit.
As cards go, this is about as borderline as it gets, but similar 8600GT cards have been running somewhat successfully...tough call to say whether it is really worth it???
|
|
|
|
OK, I just got another GPUGRID WU yesterday. CPU time (11:07:07) + To Completion time (66:25:13) is about 76 hours. Now, I'm given 96 hours to complete the WU which means that there are about 19 hours of slack time.
I use the computer about 9 hours a day over those 4 days, that's about 36 hours that are unavailable for GPU usage. I cannot complete the WU before the deadline based on my daily usage pattern.
There's the problem.
Now, when I look at the other BOINC projects, I see report deadlines ranging from the 21st to the 26th. I always complete the WUs on time for those projects even though they're loading my machine across several projects up to keep my 8 CPUs happily chugging away.
It seems to me that the report deadlines for the GPUGRID project are unrealistic. Either the manager of the project doesn't grasp the fact that we are volunteering time from machines that are used regualarly, or their picture of the world is that it only consists of top end processors.
If I've drawn incorrect conclusions from the data at hand, please let me know where my mistake is.
:-) |
|
|
|
OK, I just got another GPUGRID WU yesterday. CPU time (11:07:07) + To Completion time (66:25:13) is about 76 hours. Now, I'm given 96 hours to complete the WU which means that there are about 19 hours of slack time.
I do not believe that you add CPU and GPU time to get the total run time. Your WU elapsed times listed under your machine's completed tasks show 65-66 hours.
I use the computer about 9 hours a day over those 4 days, that's about 36 hours that are unavailable for GPU usage. I cannot complete the WU before the deadline based on my daily usage pattern.
Your shader clock being really low (918000) makes this impossible I think. If you can overclock the shaders into the 1500000 range you should be able to get well under the 60-hour threshold.
Now, when I look at the other BOINC projects, I see report deadlines ranging from the 21st to the 26th. I always complete the WUs on time for those projects even though they're loading my machine across several projects up to keep my 8 CPUs happily chugging away.
It seems to me that the report deadlines for the GPUGRID project are unrealistic. Either the manager of the project doesn't grasp the fact that we are volunteering time from machines that are used regualarly, or their picture of the world is that it only consists of top end processors.
If I've drawn incorrect conclusions from the data at hand, please let me know where my mistake is.
It is my understanding that the 4-day deadline here is necessary since the work builds upon itself...that is, batches of new work depend on completed previous work (Someone please correct me if I am wrong on this). The project is fairly clear in noting what cards are supported in its FAQ section. 32 shader cards will run, but are not recommended due to issues with meeting deadlines. Such cards with higher shader clocks seem to be fine (e.g., see my 9500GT), but such cards may require more 'baby-sitting' to make sure that completed work is reported quickly after it is finished.
Also, Low-range, Mid-range, and Top-end are difficult terms to apply across product lines. In the Quadro series, the FX1700 is marketed at the bottom of the mid-range given the advantages it has over other Quadro cards and particular features with CAD and other business apps. However, it is based on the G84 chip series which (given its modest clocking) is comparable in the GeForce card series to the 8600GT, which by today's standard is considered more of a low-end (albeit, top of the low-end) card. GPUGRID is designed for mid-range cards and up, with those 32 shader cards (FX1700, 8600GT, 8600GTS, 9500GT) that lie on the low-mid range border also being on the borderline for successful workunit completion.
|
|
|
j2satxSend message
Joined: 16 Dec 08 Posts: 20 Credit: 36,912,780 RAC: 0 Level
Scientific publications
|
So after umpteen messages about these weaker cards, the project should just remove 32 shader cards from the acceptable list.
|
|
|
|
So after umpteen messages about these weaker cards, the project should just remove 32 shader cards from the acceptable list.
32 shader cards are not recommended on the project FAQ page, but they should all be capable of completing the various workunit types within the 4-day deadline. Even the slowly clocked FX1700 discussed in this thread can do it, but it would need close to 24/7 crunching to do so. This kind of situation is why they are not recommended, I think.
The real problem with the FAQ is that the 16 shader cards are also in the not recommended grouping (grey text) giving the impression that they are the borderline cards rather than the 32 shader cards. However, the performance of the 32 shader cards shows fairly clearly that the 16 shader cards cannot ever complete any of the current workunits under the 4-day deadline. Thus, these cards really should be listed as CUDA capable but not able to run the project (red text).
|
|
|
|
I think I agree with j2satx, raise the bar on the acceptable cards.
I am content with letting this machine crunch away on the other BOINC projects while I am home at night.
Thanks for the explanation and suggestions.
<disconnect> |
|
|
|
Hi, i have 9800GT nvidia card.
I get 4 days deadlines on WU.
If i use pc for many hours per day, i can finish them normally, the problem is that somedays i dont use the pc or i use for several hours.
I suggest to give 7-days deadline.
The other project i run (seti, rosetta, einstein, worldgrid) dont have this strange limit: they give much more time.
Like this, can happens we waste some WU...
I hope you find my suggest useful. |
|
|
koschi Send message
Joined: 14 Aug 08 Posts: 124 Credit: 792,979,198 RAC: 11,592 Level
Scientific publications
|
Unfortunately I can't update my post in the FAQ section any longer, otherwise I could do so, of course ;-)
Both http://www.gpugrid.net/join.php?sys=gpu and my FAQ entry state that cards with at least 50/64 shader units are recommended. As there are no cards with 50 units on the market, those with 64 shader units should be the minimum.
When I designed the overview back in August, it was more a question of the computing capability if cards can participate or not. A lot of people had quite powerful Geforce 8800GTS 320/640, GTX or Ultra but couldn't use them because they don't support computing capability 1.1. The listing was intended to bring some light into the jungle of chip versions vs. computing capability and number of shader units.
From my point the overview is clear. Red doesn't work at all, green works perfect and is recommended by the project while grey cards are able to run CUDA 2.0 applications (they support computing capability 1.1) but are to slow and hence not recommended.
If you decide to run it on cards that are not recommended, well then you have to live with the consequences. If you don't mind some babysitting, then a 32 shader board might work for you. In case babysitting annoys you, better stay with the recommended cards. |
|
|
|
For some projects the short deadlines are imposed by the work that they are trying to do ... Milkyway needs the work returned because the next set of tasks builds on the data returned by the current set... LHC needs all of the batch returned before they can do any analysis ... not sure what the reason is behind the GPU Grid deadline but it is likely something similar...
No project is going to impose a deadline that is tighter than the minimum needed ... it is plain foolishness to think otherwise ... not that you are foolish, but, the point is that they would love to have more people doing work, but they cannot change the needs and demands of the research.
That said, I have a 9800GT and it is well able to complete the work within the deadlines.
I can only assume that you only have your computer on for limited times... is that correct?
If that is the case, and you cannot turn it on each day and let it work on its own, then projects like CPDN with their insanely long deadlines are a better fit for you ...
I know that BOINC was intended to only work on computers to put them to work during idle moments, but, those of us that are silly enough to dedicate resources 24/7 allows other projects with stricter needs to be able to tap into our lunacy ... i mean dedication ...
____________
|
|
|
|
...As there are no cards with 50 units on the market, those with 64 shader units should be the minimum.
The new version of the 9600GSO (with 512MB/1GB at 256-bit on G94 chip) has 48 shaders.
|
|
|
koschi Send message
Joined: 14 Aug 08 Posts: 124 Credit: 792,979,198 RAC: 11,592 Level
Scientific publications
|
Woohoo.... Nvidia brings some more confusion to the field ^^ Thanks for info...
Clock speeds are the same as the 9600GT ones while it has only 3/4 of the shader units. Assuming 24h per WU on a 9600GT the new GSO should be able to calculate one unit in ~32h. On a 24/7 dual core machine no babysitting would be needed, while on a quad 2 WUs would reach the deadline. On the paper (GFlops) the old GSO is 1,7x faster... |
|
|
|
The deathline must necessarily be increased. |
|
|
chimmySend message
Joined: 24 Feb 09 Posts: 14 Credit: 1,261,660 RAC: 0 Level
Scientific publications
|
I'll throw my hat in here too. Even *One* more day would complete more WU's in a shorter amount of time. My last WU (http://www.gpugrid.net/workunit.php?wuid=263896) I couldn't complete in the timeframe.
Only time I stop BOINC is when I need full CPU/GPU, which isn't very often. I understand the need to get results in a short amount of time due to the combined WU's giving an actual result. My last WU which I didn't complete on time is now assigned to someone else. If I had 3hrs more, the task would have completed. Now it's going to be a few more DAYS to complete. I've looked over my BOINC logs and can see that 99.9986% of the time, GPUGrid and SETI are both running. I have two 9600 GT's in NON-SLI config. I see the frequent scheduler requests to GPUGrid, about every 3hrs.
Any suggestions/assistance would be appreciated as well.
But seriously, give 96hrs.
Thanks,
Jim |
|
|
|
@Chimmy,
THe problem is that this gets into a never ending spiral. Extending the deadline means people with ever more marginal cards try to do work ... then they insist that if we only extended the deadline another x hours all would be well ... it would never end ... heck, people in SaH complain about the deadline and the inability to download 10,000 tasks ... :)
If you are running SaH on the GPU that could be the source of the problem. If SaH is "stealing" time from the GPU tasks, well, you are going to overrun the deadline.
As to the DAYS ... well, if they send that task to my i7 computer worse case, 7.75 hours for the just started task, 7.75 hours for the task queued ahead of the task just obtained ... or just under a day ... better case is just over 15 hours (tasks on this machine take from 5-7.75 hours and I run 4 at a time with 4 queued) ... and if they fiddle with the deadline, I would actually run it next after the next task completes ... See the thread where ignasi says that the scheduler was updated ... it might even be better on the system where I have a GTX280 card ... hard to say ... At any rate the i7 box is doing 15 tasks a day ...
Now, if I had a faster card it would be even less of an issue for a rapid return ... |
|
|
chimmySend message
Joined: 24 Feb 09 Posts: 14 Credit: 1,261,660 RAC: 0 Level
Scientific publications
|
Good point on the never ending spiral. The main thing that I was seeing was that my cards were listed in the supported category, yet I wasn't able to complete WU's on time.
On having SaH running at the same time, with 2 cards and equally high resource shares, both run all the time. GPUGrid on card 0, and SaH on card 1. I thought that was causing the WU's to fail, but the BOINC logs show that both projects are always running.
Also are there 2 types of WU's? One that is shorter and one that is longer? I am seeing that some WU's give me credit around 2500 and others around 3700.
Thanks,
Jim |
|
|
|
Three general types with in my case a spread of 5-8 hours on my fastest card and 14-18 hours on my slowest. Note that there are some that take longer, up to 30 hours if I recall correctly (rare). The three types award different amounts of credit. As with all things there is some variation and some occasional tasks that are new and different ...
With the slower cards you have to run them 24/7 with no interruptions to make the deadlines. And there are three classes of "supported" in that list. I cannot recall where your card is exactly but I suspect that it is in the "yeah it may work but we don't recommend it ..." category.
The other problem is that you have to look carefully at the model and the internals of the cards too. Some models were released with different numbers of "shaders" and that makes a big difference in the speed. On slower cards the clocks also can be important.
What I would look for in the logs is if the tasks are being restarted all the time. Which is what *I* see in the return files. A successful task is restarted 10 times and an unsuccessful one more than 20 times ... I suspect that you are losing time on the tasks to SaH ... so, the GPU Grid task is suspended to run SaH and then the task is restarted. So, you are doing more SaH than GPU Grid.
The run time for the short success task is only 15 hours and the long task is 26 hours ... either well within the 4 day spread allowed. So is the average turnaround at 2.29 days ... I think that SaH is stealing a lot more time than you suspect.
You either need to live with the loss, or, my thought would be to run only one project at a time until the scheduler gets to supporting GPU projects well ... run GPU Grid one week and SaH the next ... |
|
|
chimmySend message
Joined: 24 Feb 09 Posts: 14 Credit: 1,261,660 RAC: 0 Level
Scientific publications
|
According to this post: http://www.gpugrid.net/forum_thread.php?id=316, 9600 GT's are in the "Green cards will run the current applications without problems, if there is any it should be a software bug" category.
That is exactly what I looked for in the logs, the download, start(s), stop(s) and completion/upload time(s) for GPUGrid and SaH WU's.
I have only once observed a time when I had 2 SaH WU's going at the same time. Reexamining the logs, the WU' that seemed to take longer than the deadline was running continuously when BOINC was running. I do have to shutdown BOINC a couple of times a day but usually for less than 1 hour.
The WU's I've had so far usually complete with in 15-22 hours of running. I should have been clearer, in my previous post, are there WU's that take longer than those. It may have been that the WU I wasn't able to complete was one of the rare longer ones or there was some error that was preventing the WU from running/making progress.
I'm doing a bit more babysitting the next week or so to see if this stabilizes out.
Thanks,
Jim |
|
|
|
Much clearer ...
Sadly, I am about out of ideas...
HOWEVER, what I can tell you that your 9600 is about the same (roughly) as my 9800 as far as speeds goes and I run tasks successfully all the time. Of course, I also run 24/7 with no stops.
Is it possible to run for 24 plus hours with no interruptions? The question remaining is if it is because of the number of restarts or some other issue. If we can accumulate some long runs to see if the system can run tasks to completion then we may be able to posit that there may be a checkpointing issue...
If the tasks die even if not stopped and restarted then it may be some other issue ...
As to the rest, I like many others will hang in there about as long as you ... if you have patience with us suggesting off-the-wall ideas, we will be patient with you ... :) |
|
|
chimmySend message
Joined: 24 Feb 09 Posts: 14 Credit: 1,261,660 RAC: 0 Level
Scientific publications
|
I'm a bit out of ideas too :) But I can say that with the last batch of WU's I'm finishing them up on time (about 15.3hrs each).
I usually only stop BOINC 3-5 times a day. Most of those are for 15 minutes or so when I need to do something really CPU intensive. For those, I've switched to suspending the CPU projects and keeping the GPU projects running. The times when I need CPU & GPU (read: game time) then I completely exit.
I'm in software development so I'm REALLY good with off-the-wall/corner case scenarios/ideas/suggestions/beta testing and everything in between.
Just don't want BSOD's as I do actually use my computer for other stuff :)
Thanks,
Jim |
|
|
|
Three general types with in my case a spread of 5-8 hours on my fastest card and 14-18 hours on my slowest. Note that there are some that take longer, up to 30 hours if I recall correctly (rare). The three types award different amounts of credit. As with all things there is some variation and some occasional tasks that are new and different ...
If the 30hr+ WU's are so rare, can someone please explain why this is the only type of WU's that I seem to get...this has been happening for weeks...
____________
Q6600 @ 2.40GHz
Windows 7 - 64bit
8GB RAM
9800GX2 190.62
BOINC 6.10.9 Win64
ACTIVE PROJECTS
|
|
|
|
If the 30hr+ WU's are so rare, can someone please explain why this is the only type of WU's that I seem to get...this has been happening for weeks...
None of your last several workunits are the 30 hour kind; all are the 24xx credit variety that should only take about 10-11 hours of total crunch time each on your 9800 GX2. That is indeed the case for some of the work that shows around 40,000 sec for total completion time in your task list. However, other work is showing much longer completion times (twice as much frequently), which would indicate something wrong either with your set-up or perhaps the card itself.
|
|
|
|
I can complete individual work units fast enough. My issue is that 4 units are fetched at a time and they all have the deadline. Is there any way, other than setting my "Additional work buffer" to 0, to limit the number of tasks I have assigned at a time?
|
|
|
|
I can complete individual work units fast enough. My issue is that 4 units are fetched at a time and they all have the deadline. Is there any way, other than setting my "Additional work buffer" to 0, to limit the number of tasks I have assigned at a time?
No, sadly enough ...
I and even a couple project types asked for controls such as this to be added and we were told no ...
On the "Alpha" mailing list Dr. Anderson seems to be on a kick to eliminate configuration settings ... if anything, I am of the mind that there are far too few settings rather than too many. |
|
|
|
I can complete individual work units fast enough. My issue is that 4 units are fetched at a time and they all have the deadline. Is there any way, other than setting my "Additional work buffer" to 0, to limit the number of tasks I have assigned at a time?
Your 9800M GT is a 96 shader card with stock clocks (1250 shader clock). I have tested this with an 8800GS (also 96 shader), and all current work unit types can be completed in less than 24 hours with a 1500 shader clock (1450 or so might work, but I haven't tried it yet). If you can overclock to that range and run 24/7, you will not have any deadline issues. Given that the 9800M is a mobile GPU, you will need to be especially careful watching the heat issues when overclocking.
BTW, that is one heck of a laptop!
|
|
|
|
Your 9800M GT is a 96 shader card with stock clocks (1250 shader clock). I have tested this with an 8800GS (also 96 shader), and all current work unit types can be completed in less than 24 hours with a 1500 shader clock (1450 or so might work, but I haven't tried it yet). If you can overclock to that range and run 24/7, you will not have any deadline issues. Given that the 9800M is a mobile GPU, you will need to be especially careful watching the heat issues when overclocking.
I tried riva tuner but I'm not sure it works correctly. I'm running Vista x64 and I get the usigned driver ignored error on install. Are there other overclocking utilities I could try? So far the hottest my video card has gotten is 75C, from what I've seen that's not too bad, what would be a sane upper limit?
BTW, that is one heck of a laptop!
Thanks! It's my powerhouse machine I bought when I went to Iraq. I also have a 5.5TB storage server I built but it's all hard drives and little processing power.
Edit:
Is there anyway to correct the estimated time to a sane value? I have work units that estimate 16 hours and climb upward for a few hours before they start going down in time remaining. |
|
|
|
I tried riva tuner but I'm not sure it works correctly. I'm running Vista x64 and I get the usigned driver ignored error on install. Are there other overclocking utilities I could try? So far the hottest my video card has gotten is 75C, from what I've seen that's not too bad, what would be a sane upper limit?
Hmmm...Riva Tuner would have been my suggestion. Maybe some others here can suggest an alternative? On a laptop, 75C seems reasonable. My poor little T8100 has an 8400M GS which runs hotter than that (and it doesn't do crunching) and has been fine for months. The hottest it has gotten has been 92C, but mostly is in the 79-83C range. I think you need to worry the closer you get to the 100-105 range...as I recall, that is where frying a card begins.
Edit:
Is there anyway to correct the estimated time to a sane value? I have work units that estimate 16 hours and climb upward for a few hours before they start going down in time remaining.
I am afraid that this is a combination of BOINC and project-level issues that are not adjustable at the client level. Part of the issue is that the project has different types of work which vary in time to crunch, so estimated times are always going to be off a bit.
|
|
|