Advanced search

Message boards : Number crunching : Monster-WUs need much more time per step

Author Message
Profile Dieter Matuschek
Avatar
Send message
Joined: 28 Dec 08
Posts: 58
Credit: 231,884,297
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8325 - Posted: 9 Apr 2009 | 19:29:40 UTC

Now and then I catch some "monsters" of WUs like 508937 or 507386.
Normally my NVIDIA GTX 295 has some 36 ms per step.
These monsters have some 150 ms per step.

Both had ran on the same machine 29843 at the same time. Now WUs are running normally again.

This Intel-quad 6700 (stock-clocked) is used with XP SP3 (32bit) for BOINC (6.6.20) only with WCG and GPUGRID.
This behavior already took place earlier on other quads with a GTX 295 and other versions of BOINC 6.6.x)

Is this a prob of the WUs or are there hardware problems?


____________

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8340 - Posted: 10 Apr 2009 | 12:31:02 UTC - in response to Message 8325.

On a 295, they should not take more 50ms/step.

gdf

Profile Dieter Matuschek
Avatar
Send message
Joined: 28 Dec 08
Posts: 58
Credit: 231,884,297
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8344 - Posted: 10 Apr 2009 | 14:58:20 UTC - in response to Message 8340.

Thanks, GDF.

So there is an issue.

Here is a new one: 513712 on another Intel-quad. So this issue don't belong to a certain PC.

Both current WUs of this GTX 295 show the same behavior:
one WU (83% done) runs for 24 hours up to now, the other (15% done) 4 hours.
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8354 - Posted: 10 Apr 2009 | 18:37:17 UTC - in response to Message 8344.

Thanks, GDF.

So there is an issue.

Here is a new one: 513712 on another Intel-quad. So this issue don't belong to a certain PC.

Both current WUs of this GTX 295 show the same behavior:
one WU (83% done) runs for 24 hours up to now, the other (15% done) 4 hours.

Yes, but is it a problem with the tasks or BOINC version 6.6.20?

The only task where I saw the same LOOOONNNNNGGG step interval was when I ran 6.6.20 also ...

Since we only run tasks once, it is hard to say.

Now, if GDF would hand RE-ISSUE that task to, say, me ... we could see if it runs fine on a GTX295 while running 6.5.0 ...

Profile Dieter Matuschek
Avatar
Send message
Joined: 28 Dec 08
Posts: 58
Credit: 231,884,297
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8358 - Posted: 10 Apr 2009 | 21:10:23 UTC - in response to Message 8354.

@ Paul

From message 8325:

This behavior already took place earlier on other quads with a GTX 295 and other versions of BOINC 6.6.x

It may be that it is due to BOINC 6.6.x. (I truely hope that it's no damage of the GTX 295.)

The only task where I saw the same LOOOONNNNNGGG step interval was when I ran 6.6.20 also ...

At least, I am not alone ...
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8370 - Posted: 11 Apr 2009 | 4:29:55 UTC - in response to Message 8358.

@ Paul

From message 8325:
This behavior already took place earlier on other quads with a GTX 295 and other versions of BOINC 6.6.x

It may be that it is due to BOINC 6.6.x. (I truely hope that it's no damage of the GTX 295.)

The only task where I saw the same LOOOONNNNNGGG step interval was when I ran 6.6.20 also ...

At least, I am not alone ...

It did not damage mine ... been back to 6.5.0 for some time now and been fine. I have been tying to get up the energy to re-install a 6.6.2x version (now up to 6.6.22) to see if the problem comes back and I can turn on logs and capture an indication of what is causing the error.

Sigh ...

Not only do I not get paid for this, I lose work, and yammer-heads on the mailing lists have been accusing me of denigrating the efforts of the developers. I only do *THAT* on the boards ... :)

(And for issues that they well deserve denigration *FOR* ... like complaining that people won't help them and then ignoring input or not applying fixes for issues which are clearly identified and for which changes have been suggested/developed, or for applying changes to one version and then not doing proper configuration control and losing the change in subsequent versions).

Anyway, we will see when I can try ...

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 115,639
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 8372 - Posted: 11 Apr 2009 | 4:49:32 UTC - in response to Message 8370.

I've been running with 6.6.20 since it came out, and haven't noticed any WUs with unusually long step times.

But I'm only running a single GPU (GTX280).

Is everyone who is seeing this problem running multiple GPUs?


____________
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8374 - Posted: 11 Apr 2009 | 8:40:04 UTC - in response to Message 8372.

I've been running with 6.6.20 since it came out, and haven't noticed any WUs with unusually long step times.

But I'm only running a single GPU (GTX280).

Is everyone who is seeing this problem running multiple GPUs?

I was ...

And I think the other person that posted (may be a different thread) also had at least two GPUs ...

I asked for a small change in the next drop to maybe let me see if one of the things I am suspicious of is happening ... instead of saying "1 CUDA" it would say "CUDA x" with x being the device used.

This was also the reason that I was asking if there is a good tool to tell you if you are actually running something on the card or not ... so far all the suggestions have not panned out, possibly because I don't see the magic option needed ...

uBronan
Avatar
Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8762 - Posted: 23 Apr 2009 | 6:41:45 UTC
Last modified: 23 Apr 2009 | 6:43:59 UTC

Well i am running just 1 device and must say i only have "cheap" stuff compared to most of u guys here, i have a 9600 GT which did some units with 6.6.20 and in general it took longer to finish the units.
After i changed it to 6.6.23 it seems more like the old values with the 6.5.0
but time will tell, second some units seem to need more time in general (gianni units).
So i guess its not hardware related if and when units run longer.
I allready read on other forums people suspecting 6.6.20 slowing down their progress so it seems to be the boinc client.
It maybe sounds silly but since that the client can exactly give the time calculated on the units i see a slow down in progress in the units in general.

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 8845 - Posted: 24 Apr 2009 | 15:52:45 UTC - in response to Message 8762.
Last modified: 24 Apr 2009 | 16:02:57 UTC

I can positively confirm that BOINC 6.6.20 is four times as slow in GPU crunching as BOINC 6.4.7 I just reverted back to 6.4.7 and my stats are flying again *on the same workunit* with the same GPUGRID client. I am running a GTX295.

Profile Dieter Matuschek
Avatar
Send message
Joined: 28 Dec 08
Posts: 58
Credit: 231,884,297
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8855 - Posted: 24 Apr 2009 | 18:03:14 UTC - in response to Message 8845.

I can positively confirm that BOINC 6.6.20 is four times as slow in GPU crunching as BOINC 6.4.7

Well, I can't confirm that. On my computers huge time per step has occurred only rarely. Usually there is no difference in computing time to notice.
(I am using BOINC 6.6.20 since its release.)
____________

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8865 - Posted: 24 Apr 2009 | 19:54:39 UTC

There is an unconfirmed bug in 6.6.20 ... I was on the track of it when they made some changes leading to 6.6.23 ... which seems to have fixed that issue... broke others a little worse ... but addressed that one ...

The reason most people don't notice the issue with 6.6.20 is that as best as I can tell it only shows up on systems with multiple GPUs...

Post to thread

Message boards : Number crunching : Monster-WUs need much more time per step

//