Advanced search

Message boards : Number crunching : SANTI Errors

Author Message
tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34297 - Posted: 14 Dec 2013 | 12:41:33 UTC

My last five WUs were SANTIs. Four gave errors. I wasted 25 hours of electricity.

Until someone fixes this, I will now abort immediately any SANTI I get. Sorry...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34302 - Posted: 14 Dec 2013 | 13:38:09 UTC - in response to Message 34297.

I can "only" see 3 failed WUs in your account, with 1 of them also failing for others. And lot's of successful, including Santi's. From this data I'm not convinced something's fundamentally broken here. Could be as simple as a machine needing a cold-boot.

MrS
____________
Scanning for our furry friends since Jan 2002

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34304 - Posted: 14 Dec 2013 | 14:08:31 UTC - in response to Message 34302.

I can "only" see 3 failed WUs in your account, with 1 of them also failing for others.

Sorry - I added in the first (active) WU I aborted. But I still wasted a day's electric!

Could be as simple as a machine needing a cold-boot.

I'll give that a try and not abort any more.

Thanks for posting.

John
Send message
Joined: 15 Oct 11
Posts: 17
Credit: 81,085,378
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 34330 - Posted: 15 Dec 2013 | 16:06:51 UTC - in response to Message 34304.

I have had 5 SANTI's fail in the last couple of day's. 8 in total if I go back 4-5 day's.
I have shut down the computer completely and restared twice now.
Yes, 1 or 2 have completed but the fail rate is unacceptable.
I have changed nothing regarding system setup so it's starting to look like these SANTI's are faulty ??...and yes it is a waste of electricity...

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34331 - Posted: 15 Dec 2013 | 17:02:00 UTC
Last modified: 15 Dec 2013 | 17:02:32 UTC

Hm i didnt wanted to write my single failure down about santi, but as i see here in this thread....on a 1GB 560TI (384 cores) i had a santi long fail too i immediate switched it back to short because after POEM GPU Stopped i need every credit i can get to "hold" the OverallRAC. But i have still 310.70 drivers on it.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34334 - Posted: 15 Dec 2013 | 21:20:00 UTC

I looked into the driver versions, but there's great variation and since you guys are able to process some of these WUs I couldn't expect to find anything conclusive there. However, the track record of 331.82 and 327.23 has been quite good - maybe try these?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 36
Credit: 337,382,679
RAC: 18
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34338 - Posted: 16 Dec 2013 | 3:36:56 UTC

I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time. These devices have run GPUGrid for a while and still run solid on other GPU projects. The BSODs only happen when running GPU Grid SANTI work.

Drivers version on these machines:
331.82
320.49
331.82
____________

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34341 - Posted: 16 Dec 2013 | 8:45:38 UTC - in response to Message 34338.

I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time..

Just had the same thing happen with a SANTI_bax2 WU on one machine. It BSODed everytime BOINC started the SANTI WU. Even caused disk corruption once. 650ti GPU. If I suspended the WU the machine ran fine. Finally aborted the WU and DLed another SANTI_bax2 which is so far running OK. Before that WU the box in question had run for a VERY long time without a single crash. I've run 18 other SANTI_bax2 WUs without an issue. Wonder if perhaps some WUs got released with bad parameters? Maybe a corrupted DL?

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 34342 - Posted: 16 Dec 2013 | 10:24:40 UTC

Hm I don't know what could be causing this, as it doesn't seem to be something systematic. Santi_bax2 WU's only have a 6% error rate which I would say is nearly a historical low for this project.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34346 - Posted: 16 Dec 2013 | 12:18:11 UTC

Another SANTI just wasted five hours of electric; here.

"The simulation has become unstable. Terminating to avoid lock-up".

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34347 - Posted: 16 Dec 2013 | 12:42:14 UTC

Different WU types need different GPU (electrical) power (at a given GPU frequency). This kind of error could be caused when the processing of the WU tricks the GPU's power scheme, and it gives slightly lower voltage for the GPU than it needs (or slightly higher frequency it can run at). It can be fixed either lowering the GPU frequency, or raising the voltage. Sometimes it's not easy to do on a Kepler (i.e. MSI Afterburner). I had to use the Kepler BIOS tweaker utility to permanently fix this kind of errors on my overclocked ASUS GTX 670DC2OC. This is a very useful tool. If you put nvflash to it's working directory, it can directly flash the modified BIOS to the card.

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34354 - Posted: 17 Dec 2013 | 15:30:53 UTC

Another Santi errored out today. That's five in five days out of a total of 10.

That's 36 hours of wasted electricity.

No comment here from the scientist...

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 34355 - Posted: 17 Dec 2013 | 16:02:37 UTC - in response to Message 34354.

Sorry tomba but there is not really anything Santi can help you with. These WU's are a continuation of previous "bax" WU's which were simulated successfully. Also as I mentioned the error rate is around 6% which is really very low. I asked him if there is anything fancy with the system but it's apparently not very large, doesn't use any weird barely-tested functionality, so there is really nothing we can do about it.

About the system, it's a protein that is responsible for the activation of apoptosis, a process that controls cell death and we are looking for a specific conformation of this protein.

The only one that could help would be Matt from a technical side-point but I don't know if he really has that much time right now. If you want you can message him (username MJH).
I would suggest to maybe switch to the short queue for a while. Or do what Zoltan suggested about frequencies and voltages (I have no clue about that though, the forum members can help you).

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34356 - Posted: 17 Dec 2013 | 16:17:05 UTC - in response to Message 34341.
Last modified: 17 Dec 2013 | 17:09:57 UTC

Just had the same thing happen with a SANTI_bax2 WU on one machine. It BSODed everytime BOINC started the SANTI WU. Even caused disk corruption once. 650ti GPU. If I suspended the WU the machine ran fine. Finally aborted the WU and DLed another SANTI_bax2 which is so far running OK. Before that WU the box in question had run for a VERY long time without a single crash. I've run 18 other SANTI_bax2 WUs without an issue. Wonder if perhaps some WUs got released with bad parameters? Maybe a corrupted DL?


Hummm. Now I am beginning to wonder. I just had a similar situation (I think), where I was copying a 5 GB video file from one drive to another, and it kept BSODing the machine, which I have never seen it do before. Since it would copy fine to another drive, I put it down to a controller/disk drive compatibility problem, since the drive with problems was on a Marvell controller, not the main Intel controller.

But it just so happens I was running a Santi_bax2 at the time, and noticed that it was taking a very long time to complete, and even increasing in estimated time left after 16 hours (only 26% complete), so I aborted it. But that card (a GTX 650 Ti 1 GB) has been very stable otherwise with all the other work units, including a couple of Santi_bax2 types. That work unit may be bad, but it has not finished yet on another machine, so I don't know. I had assumed that the drive problem had corrupted the Santi_bax2, but it could be the other way around.

EDIT: Actually, it started out on the GTX 650 Ti, but had switched over to a GTX 660 by the time I ended it. So it seems not to be a memory limitation, since it was running slowly even with 2 GB, unless they have gotten worse than that.
http://www.gpugrid.net/result.php?resultid=7557083
(The restarts are due to the BSODs.)

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34357 - Posted: 17 Dec 2013 | 18:12:33 UTC - in response to Message 34355.

Also as I mentioned the error rate is around 6%

For me, right now, it's 50%...

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 34358 - Posted: 17 Dec 2013 | 18:48:46 UTC - in response to Message 34357.
Last modified: 17 Dec 2013 | 18:53:06 UTC

For me, right now, it's 50%...

Statistics are statistics...
It doesn't mean unfortunately that there are no outliers.

John
Send message
Joined: 15 Oct 11
Posts: 17
Credit: 81,085,378
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 34361 - Posted: 17 Dec 2013 | 20:04:36 UTC - in response to Message 34347.

I noticed that most of my failed WU's occured on 1 of my 2 cards. Upon investigation I noticed that the card with the WU failures was running at a slightly lower voltage.Rather than mess with the voltage ( up till now both cards have worked well) I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34365 - Posted: 17 Dec 2013 | 21:02:28 UTC - in response to Message 34361.

I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed...

Please try the same, Tomba. Lower the clock speed of the offending GPU by 13 or 26 MHz and see if it helps too.

MrS
____________
Scanning for our furry friends since Jan 2002

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34367 - Posted: 17 Dec 2013 | 21:15:22 UTC - in response to Message 34357.

Also as I mentioned the error rate is around 6%

For me, right now, it's 50%...


Check my results. Out of 66 results, I have 63 success, 1 failed SANTI, 1 failed NOELIA, 1 aborted. It seems SANTI and NOELIA are difficult tasks but not impossible.
____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Damaraland
Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34368 - Posted: 17 Dec 2013 | 22:02:55 UTC - in response to Message 34367.

Impossible for me to find out if there's a relation, but I had 98% processor usage. Changued to 100% and got 2 Santi Errors...
May be just luck?? Or the continuos interruptions I had before made these units go better. I think someone should have a look... I simple test I see would be put forced interruptions and send the same units to the same computer.
Maybe just hazarous, or Santiago doesn't get on well with the GPUs. :p

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34378 - Posted: 18 Dec 2013 | 18:15:24 UTC - in response to Message 34365.

I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed...

Please try the same, Tomba. Lower the clock speed of the offending GPU by 13 or 26 MHz and see if it helps too.

MrS


Dear MrS,

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity.

Why do you want me to penalize Natans, that give 10% more credit than Santis? Why don't you fix the Santi problem??

[I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause]

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34381 - Posted: 18 Dec 2013 | 20:19:04 UTC

If you check my posts you can see that I am nagging about the Santi WU's LR and SR since summer. All problems on the GTX660.
I got 770 in Augusts and that worked error free for 33 days consecutive. I built a new system to accommodate a GTX780Ti and put two 660's in the other system. That give error after 2 days running both Santi's. I replaced (on advise here) the 600Watt PSU to a 750Watt PUS and since then it is crunching error free for 4 days in a row.
It could be that Santi's and 660's don't work well together on a weak system (older MOBO, older BIOS) or a PSU with less overhead. I have no proof of this, but it is what I am seeing now.

@Tomba. If you have your new system ready with two 660's in and they run the Santi's smooth, then that would be little proof as you then have the same CPU and MOBO as I have my two 660's in running. Exciting.
____________
Greetings from TJ

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34384 - Posted: 19 Dec 2013 | 0:27:53 UTC - in response to Message 34378.

I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed...

Please try the same, Tomba. Lower the clock speed of the offending GPU by 13 or 26 MHz and see if it helps too.

MrS


Dear MrS,

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity.

Why do you want me to penalize Natans, that give 10% more credit than Santis?


Hmmm. Take a wee hit on NATHANs for a big gain on SANTI? You're right, that is a preposterous proposal. <roll-eyes>

Why don't you fix the Santi problem??


Why don't you fix it yourself? Why don't you install Linux and get an 11 - 12% boost on all your tasks, if you're genuinely in the credit chasing game and don't want to waste electricity. Sorry, I don't wish to offend, but to me it just doesn't make sense to cry about inefficiency when you're running an antiquated POS opsys like Win7/8.

____________
BOINC <<--- credit whores, pedants, alien hunters

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34385 - Posted: 19 Dec 2013 | 1:28:01 UTC

Some work units are easy, some are hard. You have to live with it; I actually like the harder ones better, since they exercise my card more and may do more challenging science(?). At any rate, I have just "downgraded" a GTX 660 to a base clock of 967 MHz (with corresponding reductions in the boost and maximum clocks), but also boosted the voltage on the core up from 1.162 volts to 1.175 volts to get it stable, and increased the upper power limit to 115% max. If that is what needs to be done, OK with me; I don't expect the scientists to design their experiments for the weakest cards out there.

Profile Damaraland
Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34388 - Posted: 19 Dec 2013 | 6:34:43 UTC - in response to Message 34385.

I don't expect the scientists to design their experiments for the weakest cards out there.

I think you are missunderstunding me, I don't care if they are errors, I really don't care about how much credit I get. I don't expecto project to adapt to my old card.
My point is that if there are too many errors is worth to investigate if there a way to correct them to avoide them. Nobody (nor user, nor scientists) wants to waste electricity.
If there's some kind of units that don't fit in older cards we should know it.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34389 - Posted: 19 Dec 2013 | 9:26:45 UTC - in response to Message 34388.
Last modified: 19 Dec 2013 | 10:06:52 UTC

Yes, it would be convenient if they had a "worst case" work unit we could run, to see if our cards are stable on it. Then we could adjust the cards as necessary, or just accept the error rate for whatever it is. But I doubt that even the scientists know what the worst case really is, or what they will need in the future.

Many of the cards are way overclocked for the gamers, and just inherently have a higher error rate. But the ones they do complete successfully are valuable for the science. Only the scientists know the statistics for how many errors they are getting, and they have to decide whether it is good enough. You are right that they don't get any work done if everyone leaves the project, so they have to set a happy medium that everyone can live with (or at least enough people to get the work done).

I am not a gamer myself, but perhaps they could set some warnings out that would alert new users to the possibility of problems. Then at least it wouldn't come as so much of a surprise when they inevitably happen.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34391 - Posted: 19 Dec 2013 | 10:15:09 UTC - in response to Message 34378.
Last modified: 19 Dec 2013 | 10:19:10 UTC

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

You can change it back when the problematic Santi WUs are cleared from the queue.

In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity.

It is said before, that this problem is *not* general (the overall error rate is low for these workunits), so these errors caused by a specific problem in your system not by the project, therefore:
- the staff won't do anything about it (it may cause more errors than it fixes)
- it depends on you if you accept our advice, and try to fix *your* problem or you take the frustration caused by the wasted electricity.

Why do you want me to penalize Natans, that give 10% more credit than Santis?

Lowering the GPU clock is a safe way to try to fix this error. You can increase the GPU voltage instead (no penalty), but it's risky because it will increase the power used by the GPU i.e. the temperature of the GPU.

Why don't you fix the Santi problem??

Because there isn't a Santi problem from the project's point of view.

[I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause]

Me too. We appreciate that. That's why it is also important for us to fix your problem.
GeForce cards are made for gaming, not for crunching. Their factory settings lets the gamer get the maximum performance from the GPU, sacrificing some stability (there's no problem, when there's a glitch in a game when you play for 8 hours, but it will ruin an 8 hour long workunit).
If you lower your GPU's clock, and it makes your host capable of crunching all and every workunits error free, your RAC (your daily contribution) will be higher than when it's crunching a little bit faster, but some workunits failing in exchange (also your frustration will be lower).

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 36
Credit: 337,382,679
RAC: 18
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34395 - Posted: 19 Dec 2013 | 14:22:54 UTC - in response to Message 34391.

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

You can change it back when the problematic Santi WUs are cleared from the queue.


I'm not caring about the points but rather stability of the systems involved. Until now, I didn't have to worry about running GPUGrid work units. I now have three boxes I have had to pull due to these work units. It isn't just 6xx series cards as I have pointed out. It also effects the GT430's as well. I run all of my cards at stock speeds and don't over clock my CPU's either. I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI. This is something that should be able to be fixed within the application. Or GPUGrid should allow users to decide whether to run SANTI or NATHAN work units when available via preferences. Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further.
____________

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34397 - Posted: 19 Dec 2013 | 15:54:24 UTC - in response to Message 34395.

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

You can change it back when the problematic Santi WUs are cleared from the queue.


I'm not caring about the points but rather stability of the systems involved. Until now, I didn't have to worry about running GPUGrid work units. I now have three boxes I have had to pull due to these work units. It isn't just 6xx series cards as I have pointed out. It also effects the GT430's as well. I run all of my cards at stock speeds and don't over clock my CPU's either.


You miss the point. We know you don't OC your cards but perhaps the manufacturer did. Anyway, all that irrelevant when a slight downclock or voltage boost will likely fix your problem. I say your problem because most of us aren't experiencing any problem with SANTI.

I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI.


That is a blatant exaggeration. Nobody expects "everyone else" to tweak their systems. They expect only the very few who have problems to tweak their system. Why do you ignore the many hundreds of systems on which SANTIs run with no problem?

Why should the admins tweak SANTI tasks or the app just to spare 1% of systems grief when doing so could mean SANTI starts crashing on the 99% that have no problem with current SANTI?

They could provide separate queues for SANTI but that creates more problems than it fixes because the next time your improperly configured system runs into what you think are bad tasks you'll want them to spend more time creating yet another queue. That makes no sense at all when you could solve the problem easily on your end.

This is something that should be able to be fixed within the application.


Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree.

Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further.


Technically 3 or 4 fits the definition of multiple but that is irrelevant when SANTI isn't a problem for 99% of systems. Go figure.

A better solution might be to run a script that watches your queue and aborts SANTI tasks the minute you receive one and continues aborting SANTI until you receive a NATHAN or whatever tasks work for you. The only potential problem I see with that solution is that if you abort too many tasks the server might make you wait 24 hours before it sends more, maybe but maybe not, I'm not sure how they have that configured.

Another possible option is a script that watches your queue and automatically tweaks your card one way just before it starts a SANTI and then tweaks it a different way when it receives a NATHAN.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Damaraland
Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34400 - Posted: 19 Dec 2013 | 19:26:50 UTC - in response to Message 34397.
Last modified: 19 Dec 2013 | 19:27:11 UTC

This is something that should be able to be fixed within the application.


Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree.

Reading all this thread, I'm not sure if there's a problem with OC cards or complex units. I will stop my computer and read carefully the weekend to understand what's going on.
But I would like to point something.
Whats wrong if you want to be a "passive" cruncher? and you don't want to worry about Linux, scripts or whatever. what's wrong with this?
I think you can't expect to have every user on this proyect to be a geek on hardware.
Of course people with best machines are very familiar with this Project, but there's another profile of BOINC cruncher. People with multiple Project don't want to get in profound with technical specs.
I think an easy solution should be proposed for these people. Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO
____________
HOW TO - Full installation Ubuntu 11.10

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34401 - Posted: 19 Dec 2013 | 21:33:12 UTC - in response to Message 34400.
Last modified: 19 Dec 2013 | 21:35:53 UTC

This is something that should be able to be fixed within the application.


Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree.

Reading all this thread, I'm not sure if there's a problem with OC cards or complex units. I will stop my computer and read carefully the weekend to understand what's going on.
But I would like to point something.
Whats wrong if you want to be a "passive" cruncher? and you don't want to worry about Linux, scripts or whatever. what's wrong with this?


There is nothing wrong with that. Nobody here has said there is something wrong with that.

I think you can't expect to have every user on this proyect to be a geek on hardware.
Of course people with best machines are very familiar with this Project, but there's another profile of BOINC cruncher. People with multiple Project don't want to get in profound with technical specs. I think an easy solution should be proposed for these people.


I agree. For this problem with SANTI tasks crashing there is an easy solution. That solution is the solution proposed by Retvari. If that solution is too difficult for some people then they can ask for help implementing it. Asking the project devs to fix their problem is not, IMHO, a reasonable solution unless their problem also afflicts many other users. This problem with SANTI is limited to just a few users. How do I know that? I know because if it were a widespread problem a lot more people would be complaining and the admins would be able to see it in the stats they collect.

Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO


If you want a script then ask and I will try to provide one unless the project admins think it's harmful to the project. A script to auto-abort SANTI tasks as soon as they download is easy but maybe not the wisest approach. A script to adjust the clock down or the voltage up when you receive a problem task (SANTI for example) and return clocks/voltage to normal for other tasks would be harder to implement but I am sure there is a way.
____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 36
Credit: 337,382,679
RAC: 18
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34402 - Posted: 20 Dec 2013 | 4:39:39 UTC - in response to Message 34401.

Dagorath, glad you are still trying to make a few suggestions and stick to that very narrow mindset. My reasoning for the app change rather than making users make tweaks locally for one sub project is more focused towards those machines that don't have easy access and can't be tweaked remotely on a day to day basis. As far as giving people the option to choose between SANTI and NATHAN adding more problems, that is yet to be seen. Until then, it is only opinion which really isn't worth arguing. In this case if the option was a choice, GPUGrid would still have 4 more GPU's from me crunching away. I'm sure others who are having difficulty would do the same. I have no idea the true numbers of people experiencing problems because not everyone posts in the forums. I don't even know if the techs here look at the work units I have aborted that were causing the BSOD's because they didn't get to "error out" and therefore would not show up that way. Instead it would show up as a user abort and someone else who didn't have BSOD issues could finish it. I'm not saying my cards might not be overclocked by the manufacturer. So, I can assure you that "point" was not missed. I just didn't address it in my above statement. I have made my choice in regards to tweaking my cards and have expressed my opinions (which is what they are regardless if you like them) on how I feel about the issue at hand. Please choose to ignore them if you don't like my approach.
____________

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34403 - Posted: 20 Dec 2013 | 7:31:49 UTC - in response to Message 34402.

Coleslaw,

Take heart ol' chap, I love your ribald "ad hominem followed by vicious attack on a straw man" humor. I'm just glad you can still crack a joke even though Bruce kept the computers after he changed the locks. Maybe if you tell him you didn't realize the strap-on chaffs his hips and he doesn't have to wear it anymore he'll let you back in the house so you can tweak your rigs.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34406 - Posted: 20 Dec 2013 | 13:44:28 UTC - in response to Message 34403.
Last modified: 20 Dec 2013 | 22:41:27 UTC

To quote Statler and Waldorf, "You're not old, but your ugly!"

Cat claws are retractable. If you really must, paw at each others non-dangly bits.

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 34407 - Posted: 20 Dec 2013 | 18:42:50 UTC - in response to Message 34378.

As I wrote some time ago - this project is no longer under quality control ..
scientists have siesta ..
  Tomba I will recommend another project, this GPUGRID is stopped in time..Tomba here you lost a few days on this project, I am 6 weeks ... My computer fell due to a faulty tasks only four days after I went on vacation .. and I could not restart, physically reset was needed .. unfortunately..
Now numbering Collatz Conjecture and everything goes like on butter with absolutely no errors and increment the Tasks of RAC in BOINC is the most high..

TheSkyNet POGS-- trophies have unique entertainment factor, the other projects shall not.The absolute best in the world in BOINC. Still, they could do some interactive, screen saver as have some other BOINC projects, and it will be best boinc project ..

Web site TheSkyNet POGS shows all pages BOINC project develops as to be in the future...

Profile Damaraland
Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34408 - Posted: 20 Dec 2013 | 19:04:05 UTC - in response to Message 34406.
Last modified: 20 Dec 2013 | 19:05:41 UTC

@skgiven

To quote Statler and Waldorf, "You're not old, but your ugly!
Cat claws are retractable. If you really must, paw at each others non-dangly bits.

I think the problem is not this. I think you are seeing this from a very narrow point of view.
I think there are many different kind of:
- profile of users (motivations)
- way they see problems with units (tolerance with errors)
- different kind of technical knowledge (hardware and sofware)
- appetite for problems (wishing to push hardware or find solutions as a hobby, time).

Whatever profile one might have or motivations, everyone adds, I wish "the project" could be more comprenhesive with all of them.
I'm sure that maybe that the profile of TOP 10 with huge riggs don't see any problem, and probably they contribute with 80% of computing power. But I beleave everyone adds.
I feel very stupid posting an error. I don't expect that everything is smooth, but if I post is becouse I give my time to help.
If you give the sensation that problems are not pursuived and investigated many people will quit and you will loose some users little by little. Of course others will come back. I left and came back.
To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me.
I clarify that I'm not complaining and I understand that the project has ilimited ressources. Maybe organizing everytalented people here could help.
As I wrote some time ago - this project is no longer under quality control ..
scientists have siesta ..

This is a huge xxxxxx, I bite my tongue. Jozef, you have no idea what's going on or what the problem is. In Spain there's 1% people have siesta, and it's been proved that this very good for the body and mental sharpness.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34419 - Posted: 21 Dec 2013 | 16:08:19 UTC - in response to Message 34408.

To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me.


The admins have said they will look into it and IIUC, they have also indicated that it's not likely anything can be done and I believe the reasons have been covered. Therefore, tomba, I think you got exactly what you say you want.

In addition to the above, other solutions have been offered.

Narrow minded is as narrow minded does. A few of us have tried to broaden the options. That is what we have done.

Others have ignored all alternative options and focused upon the 1 option the admins have politely indicated they're not gonna get. Is that broad thinking or narrow thinking?

I am sorry if some volunteers installed hosts in remote locations and failed to do the smart thing and configure them to allow remote access and administration. Hopefully they can fix that and do better next time they setup a remote host.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34433 - Posted: 22 Dec 2013 | 13:46:46 UTC - in response to Message 34341.

I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time..

Just had the same thing happen with a SANTI_bax2 WU on one machine. It BSODed everytime BOINC started the SANTI WU. Even caused disk corruption once. 650ti GPU. If I suspended the WU the machine ran fine. Finally aborted the WU and DLed another SANTI_bax2 which is so far running OK. Before that WU the box in question had run for a VERY long time without a single crash. I've run 18 other SANTI_bax2 WUs without an issue. Wonder if perhaps some WUs got released with bad parameters? Maybe a corrupted DL?

As for my WU detailed above. It finished fine for the next guy to get it, interesting since his machine has TONS of errors. I cut the clocks by 25Mhz and 5 SANTI_bax2 WUs have since completed fine on that GPU. I suspect that perhaps this WU type stresses the GPU slightly more than most so that GPUs "on the edge" are more likely to error. Anyway, I've had 167 valid and 1 error lately (on 8 machines). I'd say the project is running pretty smoothly (at least here). Haven't had that strange bluescreening on any machine before or since.

History quiz: Does anyone remember when MS ballyhooed long and loudly that they had solved the "black screen of death"? Remember the solution?

candido
Send message
Joined: 12 Jun 11
Posts: 12
Credit: 150,069,999
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 34469 - Posted: 24 Dec 2013 | 19:19:48 UTC
Last modified: 24 Dec 2013 | 19:23:57 UTC

Have been crunching GPUGrid WU again since about a week ago with two machines, three since yesterday, and had no problems with SANTI, NATHAN, NOELIA or SDOERR. I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good.
Thanks for the suggestion

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34472 - Posted: 24 Dec 2013 | 21:51:14 UTC - in response to Message 34469.

I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good.

That's the spirit. Unfortunately you are never quite sure that you have done enough until you eventually don't get any more errors. But my GTX 660s are now working fine for me, and I hope they stay that way. With the variability we see in the work units, you never know though.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34478 - Posted: 25 Dec 2013 | 18:06:37 UTC

Wow they have really harder requirements on that SANTI Batch it seems. I run one successfully with 50mV overvoltage on the 560ti 384core. Until now, all cards run with +25mV and computed successfully with this setting, Santis too. But only this card needs more. But +50mV needs serious cooling.. im nearly at full fanspeed and over 80degress with open case and no extra heating in the flat.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34481 - Posted: 25 Dec 2013 | 20:02:54 UTC - in response to Message 34478.

Over 80C is too hot for me. I like mine 70C max.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34516 - Posted: 30 Dec 2013 | 16:16:36 UTC - in response to Message 34478.

Wow they have really harder requirements on that SANTI Batch it seems.

That's what I'm thinking too. Had another one that caused the machine to reboot continuously until I caught it. This time a SANTI_MARwtcap. The only way to stop the cycle is to abort the WU. Lowered the clocks yet again and the next one is running fine. In fact that machine is currently showing 20 valid and the 1 error WU that caused constant bluescreens. NVIDIA GeForce GTX 650 Ti (1024MB) driver: 331.82.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34518 - Posted: 30 Dec 2013 | 20:14:05 UTC

Locked again on a workunit, i stopped again with this card on gpugrid and changed it back to einstein :/ will try again in one or two weeks.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34566 - Posted: 3 Jan 2014 | 0:44:20 UTC

After 17 days a Santi resulted in one 660 to down clock. And only 2 errors in 19 days. But today my rig with to 660's was booted when I found it. After logging in it booted immediately when BOINC started a few times, so I went to Windows in safe mode where I have some more time to abort the task. As I didn't know which one, I aborted both. Its now happily crunching again.
____________
Greetings from TJ

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 34568 - Posted: 3 Jan 2014 | 4:37:58 UTC
Last modified: 3 Jan 2014 | 4:38:39 UTC

Too many errors for me......stopping here.
Task
click for details
Show names Work unit
click for details Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
7614815 5043955 2 Jan 2014 | 20:22:26 UTC 3 Jan 2014 | 3:05:04 UTC Error while computing 23,648.57 20,821.78 --- Long runs (8-12 hours on fastest card) v8.14 (cuda42)
7614651 5043020 2 Jan 2014 | 21:42:39 UTC 3 Jan 2014 | 4:35:29 UTC Aborted by user 20,147.56 20,074.19 --- Long runs (8-12 hours on fastest card) v8.14 (cuda42)
7613690 5043565 2 Jan 2014 | 16:43:59 UTC 2 Jan 2014 | 23:04:48 UTC Completed and validated 22,184.66 18,799.57 20,550.00 Short runs (2-3 hours on fastest card) v8.15 (cuda42)
7613447 5043367 2 Jan 2014 | 16:43:59 UTC 2 Jan 2014 | 20:22:26 UTC Error while computing 12,559.07 12,065.65 --- Short runs (2-3 hours on fastest card) v8.15 (cuda42)
7612704 5042731 2 Jan 2014 | 4:29:25 UTC 2 Jan 2014 | 13:59:13 UTC Completed and validated 33,819.15 17,980.88 20,550.00 Short runs (2-3 hours on fastest card) v8.15 (cuda42)
7612537 5040031 2 Jan 2014 | 4:20:27 UTC 2 Jan 2014 | 4:29:25 UTC Error while computing 274.96 127.56 --- Short runs (2-3 hours on fastest card) v8.15 (cuda42)

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34571 - Posted: 3 Jan 2014 | 19:02:10 UTC

Just now I found my rig with two 660's frozen. No Idea when it happened, even ctrl-alt-del didn't work. After booting immediately message that the graphics drives has crashed and recovered, three times in a row and then it booted itself again. After three attempts I got BOINC to stop. I am now installing the latest beta driver but that should not be necessary and it ran for more then a month with the 331.82 driver.

I have not a lot of joy with the 660 since I bought two of them in summer.
I don't like these Santi's.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34578 - Posted: 4 Jan 2014 | 20:20:15 UTC
Last modified: 4 Jan 2014 | 20:25:20 UTC

Again I found my rig with two 660's frozen.
After 5 boots I managed to get rid of all the Santi's and switch over to LR only.

It is or an AMD CPU has problems with Santi or GPU's lower tan 7XX have problems with Santi's or a combination of the two has problems with Santi's.
I guess none of the above as there a re many AMD rigs and many more rigs with 660 that does there terrible Santi's. But my RAC is rapidly falling this way.

Edit: another Santi crash, so will become 7 boots eventually. I start to hate these Santi's!

Edit 2: Yes I got two Nathan's on the 660's, so I can go sleep this night and not watch my system continuously.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 6,169
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34579 - Posted: 4 Jan 2014 | 20:51:17 UTC - in response to Message 34578.

TJ, you should lower the GPU frequency of those 660s, or increase the GPU voltage by 12mV.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34580 - Posted: 4 Jan 2014 | 22:34:01 UTC - in response to Message 34579.

Boost the voltage and produce more heat or get crunching on Linux. Take a look at my results. I have errors but 99% of those are tasks I aborted because I played with stuff and ended up with too many tasks in my cache or other reasons. I have two 670 and one 660Ti on Linux and they almost never crash SANTI tasks. I'm running the stock clock speeds and if I keep the temps below 70C the clock boost thing kicks in regularly. They hardly ever crash on any task and if they do the OS doesn't hang, BOINC continues running, another GPUgrid task downloads and starts and life carries on.

____________
BOINC <<--- credit whores, pedants, alien hunters

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34581 - Posted: 5 Jan 2014 | 1:27:26 UTC - in response to Message 34578.

Again I found my rig with two 660's frozen.
After 5 boots I managed to get rid of all the Santi's and switch over to LR only.

It is or an AMD CPU has problems with Santi or GPU's lower tan 7XX have problems with Santi's or a combination of the two has problems with Santi's.
I guess none of the above as there a re many AMD rigs and many more rigs with 660 that does there terrible Santi's. But my RAC is rapidly falling this way.

I have had those problems too. The main reason seems to be that the 660s were bumping up against their power limit, causing them to be starved for current on the tough portions of the hardest work units. Increasing the power limit to 110% by using Nvidia Inspector has largely solved the problem for me on the two cards (a Zotac and a Gigabyte) that I now use for GPUGrid, without the need for any other changes:
http://www.gpugrid.net/results.php?hostid=159002&offset=0&show_names=1&state=0&appid=

But they are often overclocked too much at the factory for the work here, and on another of my Zotac 660s I also have had to reduce the clocks a little (GPU clock from 993 MHz to 950 MHz, and memory clock from 3004 to 2804 MHz) and also bump up the core voltage (from 1.162 to 1.175 volts). For some reason on the Zotacs the software control utilites (such as Nvidia Inspector or MSI Afterburner) do not work to change the voltage, and I had to modify the BIOS with Kepler BIOS Tweaker, and then flash it into the video card with nvflash. You can use GPU-Z to first make a copy of your present BIOS that you then modify (keep a copy of your old BIOS as a backup). If you don't want to deal with that, just reduce the clock frequency, first on the GPU clock and then on the memory clock if necessary, until it is stable.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34582 - Posted: 5 Jan 2014 | 15:35:30 UTC

Hi Guys,

Good advise. I lowered the clocks per Zoltans advise and will see what happens.
If not stable I will increase the voltage little.
Problem is that the rig was stable for 17 days continuously, that is wondering me.

I let them run at factory settings, at first months ago. They are both form EVGA. Their maximal fan speed is 74%.
____________
Greetings from TJ

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 34583 - Posted: 5 Jan 2014 | 19:21:26 UTC

You can start a snowball effect with system freezes and BSOD's if you don't run checkdisk after getting those errors, too many orphaned files or wrong time stamps and such just causes more and more problems. If you don't do that, the errors your getting now could be related to it.

Your cracking me up TJ, you think your BSOD's and freezes are because you have an AMD system? I think the drought were having in California is because of the Intel CPU in my laptop (gotta put on my tin foil hat).

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34613 - Posted: 9 Jan 2014 | 19:25:24 UTC

No BSOD but again a frozen system. This time by a Noelia on my 660's rig. I did manage to get the clocks down with Pricison X from EVGA, but after a while they boost automatically again. Trying to do it with MSI Afterburner, shows only one card, there is nowhere I click that I can see the settings of my second card.
Well from August these 660's are trouble some to me, so I will buy a second 780Ti as it, even in Windows7, does the same as two 660's is less time to replace the 660's.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34614 - Posted: 9 Jan 2014 | 19:48:39 UTC - in response to Message 34613.
Last modified: 9 Jan 2014 | 19:49:30 UTC

In afterburner, click Settings (bottom right corner of the left pane) and then you can change the GPU under the General Tab.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34617 - Posted: 10 Jan 2014 | 5:02:01 UTC - in response to Message 34433.

History quiz: Does anyone remember when MS ballyhooed long and loudly that they had solved the "black screen of death"? Remember the solution?


They made the color a user configurable option then told everyone if they get a black screen of death it's their own damn fault.

____________
BOINC <<--- credit whores, pedants, alien hunters

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34618 - Posted: 10 Jan 2014 | 15:42:28 UTC - in response to Message 34614.

In afterburner, click Settings (bottom right corner of the left pane) and then you can change the GPU under the General Tab.

Thanks skgiven, found it and used it!
____________
Greetings from TJ

David Autumns
Send message
Joined: 10 Jul 10
Posts: 1
Credit: 327,425,754
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34644 - Posted: 13 Jan 2014 | 20:37:03 UTC

Just had to move my 560ti onto the short runs

Even got the Dyson out expecting a GPU full of fluff but no it's the Work Unit's

Just 2 successful long runs since 24th Dec


There's a problem with the current batch. I'll just have to be patient.

Maybe this time next week


Dave

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 190
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34645 - Posted: 14 Jan 2014 | 2:18:28 UTC
Last modified: 14 Jan 2014 | 2:26:36 UTC

Your clocks are too high. Try 1644Mhz for the processor & 2004Mhz for the memory.

'GPUgrid stresses the parts other projects can't reach'

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34646 - Posted: 14 Jan 2014 | 6:55:47 UTC

Probably would be useful if people having a problem with a self overclocked card would return it to stock clocks before reporting problems with WU.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34649 - Posted: 14 Jan 2014 | 11:57:13 UTC

These clocks arent oc on 560ti 384 O.o try +25mV first but it can fail then too anytime. Try to underclock then. But its not a must it works then ;(
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34650 - Posted: 14 Jan 2014 | 13:06:54 UTC - in response to Message 34649.
Last modified: 14 Jan 2014 | 13:07:14 UTC

Reference GTX560TI works perfectly http://www.gpugrid.net/results.php?hostid=160845

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34653 - Posted: 14 Jan 2014 | 18:00:08 UTC

Every chip has other tolerances ^^
____________
DSKAG Austria Research Team: http://www.research.dskag.at



TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34654 - Posted: 14 Jan 2014 | 19:15:56 UTC - in response to Message 34653.

Every chip has other tolerances ^^

True, even every card from the same type and same brand has its own tolerances I have experienced.
____________
Greetings from TJ

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34655 - Posted: 14 Jan 2014 | 21:49:08 UTC - in response to Message 34654.

It seems every type of task has its own tolerances too, am I right?

You can run NATHAN tasks at high clocks and lower voltage but SANTI tasks need lower clocks and/or higher voltage?

____________
BOINC <<--- credit whores, pedants, alien hunters

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 35565 - Posted: 8 Mar 2014 | 17:23:40 UTC

Many santi jobs failed with a BSOD, the nvidia driver also fell,
about 18 last job, I currently receive only Santi.
Find out if it's my hardware, soft problem, but I did not change anything in the last week so ..
if anyone has a similar problem ...?

example-
(0x50) - exit code 80 (0x50) 313x-SANTI_MARwtcap310-27-32-RND3990_0
605x-SANTI_MAR420cap310-19-32-RND2854_0
(unknown error) - exit code -52 (0xffffffcc) 685x-SANTI_MARwtcap310-27-32-RND0438_3

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35598 - Posted: 10 Mar 2014 | 21:17:08 UTC - in response to Message 35565.

One of my Linux systems fails SANTI_MAR tasks quite regularly. Other tasks run fine, but I'm mostly getting SANTI_MAR tasks. Both the short and long tasks fail (but not all). ATM ~2 fail per day |:(
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Number crunching : SANTI Errors

//