Message boards : Number crunching : SANTI Errors
Author | Message |
---|---|
My last five WUs were SANTIs. Four gave errors. I wasted 25 hours of electricity. | |
ID: 34297 | Rating: 0 | rate: / Reply Quote | |
I can "only" see 3 failed WUs in your account, with 1 of them also failing for others. And lot's of successful, including Santi's. From this data I'm not convinced something's fundamentally broken here. Could be as simple as a machine needing a cold-boot. | |
ID: 34302 | Rating: 0 | rate: / Reply Quote | |
I can "only" see 3 failed WUs in your account, with 1 of them also failing for others. Sorry - I added in the first (active) WU I aborted. But I still wasted a day's electric! Could be as simple as a machine needing a cold-boot. I'll give that a try and not abort any more. Thanks for posting. | |
ID: 34304 | Rating: 0 | rate: / Reply Quote | |
I have had 5 SANTI's fail in the last couple of day's. 8 in total if I go back 4-5 day's. | |
ID: 34330 | Rating: 0 | rate: / Reply Quote | |
Hm i didnt wanted to write my single failure down about santi, but as i see here in this thread....on a 1GB 560TI (384 cores) i had a santi long fail too i immediate switched it back to short because after POEM GPU Stopped i need every credit i can get to "hold" the OverallRAC. But i have still 310.70 drivers on it. | |
ID: 34331 | Rating: 0 | rate: / Reply Quote | |
I looked into the driver versions, but there's great variation and since you guys are able to process some of these WUs I couldn't expect to find anything conclusive there. However, the track record of 331.82 and 327.23 has been quite good - maybe try these? | |
ID: 34334 | Rating: 0 | rate: / Reply Quote | |
I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time. These devices have run GPUGrid for a while and still run solid on other GPU projects. The BSODs only happen when running GPU Grid SANTI work. | |
ID: 34338 | Rating: 0 | rate: / Reply Quote | |
I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time.. Just had the same thing happen with a SANTI_bax2 WU on one machine. It BSODed everytime BOINC started the SANTI WU. Even caused disk corruption once. 650ti GPU. If I suspended the WU the machine ran fine. Finally aborted the WU and DLed another SANTI_bax2 which is so far running OK. Before that WU the box in question had run for a VERY long time without a single crash. I've run 18 other SANTI_bax2 WUs without an issue. Wonder if perhaps some WUs got released with bad parameters? Maybe a corrupted DL? | |
ID: 34341 | Rating: 0 | rate: / Reply Quote | |
Hm I don't know what could be causing this, as it doesn't seem to be something systematic. Santi_bax2 WU's only have a 6% error rate which I would say is nearly a historical low for this project. | |
ID: 34342 | Rating: 0 | rate: / Reply Quote | |
Another SANTI just wasted five hours of electric; here. | |
ID: 34346 | Rating: 0 | rate: / Reply Quote | |
Different WU types need different GPU (electrical) power (at a given GPU frequency). This kind of error could be caused when the processing of the WU tricks the GPU's power scheme, and it gives slightly lower voltage for the GPU than it needs (or slightly higher frequency it can run at). It can be fixed either lowering the GPU frequency, or raising the voltage. Sometimes it's not easy to do on a Kepler (i.e. MSI Afterburner). I had to use the Kepler BIOS tweaker utility to permanently fix this kind of errors on my overclocked ASUS GTX 670DC2OC. This is a very useful tool. If you put nvflash to it's working directory, it can directly flash the modified BIOS to the card. | |
ID: 34347 | Rating: 0 | rate: / Reply Quote | |
Another Santi errored out today. That's five in five days out of a total of 10. | |
ID: 34354 | Rating: 0 | rate: / Reply Quote | |
Sorry tomba but there is not really anything Santi can help you with. These WU's are a continuation of previous "bax" WU's which were simulated successfully. Also as I mentioned the error rate is around 6% which is really very low. I asked him if there is anything fancy with the system but it's apparently not very large, doesn't use any weird barely-tested functionality, so there is really nothing we can do about it. | |
ID: 34355 | Rating: 0 | rate: / Reply Quote | |
Just had the same thing happen with a SANTI_bax2 WU on one machine. It BSODed everytime BOINC started the SANTI WU. Even caused disk corruption once. 650ti GPU. If I suspended the WU the machine ran fine. Finally aborted the WU and DLed another SANTI_bax2 which is so far running OK. Before that WU the box in question had run for a VERY long time without a single crash. I've run 18 other SANTI_bax2 WUs without an issue. Wonder if perhaps some WUs got released with bad parameters? Maybe a corrupted DL? Hummm. Now I am beginning to wonder. I just had a similar situation (I think), where I was copying a 5 GB video file from one drive to another, and it kept BSODing the machine, which I have never seen it do before. Since it would copy fine to another drive, I put it down to a controller/disk drive compatibility problem, since the drive with problems was on a Marvell controller, not the main Intel controller. But it just so happens I was running a Santi_bax2 at the time, and noticed that it was taking a very long time to complete, and even increasing in estimated time left after 16 hours (only 26% complete), so I aborted it. But that card (a GTX 650 Ti 1 GB) has been very stable otherwise with all the other work units, including a couple of Santi_bax2 types. That work unit may be bad, but it has not finished yet on another machine, so I don't know. I had assumed that the drive problem had corrupted the Santi_bax2, but it could be the other way around. EDIT: Actually, it started out on the GTX 650 Ti, but had switched over to a GTX 660 by the time I ended it. So it seems not to be a memory limitation, since it was running slowly even with 2 GB, unless they have gotten worse than that. http://www.gpugrid.net/result.php?resultid=7557083 (The restarts are due to the BSODs.) | |
ID: 34356 | Rating: 0 | rate: / Reply Quote | |
Also as I mentioned the error rate is around 6% For me, right now, it's 50%... | |
ID: 34357 | Rating: 0 | rate: / Reply Quote | |
For me, right now, it's 50%... Statistics are statistics... It doesn't mean unfortunately that there are no outliers. | |
ID: 34358 | Rating: 0 | rate: / Reply Quote | |
I noticed that most of my failed WU's occured on 1 of my 2 cards. Upon investigation I noticed that the card with the WU failures was running at a slightly lower voltage.Rather than mess with the voltage ( up till now both cards have worked well) I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed... | |
ID: 34361 | Rating: 0 | rate: / Reply Quote | |
I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed... Please try the same, Tomba. Lower the clock speed of the offending GPU by 13 or 26 MHz and see if it helps too. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 34365 | Rating: 0 | rate: / Reply Quote | |
Also as I mentioned the error rate is around 6% Check my results. Out of 66 results, I have 63 success, 1 failed SANTI, 1 failed NOELIA, 1 aborted. It seems SANTI and NOELIA are difficult tasks but not impossible. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34367 | Rating: 0 | rate: / Reply Quote | |
Impossible for me to find out if there's a relation, but I had 98% processor usage. Changued to 100% and got 2 Santi Errors... | |
ID: 34368 | Rating: 0 | rate: / Reply Quote | |
I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed... Dear MrS, You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity. Why do you want me to penalize Natans, that give 10% more credit than Santis? Why don't you fix the Santi problem?? [I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause] | |
ID: 34378 | Rating: 0 | rate: / Reply Quote | |
If you check my posts you can see that I am nagging about the Santi WU's LR and SR since summer. All problems on the GTX660. | |
ID: 34381 | Rating: 0 | rate: / Reply Quote | |
I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed... Hmmm. Take a wee hit on NATHANs for a big gain on SANTI? You're right, that is a preposterous proposal. <roll-eyes> Why don't you fix the Santi problem?? Why don't you fix it yourself? Why don't you install Linux and get an 11 - 12% boost on all your tasks, if you're genuinely in the credit chasing game and don't want to waste electricity. Sorry, I don't wish to offend, but to me it just doesn't make sense to cry about inefficiency when you're running an antiquated POS opsys like Win7/8. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34384 | Rating: 0 | rate: / Reply Quote | |
Some work units are easy, some are hard. You have to live with it; I actually like the harder ones better, since they exercise my card more and may do more challenging science(?). At any rate, I have just "downgraded" a GTX 660 to a base clock of 967 MHz (with corresponding reductions in the boost and maximum clocks), but also boosted the voltage on the core up from 1.162 volts to 1.175 volts to get it stable, and increased the upper power limit to 115% max. If that is what needs to be done, OK with me; I don't expect the scientists to design their experiments for the weakest cards out there. | |
ID: 34385 | Rating: 0 | rate: / Reply Quote | |
I don't expect the scientists to design their experiments for the weakest cards out there. I think you are missunderstunding me, I don't care if they are errors, I really don't care about how much credit I get. I don't expecto project to adapt to my old card. My point is that if there are too many errors is worth to investigate if there a way to correct them to avoide them. Nobody (nor user, nor scientists) wants to waste electricity. If there's some kind of units that don't fit in older cards we should know it. | |
ID: 34388 | Rating: 0 | rate: / Reply Quote | |
Yes, it would be convenient if they had a "worst case" work unit we could run, to see if our cards are stable on it. Then we could adjust the cards as necessary, or just accept the error rate for whatever it is. But I doubt that even the scientists know what the worst case really is, or what they will need in the future. | |
ID: 34389 | Rating: 0 | rate: / Reply Quote | |
You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. You can change it back when the problematic Santi WUs are cleared from the queue. In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity. It is said before, that this problem is *not* general (the overall error rate is low for these workunits), so these errors caused by a specific problem in your system not by the project, therefore: - the staff won't do anything about it (it may cause more errors than it fixes) - it depends on you if you accept our advice, and try to fix *your* problem or you take the frustration caused by the wasted electricity. Why do you want me to penalize Natans, that give 10% more credit than Santis? Lowering the GPU clock is a safe way to try to fix this error. You can increase the GPU voltage instead (no penalty), but it's risky because it will increase the power used by the GPU i.e. the temperature of the GPU. Why don't you fix the Santi problem?? Because there isn't a Santi problem from the project's point of view. [I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause] Me too. We appreciate that. That's why it is also important for us to fix your problem. GeForce cards are made for gaming, not for crunching. Their factory settings lets the gamer get the maximum performance from the GPU, sacrificing some stability (there's no problem, when there's a glitch in a game when you play for 8 hours, but it will ruin an 8 hour long workunit). If you lower your GPU's clock, and it makes your host capable of crunching all and every workunits error free, your RAC (your daily contribution) will be higher than when it's crunching a little bit faster, but some workunits failing in exchange (also your frustration will be lower). | |
ID: 34391 | Rating: 0 | rate: / Reply Quote | |
You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. I'm not caring about the points but rather stability of the systems involved. Until now, I didn't have to worry about running GPUGrid work units. I now have three boxes I have had to pull due to these work units. It isn't just 6xx series cards as I have pointed out. It also effects the GT430's as well. I run all of my cards at stock speeds and don't over clock my CPU's either. I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI. This is something that should be able to be fixed within the application. Or GPUGrid should allow users to decide whether to run SANTI or NATHAN work units when available via preferences. Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further. ____________ | |
ID: 34395 | Rating: 0 | rate: / Reply Quote | |
You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. You miss the point. We know you don't OC your cards but perhaps the manufacturer did. Anyway, all that irrelevant when a slight downclock or voltage boost will likely fix your problem. I say your problem because most of us aren't experiencing any problem with SANTI. I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI. That is a blatant exaggeration. Nobody expects "everyone else" to tweak their systems. They expect only the very few who have problems to tweak their system. Why do you ignore the many hundreds of systems on which SANTIs run with no problem? Why should the admins tweak SANTI tasks or the app just to spare 1% of systems grief when doing so could mean SANTI starts crashing on the 99% that have no problem with current SANTI? They could provide separate queues for SANTI but that creates more problems than it fixes because the next time your improperly configured system runs into what you think are bad tasks you'll want them to spend more time creating yet another queue. That makes no sense at all when you could solve the problem easily on your end. This is something that should be able to be fixed within the application. Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree. Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further. Technically 3 or 4 fits the definition of multiple but that is irrelevant when SANTI isn't a problem for 99% of systems. Go figure. A better solution might be to run a script that watches your queue and aborts SANTI tasks the minute you receive one and continues aborting SANTI until you receive a NATHAN or whatever tasks work for you. The only potential problem I see with that solution is that if you abort too many tasks the server might make you wait 24 hours before it sends more, maybe but maybe not, I'm not sure how they have that configured. Another possible option is a script that watches your queue and automatically tweaks your card one way just before it starts a SANTI and then tweaks it a different way when it receives a NATHAN. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34397 | Rating: 0 | rate: / Reply Quote | |
This is something that should be able to be fixed within the application. Reading all this thread, I'm not sure if there's a problem with OC cards or complex units. I will stop my computer and read carefully the weekend to understand what's going on. But I would like to point something. Whats wrong if you want to be a "passive" cruncher? and you don't want to worry about Linux, scripts or whatever. what's wrong with this? I think you can't expect to have every user on this proyect to be a geek on hardware. Of course people with best machines are very familiar with this Project, but there's another profile of BOINC cruncher. People with multiple Project don't want to get in profound with technical specs. I think an easy solution should be proposed for these people. Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO ____________ HOW TO - Full installation Ubuntu 11.10 | |
ID: 34400 | Rating: 0 | rate: / Reply Quote | |
This is something that should be able to be fixed within the application. There is nothing wrong with that. Nobody here has said there is something wrong with that. I think you can't expect to have every user on this proyect to be a geek on hardware. I agree. For this problem with SANTI tasks crashing there is an easy solution. That solution is the solution proposed by Retvari. If that solution is too difficult for some people then they can ask for help implementing it. Asking the project devs to fix their problem is not, IMHO, a reasonable solution unless their problem also afflicts many other users. This problem with SANTI is limited to just a few users. How do I know that? I know because if it were a widespread problem a lot more people would be complaining and the admins would be able to see it in the stats they collect. Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO If you want a script then ask and I will try to provide one unless the project admins think it's harmful to the project. A script to auto-abort SANTI tasks as soon as they download is easy but maybe not the wisest approach. A script to adjust the clock down or the voltage up when you receive a problem task (SANTI for example) and return clocks/voltage to normal for other tasks would be harder to implement but I am sure there is a way. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34401 | Rating: 0 | rate: / Reply Quote | |
Dagorath, glad you are still trying to make a few suggestions and stick to that very narrow mindset. My reasoning for the app change rather than making users make tweaks locally for one sub project is more focused towards those machines that don't have easy access and can't be tweaked remotely on a day to day basis. As far as giving people the option to choose between SANTI and NATHAN adding more problems, that is yet to be seen. Until then, it is only opinion which really isn't worth arguing. In this case if the option was a choice, GPUGrid would still have 4 more GPU's from me crunching away. I'm sure others who are having difficulty would do the same. I have no idea the true numbers of people experiencing problems because not everyone posts in the forums. I don't even know if the techs here look at the work units I have aborted that were causing the BSOD's because they didn't get to "error out" and therefore would not show up that way. Instead it would show up as a user abort and someone else who didn't have BSOD issues could finish it. I'm not saying my cards might not be overclocked by the manufacturer. So, I can assure you that "point" was not missed. I just didn't address it in my above statement. I have made my choice in regards to tweaking my cards and have expressed my opinions (which is what they are regardless if you like them) on how I feel about the issue at hand. Please choose to ignore them if you don't like my approach. | |
ID: 34402 | Rating: 0 | rate: / Reply Quote | |
Coleslaw, | |
ID: 34403 | Rating: 0 | rate: / Reply Quote | |
To quote Statler and Waldorf, "You're not old, but your ugly!" | |
ID: 34406 | Rating: 0 | rate: / Reply Quote | |
As I wrote some time ago - this project is no longer under quality control .. | |
ID: 34407 | Rating: 0 | rate: / Reply Quote | |
@skgiven To quote Statler and Waldorf, "You're not old, but your ugly! I think the problem is not this. I think you are seeing this from a very narrow point of view. I think there are many different kind of: - profile of users (motivations) - way they see problems with units (tolerance with errors) - different kind of technical knowledge (hardware and sofware) - appetite for problems (wishing to push hardware or find solutions as a hobby, time). Whatever profile one might have or motivations, everyone adds, I wish "the project" could be more comprenhesive with all of them. I'm sure that maybe that the profile of TOP 10 with huge riggs don't see any problem, and probably they contribute with 80% of computing power. But I beleave everyone adds. I feel very stupid posting an error. I don't expect that everything is smooth, but if I post is becouse I give my time to help. If you give the sensation that problems are not pursuived and investigated many people will quit and you will loose some users little by little. Of course others will come back. I left and came back. To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me. I clarify that I'm not complaining and I understand that the project has ilimited ressources. Maybe organizing everytalented people here could help. As I wrote some time ago - this project is no longer under quality control .. This is a huge xxxxxx, I bite my tongue. Jozef, you have no idea what's going on or what the problem is. In Spain there's 1% people have siesta, and it's been proved that this very good for the body and mental sharpness. | |
ID: 34408 | Rating: 0 | rate: / Reply Quote | |
To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me. The admins have said they will look into it and IIUC, they have also indicated that it's not likely anything can be done and I believe the reasons have been covered. Therefore, tomba, I think you got exactly what you say you want. In addition to the above, other solutions have been offered. Narrow minded is as narrow minded does. A few of us have tried to broaden the options. That is what we have done. Others have ignored all alternative options and focused upon the 1 option the admins have politely indicated they're not gonna get. Is that broad thinking or narrow thinking? I am sorry if some volunteers installed hosts in remote locations and failed to do the smart thing and configure them to allow remote access and administration. Hopefully they can fix that and do better next time they setup a remote host. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34419 | Rating: 0 | rate: / Reply Quote | |
I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time.. As for my WU detailed above. It finished fine for the next guy to get it, interesting since his machine has TONS of errors. I cut the clocks by 25Mhz and 5 SANTI_bax2 WUs have since completed fine on that GPU. I suspect that perhaps this WU type stresses the GPU slightly more than most so that GPUs "on the edge" are more likely to error. Anyway, I've had 167 valid and 1 error lately (on 8 machines). I'd say the project is running pretty smoothly (at least here). Haven't had that strange bluescreening on any machine before or since. History quiz: Does anyone remember when MS ballyhooed long and loudly that they had solved the "black screen of death"? Remember the solution? | |
ID: 34433 | Rating: 0 | rate: / Reply Quote | |
Have been crunching GPUGrid WU again since about a week ago with two machines, three since yesterday, and had no problems with SANTI, NATHAN, NOELIA or SDOERR. I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good. | |
ID: 34469 | Rating: 0 | rate: / Reply Quote | |
I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good. That's the spirit. Unfortunately you are never quite sure that you have done enough until you eventually don't get any more errors. But my GTX 660s are now working fine for me, and I hope they stay that way. With the variability we see in the work units, you never know though. | |
ID: 34472 | Rating: 0 | rate: / Reply Quote | |
Wow they have really harder requirements on that SANTI Batch it seems. I run one successfully with 50mV overvoltage on the 560ti 384core. Until now, all cards run with +25mV and computed successfully with this setting, Santis too. But only this card needs more. But +50mV needs serious cooling.. im nearly at full fanspeed and over 80degress with open case and no extra heating in the flat. | |
ID: 34478 | Rating: 0 | rate: / Reply Quote | |
Over 80C is too hot for me. I like mine 70C max. | |
ID: 34481 | Rating: 0 | rate: / Reply Quote | |
Wow they have really harder requirements on that SANTI Batch it seems. That's what I'm thinking too. Had another one that caused the machine to reboot continuously until I caught it. This time a SANTI_MARwtcap. The only way to stop the cycle is to abort the WU. Lowered the clocks yet again and the next one is running fine. In fact that machine is currently showing 20 valid and the 1 error WU that caused constant bluescreens. NVIDIA GeForce GTX 650 Ti (1024MB) driver: 331.82. | |
ID: 34516 | Rating: 0 | rate: / Reply Quote | |
Locked again on a workunit, i stopped again with this card on gpugrid and changed it back to einstein :/ will try again in one or two weeks. | |
ID: 34518 | Rating: 0 | rate: / Reply Quote | |
After 17 days a Santi resulted in one 660 to down clock. And only 2 errors in 19 days. But today my rig with to 660's was booted when I found it. After logging in it booted immediately when BOINC started a few times, so I went to Windows in safe mode where I have some more time to abort the task. As I didn't know which one, I aborted both. Its now happily crunching again. | |
ID: 34566 | Rating: 0 | rate: / Reply Quote | |
Too many errors for me......stopping here. | |
ID: 34568 | Rating: 0 | rate: / Reply Quote | |
Just now I found my rig with two 660's frozen. No Idea when it happened, even ctrl-alt-del didn't work. After booting immediately message that the graphics drives has crashed and recovered, three times in a row and then it booted itself again. After three attempts I got BOINC to stop. I am now installing the latest beta driver but that should not be necessary and it ran for more then a month with the 331.82 driver. | |
ID: 34571 | Rating: 0 | rate: / Reply Quote | |
Again I found my rig with two 660's frozen. | |
ID: 34578 | Rating: 0 | rate: / Reply Quote | |
TJ, you should lower the GPU frequency of those 660s, or increase the GPU voltage by 12mV. | |
ID: 34579 | Rating: 0 | rate: / Reply Quote | |
Boost the voltage and produce more heat or get crunching on Linux. Take a look at my results. I have errors but 99% of those are tasks I aborted because I played with stuff and ended up with too many tasks in my cache or other reasons. I have two 670 and one 660Ti on Linux and they almost never crash SANTI tasks. I'm running the stock clock speeds and if I keep the temps below 70C the clock boost thing kicks in regularly. They hardly ever crash on any task and if they do the OS doesn't hang, BOINC continues running, another GPUgrid task downloads and starts and life carries on. | |
ID: 34580 | Rating: 0 | rate: / Reply Quote | |
Again I found my rig with two 660's frozen. I have had those problems too. The main reason seems to be that the 660s were bumping up against their power limit, causing them to be starved for current on the tough portions of the hardest work units. Increasing the power limit to 110% by using Nvidia Inspector has largely solved the problem for me on the two cards (a Zotac and a Gigabyte) that I now use for GPUGrid, without the need for any other changes: http://www.gpugrid.net/results.php?hostid=159002&offset=0&show_names=1&state=0&appid= But they are often overclocked too much at the factory for the work here, and on another of my Zotac 660s I also have had to reduce the clocks a little (GPU clock from 993 MHz to 950 MHz, and memory clock from 3004 to 2804 MHz) and also bump up the core voltage (from 1.162 to 1.175 volts). For some reason on the Zotacs the software control utilites (such as Nvidia Inspector or MSI Afterburner) do not work to change the voltage, and I had to modify the BIOS with Kepler BIOS Tweaker, and then flash it into the video card with nvflash. You can use GPU-Z to first make a copy of your present BIOS that you then modify (keep a copy of your old BIOS as a backup). If you don't want to deal with that, just reduce the clock frequency, first on the GPU clock and then on the memory clock if necessary, until it is stable. | |
ID: 34581 | Rating: 0 | rate: / Reply Quote | |
Hi Guys, | |
ID: 34582 | Rating: 0 | rate: / Reply Quote | |
You can start a snowball effect with system freezes and BSOD's if you don't run checkdisk after getting those errors, too many orphaned files or wrong time stamps and such just causes more and more problems. If you don't do that, the errors your getting now could be related to it. | |
ID: 34583 | Rating: 0 | rate: / Reply Quote | |
No BSOD but again a frozen system. This time by a Noelia on my 660's rig. I did manage to get the clocks down with Pricison X from EVGA, but after a while they boost automatically again. Trying to do it with MSI Afterburner, shows only one card, there is nowhere I click that I can see the settings of my second card. | |
ID: 34613 | Rating: 0 | rate: / Reply Quote | |
In afterburner, click Settings (bottom right corner of the left pane) and then you can change the GPU under the General Tab. | |
ID: 34614 | Rating: 0 | rate: / Reply Quote | |
History quiz: Does anyone remember when MS ballyhooed long and loudly that they had solved the "black screen of death"? Remember the solution? They made the color a user configurable option then told everyone if they get a black screen of death it's their own damn fault. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34617 | Rating: 0 | rate: / Reply Quote | |
In afterburner, click Settings (bottom right corner of the left pane) and then you can change the GPU under the General Tab. Thanks skgiven, found it and used it! ____________ Greetings from TJ | |
ID: 34618 | Rating: 0 | rate: / Reply Quote | |
Just had to move my 560ti onto the short runs | |
ID: 34644 | Rating: 0 | rate: / Reply Quote | |
Your clocks are too high. Try 1644Mhz for the processor & 2004Mhz for the memory. | |
ID: 34645 | Rating: 0 | rate: / Reply Quote | |
Probably would be useful if people having a problem with a self overclocked card would return it to stock clocks before reporting problems with WU. | |
ID: 34646 | Rating: 0 | rate: / Reply Quote | |
These clocks arent oc on 560ti 384 O.o try +25mV first but it can fail then too anytime. Try to underclock then. But its not a must it works then ;( | |
ID: 34649 | Rating: 0 | rate: / Reply Quote | |
Reference GTX560TI works perfectly http://www.gpugrid.net/results.php?hostid=160845 | |
ID: 34650 | Rating: 0 | rate: / Reply Quote | |
Every chip has other tolerances ^^ | |
ID: 34653 | Rating: 0 | rate: / Reply Quote | |
Every chip has other tolerances ^^ True, even every card from the same type and same brand has its own tolerances I have experienced. ____________ Greetings from TJ | |
ID: 34654 | Rating: 0 | rate: / Reply Quote | |
It seems every type of task has its own tolerances too, am I right? | |
ID: 34655 | Rating: 0 | rate: / Reply Quote | |
Many santi jobs failed with a BSOD, the nvidia driver also fell, | |
ID: 35565 | Rating: 0 | rate: / Reply Quote | |
One of my Linux systems fails SANTI_MAR tasks quite regularly. Other tasks run fine, but I'm mostly getting SANTI_MAR tasks. Both the short and long tasks fail (but not all). ATM ~2 fail per day |:( | |
ID: 35598 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : SANTI Errors