Message boards : Graphics cards (GPUs) : unspecified launch failure
Author | Message |
---|---|
I get the following error every so often on This Box It's a BFG 8800GT OC running at the speed when I bought it ... | |
ID: 3821 | Rating: 0 | rate: / Reply Quote | |
Cuda error: Kernel [frc_sum_kernel_dihed] failed in file 'force.cu' in line 252 : unspecified launch failure. I've received the same issue on a single task recently and I've never seen it before. I do have both 8800GT's OC'd some, but I haven't changed that in well over a month. I wouldn't think it is related. I've since completed a couple WU's fine, so I just chalked it up to something strange happened at one point in time. If it happens again, I'll have more reason to be concerned. http://www.gpugrid.net/result.php?resultid=115911 | |
ID: 3824 | Rating: 0 | rate: / Reply Quote | |
My first error with this log on 8800GT 1GB: | |
ID: 3870 | Rating: 0 | rate: / Reply Quote | |
Just found out that my WU which crashed this morning (near before it was finished!) had the same error: | |
ID: 3871 | Rating: 0 | rate: / Reply Quote | |
Another one killed itself with such a message. | |
ID: 3916 | Rating: 0 | rate: / Reply Quote | |
These are the same wus as before. Have you updated the drivers? Which drivers do you have? | |
ID: 3917 | Rating: 0 | rate: / Reply Quote | |
It's driver version 177.84, no change since I started crunching here. | |
ID: 3918 | Rating: 0 | rate: / Reply Quote | |
2 observations: | |
ID: 3924 | Rating: 0 | rate: / Reply Quote | |
2 observations: Well, you could be right. First WU after the driver change did run fine so far. I will crunch two, three other WUs to see if the error appears again. If yes, I will take the shader rate down a bit. As you may have readed in one of the other threads, RivaTuner accidentally did take down my shader rate without my knowledge and the WUs took much longer, but all finished without problems... ____________ Member of BOINC@Heidelberg and ATA! | |
ID: 3929 | Rating: 0 | rate: / Reply Quote | |
Next one: | |
ID: 3935 | Rating: 0 | rate: / Reply Quote | |
As I have read and heard many times Riva is not best software and can make problems with Nvidia graphic cards. Specially with newest GPU. I`ve get similar problems as long as I`ve used Riva. I would like to suggest you to use nTune which can give you bigger chance for `correct` OC. This software makes my GPUs really stable after hard OC (3x280GTX 600@702MHz). | |
ID: 3936 | Rating: 0 | rate: / Reply Quote | |
As I have read and heard many times Riva is not best software and can make problems with Nvidia graphic cards. Specially with newest GPU. I`ve get similar problems as long as I`ve used Riva. I would like to suggest you to use nTune which can give you bigger chance for `correct` OC. This software makes my GPUs really stable after hard OC (3x280GTX 600@702MHz). I havent OC my card, I will try ntune next time and uninstall RivaTuner to see if that issue is still present, but I have read some with linux got this error too. | |
ID: 3937 | Rating: 0 | rate: / Reply Quote | |
- people got the error before 6.3.21 I cannot explain me why?! And after loosing many hours, why this error is not coming on start ^^ It is a temporary error on your machine. That means normally your machine is fine and the WUs are (normally) fine for others. That the error occurs after many hours of crunching tells you that probably something goes wrong during the calculations. It's not a permanent error, it's a "transient" one. Such errors may be caused by really weird software constellations, bit-flips in the chip due to cosmic rays, hardware design faults which only occur in rare, exceptional situations (e.g. for CPUs several interrupts at the same time etc.) or by a chip which is just borderline to become unstable in the balance between clock frequency, voltage and operating temperature. Saying "but it was stable for ..." does not really help. It could be that a few transistors are worse than the others (or have degraded more over time) and fail every 10^15 cycles or so, leading to a "mean time between failures" of days. - And I don't think the mere presence of RivaTuner causes these errors. I mean, it's not even running all the time, is it? Also Rebirthers GPU is *old* enough (G92) to be supported properly. I`ve get similar problems as long as I`ve used Riva. Which problems do you mean exactly? The "unspecified launch failure"? I would like to suggest you to use nTune which can give you bigger chance for `correct` OC Well, RivaTuner and (I think) Everest are the only tools which can show you the real clock of your NV card, all others only show you the clock which you request from the system. The real clock is adjusted in steps. So if you can clock higher using nTune it may be that you're just below the next step, where it would become unstable. The internal clocks would be the same, but the number shown to you would be higher, hence it seems to be a higher OC. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 3941 | Rating: 0 | rate: / Reply Quote | |
- people got the error before 6.3.21 Factory oc, yes, but this is not a problem, you got also this error as many others too on newer cards or old ones, I dont think this is a hardware failure in all models of cards?! I have asked on alpha mailing list about this issue to limit the error, still waiting for an answer, so is it the hardware, boinc client or the project application? Drivers and other programs can be excluded. The GPU waiting for CPU could be an issue so the WU abort by itself with this error because it can not crunch furthermore from the last point. Update: thx to nicolas to pointed out its not the boinc client, app 6.48 with 0% error rate, 6.52 with 20% error rate. @GDF: can you check the application code to find out whats wrong? Or can you switch back to the old app? | |
ID: 3947 | Rating: 0 | rate: / Reply Quote | |
Factory oc, yes, but this is not a problem How can you be sure? Hardware errors can pop up quite seldomly. These are actually the hardest to detect, because you can never be sure if (i) your test software can reproduce the error at all and (ii) you tested long enough. you got also this error as many others too Yeah, I also noticed this one yesterday.. and guess what, I'm also running OC'ed. I dont think this is a hardware failure in all models of cards?! Not every OC'ed card produces these errors, don't they? I have asked on alpha mailing list about this issue to limit the error, still waiting for an answer, so is it the hardware, boinc client or the project application? Drivers and other programs can be excluded. I agree that we can exclude drivers and other programs. However, I'd also suspect that the BOINC client has absolutely nothing to do with this. It just launches the aecmd_.exe and all further CUDA related launches are done by the science app. The GPU waiting for CPU could be an issue so the WU abort by itself with this error because it can not crunch furthermore from the last point. Sounds somewhat unprobable. The GPU can not talk to BOINC, so if the CPU app stops working then "noone" would tell BOINC that an error happened. It would likely detect after a short time that the app has quit and restart it. This is the point where some trouble may be caused, when the GPU / driver is a strange state because the CUDA app was not terminated properly. Is this just a guess on your side or do you have anything hinting at such a scenario? Update: - Where do you get that 20% error rate from? - I also had another one of these "unspecified launch failure" errors - with app 6.45. - Switching back to the old app is probably not feasible, since there were changes in the science code. - Oh, and who's Nicolas? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 3973 | Rating: 0 | rate: / Reply Quote | |
- 20% is my error rate estimated from last calculation - Nicolas Alvarez, also a developer of BOINC/Primegrid/IMP/Renderfarm - we must sort out what was changed in code and causes this error - cannot find any scenario yet (removed rivatuner, installed ntune), will see what happens... (2 cores running vmware with ubuntu linux 64bit + ABC, other 2 cores BOINC in windows with GPU + Milkyway, RCN, yoyo evo) | |
ID: 3977 | Rating: 0 | rate: / Reply Quote | |
Yeah, let's get some new hard facts. But by saying - we must sort out what was changed in code and causes this error you imply that you already know it's the science apps fault. We can not know that yet. I think it's not the app, because these errors happen with different clients and the WUs run fine on other machines. .. gotta go to bed for today ;) MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 3981 | Rating: 0 | rate: / Reply Quote | |
Well, you could be right. Well, after having finished three WUs without a problem (see here, here and here) now I have the error again with this WU, fortunately very early during the crunching. After looking on my host-list it seems the error comes in repeatedly times and is not caused by something special. Okay, I will reduce my shader clock now to see if it breaks the rule then. ;-) ____________ Member of BOINC@Heidelberg and ATA! | |
ID: 4022 | Rating: 0 | rate: / Reply Quote | |
Well, the period of succesful WUs between failures is anything between 2 and 6.. I'd rather call that a guideline ;) | |
ID: 4024 | Rating: 0 | rate: / Reply Quote | |
I had another one, luckily in the beginning of the WU. I scaled back the OC and will see what I get. | |
ID: 4080 | Rating: 0 | rate: / Reply Quote | |
Well, looks like I just had the first of these...http://www.gpugrid.net/result.php?resultid=140145 | |
ID: 4108 | Rating: 0 | rate: / Reply Quote | |
I highly doubt it and I can tell you for sure that I'm not running prime grid (QMC & Milkyway). | |
ID: 4119 | Rating: 0 | rate: / Reply Quote | |
Well, this time it took a little longer, until it appeared again for me - to be precise 4 successful WUs were between, but now I have one again. | |
ID: 4128 | Rating: 0 | rate: / Reply Quote | |
Yes, 100 MHz should do. I'd also take the core down a bit (like I probably wrote somewhere above) .. approximately 50 MHz. | |
ID: 4145 | Rating: 0 | rate: / Reply Quote | |
After a while and some finished WUs with an uninstalled RivaTuner, installed ntune + BOINC 6.4.2 I have no more errors seen yet. So a possible error reason could be RivaTuner. | |
ID: 4248 | Rating: 0 | rate: / Reply Quote | |
No, I still have RivaTuner installed and got the error. No other ones since I clocked down, but then they were not frequent to begin with (2 in a couple of months). | |
ID: 4436 | Rating: 0 | rate: / Reply Quote | |
Does anyone of you also chrunch seti@home beta cuda app? | |
ID: 4443 | Rating: 0 | rate: / Reply Quote | |
DrNow, | |
ID: 4995 | Rating: 0 | rate: / Reply Quote | |
you're still getting many errors and it does not seem to matter much if you run 1.8 or 1.7 GHz shader clock. Did you also downclock core and memory for this test? What's the temperature of your chip while crunching? Hi ETA. Well, as last said here I also think it maybe a hardware problem from my graphic card, but I have to stick with it for a while, I can't buy a new, bigger one as my current case isn't suitable for that (power supply is directly behind the card! ). To the WUs: The last "unspecified launch failure" is from 12-22-2008. I don't know why this one later failed, it's no "ULF" as you can see. And this from 23rd has another failure message, obviously a WU-error. Besides that, I didn't had much time over the christmas days to continue the tests. Strangely enough, the shader clock went back to 1.8 GHz without any doings from my side some days ago. Maybe my changes from Windows to Linux and back did that without my knowledge. (Under Linux I don't have configured BOINC for CUDA yet, but in the next days I guess I will try out, as openSuse 11 is now better supported) To your questions: I didn't change GPU or memory clock during my tests. And the temp while crunching lies at 70° to 75°C, a good value I think. Previous app versions did take my 9600GT up to 90 and 100 degrees. ____________ Member of BOINC@Heidelberg and ATA! | |
ID: 5019 | Rating: 0 | rate: / Reply Quote | |
Alas :(, the new application don't use the possible power of the graphic cards on a Windows system. I had to go back to 3+1 to avoid, that my card runs much longer (up to 30%) than before. GPUGrid needs a lot of calculation power, so don't let us make the rules on the basis of the little graphic cards. On this speedway are too much big cars, so we can't make the rules on the basis of the infants tricycle ... ____________ | |
ID: 5021 | Rating: 0 | rate: / Reply Quote | |
I didn't change GPU or memory clock during my tests. You'd have to do that as well to get some meaningful results. And the temp while crunching lies at 70° to 75°C, a good value I think. Previous app versions did take my 9600GT up to 90 and 100 degrees. Yes, 70 - 75°C has to be fine. I could imagine one getting errors at 90 - 100°C if the chip is not very good, but at 70°C the card should be rather tolerant to higher clock speeds. If I were you I'd lower shader/core/mem clock by 100/50/50 MHz and let it run for at least 10 WUs. If you don't get errors during this time we could be getting somewhere. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 5024 | Rating: 0 | rate: / Reply Quote | |
If I were you I'd lower shader/core/mem clock by 100/50/50 MHz and let it run for at least 10 WUs. If you don't get errors during this time we could be getting somewhere. Okay, with the next WU I will start another experiment and will that try out. ____________ Member of BOINC@Heidelberg and ATA! | |
ID: 5030 | Rating: 0 | rate: / Reply Quote | |
Nice sig-pic, btw ;) | |
ID: 5035 | Rating: 0 | rate: / Reply Quote | |
Thanx, ETA. :-) | |
ID: 5052 | Rating: 0 | rate: / Reply Quote | |
Well, still getting the ULF, no difference from my first test without adjusting memory and GPU clock. | |
ID: 5194 | Rating: 0 | rate: / Reply Quote | |
Mhh, it seems like the time between failures got a bit longer.. but it's too uncertain. Well, certainly not the clear-cut situation I had hoped for. | |
ID: 5216 | Rating: 0 | rate: / Reply Quote | |
Wow, already 14 days ago... :-) | |
ID: 5707 | Rating: 0 | rate: / Reply Quote | |
So it seems too high clock speed is really the cause for (some of) the unspecified launch failures. Let your card run a bit longer with these settings. And eventually you should increase the clocks again to the initial values and see, if the errors return -> double check. But don't hurry with that. | |
ID: 5714 | Rating: 0 | rate: / Reply Quote | |
I got this one. Neither the machine or graphics card are overclocked. It popped-up the "application error has occured" dialogue box asking if I wanted to send a report to Microsoft, as if they'd know what to do with it. | |
ID: 5718 | Rating: 0 | rate: / Reply Quote | |
In your case the WU failed with an unspecified launch failure after 1.3s and at a quite different file / line than in most other cases. I'd tend to say it's a similar symptom for a different cause. | |
ID: 5722 | Rating: 0 | rate: / Reply Quote | |
I'm running Boinc 6.5.0 on Vista x64 and i have same issue: If shut down Boinc-manager from both the icon or File/exit - it does not stop the apps, GPUgrid Seti or Prime that I'm running atm. The window shuts down but are still active in TaskMan (running 100% on one core). Must stop it manually in TaskMan. They stop only trough "advanced/shut down connected client" - Its the same behavior as it was with one of 6.2.16-18 So an reebot may crach the apps that not are properly shut down. BTW I have not got 6.6.0 to work for me at all. - It Starts but can't connect/start the apps. B. Regards Lazy | |
ID: 5724 | Rating: 0 | rate: / Reply Quote | |
In your case the WU failed with an unspecified launch failure after 1.3s and at a quite different file / line than in most other cases. I'd tend to say it's a similar symptom for a different cause. I was exiting BOINC and have the 'terminate science applications' check-box ticked so its supposed to work, but then it is a development version. I have logged this issue in trak a couple of weeks ago with some pretty screen shots for the BOINC developers. I do not blame it for the launch failure - just the mess when I shut it down. | |
ID: 5725 | Rating: 0 | rate: / Reply Quote | |
Must stop it manually in TaskMan. No, because, just as you said yourself: They stop only trough "advanced/shut down connected client" This is a an intentional change, decoupling BOINC manager and client. It's been in BOINC since some time. If it's a good choice or not is a different question. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 5726 | Rating: 0 | rate: / Reply Quote | |
I'm running Boinc 6.5.0 on Vista x64 and i have same issue: Try Advanced -> Shutdown running science applications. Wait a few seconds and see what happens. In my case it starts them all up again! Best if you also have Task Manager open at the same time so you can see if it does terminate them or not. BTW I have not got 6.6.0 to work for me at all. - It Starts but can't connect/start the apps. I haven't tried 6.6.0 yet. Is it only GPUgrid or does the "can't connect/start" issue also effect other projects? The BOINC developers have said they will be releasing 6.6.2 fairly soon with the changes to the work-fetch logic for (supposedly) better gpu work fetching. ____________ BOINC blog | |
ID: 5727 | Rating: 0 | rate: / Reply Quote | |
It opens up a window: Yes - this happens when I press "OK" No -it do not happen when i press "Cancel"
Yes, happens to all projects i tried with.
Sorry - I was not clear on this: When I have successfully shut down the apps (Advanced -> Shutdown running science applications). And I want to shut down Boinc-manager - the window disappears, the icon in the Notification area disappears - But are still active in Task Manager. So the only thing I know to stop Boinc-managre are to kill BM in TaskMan. Maybe this is off topic here, but ... Nice Crunching Lazy | |
ID: 5730 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : unspecified launch failure