Author |
Message |
|
http://www.gpugrid.net/workunit.php?wuid=1005808
If you take a look at these errors, it's not just my own PC. It's everyone elses. This not a PC problem. It's a GPU problem, and I'd appreciate somebody taking a look into this.
____________
|
|
|
|
Another example:
http://www.gpugrid.net/workunit.php?wuid=1009407
____________
|
|
|
|
I got some problems too on two PCs. Sometimes after 6-10s of crunching. Sometimes after 40h : very disagreable ...
Please do something. You should ...
Regards
____________
|
|
|
|
Sparkle GTX 250 1Go, no oveclok FAN 90% 54°C
BOINC 6.1.21 / 195.62 /
Q9550 4Go DDR2@333.3 1:1 XP 32 bits SP2
example: http://www.gpugrid.net/result.php?resultid=1611786
a lot off WU in error after less than 10 secondes....
Cuda error: Kernel [shake_step_1] failed in file 'shake.cu' in line 79 : unspecified launch failure.
!!!!!! Dammed GTX 250
____________
|
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
You should update drivers to the latest drivers.
This will let you receive the cuda23 application which should solve your problems.
gdf |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
You should get your video card drivers directly from NVidia rather than using a Microsoft update service!
http://www.nvidia.co.uk/Download/index.aspx?lang=en-uk
Or similar, for different regions. |
|
|
jphelanSend message
Joined: 20 Jul 08 Posts: 4 Credit: 4,082,270 RAC: 0 Level
Scientific publications
|
Boy, do I have a flash for you! I've used, " cuda23 ". It still doesn't work! When I use it with SETTI I have no problems. I've discontinued crunchinng numbers for GPUGRID until you guys get your act together.
jphelan1242@hotmail.com
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Your GTX9800+ card uses a G92 core, and there is a CUDA bug that causes issues with G92 cores and CUDA 2.3.
As you are using driver version 19107 this bug will be exposed more than with more recent drivers. If you do decide to try to run that card, you should update the driver and select Boinc Preferences so that the card does not crunch GPUGrid tasks when you are using the computer:
I fould that Crunching and playing videos dont go together on the G92 cores. This might also include surfing; with all the online media content these days. |
|
|
|
I have two PC's with GPU Cards GT130 upgraded with the last version from nVidia (and not from MicroSoft). One of my PC is under Vista, the other Windows 7 (much better !). They are not overclocked and GPUGrid computes only when the PC is idle.
But nevertheless, two WU for 3 run into a boring compute error.
Doctor, is it normal ? Which one is sick ? GPUGRID or my two computers ? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Your GT130 cards uses a 65nm GPU core (G96M). Relative to other cards of this level, it will be more prone to heat & cooling problems. It may also struggle to complete tasks in time; it only has 32 shaders. Even if the system was on 24/7 some tasks would take 3days to complete. It is asking a lot of any program to run for 3 days without any glitch, so to run one set of calculations for that time is always likely to be error prone.
That said, most problems are being caused by a CUDA bug. When GPUGrid moved to CUDA 2.3 this new bug seemed to raise its head, and was at first difficult to identify. It seems to primarily effect G92 cores, but is known to cause problems with the GT200 (Not GT200b) and obviously the G96m cores to some extent too.
After checking dozens of tasks from many people, the TONI_HERG tasks tend to fail more than others, but this is not to say that other tasks will not also fail.
I would suggest that you keep an eye on the tasks arriving at your system and abort any TONI_HERG tasks that come in. It would also be a good idea to make sure you do not receive more than one task at a time. By the time you finish one task the others deadline will be rapidly approaching!
I see from your message that you have already implemented the other good suggestions (dont run tasks when system is in use). I managed to improve my GTS250 performance from 25% lost time to about 11.5% lost time. It is still improving. The techs also looked at reducing some task lengths, and were at least asked to look into an allocation system based on the cards people have.
One last thing to watch out for is updates, these tend to force applications to close and restart your system. Forcing applications to close crashes tasks!
So a bit of PC management might improve things.
Good luck,
|
|
|
|
Thanks for your very detailed answer.
I already used to keep an eye to the deadline of each WU and I allow new tasks only when the current WU is near its end.
According to your advice, I'll try to setup the Boinc Manager to "Leave application in memory while suspended", because I suppose that pausing and restarting could have the same affect a closing application.
____________
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Leaving the application in memory while suspended is a good idea. If you close Boinc and then open it again, tasks resume from their last saved positions. This could be 1 second ago or more likely, several minutes ago. So if someone kept stopping and starting, and did not keep tasks in memory, they might not get through any tasks before the deadline. |
|
|
|
My copy of cuda23 fails with this error:
ERROR: mdsim.cu, line 101: Failed to parse input file
called boinc_finish
is this a known issue ?
http://www.gpugrid.net/workunit.php?wuid=1037695
http://www.gpugrid.net/workunit.php?wuid=1037822
http://www.gpugrid.net/workunit.php?wuid=1037814
etc.
____________
Join team Bletchley Park, the innovators. |
|
|
|
You had six in a row on host 34464 - all IBUCH_reverse1_pYEEI.
I just got one from that sequence on host 45218, but the next was a GIANNI_BIND which is happily crunching.
I think it must be a bad batch of WUs - if it wasn't a known issue before, I hope it is now. |
|
|
|
And similarly on host 43404.
IBUCH_reverse1_pYEEI failed, following KASHIF_HIVPR running fine. |
|
|
|
Thank you for that clarification. I did indeed get a new set of files to crunch, they seem to work fine now. |
|
|
ignasiSend message
Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level
Scientific publications
|
Noticed.
They should be cancelling.
thanks,
i |
|
|