Advanced search

Message boards : Number crunching : GTX 275 failing in Linux after 10 million points

Author Message
criadoperez
Send message
Joined: 5 Apr 09
Posts: 6
Credit: 47,381,921
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21836 - Posted: 16 Aug 2011 | 17:44:18 UTC

Hi guys,

I haven't been able to process GPUGRID lately, because my system freezes. Sometimes it takes a few minutes other a few hours and with luck one day.
System freezes also, but less, without using GPUGRID.

I have a GTX275 running on Fedora 15 (migrated recently from Ubuntu 10.04). Problem started after only 200.000 points in Fedora 15.

I initially thought the card died, but I inserted it in another PC with Windows 7 and it has already processed a few long run units without any issues. Also my linux machine with an old 8500GT is working stable, so the problem seems related to the GTX275.

Can this be a problem of the linux driver? Fedora? Or maybe some incompatibility?

I processed 10 million points with this card in this linux machine without any issues before.

Thanks in advance for your help. I'll looking forward to be able to crunch more GPUGRID as soon as I can!

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21837 - Posted: 16 Aug 2011 | 21:58:50 UTC - in response to Message 21836.

Hola: Una pregunta: Por lo que veo en este momento tienes funcionando la GTX275 con Windows7 y parece que funciona... es así...? la tarea última la has terminado con Windows.

Si con Windows funciona el equipo, es evidente que un problema de hardware no es, por lo cual puede ser un problema de Fedora y más si antes te funcionada bien con Ubuntu.

Fedora no lo conozco mucho (uso Ubuntu 11.04 y Windows7 64bits con GTX295 sin problema alguno) pero si creo recordar que los drivers para las tarjetas de video no estaban tan afinados como los de Ubuntu, no sea que generen problemas de temperatura o con el control del ventilador de la GTX275. Saludos.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21838 - Posted: 17 Aug 2011 | 1:51:39 UTC - in response to Message 21836.
Last modified: 17 Aug 2011 | 1:52:20 UTC

I have crunched about 20,000,000 points with my GTX 570 on Fedora 14 with no problems. I haven't tried Ubuntu. You may be experiencing a heat problem because my GTX 570 did not automatically increase the fan speed when the GPU became hot. The fan stayed at 40% even when the GPU temperature reached 90C. That was with 2 different drivers (I forget the numbers). I discovered a way to force the fan speed "manually" to 85% and I documented that method in this post. Of course I am suggesting your fan speed increased automatically on Ubuntu but it does not do so on Fedora. That would be strange but it is a possibility. At least check the temperature and fan speed when it's on Fedora and see.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21839 - Posted: 17 Aug 2011 | 9:17:56 UTC - in response to Message 21838.


Hi, In Ubuntu the NVIDIA proprietary driver controls the temperature and fan perfectly. Greetings.

criadoperez
Send message
Joined: 5 Apr 09
Posts: 6
Credit: 47,381,921
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21888 - Posted: 25 Aug 2011 | 22:25:23 UTC - in response to Message 21836.

Thank you for all your replies.

I'm processing again in Fedora with no problem at all. In case this happens to anyone else, the problem was in the driver.
In the new driver 280.13 that nvidia released the 1st of August for linux 64 bits, it included several bug fixes. The bugs of the older nvidia driver are the ones that caused my pc to crash.
In July, until this driver was released there is nothing I could do fix this problem.

I quote nvidia's update log:
Fixed a GLX bug that could cause the X server to crash when rendering a display list using GLX indirect rendering.
Fixed a GLX bug that could cause a hang in applications that use X server grabs.
Fixed an X driver bug that caused 16x8 stipple patterns to be rendered incorrectly.
Fixed a GLX_EXT_texture_from_pixmap bug that caused corruption when texturing from sufficiently small pixmaps and, in particular, corruption in the GNOME Shell Message Tray.

Post to thread

Message boards : Number crunching : GTX 275 failing in Linux after 10 million points

//