Advanced search

Message boards : Graphics cards (GPUs) : Memory leak in the 6.54_x86_64 for Linux?

Author Message
Profile Kokomiko
Avatar
Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4592 - Posted: 20 Dec 2008 | 1:53:27 UTC

My Linux box has a problem with the 6.54_x86_64. 4 GB RAM is not enough, other WUs are waiting for memory. The 6.54_x86_64 use all of my RAM and I found only 35 MB free. Never seen this problem with the 6.53. After a update from 6.4.2 to 6.4.5 (I still miss the 6.5.0 for Linux 64 bit) the failure is yet not present, I'm still waiting and have a eye on it. On Vista 64 the application 6.55 is only using 35 MB.
____________

Profile [AF>Libristes>Jip] Elgran...
Avatar
Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4606 - Posted: 20 Dec 2008 | 11:22:05 UTC - in response to Message 4592.

Same things for me on GTX280 and 8800GTS512 graphic cards.
I tried to increase ram settings, it seems to fix the problem.
This problem occurred on Q6600 (4Go ram) and celeron d420 (2Go ram) based computers.

Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4657 - Posted: 21 Dec 2008 | 8:51:09 UTC

My WU with 6.54 has grown, in approx 1 hour, from 50MB to 180MB and it keeps growing

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4659 - Posted: 21 Dec 2008 | 9:02:24 UTC - in response to Message 4657.

Try to use 6.5.0.

It cannot be a memory leak in the application if it disappear changing BOINC version. Now the Linux version is also out.

gdf

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4661 - Posted: 21 Dec 2008 | 11:30:44 UTC
Last modified: 21 Dec 2008 | 11:33:27 UTC

They released the 6.5.0 for Linux as 32bit, unfortunately there is no 64bit build...

I found the same problem now on one of my systems, BOINC 6.4.2 with acemd 6.54, memory usage is increasing pretty fast:


root@frickelbude:~# while true; do ps aux | grep acemd |grep -v grep; sleep 60; done
boinc 23292 8.2 2.8 81544 58488 ? RNLl 12:04 0:19 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23292 8.2 3.1 87168 64180 ? RNLl 12:04 0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23292 8.2 3.4 93400 70300 ? SNLl 12:04 0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23292 8.2 3.6 98600 75576 ? SNLl 12:04 0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
^C
root@frickelbude:~# invoke-rc.d boinc-client restart
* Stopping BOINC core client: boinc
...done.
* Starting BOINC core client: boinc
...done.
* Setting up scheduling for BOINC core client and children:
...done.
root@frickelbude:~# while true; do ps aux | grep acemd |grep -v grep; sleep 60; done
boinc 23802 42.0 1.8 61272 38172 ? SNLl 12:11 0:00 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.6 2.0 64480 41464 ? SNLl 12:11 0:05 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.4 2.2 70224 47152 ? SNLl 12:11 0:10 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.3 2.5 75828 52840 ? SNLl 12:11 0:15 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.3 2.8 81564 58536 ? SNLl 12:11 0:20 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.2 3.1 87208 64228 ? SNLl 12:11 0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.2 3.4 93404 70304 ? SNLl 12:11 0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.2 3.6 98616 75608 ? RNLl 12:11 0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.2 3.9 104368 81304 ? RNLl 12:11 0:39 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.2 4.2 110000 87004 ? RNLl 12:11 0:44 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 23802 8.2 4.5 115744 92696 ? RNLl 12:11 0:49 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0


System is the following:
http://www.sysprofile.de/id84658

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4666 - Posted: 21 Dec 2008 | 12:11:58 UTC - in response to Message 4661.

Which WU is that for?

gdf

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4671 - Posted: 21 Dec 2008 | 12:33:20 UTC

mC16040-SH2_US-5-40-SH2_US1720000_0
http://www.gpugrid.net/result.php?resultid=173191

Profile [AF>Libristes>Jip] Elgran...
Avatar
Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4682 - Posted: 21 Dec 2008 | 16:33:56 UTC - in response to Message 4671.

Look at this type of WU, host concerned and the other.
It's a bit frustating.

Bok
Send message
Joined: 31 Oct 08
Posts: 10
Credit: 6,090,581
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwat
Message 4684 - Posted: 21 Dec 2008 | 18:33:36 UTC

I've got the same problem :(

this host

Just started a day or so ago.

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4686 - Posted: 21 Dec 2008 | 19:06:32 UTC
Last modified: 21 Dec 2008 | 19:19:10 UTC

I have two units on my dualcore, which are mC16040-SH2_US-5-40-SH2_US1720000 and ME12403-SH2_US-4-40-SH2_US950000.

Both are waiting for memory now, the first is stuck at 77%, the second at 35%...

I will cancel the 35% one and hope I'll get a WU of another type. We'll see if that makes the difference.

On my other computer a GPUTEST6 unit is running without any issues...

edit:

can't get a new one :-(
"not available for your your type of computer, bla..."

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4687 - Posted: 21 Dec 2008 | 19:30:28 UTC - in response to Message 4686.

Please move to 6.4.5 and see if the problem disapper as reported by the first post.
6.4.5 now is safe after the server bug fixed.
gdf

DeleteNull
Send message
Joined: 28 Aug 08
Posts: 10
Credit: 142,385,295
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4688 - Posted: 21 Dec 2008 | 19:40:20 UTC
Last modified: 21 Dec 2008 | 19:45:33 UTC

I have the memory leak too.

My system: Intel-Quad-6600 Opensuse 11.0 bit, Boinc_6.5.0 64 bit, NVidia 260.

In BOINC/slots/0 (this is the slot which acemd_6.54 is using) you will find two files with the same size (output.dcd and output.vel.dcd) and the size of the files is exactly the amount of memory the "acemd_6.54 process" is using).

Both files (and the process) are growing and growing....

I don't have a windows system (with CUDA), so i cannot compare this with the windows-files......
____________

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4689 - Posted: 21 Dec 2008 | 20:04:55 UTC
Last modified: 21 Dec 2008 | 20:18:36 UTC

I upgrade to 6.4.5 now, the remaining task stays in "waiting for memory" mode and again I got:

So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: No work sent
So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.


My slot 14 which holds the stalled 77% WU is just 19MB...

edit:

I'm going nuts, without any help from my side the work unit continued crunching now. I wonder how, as there can hardly be more memory available than 30 minutes ago, when I restarted my system and the WU wouldn't start.
But again the app eats up my memory:
root@frickelbude:/var/lib/boinc-client/slots# while true; do ps aux | grep acemd | grep -v grep; sleep 60;done
boinc 12414 9.1 2.9 83496 60488 ? RNLl 21:10 0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 12414 8.8 3.2 89096 66128 ? RNLl 21:10 0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc 12414 8.7 3.4 94748 71764 ? RNLl 21:10 0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
root@frickelbude:/var/lib/boinc-client/slots# ps aux | head -1
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

DeleteNull
Send message
Joined: 28 Aug 08
Posts: 10
Credit: 142,385,295
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4690 - Posted: 21 Dec 2008 | 20:36:22 UTC

...after a few minutes the files in the slot are growing very slow, but the application is growing with the same speed from the beginning.

(more than 1 Gig)

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4724 - Posted: 22 Dec 2008 | 10:58:06 UTC

This morning my Quad was crunching a SH2_USPME-5 workunit (pN16075-SH2_USPME-5-40-SH2_USPME470000) and the memory usage was again increasing. I stopped it and started a GPUTEST6 unit (lY10341-GPUTEST6-1-20-acemd_0), the memory usage is stable, not a single Kbyte more after some minutes... Now I stopped the GPUTEST6 and started one SH2_US (to20339-SH2_US_1-5-40-SH2_US_1240000_0), immediately the memory usage starts growing...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4737 - Posted: 22 Dec 2008 | 17:29:30 UTC
Last modified: 22 Dec 2008 | 17:30:36 UTC

So we can say for certain that the linux GPU client or driver has a problem with certain WUs, probably independent of the BOINC client?
How many credits do these WUs yield?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4740 - Posted: 22 Dec 2008 | 17:56:34 UTC

I had already upgraded to 6.4.5 when I posted in this thread about the memory leak.

Btw, I am out of work now, and not getting new WUs. Bah.

Profile [AF>Libristes>Jip] Elgran...
Avatar
Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4754 - Posted: 22 Dec 2008 | 19:47:11 UTC - in response to Message 4740.

Another WU with memory exceeded error :
ZO25834-SH2_USPME_1-0-40-SH2_USPME_110000_0
Workunit 130854
Created 22 Dec 2008 10:31:25 UTC
Sent 22 Dec 2008 11:15:21 UTC
Received 22 Dec 2008 16:43:18 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -177 (0xffffffffffffff4f)
Computer ID 15576
Report deadline 26 Dec 2008 11:15:21 UTC
CPU time 7496.045
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Maximum memory exceeded
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 8800 GTS 512"
# Clock rate: 1620000 kilohertz
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

Could it be fixed soon please ?

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4777 - Posted: 23 Dec 2008 | 6:20:22 UTC
Last modified: 23 Dec 2008 | 6:23:08 UTC

I found now an acemd_6.57_x86_64-pc-linux-gnu__cuda process running, unfortunately memory usage is still growing over the time... CPU usage of that app is ~40% of one core, on a C2D 3,4GHz... Now Windows and Linux change roles? ;-)

Profile koschi
Avatar
Send message
Joined: 14 Aug 08
Posts: 124
Credit: 466,579,198
RAC: 362,756
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4794 - Posted: 23 Dec 2008 | 13:41:23 UTC

I'm running the 6.57 on a GPUTEST6 right now, no problems. Load is at ~8% (normal on my system) and memory is stable.
If I switch back to the SH2_USPME unit I get again increasing memory load and 40% load on one CPU core.

There is either something wrong with the WUs or the apps...
My BOINC Version is 6.5.0.

Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4819 - Posted: 24 Dec 2008 | 7:27:54 UTC

I'm crunching a WU that uses up to 15% of my CPU and has already eaten over 1GB RAM...

mer 24 dic 2008 00:45:32 CET|GPUGRID|Starting XYr2246-SH2_USPME_1-1-40-SH2_USPME_12370000_1
mer 24 dic 2008 00:45:32 CET|GPUGRID|Starting task XYr2246-SH2_USPME_1-1-40-SH2_USPME_12370000_1 using acemd version 657

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4823 - Posted: 24 Dec 2008 | 11:27:16 UTC - in response to Message 4819.

We reproduced the problem and implementing a fix.

g

Profile mike047
Send message
Joined: 21 Dec 08
Posts: 47
Credit: 7,330,049
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 4843 - Posted: 25 Dec 2008 | 11:20:03 UTC - in response to Message 4823.
Last modified: 25 Dec 2008 | 11:20:59 UTC

We reproduced the problem and implementing a fix.

g



I changed to 6.4.5 this morning and it "leaks" the same as 6.4.2. It take about 4 hours to devours all the memory and takes up all the swap. When it starts to write to the hard drive constantly, I re-run benchmarks and it releases the memory and most of the swap. The wu's are 6.57. It makes no difference if I run another project at the same time. I am running current drivers[177.82] for my card, XFX 9600GSO. Ubuntu 8.04 on a bone stock X2 3800.

Is there something else that I can do or is this a problem beyond my control??

mike

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4844 - Posted: 25 Dec 2008 | 11:38:11 UTC - in response to Message 4843.

We reproduced the problem and implementing a fix.

g



I changed to 6.4.5 this morning and it "leaks" the same as 6.4.2. It take about 4 hours to devours all the memory and takes up all the swap. When it starts to write to the hard drive constantly, I re-run benchmarks and it releases the memory and most of the swap. The wu's are 6.57. It makes no difference if I run another project at the same time. I am running current drivers[177.82] for my card, XFX 9600GSO. Ubuntu 8.04 on a bone stock X2 3800.

Is there something else that I can do or is this a problem beyond my control??

mike

You just have to wait until the fix is issued. So, yes, beyond your control for the moment.

Phoneman1
Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 4853 - Posted: 25 Dec 2008 | 16:26:33 UTC - in response to Message 4843.

Is there something else that I can do or is this a problem beyond my control??


While waiting for a fix could you schedule a task to run boinccmd, say, every three hours? I don't know Linux but in Windows the .bat file would be:

e:
cd e:\boinc
boinccmd --run_benchmarks

You'd need to change the first two lines to point to where boinccmd lives on your system. Runing this every three hours means you'd lose 8 minutes processing GPU and your other project(s) every day but you could lose a lot more if it crashes overnight.

Phoneman1

Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4862 - Posted: 25 Dec 2008 | 23:35:42 UTC - in response to Message 4853.

I have a 6.58 WU in queue, so I guess the new version adresses this specific problem. I'll keep you updated about the results.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4876 - Posted: 26 Dec 2008 | 13:29:53 UTC - in response to Message 4862.

I have a 6.58 WU in queue, so I guess the new version adresses this specific problem. I'll keep you updated about the results.

Well, now I am jealous ...

I only have the old 6.55 ... sniff, sniff ... :)

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4880 - Posted: 26 Dec 2008 | 16:17:55 UTC - in response to Message 4876.

6.58 for Linux fixed the memory leak.

gdf

Profile Venturini Dario[VENETO]
Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4882 - Posted: 26 Dec 2008 | 16:53:33 UTC - in response to Message 4880.

So far it seems to be working. Good work ;)

Post to thread

Message boards : Graphics cards (GPUs) : Memory leak in the 6.54_x86_64 for Linux?

//