Bug 196444 - gam_server seems to leak memory badly
Summary: gam_server seems to leak memory badly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: gamin
Version: 5
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Alexander Larsson
QA Contact:
URL:
Whiteboard:
: 155779 (view as bug list)
Depends On:
Blocks: FC5Update
TreeView+ depends on / blocked
 
Reported: 2006-06-23 11:36 UTC by Benjamin Thery
Modified: 2007-11-30 22:11 UTC (History)
9 users (show)

Fixed In Version: gamin-0.1.7-1.3.fc5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-11-14 13:43:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
strace output from gam-server (80.50 KB, text/plain)
2006-08-24 03:36 UTC, Ariel T. Glenn
no flags Details
debugging output from gam-server (66.76 KB, text/plain)
2006-08-24 03:38 UTC, Ariel T. Glenn
no flags Details
strace of the gnome-panel end of the connection (33.77 KB, text/plain)
2006-08-24 04:47 UTC, Ariel T. Glenn
no flags Details
strace excerpts of gam_server, gnome-panel going from ok to buggy (43.16 KB, text/plain)
2006-08-26 07:12 UTC, Ariel T. Glenn
no flags Details
patch for libgamin/gam_fork.c to fix server stderr -> client socket problem (2.47 KB, patch)
2006-08-26 21:34 UTC, Ariel T. Glenn
no flags Details | Diff
patch to libgamin/gam_data.c to fix "invalid length" never cleared (830 bytes, patch)
2006-08-27 23:13 UTC, Ariel T. Glenn
no flags Details | Diff
Smaller version of buffer emptying patch (497 bytes, patch)
2006-08-28 09:32 UTC, Alexander Larsson
no flags Details | Diff
why gam_server can't write creds to k3b (source of error message) (7.84 KB, text/plain)
2006-08-30 03:48 UTC, Ariel T. Glenn
no flags Details
patch for memory leak in gam_server (875 bytes, patch)
2006-09-02 04:35 UTC, Ariel T. Glenn
no flags Details | Diff

Description Benjamin Thery 2006-06-23 11:36:38 UTC
Description of problem:

My PC seemed awfully slow this morning, whereas there was no CPU consuming
applications running. So, I launched 'top' and discovered all my physical
memory was used (512MB) and as half my 1GB swap space.
This was something I never noticed before. 

I ordered the processes by "Resident Size", and discovered that 'gam_server' 
was using 222MB of 'Resident Memory' and a lot more 'Virtual Memory'.


Version-Release number of selected component (if applicable):

gamin-0.1.7-1.2.1

How reproducible:

This is the first time I noticed this behaviour.
Not sure, what triggered this.
My uptime is only 4 days.

Steps to Reproduce:
1. No idea
2.
3.
  
Actual results:

gamin eats all the available physical memory.
The PC keeps swapping.

Expected results:

gamin does not eat all the available physical memory.

Additional info:

Comment 1 Trevor Cordes 2006-07-18 18:25:31 UTC
Confirmed.  Top shows:

4488 trevor    15   0  941m 566m  868 S  3.6 28.0 359:08.31 gam_server

I have 2GB RAM and 2.5GB virtual.  Uptime 12 days.  I have very large NFS and
smb shares imported into this box.

1GB mem usage?  Definite mem leak.  Not to mention the constant 4% CPU usage,
what  is it doing?  There's another good bug on that (since FC3), that I can't
recall right now.


Comment 2 Trevor Cordes 2006-07-18 18:39:57 UTC
Also see bug 132354, bug 151507, most likely these are all the same bug. 
Present in FC3, FC4 and FC5.

Benjamin, are you by chance running any KDE apps (under gnome?)?  NFS mounts? 
smb mounts?  There must be some common thread that explains why the posters so
far have seen this but everyone else hasn't.


Comment 3 Benjamin Thery 2006-07-19 06:52:27 UTC
I don't remember exactly what I did this day, but when this happened I thought
the cause was Evolution:

I don't usually use Evolution, but this day I started Evolution to retrieve my
root's email (to get the administration messages).
- My gnome session was opened with my standard user (non-superuser)
- In a terminal, I su-ed to root and I started evolution
- I closed evolution, and a few hours later I noticed the problem with gam_server.

As this is something I usually never do and as this was the first time I saw
this behaviour of gam_server, I supposed evolution triggered the problem.

Also, I may have launch K3B under gnome this day too (this is the only kde app I
use).



Comment 4 Daniel Rowe 2006-07-20 12:23:12 UTC
I also am getting gam_server using larg amount of memory on my FC 5 AMD64 box. 

Over the about 4 days it is now using 35% of 1.5gigs of memory. If I leave all
the box eventually gam_server will use all memoy including 2gigs of swap.

This box has large amount of disk storage 2TB.

If I can be of any assistance please let me know.

bart 3094  3.8 34.2 1598984 527272 ?  S  Jul16 243:26 /usr/libexec/gam_server

[root@bajor ~]# free
             total       used       free     shared    buffers     cached
Mem:       1541616    1224668     316948          0      17324     232508
-/+ buffers/cache:     974836     566780
Swap:      2031608    1365360     666248


Comment 5 Trevor Cordes 2006-07-20 12:50:35 UTC
I don't use Evolution (ever).  Bug seems to hit even if I don't run konqueror
(my only KDE app that I can see).

I do have in common with Daniel (comment #4) a 2TB fs, but mine is over NFS, not
local.  Do the other people with this bug have very large fs's?

Does everyone else see a constant 3-5% CPU usage showing for gam_server?


Comment 6 Zivago Lee 2006-07-21 14:24:26 UTC
I don't have any NFS mounts but I also noticed it then when I ran K3b (also one
of the very few KDE apps I run).  I also see constant CPU usage for gam_server
and massive memory leaks.  Here is a top:

 4619 zivago    15   0  142m 140m  892 S  7.6 13.9  88:01.20 gam_server

I noticed this in FC4 but when I upgraded to FC5, it seemed to have gone away. 
Now it's back...

Comment 7 Trevor Cordes 2006-07-22 10:52:02 UTC
If you then quit k3b does the bug remain?  I can't reboot often (production
workstation) but even if I quit all konquerors and restart gam_server, the 4%
CPU remains.  I'm not sure if I rebooted if it would still use 4% before I start
X and/or konqueror.  Next reboot I will test and see.


Comment 8 Daniel Rowe 2006-07-23 09:04:56 UTC
I have found if I have this line in a  /home/<username>/.gaminrc file:

notify /mnt/*raid*

The problem goes away. 

My 2TB file systems are mounted under /mnt/hwraid0 and /mnt/hwraid1 and by
default gam doesn't use kernel notify under /mnt and the above line make it use
kernel notify on the paths specified. 

My system has been running for a few days now with no problems.

Regards
Daniel

Comment 9 Trevor Cordes 2006-07-23 15:30:46 UTC
I had to reboot yesterday to upgrade the kernel and I've been running some
tests.  So far gam is well behaved and I have my 2TB NFS share mounted and being
accessed.  I have firefox running (which seems to kickstart gam_server).  I DO
NOT have any KDE apps running, especially Konqueror as a file browser.  I using
Nautilus for now for browing files.  gam_server is taking up no CPU time and
normal amounts of memory.  The whole system feels way faster than before, when
gam took a constant 4%.

Since the only thing I haven't done yet is start konqueror as a file browser,
I'm betting that it's that that causes the problem.  I don't want to test it
right now because then I'll have to reboot again!  Before I need to reboot next
time I will test that theory and post back.

So it would seem the bug is triggered by using KDE apps on large fs's?

Interesting about the .gaminrc idea.  I couldn't find any gamin man pages but
rpm -ql shows that there are docs for it.  I'll read through them.  Knowing what
gam is for, it would be nice to have it working properly on all mounts and fs's,
but faced with the choice of crap performance or excluding my NFS mount, I'll
choose the latter!


Comment 10 Zivago Lee 2006-07-24 14:29:58 UTC
I experience the same behavior.  If I reboot, then gam_server acts "normally." 
Once I start up k3b again, then gam_server freaks out again.

Comment 11 Normand Robert 2006-07-24 14:47:40 UTC
 have the same bahaviour. I have never run konqueror. My system has many NFS
automounts, and it share its local drive to few selected system via NFS and SMB.

 rpm -qa | grep gamin
gamin-devel-0.1.7-1.2.1
gamin-0.1.7-1.2.1

top - 10:45:41 up 25 days, 19:23, 10 users,  load average: 1.05, 0.83, 0.59
Tasks: 165 total,   2 running, 163 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3% us,  0.7% sy,  0.0% ni, 98.5% id,  0.0% wa,  0.0% hi,  0.5% si, 
0.0% st
Mem:   1031592k total,   932520k used,    99072k free,     8384k buffers
Swap:  2939884k total,  1739668k used,  1200216k free,   214516k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2757 robert    15   0 1799m 274m  968 S    1 27.2 390:54.95 gam_server
 5186 root      15   0  187m 107m  14m S    0 10.6   0:43.75 python
25400 robert    15   0  257m  47m  11m S    1  4.7  15:42.07 thunderbird-bin
 2534 root      15   0  211m  46m 6968 S    0  4.6 353:49.73 Xorg
 5112 robert    16   0  168m  42m  17m S    0  4.2   0:23.79 firefox-bin


Comment 12 Craig McQueen 2006-07-27 23:44:16 UTC
I saw something like this yesterday on my FC5 desktop machine with the standard
gamin FC5 RPM (0.1.7 etc). System monitor said it was using something like 415
MB and rising. My computer was thrashing the hard disk presumably for swap. But
also CPU usage was something like 90%.

At the time I had been copying my photo collection from that computer to a
laptop across the network using rsync. The command (run from the laptop I think)
was something like:
rsync -avuz desktopmachinename:graphics .

I'd also had Evolution open (but minimised) on the desktop machine overnight.

Comment 13 Zivago Lee 2006-07-28 14:56:55 UTC
Hmm, did something change in the latest updates?  I just updated to the latest
versions hal, hal-gnome, kdelibs in the updates repo and rebooted.  I started up
k3b and no memory leaking gam_server.

How strange is that?

Comment 14 Trevor Cordes 2006-08-01 08:14:43 UTC
The problem has not showed up yet for me since reboot 9 days ago, but as I said
in  comment #9, I haven't had the guts (or need) to start konqueror yet.  I
don't use k3b -- what is it?  I don't even seem to have it on my system.


Comment 15 Zivago Lee 2006-08-01 14:03:18 UTC
k3b is a kde based cd burning gui for cdrecord, etc.

Comment 16 Panu Matilainen 2006-08-02 14:32:28 UTC
I started seeing this just recently on fc5-x86_64, but then I haven't been on
the computer during July so maybe some update in the meantime has triggered
this. No KDE apps nor NFS in use, basically just 
- lotsa gnome-terminals with vim and gcc churning away
- evolution (and occasionally pine)
- rhythmbox
- xchat

Hmm. I've recently began experimenting with replacing trusty old XMMS with
rhythmbox (which is a terrible resource hog on its own), timingwise that would
be a good candidate for being guilty for this here, but that's just a wild wild
guess, need to look into this deeper.

Comment 17 Trevor Cordes 2006-08-03 09:48:40 UTC
From all the posts, it seems the commonality is ( evolution OR k3b OR konqueror
).  Are all you guys running these under GNOME, like me?  Anyone running under
KDE and noticing this bug?  Perhaps it has something to do with running KDE apps
under GNOME as surely if it hit people running KDE apps under KDE then a lot
more people would have hit this bug.  (Although evolution is pretty popular, and
it's not KDE; weird.)


Comment 18 Daniel Rowe 2006-08-09 10:35:58 UTC
Hi

I have been OK since I put a /home/<username>/.gaminrc file in place. But
yesterday I notice gam_server was again using a steady 3% CPU and around 35% of
memory and growing.

I think running K3b may have triggered it. 

I run Gnome desktop but do run KDE apps. I use krusader lots and is doesn't seem
to cause a problem. About the only thing I can think is I ran K3b the day before
I notice gam_server doing it thing. I don't often run this app.

Regards
Daniel

Comment 19 Ariel T. Glenn 2006-08-24 03:29:58 UTC
"Me too".

Fc5, 2.6.17-1.2174_FC5, gamin gamin-0.1.7-1.2.1

top shows:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3181 ariel     15   0  947m 472m  804 S  4.0 46.7 322:33.38 gam_server

This is the second time I've noticed this behavior in the last couple of weeks. 
I run: evolution, amarok, k3b under gnome.  Evolution was the only thing running
during this last event.  I'm using gnome, not kde.

Comment 20 Ariel T. Glenn 2006-08-24 03:36:20 UTC
Created attachment 134768 [details]
strace output from gam-server

In about 5 seconds strace generated nearly 50000 lines of output.
One repetition of whatever gam-server is doing seems to take nearly 950 lines,
so here they are, in case they shed any light on the subject.

Comment 21 Ariel T. Glenn 2006-08-24 03:38:44 UTC
Created attachment 134769 [details]
debugging output from gam-server

Here's the debugging output (sigusr2) from gamin, a few minutes later. Again
this should be roughly one repatition of whatever it's doing.

Comment 22 Ariel T. Glenn 2006-08-24 04:47:25 UTC
Created attachment 134776 [details]
strace of the gnome-panel end of the connection

Since the only app talking to gam_server seems to be gnome-panel, I watched
that for a bit.  Here's several loops of the interaction from that side.
Gnome-panel does seem to ask about these files about 10 times a second. But,
even if that's excessive, why the memory hogging at the server end?  Unless all
these connections just add up over time.

Ok, done with attachments now, unless someone has some bright ideas.

Comment 23 Daniel Rowe 2006-08-24 05:07:39 UTC
Hi

I have moved all my mount away from the /mnt mount points and put them somewhere
different in the file system. Gam_server seems to not be doing its thing now but
time will tell, the box has been up for a few days now.

I also noticed with strace when it is leaking memory a large number of calls are
made, far more than when it is acting normally.

Regards
Daniel 

Comment 24 Trevor Cordes 2006-08-24 20:00:30 UTC
I found the attachment output interesting.  One thing that may be odd about my
configuration is my 2TB NFS mount is off the root fs and not in /mnt (as
standards would dictate) due to personal preference.  The other guys with big
fs's: do you mount it under /mnt or somewhere else?

PS: I haven't had gam go crazy in many weeks, but I completely stopped using
konqueror and still haven't had the guts to fire it up.  I don't use k3b or any
other KDE app, nor evolution.  I work in gnome.


Comment 25 Ariel T. Glenn 2006-08-26 05:58:16 UTC
Today's report: I can now reproduce the bug, where "the bug" in my instance is
4% cpu usage, and a gain of about .7 - .8 % memory in an hour, which adds up
quickly in 24 hours.  (1 Gig memory on my box.)

I do not have any nfs or large filesystems in play.

To reproduce: log in with gnome session.
gnome-panel will have started gam_server via gnome_vfs.

Now run k3b. Just run it, and exit.

After exit, gnome_panel gets bad packets every time from gam_server (or at least
thinks it does).  Usage goes to 4%, and an strace of gnoe_panel at that time
will show the characteristic "invalid length" messages. (See also a discussion
at https://launchpad.net/distros/ubuntu/+source/gamin/+bug/36581 which describes
the same symptoms.)

I am attaching a file which contains the transition from good to buggy behavior
on the gam_server side and the gnome_panel side.  I've also got a bit of k3b
output, though not strace, since that seems to interefere with bug reproduction.

The curious thing about this mess is that at the moment things go out to lunch,
gam_server has just accepted a new incoming connection that is (probably?)
garbage, tries to send credentials to the client, fails, and then writes on 2
(presumably stderr) "Failed to write credential bytes to socket 5\n".  I see it
in the friggin' trace.  ON 2. But gnome_panel gets this in on its connection on
whatever socket it was using to talk to the server all this time:

write(2, "FAMPending(fd = 29)\n", 20)   = 20
write(2, "Checking data available on 29\n", 30) = 30
select(30, [29], NULL, NULL, {0, 0})    = 1 (in [29], left {0, 0})
read(29, "Failed to write credential bytes to socket 4\n", 4106) = 45
write(2, "read 45 bytes from server\n", 26) = 26
write(2, "invalid length 24902\n", 21)  = 21

and then that's all we get, forever and ever, gamin_resend_request to the server
and more complaints about the same invalid length.  The invalid length is
because gnome_panel takes the message to be a GAMPacket where the first ten
bytes indicate length and some other crap.  Evidently that buffer never gets
cleared out at the client end. 

Also interesting is that when I did gnome-session-remove gnome-panel, then shot
gam_server so it would stay dead, then restarted gam_server in a terminal
window, and then (finally) restarted gnome-panel, then ran and exited k3b, the
complaint from gam_server "Failed to write credential bytes to socket 4\n"
appeared in the terminal window AND NOT in gnome_panel's connection, and the bug
did not get triggered.

Need food now. Any thoughts?


Comment 26 Ariel T. Glenn 2006-08-26 07:12:31 UTC
Created attachment 134968 [details]
strace excerpts of gam_server, gnome-panel going from ok to buggy

Here's the gnome-panel, gam-server, k3b pieces, too bad no strace on k3b but
you take what you can get. A little commentary included too.

Comment 27 Ariel T. Glenn 2006-08-26 21:34:03 UTC
Created attachment 134988 [details]
patch for libgamin/gam_fork.c to fix server stderr -> client socket problem 

Any program using libgamin to start up the gam_server will close all the fds
and then... gam_server blithely uses stderr for error messages, even though
these same fds amy be assigned (and are!) to other sockets it opens. This patch
addresses that problem.

Tested for all of about 5 minutes, but does fix one symptom.  Other eyes should
look to make sure the close/dups are in the right place and make sense. 

After applying, running k3b and exiting, and looking at strace, I see some
polling but mostly quiet on both ends of gam_server and gnome-panel connection.


Still to be looked at: memory leak (where?), libgamin recovery from 'invalid
length' error.

Comment 28 Ariel T. Glenn 2006-08-27 23:13:08 UTC
Created attachment 135017 [details]
patch to libgamin/gam_data.c to fix "invalid length" never cleared 

Withthis patch, gamin_data_reset() now clears out the event buffer and sets
conn->evn_read to 0.  gamin_data_reset is  only called in case we try to
reconnect, and then the event buffer contents are no good to us anyways.

Without this patch, the next read attempt will believe the data already in the
buffer to be valid and will advance the read buffer pointer past that data. 
When the sanity check on the read is done, the initial bad contents are still
there, (meaning that an invalid length stays around forever).

With this patch, if garbage is sent by gam_server for some reason, the client
won't spin trying to do reconnect/resend requests forever.

As usual someone who actually knows this code should check this patch.

To do: why does gam_server fail to write the credential bytes in the first
places? And where is the memory leak?

Comment 29 Alexander Larsson 2006-08-28 08:16:59 UTC
The fork stdin/out/err change is already fixed in upstream gamin cvs and a patch
for that is in rawhide. I'll have a look at the other patch.

Comment 30 Alexander Larsson 2006-08-28 09:22:16 UTC
ariel, the way i read it only the part that does:
 conn->evn_read = 0;

Is needed. Was there a specific reason for also clearing the event buffer?



Comment 31 Alexander Larsson 2006-08-28 09:32:18 UTC
Created attachment 135030 [details]
Smaller version of buffer emptying patch

I'm applying this smaller version of the buffer clearing patch to rawhide.

Comment 32 Alexander Larsson 2006-08-28 10:21:13 UTC
Also commited to upstream cvs.
Ariel: Are you looking into the leak?

At some point we should collect these patches and do an upgrade for FC5.

Comment 33 Daniel Veillard 2006-08-28 10:32:43 UTC
At some point I should just make a new gamin release !

Daniel

Comment 34 Ariel T. Glenn 2006-08-28 16:22:32 UTC
Next I was going to see why we have "Failed to write credentials" from
gam_server at all.  (I was putting off the memory leak issue for now, since
memory usage seems stable with either of these patches (so far. I've only left
it running 12 hours.)

I clear out the event buffer entirely because (as far as I can tell) it's no
longer valid data by the time we want to do gamin_data_reset().  If we keep it
around, we can wind up looking at it by mistake when some other change to the
code is made.

Comment 35 Alexander Larsson 2006-08-29 07:23:35 UTC
Sure, its not valid, but we won't read past evn_read anyway. The same thing
happens in the "normal" case in gamin_data_conn_data where we set evn_read, and
that doesn't clear the buffer.

Comment 36 Ariel T. Glenn 2006-08-30 03:48:21 UTC
Created attachment 135182 [details]
why gam_server can't write creds to k3b (source of error message)

summary: at app exit, the KSambaShare destructor calls this,

./kio/kio/ksambashare.cpp:	  KDirWatch::self()->removeFile(d->smbConf);

after the KDirWatch destructor has already been called.  So, new KDirWatch
created.. FAMOpen... then the KDirWatch destructor is called again, FAMCLose
right away, so the poor server can't get a word in edgewise.  

KDE problem (if problem), not libgamin.

Next todo: memory leak. (Yes I am now going to look at it.)

As to event buffer, it's just a matter of a different approach: when a
connection is going away, I would want to clear out as much as possible,
instead of preserving as much as possible.  I don't maintain this code or even
understand it well, so it's not my call.  This is a "Works For You(TM)" case.

Comment 37 Zivago Lee 2006-08-31 04:56:02 UTC
Hmm.. so this is a KDE issue.  Did something change in the last kdebase update?
 The gam_server is acting up again whenever I open up a KDE app.. sigh.

Comment 38 Alexander Larsson 2006-09-01 15:07:43 UTC
*** Bug 155779 has been marked as a duplicate of this bug. ***

Comment 39 Ariel T. Glenn 2006-09-02 04:35:55 UTC
Created attachment 135423 [details]
patch for memory leak in gam_server

patch for memory leak in gam_server.  Summary: need g_list_free_1() after
g_list_remove_link() in gam_inotify.c

It only adds up to 100 - 200 bytes leaked per connection, but when an app like
gnome-panel gets into a many connections per second loop, as with this bug,
then the leak adds up fast.

mtrace kicks *ss.

Next todo:  Nothing (I hope!)

Comment 40 Alexander Larsson 2006-09-04 12:56:56 UTC
s/g_list_remove_link/g_list_delete_link/ seems better, but that looks right.

Comment 41 Alexander Larsson 2006-09-04 13:02:01 UTC
This seems to be fixed in the gnome-vfs version of the inotify code, and in bug
204906 we're waiting for a backport of that.


Comment 42 Matthias Clasen 2006-10-03 13:56:51 UTC
Alex, did we do an FC5 update for gamin ?

Comment 43 Alexander Larsson 2006-10-04 08:07:48 UTC
mclasen: Ah, no, not yet.

Comment 44 Daniel Rowe 2006-10-16 11:53:25 UTC
Is there any chance in getting a update for the gam package in Fedora Core 5? As
my machine is still doing this every so often.

Thanks. 

Comment 45 msridhar 2006-10-23 18:07:54 UTC
Until the package is fixed, is there a recommended workaround for this bug?  

Comment 46 Fedora Update System 2006-10-26 16:18:23 UTC
gamin-0.1.7-1.3.fc5 has been pushed for fc5, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.

Comment 47 Fedora Update System 2006-11-06 16:02:12 UTC
gamin-0.1.7-1.3.fc5 has been pushed for fc5, which should resolve this issue.  If these problems are still present in this version, then please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.