Bug 140874 - gam_server continues to run after user has logged out of X
Summary: gam_server continues to run after user has logged out of X
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gamin
Version: 3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Alexander Larsson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-25 23:21 UTC by Sitsofe Wheeler
Modified: 2008-08-02 23:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-14 12:14:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Nifty -USR2 gamin debug log (7.41 KB, text/plain)
2004-12-04 21:03 UTC, Sitsofe Wheeler
no flags Details
gam_debug log for comment #14 (13.39 KB, text/plain)
2004-12-06 18:37 UTC, Sitsofe Wheeler
no flags Details
Output of ps, gamin_debug log, lsof and a gdb backtrace (2.70 KB, text/plain)
2004-12-06 19:26 UTC, Sitsofe Wheeler
no flags Details

Description Sitsofe Wheeler 2004-11-25 23:21:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040804 Galeon/1.3.17

Description of problem:
gam_server sometimes runs indefinitely even though the user has logged
out.

Version-Release number of selected component (if applicable):
gamin-0.0.17-1.FC3

How reproducible:
Sometimes

Steps to Reproduce:
1. Log in.
2. ?
3. Log out.

Actual Results:
It is unknown exactly what causes gam_server to do this since it does
not happen every time. However the only other task running with the
same uid as gam_server is ssh-agent.

Expected Results:
After user logs out no gam_servers to be left running.

Additional info:

An strace shows:
gettimeofday({1101425047, 921476}, NULL) = 0
poll([{fd=28, events=0}, {fd=28, events=POLLIN}, {fd=3,
events=POLLIN}], 3, 999) = 0
gettimeofday({1101425048, 921837}, NULL) = 0
time(NULL)

over and over again.

Comment 1 Daniel Veillard 2004-11-30 15:15:34 UTC
I don't know how to reproduce this, I never saw this problem
happen. Probably a side effect of the GList corruptions,
the best you can do is to try 0.0.18 and report any improvement
of persistance of the problem.
  http://www.gnome.org/~veillard/gamin/downloads.html

Daniel

Comment 2 Sitsofe Wheeler 2004-11-30 16:22:56 UTC
Ok. I have downloaded the SRPM for 0.0.18 and renamed it to 0.17-100 so if there
upgrades later they will silently happen. If it doesn't happen again within a
wekk then we may as well resolve this fixed.

Comment 3 Sitsofe Wheeler 2004-12-01 09:57:37 UTC
A user just logged out but /usr/libexec/gam_server is still going.
However this time there is an /usr/bin/artsd still hanging around.
Would the artsd be enough to keep gam_server going?

Comment 4 Daniel Veillard 2004-12-01 10:04:09 UTC
Yes if it uses (directly or indirectly) any monitoring.

I will try to improve the debugging capabilities to also 
save the client(s) registered at a given point and the 
monitored resource(s).

Daniel

Comment 5 Sitsofe Wheeler 2004-12-02 19:46:59 UTC
At some point the aforementioned gam_server lost its marbles and is now chewing
up any spare CPU it can lay its hands on. Is there anything I can do to help you
debug this with a gam_server that has already gone crazy?

Comment 6 Daniel Veillard 2004-12-02 21:02:21 UTC
w.r.t. comment #5 see bug#132354 . First make sure you are running gamin-0.0.18
see informations at 
   http://www.gnome.org/~veillard/gamin/

Daniel

Comment 7 Sitsofe Wheeler 2004-12-02 22:25:38 UTC
Thanks! Ok back to the original point of this bug - can I use lsof to see
whether gamin has any real clients (maybe it is fooled into thinking it has
clients when they have really left...)?

Comment 8 Daniel Veillard 2004-12-03 10:56:05 UTC
yes in the version from CVS. The kill -USR2 gam_server_pid will dump first
the list of connection and monitored resources in the debug file
Not released yet, but in CVS

Daniel

Comment 9 Sitsofe Wheeler 2004-12-04 21:02:26 UTC
That kill -USR2 is VERY nifty.

After upgrading to gamin 0.19 and killing off old gamin I managed to
get a gam_server hanging around after log out with only an esd still
running. I shall attach /tmp/gamin_debug_* and write a little about
what I was doing when this happened. I can't give any steps to
reproduce but it might provide some insight.

Comment 10 Sitsofe Wheeler 2004-12-04 21:03:43 UTC
Created attachment 107902 [details]
Nifty -USR2 gamin debug log

Comment 11 Sitsofe Wheeler 2004-12-04 21:11:37 UTC
(My home directory is shared via NFS across all machines). I logged
into machine A and then after some time logged into machine B and then
machine C.

Machine A kept having to be rebooted because I was testing bug #138822
and it is unknown whether a proper logout finished each time.

After finishing testing for some binary nvidia driver bugs I logged
out of Machine C. Machine C did not have any stray gam_server on
logging out.

On Machine B I did some light web surfing using firefox, sshing and
did some resolution switching with xranr and ran some tests with
bzflag. Machine B was the last machine to be logged out of and upon
going to a virtual terminal after log out I found a gam_server and esd
still running.

Comment 12 Sitsofe Wheeler 2004-12-06 15:15:08 UTC
The gam_server from the original bug disappeared some time after I turned
logging off. However a few more have popped up and when I look through the debug
logs the first line:
Connection fd 33 to pid 5995: state okay, 0 read
refers to a pid which no longer exists!
The particular gam_server in this case is keeping a usb key open by polling
files on it...

Comment 13 Daniel Veillard 2004-12-06 17:17:04 UTC
Hum, the debug line I see from your log is
  Connection fd 33 to pid 20252: state okay, 0 read
gam_server monitors connections being closed, and should clean up the
associated monitors and state.
At least if ou have the PID then it may be possible to find what program
generates that behaviour. Without it I can't reproduce the problem, and
hence can't debug it. From an internal state things "looks" normal, and the
traces looks fine. I don't understand how you can get dnotify events for your
home directory (especially on NFS) while you have logged out. Something is
still running from your session and modifying "something" in your home directory.
It is still unclear to me at this point that the problems lies in gamin.

Daniel

Comment 14 Sitsofe Wheeler 2004-12-06 18:19:10 UTC
After reading your comment I went back and double checked ps auxw and found that
there was a /usr/bin/gnome-keyring-daemon still running in the report I filed in
comment #12 basically making the evidence completely null and void. I apologise
for this oversight.

In other circumstances there has almost always been another program about but
not one you would necessarilly associate with the files gamin was watching at
the time
(some of the files were things like /usr/share/control-center-2.0/capplets and
/etc/X11/applnk ). On one machine I had only the following processes running
under the uid 24045:

24045     8336  0.0  0.3  2740 1308 ?        S    14:42   0:00
/usr/libexec/gam_server
24045     8462  0.0  0.2  4468  988 ?        Ss   14:54   0:00 /usr/bin/esd
-terminate -nobeeps -as 2 -spawnfd 26

However when process 8462 was killed the gamin log said "Freeing listener for
8373" and promptly closed down cleanly. So it seems like something spawned off
the esd, died but the esd kept those files being monitored but under a different
pid...

This sounds like NOTABUG to me more and more (other than the wrong pid business)
so I'm sorry for wasting your time...

Comment 15 Sitsofe Wheeler 2004-12-06 18:37:02 UTC
Created attachment 107976 [details]
gam_debug log for comment #14

Comment 16 Sitsofe Wheeler 2004-12-06 19:21:23 UTC
And just when I was ready to give up I have found two gam_server which have NO
other processes with the same uid (I killed off any other process which was
running with the same uid). I am enclosing as much information about one of them
as I can.

Comment 17 Sitsofe Wheeler 2004-12-06 19:26:18 UTC
Created attachment 107979 [details]
Output of ps, gamin_debug log, lsof and a gdb backtrace

Comment 18 Sitsofe Wheeler 2004-12-06 19:36:27 UTC
With regard to comment #17, pid 3010 does not appear to exist:
[root@y ~]$ ls -d /proc/3010
ls: /proc/3010: No such file or directory

My guess from the files being polled is that 3010 was either nautilus or
gnome-settings-daemon.

Comment 19 Sitsofe Wheeler 2004-12-12 11:55:36 UTC
Here's another backtrace from a left over gam_server. This one doesn't appear to
be polling anything though. 

#0  0x0069b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x001bc66d in poll () from /lib/tls/libc.so.6
#2  0x0047d163 in profile_print_locked (local_data=0x7fffffff, success=1000)
    at gmem.c:369
#3  0x0047d47f in profiler_try_malloc (n_bytes=4277653328) at gmem.c:442
#4  0x0804ac06 in main (argc=1, argv=0xfef7cfc4) at gam_server.c:353
#5  0x00115e33 in __libc_start_main () from /lib/tls/libc.so.6
#6  0x08049e91 in _start ()

gamin_debug says this:
Connection fd 60 to pid 10328: state okay, 0 read
  Listener has 1 subscriptions registered
    Subscription 1 reqno 1008 events 1008 dir 0: /home/x/.recently-used
Connection fd 63 to pid 28231: state okay, 0 read
  Listener has 1 subscriptions registered
    Subscription 1 reqno 1008 events 1008 dir 0: /home/x/.recently-used
Connection fd 61 to pid 28192: state okay, 0 read
  Listener has 1 subscriptions registered
    Subscription 1 reqno 1008 events 1008 dir 0: /home/x/.recently-used

pids 10328, 28192, 27870 don't exist and gam_server is the only process running
with this uid.

Comment 20 Daniel Veillard 2004-12-21 15:59:36 UTC
Well clearly I never reproduced this. If you know what application
watches /home/x/.recently-used then I may be able to understand what
is happening. So far all the tests/debug I have done, when the client
process exited or was killed, the gam_server always received the
disconnection and closed the connection to the client.
I need more informations because I don't see how to make a local
socket close or have the client side exit without getting the 
notification on the server. This makes little sense to me, I don't
understand what is happening. I assume the kernel comes from Fedora,

Daniel 


Comment 21 Sitsofe Wheeler 2005-01-04 13:38:24 UTC
Yes the kernel came from Fdora. In the previous reports the kernel was
kernel-2.6.9-1.681_FC3 with the nvidia binary drivers.  Other than that nearly
everything is FC3 with updates applied except for gamin where the gamin-0.18
source RPM was recompiled and installed.

I finally have a set of steps that can reproduce this problem. I have no idea
whether these steps cause the same bug as reported earlier (previous problems
were being seen from a GNOME desktop) but the symptoms looks the same.

Steps to reproduce:
(all settings to do with desktop environments have been removed from the user's
home directory)
1. At gdm click on Session. Choose KDE then click OK.
2. Type your username then password and log in.
3. After KDE finishes logging in go to Fedora Hat at the bottom left -> Control
Centre.
4. In the Control Centre window from the tree at the left go to Sound &
Multimedia -> Sound System.
5. Click on the Hardware tab and change "Select the audio device" from
Autodetect to Enlightened Sound Daemon.
6. Immediately click on the General tab and then click on the "Test Sound" button.
7. In the "Save Sound Server Settings?" window click Yes.

Now one of two things happens. Either:
a) The desktop will freeze entirely forcing you to zap X or log in at at a
terminal and kill off kdeinit to return to the log in prompt.
b) A "Knotify - The KDE Crash Handler" dialog will appear informing you that
Knotify SIGSEGVd.

Assuming that b) happens:
8. Click close button in the dialog window.
9. Close the Control Centre window and when "Unsaved Changes - Control Centre"
appears choose Discard.
10. Go to the Fedora Hat at the bottom left -> Logout... (you may have to do
this twice before KDE notices) and then in the End Session dialog choose Logout.
(You may get another SIGSEV crash handler window as you log out)
11. If you log in at a virtual-terminal and do a ps ux | grep username you
should see that the only processes left from that session are a ssh-agent and
gam_server. ssh-agent doesn't have anything to do with gam_server but kill it
off and wait one minute for gam_server to quit.

Reproducibility:
KDE will either lock up or SEGV 100% of the time. Which one is more likely to
happen seems to depend on the machine though.

Expected results:
gam_server to quit.

Actual results:
gam_server never quits.

Additional information:
The sound card is an onboard VIA VT82C686 which does not have hardware mixing.

After doing a kill -USR2 on the remaining gam_server I found this in the debug log:
Connection fd 74 to pid 26550: state okay, 0 read
  Listener has 1 subscriptions registered
    Subscription 1 reqno 1008 events 1008 dir 1: /home/x/.kde/share/apps/kabc

I wrote a small script to record all running processes to a log file every
second (normal process accounting doesn't store pids unfortunately). It took
quite a few attempts to find the process that is looking at kabc file and it
turned out to be kab2kabc (I don't have all the parameters it was run with
though). kab2kabc runs for less than second during KDE startup.

Comment 22 Sitsofe Wheeler 2005-01-04 13:45:49 UTC
That "ps ux" should have been "ps aux"

I forgot to mention that if a) happens and you kill all the tasks from that
login other than gam_server, gam_server will still never quit. This happens on
both kernel-2.6.9-1.681_FC3 and kernel-2.6.9-1.724_FC3 kernels (the latter one
did not even have the nvidia binary drivers loaded).

Comment 23 Sitsofe Wheeler 2005-01-13 22:53:31 UTC
I'm still seeing users left over gam_servers with gamin-0.0.20.

Comment 24 Andrew D. 2005-02-22 04:39:34 UTC
I've noticed this with gamin-0.0.17-4 that comes with Enterprise-4.
This doesn't happen all the time though. ssh-agent also lingers (all
the time). As a temporary fix, is it safe to put a command: 
su --shell=/bin/bash --command="killall gam_server >/dev/null 2>&1"
"$USER"

in /etc/X11/gdm/PostSession/Default or will this cause problems?

Comment 25 Daniel Veillard 2005-02-22 08:10:09 UTC
Andrew,

the killall should really be run under the user account not as root,
as gam_servers are started per-user. The client library is restartable
so other users logged should not have big troubles if root killed their
gam_server, but this still introduce a risk.
So if you are sure $USER is correctly positionned at that point then
this should be fine, 

Daniel

Comment 26 Andrew D. 2005-03-01 18:12:15 UTC
Hi Daniel,
Thanks for the info. I've added the above line but gam_server starts
up again before the user completely logs out (so I added a -HUP to
make it more legit). I have noticed that this restarting of gam just
before logout cures most of the lingering gams as the re-launched gam
seems to realize it's not needed and stops after about 30 seconds. One
other problem I've come across: Just the other day gam stayed on again
but this time couldn't be killed (even kill -9 by root wouldn't do
it). This prevented the user who was running gam from logging in at
the graphical login (just hangs when gnome starts). This required a
reboot to cure. The only thing that I had changed was that I mounted a
remote computer using cifs. I've now switched back to smbfs and things
seem to be working OK (I also had enabled lm_sensors but I doubt this
would have anything to do with the gam problem).
Thanks,
   Andrew

Comment 27 Daniel Veillard 2005-03-01 22:13:29 UTC
I have already heard of problems with CIFS, but it's really
filesystem specific, so this IMHO is a kernel issue. You can still
try to desactivate kernel monitoring and fallback to polling 
to avoid troubles with CIFS, see
   http://www.gnome.org/~veillard/gamin/config.html

Daniel

Comment 28 Sitsofe Wheeler 2005-03-02 08:25:32 UTC
Comment #24 - ssh-agent continuing after logout is a completely different bug,
see bug #138747 .

Comment 29 Sitsofe Wheeler 2005-03-02 08:27:06 UTC
I would also like to note that I have not seen any stray gamin after an upgrade
to gamin-0.0.24-1.FC3. I will keep an eye on it before resolving this bug FIXED.

Comment 30 Andrew D. 2005-04-26 22:23:48 UTC
Is this fix due out for Enterprise 4 or should Enterprise users switch over to
the FC3 version of gamin in the mean time? 
Thanks, --Andrew

Comment 31 Daniel Veillard 2005-04-28 12:01:01 UTC
W.r.t. RHEL 4, I don't feel like pushing a new version of gamin while there
are known unresolved issues. I will likely keep it as-is until I get a version
where 0.0. prefix on the release number will have disapeared :-)

Daniel

Comment 32 Matthew Miller 2006-07-10 21:42:41 UTC
Fedora Core 3 is now maintained by the Fedora Legacy project for security
updates only. If this problem is a security issue, please reopen and
reassign to the Fedora Legacy product. If it is not a security issue and
hasn't been resolved in the current FC5 updates or in the FC6 test
release, reopen and change the version to match.

Thank you!


Comment 33 Alexander Larsson 2006-08-14 12:14:30 UTC
People seems to report this is fixed. Also, an upgrade is in later RHEL4
updates. Closing.


Note You need to log in before you can comment on or make changes to this bug.