From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040804 Galeon/1.3.17 Description of problem: gam_server sometimes runs indefinitely even though the user has logged out. Version-Release number of selected component (if applicable): gamin-0.0.17-1.FC3 How reproducible: Sometimes Steps to Reproduce: 1. Log in. 2. ? 3. Log out. Actual Results: It is unknown exactly what causes gam_server to do this since it does not happen every time. However the only other task running with the same uid as gam_server is ssh-agent. Expected Results: After user logs out no gam_servers to be left running. Additional info: An strace shows: gettimeofday({1101425047, 921476}, NULL) = 0 poll([{fd=28, events=0}, {fd=28, events=POLLIN}, {fd=3, events=POLLIN}], 3, 999) = 0 gettimeofday({1101425048, 921837}, NULL) = 0 time(NULL) over and over again.
I don't know how to reproduce this, I never saw this problem happen. Probably a side effect of the GList corruptions, the best you can do is to try 0.0.18 and report any improvement of persistance of the problem. http://www.gnome.org/~veillard/gamin/downloads.html Daniel
Ok. I have downloaded the SRPM for 0.0.18 and renamed it to 0.17-100 so if there upgrades later they will silently happen. If it doesn't happen again within a wekk then we may as well resolve this fixed.
A user just logged out but /usr/libexec/gam_server is still going. However this time there is an /usr/bin/artsd still hanging around. Would the artsd be enough to keep gam_server going?
Yes if it uses (directly or indirectly) any monitoring. I will try to improve the debugging capabilities to also save the client(s) registered at a given point and the monitored resource(s). Daniel
At some point the aforementioned gam_server lost its marbles and is now chewing up any spare CPU it can lay its hands on. Is there anything I can do to help you debug this with a gam_server that has already gone crazy?
w.r.t. comment #5 see bug#132354 . First make sure you are running gamin-0.0.18 see informations at http://www.gnome.org/~veillard/gamin/ Daniel
Thanks! Ok back to the original point of this bug - can I use lsof to see whether gamin has any real clients (maybe it is fooled into thinking it has clients when they have really left...)?
yes in the version from CVS. The kill -USR2 gam_server_pid will dump first the list of connection and monitored resources in the debug file Not released yet, but in CVS Daniel
That kill -USR2 is VERY nifty. After upgrading to gamin 0.19 and killing off old gamin I managed to get a gam_server hanging around after log out with only an esd still running. I shall attach /tmp/gamin_debug_* and write a little about what I was doing when this happened. I can't give any steps to reproduce but it might provide some insight.
Created attachment 107902 [details] Nifty -USR2 gamin debug log
(My home directory is shared via NFS across all machines). I logged into machine A and then after some time logged into machine B and then machine C. Machine A kept having to be rebooted because I was testing bug #138822 and it is unknown whether a proper logout finished each time. After finishing testing for some binary nvidia driver bugs I logged out of Machine C. Machine C did not have any stray gam_server on logging out. On Machine B I did some light web surfing using firefox, sshing and did some resolution switching with xranr and ran some tests with bzflag. Machine B was the last machine to be logged out of and upon going to a virtual terminal after log out I found a gam_server and esd still running.
The gam_server from the original bug disappeared some time after I turned logging off. However a few more have popped up and when I look through the debug logs the first line: Connection fd 33 to pid 5995: state okay, 0 read refers to a pid which no longer exists! The particular gam_server in this case is keeping a usb key open by polling files on it...
Hum, the debug line I see from your log is Connection fd 33 to pid 20252: state okay, 0 read gam_server monitors connections being closed, and should clean up the associated monitors and state. At least if ou have the PID then it may be possible to find what program generates that behaviour. Without it I can't reproduce the problem, and hence can't debug it. From an internal state things "looks" normal, and the traces looks fine. I don't understand how you can get dnotify events for your home directory (especially on NFS) while you have logged out. Something is still running from your session and modifying "something" in your home directory. It is still unclear to me at this point that the problems lies in gamin. Daniel
After reading your comment I went back and double checked ps auxw and found that there was a /usr/bin/gnome-keyring-daemon still running in the report I filed in comment #12 basically making the evidence completely null and void. I apologise for this oversight. In other circumstances there has almost always been another program about but not one you would necessarilly associate with the files gamin was watching at the time (some of the files were things like /usr/share/control-center-2.0/capplets and /etc/X11/applnk ). On one machine I had only the following processes running under the uid 24045: 24045 8336 0.0 0.3 2740 1308 ? S 14:42 0:00 /usr/libexec/gam_server 24045 8462 0.0 0.2 4468 988 ? Ss 14:54 0:00 /usr/bin/esd -terminate -nobeeps -as 2 -spawnfd 26 However when process 8462 was killed the gamin log said "Freeing listener for 8373" and promptly closed down cleanly. So it seems like something spawned off the esd, died but the esd kept those files being monitored but under a different pid... This sounds like NOTABUG to me more and more (other than the wrong pid business) so I'm sorry for wasting your time...
Created attachment 107976 [details] gam_debug log for comment #14
And just when I was ready to give up I have found two gam_server which have NO other processes with the same uid (I killed off any other process which was running with the same uid). I am enclosing as much information about one of them as I can.
Created attachment 107979 [details] Output of ps, gamin_debug log, lsof and a gdb backtrace
With regard to comment #17, pid 3010 does not appear to exist: [root@y ~]$ ls -d /proc/3010 ls: /proc/3010: No such file or directory My guess from the files being polled is that 3010 was either nautilus or gnome-settings-daemon.
Here's another backtrace from a left over gam_server. This one doesn't appear to be polling anything though. #0 0x0069b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x001bc66d in poll () from /lib/tls/libc.so.6 #2 0x0047d163 in profile_print_locked (local_data=0x7fffffff, success=1000) at gmem.c:369 #3 0x0047d47f in profiler_try_malloc (n_bytes=4277653328) at gmem.c:442 #4 0x0804ac06 in main (argc=1, argv=0xfef7cfc4) at gam_server.c:353 #5 0x00115e33 in __libc_start_main () from /lib/tls/libc.so.6 #6 0x08049e91 in _start () gamin_debug says this: Connection fd 60 to pid 10328: state okay, 0 read Listener has 1 subscriptions registered Subscription 1 reqno 1008 events 1008 dir 0: /home/x/.recently-used Connection fd 63 to pid 28231: state okay, 0 read Listener has 1 subscriptions registered Subscription 1 reqno 1008 events 1008 dir 0: /home/x/.recently-used Connection fd 61 to pid 28192: state okay, 0 read Listener has 1 subscriptions registered Subscription 1 reqno 1008 events 1008 dir 0: /home/x/.recently-used pids 10328, 28192, 27870 don't exist and gam_server is the only process running with this uid.
Well clearly I never reproduced this. If you know what application watches /home/x/.recently-used then I may be able to understand what is happening. So far all the tests/debug I have done, when the client process exited or was killed, the gam_server always received the disconnection and closed the connection to the client. I need more informations because I don't see how to make a local socket close or have the client side exit without getting the notification on the server. This makes little sense to me, I don't understand what is happening. I assume the kernel comes from Fedora, Daniel
Yes the kernel came from Fdora. In the previous reports the kernel was kernel-2.6.9-1.681_FC3 with the nvidia binary drivers. Other than that nearly everything is FC3 with updates applied except for gamin where the gamin-0.18 source RPM was recompiled and installed. I finally have a set of steps that can reproduce this problem. I have no idea whether these steps cause the same bug as reported earlier (previous problems were being seen from a GNOME desktop) but the symptoms looks the same. Steps to reproduce: (all settings to do with desktop environments have been removed from the user's home directory) 1. At gdm click on Session. Choose KDE then click OK. 2. Type your username then password and log in. 3. After KDE finishes logging in go to Fedora Hat at the bottom left -> Control Centre. 4. In the Control Centre window from the tree at the left go to Sound & Multimedia -> Sound System. 5. Click on the Hardware tab and change "Select the audio device" from Autodetect to Enlightened Sound Daemon. 6. Immediately click on the General tab and then click on the "Test Sound" button. 7. In the "Save Sound Server Settings?" window click Yes. Now one of two things happens. Either: a) The desktop will freeze entirely forcing you to zap X or log in at at a terminal and kill off kdeinit to return to the log in prompt. b) A "Knotify - The KDE Crash Handler" dialog will appear informing you that Knotify SIGSEGVd. Assuming that b) happens: 8. Click close button in the dialog window. 9. Close the Control Centre window and when "Unsaved Changes - Control Centre" appears choose Discard. 10. Go to the Fedora Hat at the bottom left -> Logout... (you may have to do this twice before KDE notices) and then in the End Session dialog choose Logout. (You may get another SIGSEV crash handler window as you log out) 11. If you log in at a virtual-terminal and do a ps ux | grep username you should see that the only processes left from that session are a ssh-agent and gam_server. ssh-agent doesn't have anything to do with gam_server but kill it off and wait one minute for gam_server to quit. Reproducibility: KDE will either lock up or SEGV 100% of the time. Which one is more likely to happen seems to depend on the machine though. Expected results: gam_server to quit. Actual results: gam_server never quits. Additional information: The sound card is an onboard VIA VT82C686 which does not have hardware mixing. After doing a kill -USR2 on the remaining gam_server I found this in the debug log: Connection fd 74 to pid 26550: state okay, 0 read Listener has 1 subscriptions registered Subscription 1 reqno 1008 events 1008 dir 1: /home/x/.kde/share/apps/kabc I wrote a small script to record all running processes to a log file every second (normal process accounting doesn't store pids unfortunately). It took quite a few attempts to find the process that is looking at kabc file and it turned out to be kab2kabc (I don't have all the parameters it was run with though). kab2kabc runs for less than second during KDE startup.
That "ps ux" should have been "ps aux" I forgot to mention that if a) happens and you kill all the tasks from that login other than gam_server, gam_server will still never quit. This happens on both kernel-2.6.9-1.681_FC3 and kernel-2.6.9-1.724_FC3 kernels (the latter one did not even have the nvidia binary drivers loaded).
I'm still seeing users left over gam_servers with gamin-0.0.20.
I've noticed this with gamin-0.0.17-4 that comes with Enterprise-4. This doesn't happen all the time though. ssh-agent also lingers (all the time). As a temporary fix, is it safe to put a command: su --shell=/bin/bash --command="killall gam_server >/dev/null 2>&1" "$USER" in /etc/X11/gdm/PostSession/Default or will this cause problems?
Andrew, the killall should really be run under the user account not as root, as gam_servers are started per-user. The client library is restartable so other users logged should not have big troubles if root killed their gam_server, but this still introduce a risk. So if you are sure $USER is correctly positionned at that point then this should be fine, Daniel
Hi Daniel, Thanks for the info. I've added the above line but gam_server starts up again before the user completely logs out (so I added a -HUP to make it more legit). I have noticed that this restarting of gam just before logout cures most of the lingering gams as the re-launched gam seems to realize it's not needed and stops after about 30 seconds. One other problem I've come across: Just the other day gam stayed on again but this time couldn't be killed (even kill -9 by root wouldn't do it). This prevented the user who was running gam from logging in at the graphical login (just hangs when gnome starts). This required a reboot to cure. The only thing that I had changed was that I mounted a remote computer using cifs. I've now switched back to smbfs and things seem to be working OK (I also had enabled lm_sensors but I doubt this would have anything to do with the gam problem). Thanks, Andrew
I have already heard of problems with CIFS, but it's really filesystem specific, so this IMHO is a kernel issue. You can still try to desactivate kernel monitoring and fallback to polling to avoid troubles with CIFS, see http://www.gnome.org/~veillard/gamin/config.html Daniel
Comment #24 - ssh-agent continuing after logout is a completely different bug, see bug #138747 .
I would also like to note that I have not seen any stray gamin after an upgrade to gamin-0.0.24-1.FC3. I will keep an eye on it before resolving this bug FIXED.
Is this fix due out for Enterprise 4 or should Enterprise users switch over to the FC3 version of gamin in the mean time? Thanks, --Andrew
W.r.t. RHEL 4, I don't feel like pushing a new version of gamin while there are known unresolved issues. I will likely keep it as-is until I get a version where 0.0. prefix on the release number will have disapeared :-) Daniel
Fedora Core 3 is now maintained by the Fedora Legacy project for security updates only. If this problem is a security issue, please reopen and reassign to the Fedora Legacy product. If it is not a security issue and hasn't been resolved in the current FC5 updates or in the FC6 test release, reopen and change the version to match. Thank you!
People seems to report this is fixed. Also, an upgrade is in later RHEL4 updates. Closing.