132354 – gam_server goes into loop and uses lots of CPU

Bug 132354 - gam_server goes into loop and uses lots of CPU

Summary: gam_server goes into loop and uses lots of CPU

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gamin
Sub Component:
Version:	9
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tomáš Bžatek
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	137439 140701 (view as bug list)
Depends On:
Blocks:	FC3Target
TreeView+	depends on / blocked

Reported:	2004-09-11 08:50 UTC by Ellen Shull
Modified:	2015-03-03 22:27 UTC (History)
CC List:	38 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-07-14 18:28:49 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
trace (793 bytes, text/plain) 2004-10-28 15:43 UTC, Neal Becker	no flags	Details
debug (489.36 KB, application/x-gzip) 2004-10-28 15:45 UTC, Neal Becker	no flags	Details
another gdb trace (207 bytes, text/plain) 2004-10-29 12:03 UTC, Neal Becker	no flags	Details
yet another gdb backtrace (1008 bytes, text/plain) 2004-12-02 22:45 UTC, Sitsofe Wheeler	no flags	Details
debug output from /tmp from before the 100% utilization occurred (796.73 KB, application/octet-stream) 2004-12-10 15:59 UTC, Peter Eddy	no flags	Details
gamin debug output (9.10 KB, text/plain) 2005-05-27 11:48 UTC, Joe Orton	no flags	Details
beginning of gam_server log file (116.27 KB, text/plain) 2006-01-11 01:31 UTC, Dean Kolosiek	no flags	Details
View All

Description Ellen Shull 2004-09-11 08:50:25 UTC

Description of problem: 
For as-yet undetermined reasons, sometimes gam_server starts using 
lots of CPU time (~80% on my Athlon XP 1466 MHz) 
 
Version-Release number of selected component (if applicable): 
gamin-0.0.9-1 
everything else current rawhide 
 
How reproducible: 
Not sure what triggers it; I think this is the second time I've seen 
it, but I didn't investigate the first time as I needed to boot into 
a new kernel then anyway (yes, I know, I'm a bad tester!) 
 
Additional info: 
attaching strace to the process yields these same (I verified with 
sort/uniq) two lines over and over in an infinitely repeating loop: 
 
stat64("/home/wes/.kde/share/config/ksmserverrc", {st_dev=makedev(9, 
0), st_ino=99361, st_mode=S_IFREG|0600, st_nlink=1, st 
_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, st_size=1492, 
st_atime=2004/09/09-17:55:06, st_mtime=2004/09/09-16:20:2 
0, st_ctime=2004/09/09-16:20:20}) = 0 
stat64("/home/wes/.kde/share/config/kioslaverc", {st_dev=makedev(9, 
0), st_ino=163182, st_mode=S_IFREG|0600, st_nlink=1, st 
_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, st_size=92, 
st_atime=2004/09/11-01:37:47, st_mtime=2004/08/25-23:49:21, 
 st_ctime=2004/08/25-23:49:21}) = 0 
 
I then sent it a SIGHUP and it went back to normal.  Could be 
coincidence; does gam_server actualy restart on that signal?  (I 
stupidly didn't think to leave strace attached when I did it) 
 
I will set up gam_server to run with GAM_DEBUG and --notimeout and 
all that; hopefully I can catch the behavior again.

Comment 1 Ellen Shull 2004-09-18 17:08:24 UTC

Ok, I'm seeing it again.  Unfortunately after two days of trying to 
catch it in debug mode, I had rebooted the system for updates and 
forgot to put gam_server in debug mode again :( 
 
This time it's looping against the following two (different than 
before) files: 
 
stat64("/home/wes/.kde/share/config/knotify.eventsrc", 
{st_dev=makedev(9, 0), st_ino=7314, st_mode=S_IFREG|0600, st_nlink=1, 
st_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, st_size=1085, 
st_atime=2004/09/15-19:41:19, st_mtime=2004/08/29-20:53:09, 
st_ctime=2004/08/29-20:53:09}) = 0 
stat64("/home/wes/.kde/share/config/kpilot_vcalconduitsrc", 
{st_dev=makedev(9, 0), st_ino=163235, st_mode=S_IFREG|0600, 
st_nlink=1, st_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, 
st_size=57, st_atime=2004/08/25-23:49:33, 
st_mtime=2004/08/25-23:49:33, st_ctime=2004/08/25-23:49:33}) = 0 
 
I see there is a new glibc in rawhide today, so I'm going to install 
it and hope it helps.

Comment 2 Sammy 2004-09-23 15:59:03 UTC

I am watching gam_server on rawhide 9-23-2004 system and it is using 
up around 50% of the CPU!

Comment 3 Daniel Veillard 2004-09-23 16:06:27 UTC

excellent, launch a gdb /usr/libexec/gam_server , attach with the
PID of the process, look at what's happening and report. Knowing
that you look at it or a syscall trace isn't that useful !

Daniel

Comment 4 Mikael Carneholm 2004-10-04 01:40:13 UTC

I'm seeing it here as well. At most, gam_server is eating as much as
50-70% of the cpu.

Comment 5 Daniel Veillard 2004-10-04 13:41:33 UTC

http://www.gnome.org/~veillard/gamin/debug.html#Debugging1

  debug the problem and provide a trace. Also make sure you
have the latest version installed. What I said in comment #3 
is still valid.

Daniel

Comment 6 Ellen Shull 2004-10-07 21:04:24 UTC

Ok, caught it again, this time on gamin-0.0.14-1 (which is current 
rawhide AFAIK) 
 
Using your fancy new SIGUSR2 debug trick, I get a quickly growing 
file with this line repeated forever: 
 
node_remove_subscription() 
 
It's nice that another SIGUSR2 turns it off again, because it was 
threatening to fill my disk :O 
 
Interestingly, now strace shows nothing, nada, nichts, rien.  (well 
it shows the debug prints if that's enabled).  Different problem 
causing the same high CPU usage, or just difference due to code 
changes you've made? 
 
Latest gdb backtrace, this time with debuginfo installed: 
 
#0  0x00135e42 in __i686.get_pc_thunk.bx () 
from /usr/lib/libglib-2.0.so.0 
#1  0x00155944 in g_node_is_ancestor (node=0x8123018, 
descendant=0x8058498) at gnode.c:413 
#2  0x0804af3a in gam_tree_remove (tree=0x80583c8, node=0x8123018) at 
gam_tree.c:144 
#3  0x0804b7d3 in remove_directory_subscription (node=0x8123018, 
sub=0x811c4e8) at gam_poll.c:507 
#4  0x0804cd56 in gam_poll_consume_subscriptions () at gam_poll.c:918 
#5  0x0804fc64 in gam_dnotify_consume_subscriptions_real (data=0x0) 
at gam_dnotify.c:212 
#6  0x0014e848 in g_idle_dispatch (source=0x8129f00, 
callback=0x8123018, user_data=0x8058498) at gmain.c:3802 
#7  0x0014b4fb in g_main_context_dispatch (context=0x8057ee8) at 
gmain.c:1942 
#8  0x0014cf82 in g_main_context_iterate (context=0x8057ee8, block=1, 
dispatch=1, self=0x8053018) at gmain.c:2573 
#9  0x0014d22f in g_main_loop_run (loop=0x8059908) at gmain.c:2777 
#10 0x0804aa28 in main (argc=1, argv=0xfefffa54) at gam_server.c:330 
#11 0x001b7b03 in __libc_start_main (main=0x804a8f7 <main>, argc=1, 
ubp_av=0xfefffa54, init=0x8050304 <__libc_csu_init>, 
    fini=0xfefff9e0, rtld_fini=0xfefffa54, stack_end=0xfefffa4c) 
at ../sysdeps/generic/libc-start.c:209 
#12 0x08049fa1 in _start () 
 
Stepping through it in ddd/gdb, I notice that in gam_tree_remove, the 
g_node_is_ancestor sanity check seems to be consistently failing.  To 
be specific, in g_node_is_ancestor, descendent->parent seems to 
always be null (only data and next are non-null).  Somewhere the 
trees aren't getting built right, or are being systematically 
corrupted... 
 
I'm not resetting gam_server for the moment; email me if you want to 
telnet in and gdb it or X ddd out to your host to check it out, since 
it seems to be difficult to reproduce...

Comment 7 Daniel Veillard 2004-10-07 21:55:19 UTC

I'm about to go on the road... this helps, but you should not 
wait from me.

  thanks,

Daniel

Comment 8 Sammy 2004-10-08 14:04:59 UTC

I again had the same problem too....not clear what triggers it. I was installing 
the new kernel rpm and it was taking for ever to install....when I did top I  
saw gamin taking all the cpu/ This has the potential of causing serious 
hangs.

Comment 9 Robert Scheck 2004-10-13 12:10:25 UTC

I also got this problem using Fedora Core 3 test 3 on an ProLiant 
DL 145 (AMD64, Opteron) using x86_64, but gam_server always uses here 
99,9% CPU constant. This causes a default load of ~ 1.5 - very bad.

NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3440 root      25   0  6204 2268 4836 R 99.9  0.0  18:35.44 gam_server

[root@fc3-test ~]# ps aux | grep gam
root      3440 63.2  0.0  6204 2268 ?        R    13:40  21:12 /usr/libexec/gam_server
root      5081  0.0  0.0 42308  760 pts/3    S+   14:14   0:00 grep gam
[root@fc3-test ~]#

[root@fc3-test ~]# rpm -q gamin --qf '%{name}-%{version}-%{release}.%{arch}\n'
gamin-0.0.14-1.i386
gamin-0.0.14-1.x86_64
[root@fc3-test ~]#

This problem really should be solved before the final release. This 
issue maybe also should be marked as possible blocker *suggesting*?!

Comment 10 Chris Wright 2004-10-14 20:44:00 UTC

gamin-0.0.14-1, current rawhide, x86_64 dual opteron.
Output from SIGUSR2 shows it's looping through the following list of
files.  Looks like a poll_file loop is stuck.  Quick source read
points at gam_poll_scan_directory_internal and the for loop:

  for (l = children; l; l = l->next) {

Poll: poll_file for
/usr/share/applications/gnome-accessibility.desktop called
 at 1097770113 delta 0 : 0 Poll: poll_file
/usr/share/applications/gnome-accessibility.desktop unchanged
1097700841 0 : 1097700841 0
Poll: poll_file for
/usr/share/applications/redhat-neat-control.desktop called  at
1097770113 delta 0 : 0
Poll: poll_file /usr/share/applications/redhat-neat-control.desktop
unchanged
1096989180 0 : 1096989180 0
Poll: poll_file for
/usr/share/applications/redhat-rhn-up2date-config.desktop called
 at 1097770113 delta 0 : 0
Poll: poll_file
/usr/share/applications/redhat-rhn-up2date-config.desktop unchanged
1095979273 0 : 1095979273 0

And gdb confirms this:

(gdb) bt
#0  0x0000002a95721945 in ?? ()
#1  0x0000000000404701 in poll_file (node=0x523b40) at stat.h:366
#2  0x0000000000404b46 in gam_poll_scan_directory_internal (dir_node=0x0,
    exist_subs=0x0, scan_for_new=1) at gam_poll.c:446
#3  0x0000000000404f33 in gam_poll_scan_callback (data=0x5303d0)
    at gam_poll.c:550
#4  0x00000036cc52942b in ?? ()
(gdb) list gam_poll.c:446
441         }
442         children = gam_tree_get_children(tree, dir_node);
443         for (l = children; l; l = l->next) {
444             node = (GamNode *) l->data;
445
446             fevent = poll_file(node);
447
448             if (gam_node_is_dir(node) &&
449                 gam_node_has_flag(node, FLAG_NEW_NODE) &&
450                 gam_node_get_subscriptions(node)) {

(gdb) print l
$1 = (GList *) 0x51ea60
(gdb) print l->next
$2 = (GList *) 0x523920
(gdb) print l->next->next
$3 = (GList *) 0x51ea78
(gdb) print l->next->next->next
$4 = (GList *) 0x51ea60
(gdb) p children 
$5 = (GList *) 0x53ddd0
(gdb) p *(GamNode *)l->data
$6 = {path = 0x523c10
"/usr/share/applications/redhat-neat-control.desktop",   subs = 0x0,
data = 0x530210,   data_destroy = 0x404380 <gam_poll_data_destroy>,
flags = 0, node = 0x515f78,   is_dir = 0}

This list is not NULL terminated.

Comment 11 Daniel Veillard 2004-10-15 07:39:36 UTC

Thanks a lot ! This is what I was afraid of. The gam_server is not
multithreaded anymore, so such corruption should not be the result
of unguarded reentrancy. The children list is obtained by
children = gam_tree_get_children(tree, dir_node); which does

    GList *list = NULL;
    [...]
    for (i = 0; i < g_node_n_children(node); i++) {
        list = g_list_prepend(list, NODE_DATA(g_node_nth_child(node, i)));
    }

gam_tree_get_children() cannot loop, it should return a correct list.

the loop in gam_poll_scan_directory_internal() just emits event and 
should not modify the list which is built as a temporary structure,
l or related list data are not passed down to the recursive call to
gam_poll_scan_directory_internal()

I'm puzzled that we end-up with some corruption there. Reading 
g_list_prepend() code I don't see how this could fail. Except running
gam_server under valgrind to try to track a random memory access 
error I don't see how to chase this in a deterministic way.
Annoying, very annoying !

Daniel

Comment 12 Lars G 2004-10-15 17:44:47 UTC

gam_server goes nuts here too with rawhide.

Comment 13 Chris Wright 2004-10-15 18:41:00 UTC

Yesterday I looked briefly about at all list manipulation areas,
but didn't see anything glaring either.  Doesn't see like random memory
access, however.  The l->next pointers are valid, just creating a loop.
I think the directory was being changed while this happened (during
daily rawhide update).  Is there any async event which could change
the tree?  AFAICT, gam_tree_get_children() expects the parent node to
remain queiscent.  You said it's not multithreaded, how about signal
driven?  Any way for gam_tree_add() to happen during a
gam_tree_get_children so that the GNode sibling list changes while
building the GList?

Comment 14 Daniel Veillard 2004-10-16 11:46:47 UTC

The only asynch event is the dnotify signal. It is handled
by dnotify_signal_handler() which pushes the file descriptot number
onto a GQueue and does a write to a local pipe. The pipe is hooked
to the mainloop and pure synchronous processing should be done from
there.
There is a comment that GQueue changes is not signal safe and something
else should be used. That's the only uncertaintie I can detect in the
code, assuming there is only the main application thread running. The
fact that the problem seems to occur frequently on your fast SMP box 
makes me wonder if there isn't something which still generate some
kind of reentrancy.
What puzzles me is that even if the node children list was modified
during gam_tree_get_children() the list might get duplicate or wrong
data pointers, but the l->next pointers should still be correct...

Daniel

Comment 15 Daniel Veillard 2004-10-16 16:19:33 UTC

I rechecked the whole code path for 
  children = gam_tree_get_children(tree, dir_node);
and how it is walked. I still can't understand why those data which
are local variables of the subroutines could generating a loop or
modified to that effect.
But to try to make progress I added sample trick code detecting 
loop in the children list within gam_poll_scan_directory_internal()
to raise an error and break the loop if this happens.
I released a 0.0.15 version with that workaround
   http://www.gnome.org/~veillard/gamin/sources/
I would be very interested in feedback about this for those who had
troubles with 0.0.14 looping in their environment.
I don't consider the problem fixed though, it is a workaround until
I fully understand the problem.

Daniel

Comment 16 Ralf Ertzinger 2004-10-18 15:53:20 UTC

0.0.15 looped for me today. Since I did not have the -debuginfo
package, and yum did not like me, I can not provide further information.

Comment 17 Daniel Veillard 2004-10-21 15:26:46 UTC

I generated gamin-0.0.16 after fixing a couple of problems including
one in tree handling. I hammered on it seriously and could not reproduce
any kind of problem with it. I would very much appreciate if the
people who managed to get the looping effect could upgrade to 0.0.16
and report if they manage to reproduce the problem again:
   http://www.gnome.org/~veillard/gamin/sources/

  thanks,

Daniel

Comment 18 Nicholas Miell 2004-10-27 07:53:37 UTC

gamin 0.0.16 still loops

Comment 19 Sammy 2004-10-27 14:18:03 UTC

I can confirm this too but for the life of me have no idea what triggers it. 
I am running smp kernel with hyperthreading and using kde/kdm as gui.

Comment 20 Daniel Veillard 2004-10-28 15:07:00 UTC

*** Bug 137439 has been marked as a duplicate of this bug. ***

Comment 21 Neal Becker 2004-10-28 15:43:11 UTC

Created attachment 105898 [details]
trace

Comment 22 Neal Becker 2004-10-28 15:45:32 UTC

Created attachment 105899 [details]
debug

Comment 23 Neal Becker 2004-10-28 15:47:14 UTC

Comment on attachment 105899 [details]
debug

May contain sensitve information, please respect privacy.

Comment 24 Neal Becker 2004-10-29 12:03:59 UTC

Created attachment 105934 [details]
another gdb trace

Comment 25 Daniel Veillard 2004-10-31 17:13:07 UTC

well you would need the debuginfo for the gdb trace.
g_pattern_match is called indirectly from poll_file() or
node_add_subscription() or  node_remove_subscription()
Since your log seems to indicate it is looping on 
node_remove_subscription. This again seems to indicate an
error looping on a corrupted node list that time a children
list within a directory...

Daniel

Comment 26 Kim Lux 2004-11-10 07:10:32 UTC

I've got the same problem with FC3 final.  I am ripping CDs with 
grip, running Kdevelop and listening to noatun when it happens. 
 
I've running kernal 2.6.9-1.667smp.  This is the first time I've seen 
it do this and I've been watching processes quite closely because I 
had an issue with artsd going nuts. (Sound problem.) 
 
I just killed gamserver in top.  I think there were 2 gam_servers 
running.  I killed the first PID and a second one jumped to the top 
of the list briefly.  It had a different PID. 
 
Let me know if there is anything I can do to help.

Comment 27 Philippe Rigault 2004-11-10 15:58:20 UTC

As with comment#26, I have not seen it previously (FC3-RC5) 
 
FC3 final, x86_64 (Sun W1100z), 2.6.9-1.667 (single CPU) 
 
Usage: KDE desktop (kontact, konqueror, kdevelop etc. K3b --ripping 
four sets of FC3 CDs). 
 
/usr/libexec/gam_server can eat up to 99% CPU 
When switching on debug (kill -s SIGUSR2 pid), I see this: 
 
# tail -f /tmp/gamin_debug_phCf 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
node_remove_subscription(â(*) 
 
I will watch it closely from now on.

Comment 28 Paulo moura Guedes 2004-11-10 22:11:15 UTC

Same thing here. 
 
FC3 final, i686_32, kernel-2.6.9-1.667 
Using KDE desktop.

Comment 29 Philippe Rigault 2004-11-11 13:15:56 UTC

I noticed that since I updated to FC3 final, many of my email 
messages end up being duplicated (I am using kmail with maildir 
format mailboxes). 
 
Can it be related to problems with gamin ?

Comment 30 Neal Becker 2004-11-11 13:24:43 UTC

This happens to me if I run more than 1 kmail (say, on 2 different 
machines using imap).  In this case, nothing to do with gamin.

Comment 31 Sammy 2004-11-11 14:33:44 UTC

I have the same setup. Could it be KDE related?

Comment 32 Ralf Ertzinger 2004-11-11 14:38:20 UTC

I do not run KDE, and I do see this very rarely (not at all during the
last two (three?) weeks).

But if I recall correctly, k3b (which is about the only kde program I
use) liked to trigger it.

Comment 33 David Fraser 2004-11-12 12:17:28 UTC

0.16 fixed it for me, thanks very much

Comment 34 Daniel Veillard 2004-11-12 15:26:37 UTC

I just released 0.0.17 where I have tried to cope with possible loops 
in the second place where gam_tree_get_children() is called, and
also made more changes and checkings in that function too.
  http://www.gnome.org/~veillard/gamin/sources/
I would appreciate if people having troubles could try that version
and report !

  thanks,

Daniel

Comment 35 Philippe Rigault 2004-11-12 16:08:53 UTC

Could you release RPMs in rawhide ? 
 
Thanks, 
 
Philippe

Comment 36 Daniel Veillard 2004-11-12 16:25:12 UTC

They are built and may show up within a day,

Daniel

Comment 37 P Jones 2004-11-15 00:13:08 UTC

I am using GNOME in FC3 Final, and I notice that gam_server is using
99-100% of CPU after I just ran K3B in GNOME.

Comment 38 Daniel Veillard 2004-11-15 11:25:26 UTC

try 0.0.17 see comment #34,

Daniel

Comment 39 Ken Barber 2004-11-16 07:37:53 UTC

Just a quick "me too."

FC3 release, fully updated as of this post.  Running KDE, KMail, 2
instances of Konqueror as file manager, 2 idle command shells and
Firefox 1.0.

Most recent action:  some file management stuff (moving them around).
 Also gedit.

I'm not sure where to go to get "rpms in rawhide" (comment #35) but
I'll look for it and install it if I find it.

Comment 40 Philippe Rigault 2004-11-21 03:46:07 UTC

> I'm not sure where to go to get "rpms in rawhide" 
http://fedora.redhat.com/download/updates.html 
This page explains the different stages of development and updates 
after a release of Fedora Core has gone out: 
  - Fedora updates 
  - Proposed Fedora (aka testing) 
  - Development (aka rawhide) 
 
The 0.0.17 version of gamin is now in Fedora updates 
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/3

Comment 41 Philippe Rigault 2004-11-21 03:52:50 UTC

After upgrading to 0.0.17, I no longer see big hikes in CPU usage 
like before. 
 
However, I just noticed that there has also been messages like this 
on my syslog for a while: 
# grep gam /var/log/messages  
Nov 16 20:10:45 foo kernel: gam_server[5241]: segfault at 
0000000000000051 rip 00000000004038a7 rsp 0000007fbfffd3a8 error 4 
Nov 17 07:16:11 foo kernel: gam_server[5844]: segfault at 
0000000000000013 rip 00000000004038a7 rsp 0000007fbfffd3a8 error 4 
Nov 17 08:02:32 foo kernel: gam_server[14902]: segfault at 
000000000000000a rip 00000000004038a7 rsp 0000007fbfffd278 error 4 
Nov 17 11:07:06 foo kernel: gam_server[25002]: segfault at 
0000000000000013 rip 00000000004038a7 rsp 0000007fbfffd3a8 error 4 
Nov 17 23:03:08 foo kernel: gam_server[4699]: segfault at 
000000000000005c rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4 
Nov 18 06:58:10 foo kernel: gam_server[3431]: segfault at 
000000000000005c rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4 
Nov 18 07:03:00 foo kernel: gam_server[3722]: segfault at 
0000000000000008 rip 00000000004038a7 rsp 0000007fbfffd2c8 error 4 
Nov 18 07:05:08 foo kernel: gam_server[4694]: segfault at 
0000000000000050 rip 0000002a9557b920 rsp 0000007fbfffe480 error 4 
Nov 19 22:09:06 foo kernel: gam_server[3447]: segfault at 
000000000000005c rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4 
Nov 19 22:09:26 foo kernel: gam_server[3863]: segfault at 
0000000000000013 rip 00000000004038a7 rsp 0000007fbfffd2c8 error 4 
Nov 19 22:38:32 foo kernel: gam_server[3935]: segfault at 
00000015000003f8 rip 0000002a9557b920 rsp 0000007fbffff6c0 error 4 
Nov 19 23:30:51 foo kernel: gam_server[12659]: segfault at 
00000006000003f8 rip 0000002a9557b920 rsp 0000007fbffff700 error 4 
Nov 20 09:13:34 foo kernel: gam_server[3445]: segfault at 
0000000000000061 rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4 
Nov 20 20:47:23 foo kernel: gam_server[3411]: segfault at 
0000000000000047 rip 0000002a9557b920 rsp 0000007fbfffe470 error 4 
Nov 20 22:20:34 foo kernel: gam_server[19964] general protection 
rip:4046bb rsp:7fbfffe640 error:0 
Nov 20 22:20:45 foo kernel: gam_server[7190]: segfault at 
00000060000003f8 rip 0000002a9557b6b1 rsp 0000007fbffff7c0 error 4 
Nov 20 22:22:45 foo kernel: gam_server[10136]: segfault at 
0000000000000066 rip 0000002a9557c3a4 rsp 0000007fbffff6a0 error 4 
 
These have not gone away with 0.0.17.

Comment 42 Peter Eddy 2004-11-21 16:52:04 UTC

I'm using gamin-0.0.17-1.FC3 and found this morning that gam_server
was using 100% of one CPU on my dual-CPU machine. I do not see the
segfaults Philippe posted though.

Comment 43 Daniel Veillard 2004-11-21 23:22:07 UTC

for crash or 100% cpu usage on 0.0.17 please follow the informations
at http://www.gnome.org/~veillard/gamin/debug.html to try to 
provide feedback on what is happening. 

Daniel

Comment 44 Daniel Veillard 2004-11-24 11:24:17 UTC

*** Bug 140701 has been marked as a duplicate of this bug. ***

Comment 45 Peter Eddy 2004-11-25 15:33:51 UTC

So this is interesting, I have a huge (5K+ files), unorganized directory of
photographs on /mnt/ata0/www-images, and I don't have any .gamin config, so it's
polling since it's in /mnt/*, and the log file viewed after using the SIGUSR2
signal confirms that.

So I open up Konqueror (I'm using KDE) on that directory and see gam_server
using 20% of one CPU, it's constantly polling. Then I open open one of the
photos with Kuickshow, and use the page up/page down keys to move back and forth
between images. Now gam_server's using 40% of one CPU. I open another Kwickshow
and repeat the previous step and gam_server's utilization goes to 79%. Another
Kwickshow, another 20% utilization.

Is this just an optimization issue? I'm assuming Konqueror and Kwickshow are
both gamin clients. Could gam_server be polling the same directory once for each
client?

Comment 46 Peter Eddy 2004-11-25 15:39:36 UTC

Oops, I meant to say that the utilization goes up 20% for each gamin client, I
did not mean to say that it went from 40% to 79%, it always went up 18-20%.

Comment 47 Daniel Veillard 2004-11-25 15:43:16 UTC

w.r.t. comment #45 and #46, this is totally unrelated to the current bug,
so please open a new bug report if you want feedback on this !

Daniel

Comment 48 Philippe Rigault 2004-11-26 16:00:57 UTC

OK, after a few days without crash or 100% CPU usage, it happened 
again. 
 
gamin-0.0.17-1.FC3 
kernel-2.6.9-1.681_FC3 x86_64 
 
KDE-3.3.1 (compiled from sources) 
 
CPU usage goes to the roof, freeze solid (cannot get a console or ssh 
into the box, ping responds though), then after 5 minutes goes back 
to normal (at this time, 'top' still shows a load of 26.00). 
 
Post-mortem (post-freezem actually) diagnosis: 
 
1. /var/log/mesaages 
Nov 26 10:43:26 mybox kernel: oom-killer: gfp_mask=0x1d2 
Nov 26 10:43:30 mybox kernel: DMA per-cpu: 
Nov 26 10:43:30 mybox kernel: cpu 0 hot: low 2, high 6, batch 1 
Nov 26 10:43:30 mybox kernel: cpu 0 cold: low 0, high 2, batch 1 
Nov 26 10:43:30 mybox kernel: Normal per-cpu: 
Nov 26 10:43:30 mybox kernel: cpu 0 hot: low 32, high 96, batch 16 
Nov 26 10:43:30 mybox kernel: cpu 0 cold: low 0, high 32, batch 16 
Nov 26 10:43:30 mybox kernel: HighMem per-cpu: empty 
Nov 26 10:43:30 mybox kernel: 
Nov 26 10:43:30 mybox kernel: Free pages:        1516kB (0kB HighMem) 
Nov 26 10:43:30 mybox kernel: Active:181 inactive:236594 dirty:0 
writeback:235967 unstable:0 free:379 slab:13567 mapped:2424 
pagetables:2311 
Nov 26 10:43:30 mybox kernel: DMA free:4kB min:12kB low:24kB 
high:36kB active:0kB inactive:9788kB present:16384kB 
Nov 26 10:43:30 mybox kernel: protections[]: 0 0 0 
Nov 26 10:44:06 mybox kernel: Normal free:1512kB min:1004kB 
low:2008kB high:3012kB active:724kB inactive:936588kB 
present:1031552kB 
Nov 26 10:44:52 mybox kernel: protections[]: 0 0 0 
Nov 26 10:45:13 mybox gpm[2410]: *** info [mice.c(1766)]: 
Nov 26 10:47:03 mybox kernel: HighMem free:0kB min:128kB low:256kB 
high:384kB active:0kB inactive:0kB present:0kB 
Nov 26 10:47:07 mybox gpm[2410]: imps2: Auto-detected intellimouse 
PS/2 
Nov 26 10:47:07 mybox kernel: protections[]: 0 0 0 
Nov 26 10:47:08 mybox kernel: DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4kB 
Nov 26 10:47:08 mybox kernel: Normal: 108*4kB 3*8kB 4*16kB 1*32kB 
1*64kB 7*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1512kB 
Nov 26 10:47:09 mybox kernel: HighMem: empty 
Nov 26 10:47:09 mybox kernel: Swap cache: add 276677, delete 40720, 
find 4408/4496, race 0+0 
Nov 26 10:47:09 mybox kernel: Out of Memory: Killed process 23735 
(gam_server). 
 
 
FWIW, I customized /etc/sysctl.conf with these lines added: 
# Control shared memory size 
# (added for PostgreSQL) 
kernel.shmall = 134217728 
kernel.shmmax = 134217728 
 
# Do not overcommit memory 
vm.overcommit_memory = 2 
 
2. in $HOME/.xsession-errors: 
gam_poll_scan_directory_internal(/home/user) loop detected 
gam_poll_scan_directory_internal(/home/user) loop detected 
gam_poll_scan_directory_internal(/home/user) loop detected 
... 
465 lines like this 
 
Hope this helps 
 
Philippe

Comment 49 Philippe Rigault 2004-11-26 17:01:17 UTC

Daniel, the SIGUSR2 trick is very nice, but as comment #6 point out, 
it is *very* verbose (can fill 100MB in minutes), so it will exhaust 
any reasonable partition pretty quickly, and the fact that it writes 
in /tmp means bad consequences for the system when this fills up. 
 
So it is not currently usable as a way to track gamin permanently. I 
have to turn it on only for short periods of time, and sure enough, 
these are not the times when things go bad. 
 
I suggest two improvements: 
  1- Have the log directory configurable (defaults to /tmp) 
  2- Configure a MAX_SIZE for a log file, after which logs are 
rotated, possibly with compressing old ones automatically. 
 
Thanks, 
 
Philippe

Comment 50 Daniel Veillard 2004-11-26 18:29:03 UTC

Okay, I have tried to track and change all usage of GList which may
potentially result in the loop we are seeing. Basically the analysis
is that list element are freed, put back in the free pool, reused, and
then the pointer from the location where it was freed is modified.
That's the only explanation I can find to get a loop in the lists.
As a result I generated a new version with a lot of new cleanups 
maybe that time I got it for good. Version 0.0.18 is available
as usual from the download page
   http://www.gnome.org/~veillard/gamin/downloads.html

w.r.t. comment #49, the goal really is to find the bug, I don't
think gathering days of logs is a good idea :-\ and since it is
a race condition apparently (but how it is single-threaded) adding
the debugging code is likely to just avoid the problem.

Daniel

Comment 51 Philippe Rigault 2004-11-26 18:50:57 UTC

Thanks for the quick response. 
 
> Version 0.0.18 is available 
Downloaded, built x86_64 RPM and installed. 
 
Side note: in the changelog of src.rpm, there is no entry for 
0.0-17.1 
 
> w.r.t. comment #49, the goal really is to find the bug, I don't 
> think gathering days of logs is a good idea :-\ and since it is 
> a race condition apparently (but how it is single-threaded) adding 
> the debugging code is likely to just avoid the problem 
 
Well, currently gathering *any* data is pretty much impossible, given 
how fast it writes in /tmp. User has to turn debug off in a hurry, so 
the SIGUSR2 feature becomes sort of useless for users to help you. 
 
Besides, the goal of debug is to find any bug, not only this one I 
think. 
 
Cheers, 
 
Philippe

Comment 52 Philippe Rigault 2004-12-01 23:03:40 UTC

OK, it happens again as I speak 
 
gamin-0.0.18-1 consumes all CPU 
FC3 x86_64 
kernel-2.6.9-1.681_FC3 x86_64 
Using KDE 
 
1. Top 
top - 17:55:19 up 5 days,  6:29,  6 users,  load average: 1.21, 0.62, 
0.24 
Tasks:  94 total,   2 running,  92 sleeping,   0 stopped,   0 zombie 
Cpu(s): 98.7% us,  1.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  
0.0% si 
Mem:   1024700k total,   980336k used,    44364k free,   159164k 
buffers 
Swap:  1534168k total,      808k used,  1533360k free,   437340k 
cached 
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
23049 foobar    25   0  6824 2672 5060 R 97.2  0.3   2:52.77 
gam_server 
 5144 root      15   0  213m 122m  93m S  1.7 12.2  36:01.05 X 
19834 foobar    16   0  154m  23m 150m S  0.7  2.3   2:06.04 kdeinit 
    1 root      16   0  4736  616 4524 S  0.0  0.1   0:02.16 init 
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.39 
ksoftirqd/0 
    3 root       5 -10     0    0    0 S  0.0  0.0   0:03.10 events/0 
    4 root      10 -10     0    0    0 S  0.0  0.0   0:00.00 khelper 
    5 root      15 -10     0    0    0 S  0.0  0.0   0:00.00 kacpid 
   42 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 
kblockd/0 
 
2. kill -SIGUSR2 23049: the debug file is spitting this: 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
Queue Full 
769 lines like this, it seems to print them by little groups every 
few seconds. 
 
3. Syslog: 
$ sudo grep gam /var/log/messages 
Nov 29 14:14:29 lw1 kernel: gam_server[3353]: segfault at 
00000001000003f8 rip 0000002a95690920 rsp 0000007fbffff6e0 error 4 
Dec  1 16:56:22 lw1 kernel: gam_server[3903]: segfault at 
00000006000003f8 rip 0000002a956906b1 rsp 0000007fbffff5b0 error 4 
 
Cheers, 
 
Philippe

Comment 53 Daniel Veillard 2004-12-01 23:48:17 UTC

Queue Full is a report from the signal handler. There is more
than 500 kernel events stacked for processing.
Run gam_server under gdb, possibly started from a vt console
to try to find why there is a segfault or where it is looping.
  http://www.gnome.org/~veillard/gamin/debug.html

I will need a stack trace, this should be possible to find in
your case.

Daniel

Comment 54 Philippe Rigault 2004-12-02 00:48:57 UTC

Stack trace. 
 
note: when gamin-0.0.18 was built (from the src.rpm), it was linked 
with the copy of glib-2.0 that I compiled from sources together with 
my KDE.  
 
$ gdb gam_server 23049 
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh) 
Copyright 2004 Free Software Foundation, Inc. 
GDB is free software, covered by the GNU General Public License, and 
you are 
welcome to change it and/or distribute copies of it under certain 
conditions. 
Type "show copying" to see the conditions. 
There is absolutely no warranty for GDB.  Type "show warranty" for 
details. 
This GDB was configured as "x86_64-redhat-linux-gnu"...gam_server: No 
such file or directory. 
 
Attaching to process 23049 
Reading symbols from /usr/libexec/gam_server...Reading symbols 
from /usr/lib/debug/usr/libexec/gam_server.debug...done. 
Using host libthread_db library "/lib64/tls/libthread_db.so.1". 
done. 
Reading symbols from /opt/kde3.3.1/lib64/libglib-2.0.so.0...done. 
Loaded symbols for /opt/kde3.3.1/lib64/libglib-2.0.so.0 
Reading symbols from /lib64/tls/libc.so.6...done. 
Loaded symbols for /lib64/tls/libc.so.6 
Reading symbols from /lib64/ld-linux-x86-64.so.2...done. 
Loaded symbols for /lib64/ld-linux-x86-64.so.2 
Reading symbols from /lib64/libnss_files.so.2...done. 
Loaded symbols for /lib64/libnss_files.so.2 
0x0000002a956905b7 in g_list_last () 
from /opt/kde3.3.1/lib64/libglib-2.0.so.0 
(gdb) where 
#0  0x0000002a956905b7 in g_list_last () 
from /opt/kde3.3.1/lib64/libglib-2.0.so.0 
#1  0x0000002a9569069d in g_list_append () 
from /opt/kde3.3.1/lib64/libglib-2.0.so.0 
#2  0x0000000000403fc1 in gam_tree_get_children (tree=0x546000, 
root=0x524a40) at gam_tree.c:265 
#3  0x00000000004043ba in remove_directory_subscription 
(node=0x51b340, sub=0x51d600) at gam_poll.c:559 
#4  0x00000000004056b3 in gam_poll_consume_subscriptions () at 
gam_poll.c:998 
#5  0x0000000000408a73 in gam_dnotify_consume_subscriptions_real 
(data=0x546000) at gam_dnotify.c:212 
#6  0x0000002a9569423a in g_main_context_dispatch () 
from /opt/kde3.3.1/lib64/libglib-2.0.so.0 
#7  0x0000002a95696617 in g_main_context_iterate () 
from /opt/kde3.3.1/lib64/libglib-2.0.so.0 
#8  0x0000002a956969aa in g_main_loop_run () 
from /opt/kde3.3.1/lib64/libglib-2.0.so.0 
#9  0x00000000004037c2 in main (argc=0, argv=0x0) at gam_server.c:340 
#10 0x0000002a958314ca in __libc_start_main () 
from /lib64/tls/libc.so.6 
#11 0x0000000000402b3a in _start () 
#12 0x0000007fbffff888 in ?? () 
#13 0x000000000000001c in ?? () 
#14 0x0000000000000001 in ?? () 
#15 0x0000007fbffffad1 in ?? () 
#16 0x0000000000000000 in ?? ()

Comment 55 Philippe Rigault 2004-12-02 01:14:14 UTC

>note: when gamin-0.0.18 was built (from the src.rpm), it was linked  
>with the copy of glib-2.0 that I compiled from sources together with  
>my KDE.  
 
Just to clarify, this is glib-2.4.7.

Comment 56 Daniel Veillard 2004-12-02 10:49:15 UTC

Hummm ... If you recompile stuff by yourself, this may raise problems
that others using the pristine distro will not get. On the other hand
you then have the opportunity to rebuild glib with the following 
configure flags which would help finding the exact location of the
corruption:

   --disable-mem-pools --enable-gc-friendly

then run again under gdb or valgrind. The problem comes from a 
corrupted memory pool.

Daniel

Comment 57 Sitsofe Wheeler 2004-12-02 22:41:50 UTC

I'm getting the same backtrace as comment #54, i386 up to date FC3 as of 1 Dec
except for having gamin 0.18 installed.

Comment 58 Sitsofe Wheeler 2004-12-02 22:45:43 UTC

Created attachment 107793 [details]
yet another gdb backtrace

I tried the SIGUSR2 trick but absolutely nothing happens. gamin seems to be
sitting stuck yet using CPU...

Comment 59 John Summerfield 2004-12-03 07:16:49 UTC

This also happens in Nahant-1 which has 0.0.9-1. I'm running KDE, have
done little more than read email (evolution then later kmail), report
bugs (epiphany) and do konsole stuff.

The system is an Evectra Pentium III 600EB (so no SMP, no
hyperthreads) 128 Mb RAM (so lots of swapping).

There doesn't seem to be a lot for me to add other than the datapoint
RHEL4 is impacted. I was about to trash the system and was scouting
round for valuables when I noticed it was somewhat sluggish.

Note that I have a similar report (looping) on Evolution. When I saw
Evolution was hogging the CPU this was pretty active too, and it may
be that the real problem was Gamin but I blamed E because it was the
most active at the times I checked.

Comment 60 Daniel Veillard 2004-12-03 10:08:55 UTC

w.r.t. comment #57 and #58

gam_tree_get_children basically does
    list = NULL;

     for all children
         list = g_list_append(list, children_data);

and the stack trace shows the generated list gets corrupted !
The error is somewhere else, this can only be rationally explained
if the list memory pool gets corrupted, and as I said in #56 
running a specifically compiled glib version is the best way to
reproduce the problem and catch it when it happens not the side
effect.

Comment 61 Sitsofe Wheeler 2004-12-03 10:58:53 UTC

Ouch. I can rebuild glib but I can't constantly run gam_server under
gdb/valgrind. I can attach valgrind/gdb to gam-server after I notice
it's gone loco though.

The set up is that I am administering have multi user machines so I
only see the aftermath - I don't know what steps are actually causing
this nor can I force users to run things in a debug mode all the time.

Comment 62 Daniel Veillard 2004-12-03 11:45:17 UTC

Sitsofe, I assume you are running a normal Fedora Core kernel and
the glib2 also coming from Fedora Core, right ? 
Did you reboot the machine after upgrading to 0.0.18 to be sure that
no process used an old gam_server.
I'm just trying to be 100% sure I'm not chasing something related to
inotify or a different release.

Daniel

Comment 63 Sitsofe Wheeler 2004-12-03 12:49:48 UTC

Daniel, yes I am running a normal FC3 kernel (kernel-2.6.9-1.681_FC3 )
and normal glib2 (glib2-2.4.7-1 ). No I didn't reboot after upgrading
to  gamin 0.0.18 but I know it probably isn't an old gam_server
because it's start time is Dec02 which is after the RPMs install time
of Tue 30 Nov 2004 15:52:50 .

However if you are unconvinced I suppose I can reboot all the machines
and wait too see if this happens again...

Comment 64 Daniel Veillard 2004-12-03 15:33:20 UTC

Can people try 0.0.19 that I uploaded at
  http://www.gnome.org/~veillard/gamin/downloads.html
I did yet another pass at checking all GList usage which could
lead to any kind of List pool corruption, I minimized the set of
GList API from gam_server to a very minimal set, I added a copy
of GList implementation directly in the gam_server disabling 
memory pool, poisonning freed list items. 
I have been hammering it for a couple of hours, and I'm still
unable to reproduce any crash or loop.
Please try 0.0.19 and report, as I'm running dry over ideas concerning
what is happening, or how to solve it,

  thanks,

Daniel

Comment 65 Sitsofe Wheeler 2004-12-03 16:08:44 UTC

Sure thing. I'm going away for a few days so it will be mid next week
before I get back to you on this. Do you want machines to rebooted
before reports are submitted back?

My one and only thought on this is are people using any binary
drivers? I don't think I've seen this  100% CPU usage happen (yet) on
a machine without nvidia binary drivers on it...

Comment 66 Paulo moura Guedes 2004-12-03 16:14:17 UTC

I have nvidia drivers but the kernel module was built in my machine.

Comment 67 Ralf Ertzinger 2004-12-03 16:19:30 UTC

I have seen this on a machine without nvidia drivers. I do have vmware
drivers (occasionally, not always), but I can not remember any
coincidence between vmware being loaded and gam eating CPU time.

Comment 68 Daniel Veillard 2004-12-03 16:23:12 UTC

Sitsofe, killall gam_server as root after the upgrade would do.
and it's unrelated to kernel drivers,

Daniel

Comment 69 Philippe Rigault 2004-12-03 16:37:32 UTC

> Can people try 0.0.19  
Updated. Now testing.  
  
I noticed that the /usr/libexec/gam_server process is *not* killed  
upon exit from my (KDE) graphical session. In other words, logging  
in/out repeatidly ends up using the very same process through  
different graphical sessions. Shouldn't _all_ user processes started  
with a graphical session be killed upon exit ?  
  
I will compile glib with   
 --disable-mem-pools --enable-gc-friendly  
in the next few days with KDE-3.3.2.  
  
> Hummm ... If you recompile stuff by yourself, this may raise  
problems  
> that others using the pristine distro will not get.  
Beside the fact that "pristine" distro users _are_ getting it, this  
is _good_ as you noticed because non-standard build may help  
find/troubleshoot more bugs.  
   
And since you mention the word "pristine", let me tell you that if  
Redhat/Fedora's implementation of KDE were indeed pristine and not so  
crippled (i.e downgraded menus/apps/configs resulting in a poor man's  
common denominator with Gnome, inheriting in the process some of its  
_bad_ user-interface-guidelines like the infamous  
double-click-by-default), you would have more users running your  
distro. Many KDE users have gone away from Redhat at the time  
Bluecurve and the bright "unified-desktop" idea came out. Since I  
always compile KDE from sources, it does not affect me that much but  
I am a somewhat rare case of KDE enthusiast on Fedora.  
  
Last nut not least, did you consider using the C++ STL to replace  
glib in gamin ?   
  
Cheers,  
  
Philippe

Comment 70 Daniel Veillard 2004-12-03 16:45:23 UTC

> I noticed that the /usr/libexec/gam_server process is *not* killed  
> upon exit from my (KDE) graphical session.

  the server exits after 30 seconds without client connection.

> I will compile glib with   
> --disable-mem-pools --enable-gc-friendly  
> in the next few days with KDE-3.3.2.  

  Not needed, 0.0.19 has it's own copy of the GLib list code

> pristine and KDE 

  Not my business, I don't use it, my point is reproductability
of report. I learnt for example that depending on the automake
version something as simple as gamin 0.0.18 get compiled completely
differently. Pristine mean that the bug report is valid of others
using the distro.

> did you consider using the C++ STL to replace  glib in gamin

  the client side does *NOT* use glib. The client side of FAM
was using C++ STL forcing all client to load the library :-(
that's one of the reasons we rewrote the package altogether.

The server side is a standalone program, based on glib because 
    - we know glib well
    - I don't want to code in C++

Daniel

Comment 72 Ken Barber 2004-12-03 18:15:47 UTC

I have not seen a problem since installing 0.0.17 (see comment #39).  
Since I have nothing further to report, I am removing myself from the 
CC: list.

Comment 74 Peter Eddy 2004-12-06 22:52:06 UTC

I've just been running 0.0.19 for about a half-hour, trying to
reproduce this as well as bug 140920. So far it seems to be
dramatically better than 0.0.18. I see at most 8% CPU utilization,
even with many clients.

Comment 75 Peter Eddy 2004-12-08 15:14:53 UTC

Hmm, I was able to trigger the 100% (well, 98-88%) CPU usage case
again this morning. Not sure what I did, and I can't get it to happen
again, but I did have to do a killall gam_server to recover. This is
with 0.0.19.

Comment 76 Philippe Rigault 2004-12-09 15:18:24 UTC

>> I noticed that the /usr/libexec/gam_server process is *not* 
killed   
>> upon exit from my (KDE) graphical session. 
 
>  the server exits after 30 seconds without client connection. 
 
Definitely not in my case. When I logout from 
KDE, /usr/libexec/gam_server does not exit (I watched it for 10 
minutes before killing it). I have verified from a root shell that 
the user has no more processes on the machine 
(except /usr/libexec/gam_server).  
 
PS: Haven't reproduced the high-CPU usage yet with 0.0.19

Comment 77 Philippe Rigault 2004-12-09 15:26:21 UTC

Oops, after double checking, I *had* one stale process somewhere 
which apparently established contact with gam_server after I logged 
out. 
Killing it allowed gam_server to exit by itself now.

Comment 78 Sammy 2004-12-09 15:29:36 UTC

How about kdm? Are you running it?

Comment 79 Philippe Rigault 2004-12-09 15:36:33 UTC

> How about kdm? Are you running it? 
If the question is to me, the answer is yes (as root of course).

Comment 80 Peter Eddy 2004-12-09 15:43:19 UTC

In versions up to and including 0.0.19, I've been able to trigger the
100% CPU useage problem by following these steps: I open a single
Konqueror window on a directory that contains a lot of photographs.
Then I open a photo with Kuickshow (right click image icon-> open with
-> Kuickshow) and use the mouse scrollwheel over the Kuickshow image
to move to the next or previous image in the directory. I usually open
about 10 individual Kuickshow applications and use the mouse wheel to
move between images in each one (not sure this moving between images
is important, but I think it may be.) Finally I close all the images
by selecting "Close All" from the KDE panel (having enabled the "Group
Similar tasks" panel option)

I, and other people at the organization I contribute my time to, have
 performed these same steps with RH9 and FC1 without seeing this problem.

I'm using gdm

Comment 81 Daniel Veillard 2004-12-09 15:54:26 UTC

Please provide the informations about the process state, 
gdb stack trace, and fragment of log generated using SIGUSR2 as
pointed out previously if you reproduce this with 0.0.19 .

  http://www.gnome.org/~veillard/gamin/debug.html#Debugging1

If you have a reproducable way to trigger this, then switch 
gam_server before the problem to debugging mode with SIGUSR2,
make it hang following your recipe, kill it and provide the output
debug found in /tmp as an attachment to this bug.

  thanks,

Daniel

Comment 82 Philippe Rigault 2004-12-09 16:50:25 UTC

About comment#80  
I cannot trigger the 100% usage by your recipe. 
Using KDE-3.3.2 compiled from sources (gcc-3.4.2) on FC3. 
 
I tried on two marchitectures (i386: P4/768MB-RAM and x86_64: 
Opteron/1GB-RAM), with a  
directory containing 1010 images (all of them ~100kB 600x400 JPEGs):  
open >10 kuickshow instances by right-clicking, scroll a bit in 
each, close all -> exit fine.  
Is this "a lot of photographs" by your standards ?

Comment 83 Peter Eddy 2004-12-09 17:04:49 UTC

This has happend only twice after moving to 0.0.19, previously it
would happen consistently. In my case, "a lot" == about 10000 files.
(Don't ask me why they like to 'organize' their photos this way)

I don't think the architecture should matter, but the machine in
question is a dual PIII-800Mhz w 1G RAM running KDE on a vanilla,
up-to-date, FC3 install.

Comment 84 Peter Eddy 2004-12-10 15:59:47 UTC

Created attachment 108323 [details]
debug output from /tmp from before the 100% utilization occurred

Comment 85 Daniel Veillard 2004-12-10 16:33:22 UTC

w.r.t. comment #84

I looked at the logs, you are doing 2 bad things:
  1/ you ask FAM to watch a directory with 10,000+ files in it
  2/ that directory is under /mnt

 1/ means that when gam server needs to check for modifications it
need to stat all files in the directory to check for changes, which
amounts to 10,000 stat() call and checks
 2/ gamin does not use the kernel notification API for directories
which may be temporary mount files like /mnt/... so it uses a 1 second
timeout and recheck every time for changes.

  the conjunction of 1/ and 2/ means gam_server spend its time
checking your files. It's not really a software loop, not a bug
but how it was designed at the moment.
  It is not the same problem as why this bugzilla entry was opened.
You can probably avoid the problem by removing either 1/ or 2/
but I can't find a fix to your problem, based on the fact that 
kernel dnotify must not be used on /mnt/... files and that maintaining
the FAM semantic on a 10,000 entry directory need to stat all entries
in that directory if the kernel can't tell they were not modified.

   You're pushing the FAM API to the limit your computer can handle 
it, so this doesn't work well...

Daniel

Comment 86 Peter Eddy 2004-12-10 18:02:37 UTC

Yes, I believe I mentioned the conditions before. However, this
problem  didn't exist in FC1, it's only after upgrading to FC3 that
I'm seeing it (all drives mounted under /mnt are exactly the same as
what I was using in FC1, I didn't modify them when I installed FC3.)
Also, the 100% utilization persists after I close the client, which I
think must be a bug.

For some reason I'm now able to reproduce this consistently using just
one Konqueror session (no Kuickviews). I haven't changed any
configuration or updated since upgrading to 0.0.19. I did reboot to
see if that had any effect, but it didn't.

For my part, I can easily disable gamin on /mnt/*, however this
problem seems like a regression from FC1

Comment 87 Kim Lux 2004-12-14 06:21:53 UTC

I've got gam_server taking 100% of the cpu as well.  

It looks like it is triggered by a combination of a process that I
wrote that collects data from a machine and puts it in the directory
and accessing the same directory from Konqueror.  This is just a guess
on my part as I haven't thoroughly tested it.  

The directory now has 17,500 files, each about 250KB in size.

I don't have time to run tests on this, I just thought I'd share that
I've got a similar problem.

Comment 88 Dr J Austin 2004-12-15 20:47:28 UTC

I've just seen this problem
maui ~ 1001# uname -a
Linux maui.ee.port.ac.uk 2.6.9-1.667 #1 Tue Nov 2 14:41:25 EST 2004
i686 i686 i386 GNU/Linux
maui ~ 1002# uptime
 20:43:49 up 17 days, 10:26,  3 users,  load average: 1.60, 1.51, 1.09
Not seen until today !!!!
Full FC3 install
No help tying it down I'm afraid

Comment 89 Andrew Athan 2004-12-17 15:42:35 UTC

I just experienced this on Fedora Core 3 Test 3, folloing a full "yum
update" (gamin-0.0.17-1.FC3).

Earlier in this bug, dnotify was mentioned.  About 2-3 weeks ago I was
experiencing some problems with Courier's IMAP server when running on
very large Maildirs, and my searches lead me to some posts about that
implied there were some basic deficiencies in dnotify on Linux. 
Perhaps there is some issue with these system calls?

Comment 90 James Ryley 2005-01-02 00:58:54 UTC

Doesn't look like there is any resolution of this issue from reading
the comments above.  I have the same problem when dealing with large
amounts of files (up to 100,000).  It is pretty reproducible.  Note
that I am not trying to view the files themselves, but just look at
directories which contain many files.  I understand that, according to
comment 85, using FAM/gamin on a directory with this many files is not
advisable.  But, I have seen no comments on how to turn it off.  Is
that possible?  If so, what are the repercussions?

Steps to reproduce:

1. Unzip archive with 10,000 - 100,000 files in it, into a folder.
2. View the folder (the machine thinks for a while, then reports the
number of files in the folder.  Shortly after this gam_server starts
to take 100% of one CPU on a dual Xeon machine).

Comment 91 Johnny Hughes 2005-01-12 10:44:31 UTC

I am using gamin-0.0.15-1.x86_64 on a RHEL4-B2 machine, and I just had
gamin max out.  This is with the x86_64 install on a Athlon64 3000+
Processor.

I don't currently have time to troubleshoot, but as I didn't see
Athlon64 mentioned before, I thought I would add the comment.

Comment 92 Daniel Veillard 2005-01-12 10:57:31 UTC

I suggest people try 0.0.20 as it has a potential fix for most 
corruptions raised so far. 
   http://www.gnome.org/~veillard/gamin/downloads.html

Daniel

Comment 93 Peter Eddy 2005-01-14 01:01:38 UTC

I don't see any difference between .19 and .20 related to this bug. I
did 'killall gam_server' before testing .20 and verified that I had a
new PID before testing. When I closed the gam client, gam_server
continued to run using 100% of one CPU until I killed it after about
two minutes.

Comment 94 Ellen Shull 2005-01-15 20:36:21 UTC

Ok, I (original reporter of this bug) haven't seen this bug in quite a while
now.  But now that I think about it...  around the time FC3 went final, I
rearranged my drive setup.  

1.  Everything (including / and /home) had been on a slow RAID 5 array of 5400
RPM IDE drives.  When I installed FC3, I did a fresh install on a 10k SCSI drive.

2.  I didn't move any of my junk over, just mounted the old array on /slow.
$ ls -R /slow/home/wes | wc -l
31647
$ ls -R /home/wes | wc -l
2516

3.  The old install had been continuously hand-upgraded (using rpm, not
anaconda) since rhl8 or so.

Now, looking at comment #85 from Daniel...  I wonder if maybe the original bug I
reported is indeed fixed.  Now that I think about it, the later occurrences that
I saw (and didn't bother adding to here, because I saw nothing new in them at
the time), while it was still looping, it was looping over a large number of
files, not the small number I saw at first.  My slow drive array, coupled with
gamin for some reason using the timer-based rescan instead of dnotify, might
explain it.  (But in my comment #6 above I note dnotify in the backtrace, don't
remember whether that was a small or large # of files loop.)

So questions for Daniel:
A)  Exactly what logic does gamin use to decide if it can use dnotify or not? 
Has that changed at all?  (I wonder if something about my setup due to item 3
above caused it to misidentify paths on my home directory and not use dnotify)
B)  When gamin starts watching a directory, does it always do so recursively? 
(/slow/home/wes only has a few hundred entries itself, it's all the subdirs of
stuff that makes it big)

Happy New Year, everyone.  Let's see if we can't get this bug closed before we
hit 100 comments ;-)

Comment 95 Ilya Boyandin 2005-03-11 14:33:03 UTC

I experienced the problem with gamin-0.0.24-1.FC3 in Fedora 3 on an HP compaq
workstation with an Intel Celeron processor. For an unobvious reason gam_server
starts to take 100% of the CPU time until it is killed along with nautilus.

Comment 96 Chris Underhill 2005-03-15 21:03:13 UTC

I've just been subject to this bug. I'm using FC3+all updates. rpm installed is
gamin-0.0.25-1.FC3 and I'm on an 64-bit AMD platform. I fixed the problem by
restarting my X server. Killing the process on its own just resulted in it
respawning and, after a short period, using nearly 100% cpu again. 

Unfortunately all I thought to do was strace the offending gam_server process.
It was spinning trying to read the file

/usr/local/share/applications/mimeinfo.cache

and a couple of other files from that directory, which don't exist on my system.
Their correct path does not have 'local' therein.

Comment 97 Robin Green 2005-03-24 20:13:36 UTC

gamin sucks, how the **** do I turn it off??

Comment 98 Richard Körber 2005-03-26 23:48:34 UTC

I have this issue on my system too. gamim 0.0.25 on Fedora Core 3/AMD64.

My workaround is sending a SIGSTOP (killall -19 gam_server), which will freeze
the gam_server thread. As soon as the copy process has finished, I send a
SIGCONT (killall -18 gam_server) again. gam_server then works just as expected,
consuming just very few CPU time.

I hope this bug will be found soon. It's rather annoying. :)

Comment 99 Chris Sherman 2005-04-12 18:54:43 UTC

I'm running 32-bit fedora Core 3, with all the latest updates according to
'yum', and was getting the 99% CPU usage by gam_server after I added some
pictures to one directory, and added some symlinks to some scripts in
.gnome2/nautilus-scripts/ which I then used to play with those images.  All the
tricks above didn't work (including restarting it several times), but then I
upgraded to 0.0.26-1, restarted, and everything went back to normal.

I got the upgrade from:
http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/

Comment 100 Joe Orton 2005-05-27 11:45:10 UTC

$ rpm -qf /usr/libexec/gam_server
gamin-0.0.25-1.FC3.i386

I'm seeing the issue described in comment 98.  I'll attach the debug output
generated by sending the daemon SIGUSR2.

Comment 101 Joe Orton 2005-05-27 11:48:59 UTC

Created attachment 114906 [details]
gamin debug output

The strace output is looping like as below:

stat64("/usr/local/share/applications/defaults.list", 0xbfffe70c) = -1 ENOENT
(No such file or directory)
open("/usr/local/share/applications/defaults.list",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
directory)
stat64("/home/jorton/.local/share/applications/mimeinfo.cache", 0xbfffe70c) =
-1 ENOENT (No such file or directory)
open("/home/jorton/.local/share/applications/mimeinfo.cache",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
directory)
stat64("/home/jorton/.local/share/applications/defaults.list", 0xbfffe70c) = -1
ENOENT (No such file or directory)
open("/home/jorton/.local/share/applications/defaults.list",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
directory)

Comment 102 Daniel Veillard 2005-05-29 07:41:00 UTC

try to update to 0.0.26 which fixes crashes and CPU consumption, or even
better 0.1.0 which also fixes a bunch of other problems.

Daniel

Comment 103 Joe Orton 2005-05-31 15:40:58 UTC

Do you have a copy of this built for FC3 somewhere?

Comment 104 Philippe Rigault 2005-05-31 16:37:28 UTC

> Do you have a copy of this built for FC3 somewhere?  
There should be an FC3 update if gamin-0.1.0 fixes problems. However, there is 
not fot the moment. You can just do an rpmbuild from the src.rpm. 
 
For Daniel: I noticed that the following files disappeared from gamin-devel in 
version 0.1.0: 
 
/usr/lib64/libfam.la 
/usr/lib64/libgamin-1.la 
 
and this breaks things with libtool. 
Therefore, upgrading FC3 to 0.1.0 for me is a no-no.

Comment 107 Neil Bird 2005-09-26 18:38:07 UTC

At the risk of being just another "me too", I'm getting this for the first time
after upgrading to FC4 (from 3).  gamin-0.1.3-1.FC4.i386.rpm

Not sure what ths USR2 trick everyone's on about is;  didn't do anything for me.

In a week, I've caught it twice, once I strace'd it cuycling through an infinite
loop of a directory's contents, the second gave zero output in strace.

Should this bug still be tagged "devel"?

Comment 109 Niklas Nylund 2005-10-27 14:47:05 UTC

I am seeing excessive CPU usage often with the Ubuntu packes of 0.1.5
(0.1.5-0ubuntu1) also. gam_server regularly goes fubar and consumes 99% CPU
time. It's highly irritating, it's come to that point that I have cron job that
kills it every 15 minutes automatically. 

Anyway, I tried the SIGUSR2 trick and got the ouput, is there any other useful
info I should attach here? I can't attach with gdb to the running process since
everything is compiled without the debug option.

Here's the output when debugging is turned on for gam_server,
http://albin.abo.fi/~ninylund/dump/gamin-debug/gamin_debug_Op1icj
http://albin.abo.fi/~ninylund/dump/gamin-debug/gamin_debug_Oszxi0
http://albin.abo.fi/~ninylund/dump/gamin-debug/gamin_debug_dFcUS5

Comment 110 Sitsofe Wheeler 2005-10-27 18:18:54 UTC

Interesting... Are non Fedora/Red Hat gamin bugs OK here? Nikalus - did you
really reproduce this problem with Red Hat's rawhide or did you just choose that
because there was nothing else?

Comment 111 Niklas Nylund 2005-10-27 20:28:25 UTC

Oh bummer, didn't think about that this is redhat's bugzilla. There was just a link on gamin's homepage 
that took me here.

I guess I didn't use rawhide, since I've never heard of such a thing.

Comment 112 Jesse Barnes 2005-11-10 17:41:15 UTC

I see this as well, on my one processor Thinkpad laptop.  I haven't noticed a 
pattern that triggers the excessive CPU consumption, I just occasionally 
notice the CPU usage meter in my system tray get pegged at 100% (either all 
user time or all system time, the latter is probably gam_server calling poll 
in a tight loop).

Comment 113 Alexander Larsson 2005-12-15 13:47:38 UTC

This bug is "interesting".

Originally there was a bug that caused gamin to really go into a tight 100% cpu
loop, looping over a circular list. However, we now believe that the circular
list bug is fixed, and other reports of this is mainly about gamin using lots of
CPU when polling a large directory or when getting lots of change events from
dnotify (e.g. when downloading something fast).

It would be nice if people seeing this could try to determine what sort of
problem they are seeing. I.E. When this happens, attach to gamin (with debuginfo
installed) and see if its just spinning over one particular list forever.

Comment 114 Neil Bird 2005-12-15 14:18:29 UTC

What version do you think it was fixed in?  Last one I investigated was comment#107

Comment 115 Dean Kolosiek 2006-01-11 01:24:07 UTC

This happenned to me in FC4 with gamin-0.1.0-1.1. CPU usage would peg for a
while, then be normal for a while, on the order of 30 to 90 seconds.

I was running an application that reads and writes a couple large files at a
time for several minutes. I was watching the directory in File Browser and
clicked Reload in the browser several times, and deleted a few files at a time,
a few minutes before I noticed the CPU usage. The directory has about 220 files,
and is in the /data partition which is mounted under / It is a dual processor
system.

When I did kill -SIGUSR2 the CPU usage immediately went back to normal and
remained normal. I'm attaching the beginning of the debug file.

Comment 116 Dean Kolosiek 2006-01-11 01:31:34 UTC

Created attachment 123024 [details]
beginning of gam_server log file

from FC4 gamin-0.1.0-1.1

Comment 117 Miles Lane 2006-02-02 06:43:32 UTC

I am seeing this problem with gamin-0.1.7-1.1 (FC5T2 + current rawhide).
I didn't have the debug package installed at the time.  I will try to reproduce
and send gdb info.

Comment 118 Trevor Cordes 2006-02-17 16:33:58 UTC

I am seeing this bug a lot when I use konqueror under Gnome to browse for files
(ie: as a file manager) on my big NFS server.  On a 2.4 P4 with 2GB of RAM it'll
go up to 70-90% CPU and stay there for dozens of seconds.  It'll do this even
when I'm not doing anything with the folders that Konqueror is browsing.

It definitely is Konqueror triggering it because I don't use KDE apps normally,
and didn't used to, and never had this problem before.  I'm not sure if the fact
I'm browsing a 2TB NFS server is exacerbating the problem.

FC3 gamin-0.1.1-3.FC3

This isn't just a FC bug:

http://www.gnusolaris.org/cgi-bin/trac.cgi/ticket/60
http://www.irclogs.ws/freenode/kde/30Oct2005/13.html

I just renamed and killed the gam_server binary and now all my gnome apps are
going nuts using 100% CPU:

27852 trevor    25   0 17664 3616 3120 R 16.7  0.2   0:44.11 gnome-settings-
30113 trevor    22   0  232m  65m  32m R 16.4  3.2  30:49.73 soffice.bin
10130 trevor    25   0 26320 8692 5360 R 14.4  0.4   1:34.74 gnome-panel
27932 trevor    25   0 19260 2484 2152 R 13.8  0.1   0:14.60 gnome-vfs-daemo
10674 trevor    25   0  142m  59m  15m R 13.1  2.9  13:41.91 galeon
27918 trevor    25   0 44808 5988 4832 R 12.8  0.3   1:00.80 nautilus

Seems to be stuck this way.  As I kill those apps one by one the others take up
the slack to use up 100% CPU.  Guess I have to restore the file.

The other thing, is I swear that over a week or two of an uninterrupted X
session that gam_server goes nuts more frequently.  It may just be my perception
though.

Comment 119 Ken 2006-02-18 07:22:52 UTC

Why is this 'file modification detection server' polling directories? This seems
to be suicide if there are large numbers of child nodes... why doesn't it just
listen to events fired by the kernel?

Comment 120 Nicholas Miell 2006-02-18 07:31:21 UTC

Some filesystems (i.e. NFS) don't fire events.

Comment 121 David Acker 2006-03-22 19:31:18 UTC

I saw this problem using 2.6.15-1.1833_FC4, gamin-0.1.1-3.FC4 .  I was using
firefox-1.0.7-1.2.fc4 to download a 132 MB file from a server on our local 100
Mb lan.  I am in gnome at the time.  Here is a snippet from top:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11222 dacker    16   0  123m  52m  20m R 30.2  5.2   2:27.62 firefox-bin
11067 dacker    16   0  2464 1208  872 R 19.6  0.1   0:01.93 gam_server
11083 dacker    15   0 35116  17m  11m S  6.0  1.7   0:03.45 nautilus

This happens if I save the file to the desktop.  If I save it to my home, things
get better:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11222 dacker    15   0  123m  52m  20m S 30.3  5.2   3:19.01 firefox-bin
11067 dacker    15   0  2464 1208  872 S 10.6  0.1   0:26.35 gam_server

Comment 122 Hin-Tak Leung 2006-05-05 18:09:36 UTC

I am seeing this on fc5 - and I don't use NFS at all. It is just
the default(?) LVM2 root with ext3 and boot with ext3, and a few sshfs/fuse
mounted.

Comment 123 Trevor Cordes 2006-07-18 18:33:47 UTC

See also bug 196444 with regards to constant 4% CPU usage for no reason and
*huge* mem leak.  Probably related.  Bugs confirmed in FC3, 4 and 5.  gam_server
blows.

Comment 124 Ken 2006-07-18 18:44:45 UTC

(In reply to comment #120)
> Some filesystems (i.e. NFS) don't fire events.

Perhaps, but even NFS must call though the kernel I believe.

Unless there is really some exception to this (even on newer kernels), it seems
like gam_server needs to switch into some kind of a callback mode and notify its
client via those messages... instead of polling directories. Retain the polling
mode for really old kernels, that's fine..

Comment 125 Trevor Cordes 2006-07-18 19:08:20 UTC

I ran a few tests.  Killing all konqueror and any other KDE apps I could find
(there were none) didn't help.  gam_server still eats a constant 3.5-4.5% CPU on
my 2.4 P4.  I tried killing/restarting gam_server after that and it immediately
starts up again and still eats 3.5-4.5%.  I can't easily umount my NFS as it's
used for important server processes.

Comment 126 Trevor Cordes 2006-07-20 14:13:14 UTC

Another possible common thread: another reporter says they have a 2TB fs
(local).  I have a 2TB fs (non-local) mounted over NFS.  Do other reporters have
large fs's?  Also, I run SMP kernel.

Comment 127 Jan "Yenya" Kasprzak 2006-08-14 08:09:05 UTC

I want to add a "me too":
RHEL4 on quad-cpu x86_64 server, I have a 2TB+ volume (over lvm2) as well, 
gamin-0.1.1-3.EL4
gamin-0.1.1-3.EL4
I run NFS server, but there are also some directories NFS-mounted to this server.

Comment 128 Hin-Tak Leung 2006-08-14 11:17:25 UTC

I see I put myself in the cc-list in May, but honestly I haven't seen the problem 
recently (on FC5, still only local LVM volumes < 80GB + sshfs/fuse mounts).
I assume that the bug has either been fixed since, or that I have somehow managed
to work around it - I have switched off nautilus in the session manager, for
example, since I don't use nautilus at all anyway.

Comment 129 John Cant 2006-08-25 12:16:10 UTC

Another "me too" here.
TwoDual core CPUs  x86_64 workstation running 2.6.9-34 kernel (Red Hat Es 4
Workstation) with 8Gb RAM. 
I have a couple of 2+Tb LVM arrays.
gam_server using 85-95% of one CPU.

Not sure when this issue started or what set it off.

Comment 130 Matt Kenigson 2007-02-20 18:52:07 UTC

Another "me too".
Dual Xeon fileserver running 2.6.9-34.0.1.ELsmp with 4Gb RAM.  Attached to FC
SAN (multi-TB).  Many other servers NFS-mount and CIFS-mount to this box.
gam_server using 80-95% of one CPU constantly.

Comment 131 gregrwm 2007-04-27 20:17:17 UTC

me too.
gamin-0.1.1-4.EL4
thunderbird-1.5.0.10-0.1.el4.centos
kernel-2.6.9-42.0.8.EL

i used to see this alot on a prior system running courier-imap.  running dovecot
on this one.  wasn't seeing the gam_server hang much recently until recent yum
update when among others thunderbird upgraded (to above) from
thunderbird-1.5.0.9.  now i frequently find gam_server eating all available CPU.
 kill it, and it immediately gets respawned.  kill thunderbird, and gam_server
quiets down.  relaunch thunderbird and things are fine again for a couple days.
 if you want more info let me know what would be helpful.

Comment 132 Andy Holland 2007-06-13 11:58:41 UTC

This is not a "Medium" problem - it is a very severe problem. In years of 
running Linux, I have never seen a package perform like this - its practically 
a virus.

gam_server constantly causes problems, and I must renice it. It seems to have a 
terrible interaction with KDE. This has been going on for too long, a package 
needs to be created to remove this software or it needs to be fixed!

This is not a medium bug - just do a google search on gam_server!!!!!

Comment 133 gregrwm 2007-06-13 20:04:42 UTC

i have reduced the effects of this bug by (1) every 15 minutes launch a cron job
to renice all gam_server processes to bottom priority, and (2) backout
thunderbird from 1.5.0.10 to 1.5.0.9, which seems for whatever reason to far
less frequently encounter the bug.  but of course, the bug remains.

Comment 134 Ellen Shull 2008-01-05 13:47:59 UTC

Gamin is up to version 0.1.9 now in F8 and rawhide; F7 has 0.1.8, and even RHEL4
has been updated as far as 0.1.7.  Older distro releases are EOL/in maintenance
support at best (i.e. go grab an SRPM and update it yourself)  There have been
no new reports added to this bug in half a year.  I personally haven't seen it
in *years*.  Is anyone experiencing this with a semi-current version of gamin? 
Or can we finally close this one?

Comment 135 Kleven Bingham 2008-01-06 22:33:00 UTC

We have just begun upgrading our compute farm to RHEL4 and we are seeing this
problem.

We have a fairly complex NFS setup with several netapps volumes.

I'm not the sysadmin, so I don't have root access.

To give a little insight, we have set up a single machine with freenx and are
using it for our local site session server.  I am only paying attention to this
machine right now, but I know that other RHEL4 machines have had issues prior to
this when users were starting VNC sessions before the freenx transition.

We did not have this problem with RHEL3.

lngl0116:/home/kbingham-> rpm -qf /usr/libexec/gam_server 
gamin-0.1.7-1.2.EL4
gamin-0.1.7-1.2.EL4

Here are the contents of the gaminrc file:
lngl0116:/home/kbingham-> more /etc/gamin/gaminrc 
# configuration for gamin
# Can be used to override the default behaviour.
# notify filepath(s) : indicate to use kernel notification
# poll filepath(s)   : indicate to use polling instead
# fsset fsname method poll_limit : indicate what method of notification for the
filesystem
#                                  kernel - use the kernel for notification
#                                  poll - use polling for notification
#                                  none - don't use any notification
#
#                                  the poll_limit is the number of seconds
#                                  that must pass before a resource is polled again.
#                                  It is optional, and if it is not present the
previous
#                                  value will be used or the default.

fsset nfs poll 10                 # use polling on nfs mounts and poll once
every 10 seconds


Not all users are seeing this run out of control:

lngl0116:/home/kbingham-> ps -eaf | grep gam_server
ssirun    1096     1  0  2007 ?        00:05:31 /usr/libexec/gam_server
hsales    2055     1  0 Jan02 ?        00:00:56 /usr/libexec/gam_server
szanatta  2882     1  0 Jan03 ?        00:00:11 /usr/libexec/gam_server
dreed     3710     1  0 Jan03 ?        00:00:22 /usr/libexec/gam_server
jkoller   3801     1  0  2007 ?        00:00:12 /usr/libexec/gam_server
jlawson   6022     1  0 Jan04 ?        00:01:40 /usr/libexec/gam_server
wstrickl  8332     1 36 Jan05 ?        11:33:38 /usr/libexec/gam_server
nphillip  9248     1  0  2007 ?        00:00:41 /usr/libexec/gam_server
nmysore  13140     1 55 Jan04 ?        1-04:11:33 /usr/libexec/gam_server
rkhan    13352     1  0  2007 ?        00:01:54 /usr/libexec/gam_server
bonfanti 23065     1  0 Jan03 ?        00:00:21 /usr/libexec/gam_server
bcruiksh 24066     1  0 Jan03 ?        00:00:08 /usr/libexec/gam_server
mbarnes  24673     1  0 Jan02 ?        00:00:20 /usr/libexec/gam_server
bgreiner 24736     1 54 Jan04 ?        1-01:59:05 /usr/libexec/gam_server
kbingham 25283     1  0 Jan05 ?        00:00:01 /usr/libexec/gam_server
kbingham 25419 21283  0 15:28 pts/4    00:00:00 grep gam_server
mfalkinb 26903     1  0 Jan05 ?        00:00:01 /usr/libexec/gam_server
lphillip 27510     1  0 Jan04 ?        00:00:38 /usr/libexec/gam_server
jkeefer  29207     1 48 Jan05 ?        18:10:39 /usr/libexec/gam_server


Any suggestions?

Comment 136 Kleven Bingham 2008-01-07 18:00:46 UTC

UPDATE to my comment #135:

We did not see this previously:

http://kbase.redhat.com/faq/FAQ_85_11914.shtm

Our sysadmin is doing the upgrade to gamin-0.1.7-1.4.EL and we will see if we 
have any additional issues.

Comment 137 Mark Richards 2008-02-19 12:19:23 UTC

2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 09:40:21 EST 2007 x86_64 x86_64 x86_64
GNU/Linux

Sorry, Wes - I'm running 0.1.9 on a production server and experience the runaway
problem.

gam_server will behave for a few hours - sometimes a few days.

I wrote a custom daemon that uses the fam-2.7.0 library.  gam_server is, of
course, required by fam.

As a temporary solution a cron job now stops my daemon.  Doing this is not
enough, however.  I still have to kill gam_server, and then restart my daemon. 
(I'm a little worried about the effect of killing gam_server in the midst of
some operation).

Is there a better alternative?  My daemon monitors a few directories and
triggers actions when files appear.  gam runs in polling mode because I don't
want to re-build the kernel on the production box.  Maybe this isn't a problem
if it runs from the kernel?

I've tried the config file trick:

/etc/gamin/gaminrc:

fsset ext3 poll 5

Still, no joy.

Comment 138 Bug Zapper 2008-05-14 01:56:55 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 139 Bug Zapper 2009-06-09 21:58:34 UTC

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 140 Bug Zapper 2009-07-14 18:28:49 UTC

Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.

aacanj
andyh
bcs
bkleven
chkr
chrisw
dennis.coonich
djh
dm
gczarcinski
george.soley
gmasci
greenrd
htl10
iib
jasons
jgoldin
jorton
kas
ken2006
lux
mark.richards
mkenigson
mmacleod
moura
ninylund
nmiell
petere
prigault
redhat-bugzilla
redhat-bugzilla
rhbugzi
richard.cunningham
ron
sitsofe
stefan.hoelldampf
trevor
tsmetana