Bug 244633

Summary: gam_server using 100% of CPU (strace shows polling)
Product: Red Hat Enterprise Linux 4 Reporter: Alexander Larsson <alexl>
Component: gaminAssignee: Tomáš Bžatek <tbzatek>
Status: CLOSED WONTFIX QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: 4.5CC: ahabig, andrew.simmonds, andyh, brett.morrow, Colin.Simpson, dimitrios.gerasimatos, gunther.mayer, hyclak, jan.iven, kajtzu, linux_support, nilsson, olle, pasteur, paul.boin, richard.cunningham, rkhadgar, sa, srigler, stefan.hoelldampf, tsmetana, ttsig, voetelink
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 13:30:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 240154    
Bug Blocks:    
Attachments:
Description Flags
The output of strace -tt -p <pid> none

Description Alexander Larsson 2007-06-18 08:26:55 UTC
+++ This bug was initially created as a clone of Bug #240154 +++
-- Additional comment from alexl on 2007-06-15 04:39 EST --
There are two problems here. One is tom getting a real 100% cpu use loop, i.e.
running strace on it shows nothing. 

The other problem is gamin doing lots of polling.  This has been observed
before, and seems to be related to something kde does (only happens with kde). I
believe this is due to gamin doing polling to often on NFS, and can be fixed by
configuring gamin to poll less often. In fact, the reason for this update was to
add a way to globally set this option so that such problems could be fixed.

-------------------------------

This bug is for the split out case where gamin does a lot of polling.

Comment 1 Alexander Larsson 2007-06-19 10:29:01 UTC
If you get this loop (with strace showing continuous polling of the same files)
can you please attach with gdb (with gamin-debuginfo installed) and get a few (5
or so) backtraces at different times?

Comment 2 Alexander Larsson 2007-06-19 10:36:21 UTC
This could possibly be gamin getting stuck in the for loop in
gam_poll_generic_scan_directory_internal(), similarly to bug 240154. I don't
know how that would happen, but it would give consistent behaviour with what was
reported.

Comment 3 Need Real Name 2007-06-25 14:04:13 UTC
I see similar problems with gam_server polling the hell out of local files (not
nfs) on an RHEL4 x86_64 system. 

stat("/root/.local", 0x7fbfffdf80)      = -1 ENOENT (No such file or directory)
stat("/root/.gnome2", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat("/root/.local", 0x7fbfffdf80)      = -1 ENOENT (No such file or directory)
stat("/root/.gnome2", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat("/root/.local", 0x7fbfffdf80)      = -1 ENOENT (No such file or directory)
stat("/root/.gnome2", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat("/root/.local", 0x7fbfffdf80)      = -1 ENOENT (No such file or directory)

etc.

Comment 4 Dennixx 2007-07-05 08:34:58 UTC
Previous times strace showed nothing on high-cpu gam_server processes, this time
however it does excessive polling. 2 users with both a high-cpu gam_server,
strace shows the following output on both:

----
stat64("/etc/fstab", {st_mode=S_IFREG|0644, st_size=1330, ...}) = 0
stat64("/etc/mtab", {st_mode=S_IFREG|0644, st_size=1463, ...}) = 0
stat64("/etc/fstab", {st_mode=S_IFREG|0644, st_size=1330, ...}) = 0
stat64("/etc/mtab", {st_mode=S_IFREG|0644, st_size=1463, ...}) = 0
----

Comment 5 Alexander Larsson 2007-07-06 08:00:12 UTC
Can you pass -tt to strace to get the times too.

And, can you test the packages attached to bug 240154 to see if they help for
this too.

Comment 6 Dennixx 2007-07-06 08:10:54 UTC
Created attachment 158645 [details]
The output of strace -tt -p <pid>

The output of strace -tt -p <pid>

Comment 7 Dennixx 2007-07-16 14:00:08 UTC
I've been running the test package for 10 days now, and I haven't seen the
problem anymore.

Comment 8 sa@tmt.ca.boeing.com 2007-07-16 14:39:32 UTC
We are also running with the gamin-0.1.7-1.2.EL4.0signalsafe.2
version and have not experienced a reoccurrence of this problem.

Comment 9 sa@tmt.ca.boeing.com 2007-07-16 15:11:40 UTC
I need to amend my previous comment.  On one system we do see occasional spikes
of gam_server, it will still consume 100% of the CPU, but that lasts for <10
seconds and then it settles back down.  Over the course of a 7 day period,
however, the current gam_server process has acquired just under 7hrs of CPU time
which it is still a sizable amount of time.

A strace when it goes into its 100% mode before settling back down it is fixated
on the user's .kde/share/config/korgacrc.lockxxxxxx.tmp files.

Comment 10 Alexander Larsson 2007-07-23 08:22:52 UTC
sa: Maybe that is when the .tmp files are being written to constantly?

Comment 11 Andy Holland 2007-08-03 14:03:38 UTC
In my experience, when gam_server begins using 100% of one of my processors, 
it DOES NOT BACK DOWN. It must be manually killed, and when the thing 
restarts, it is reniced to lower the probability and frequency of using 100% 
of one of our CPUs. 
 
THIS IS AN EXCEEDINGLY URGENT BUG. It acts like a virus - it spawns, it won't 
die, I can't remove it from my system, it uses 100% of a vital resource. Its 
the worst bug I have ever seen in my entire life.   
 
There ought to be a patch to get rid of this moThis is a lousy responses to 
this urgent bug, the silly excuses must end. This is a terrible performance 
hit and it obviously affects some systems differently from others. 

Comment 12 Wade Mealing 2007-08-29 00:40:05 UTC
Andy, I think that you might be confusing bugs, The bugzilla for the bug where
it does chew 100% cpu is 240154 (which has been fixed and has a work around of
going back to the U4 gamin rpms).   This doesn't spread between processes or
machines, so its doesn't fit the classic virus definition.

This is a split out of that bug, where gamin does a lot of polling on non
existent files.

Comment 13 Olle Liljenzin 2007-08-31 15:31:34 UTC
I'm on a machine with two gam_server processes running 100%. strace on these
processes are showing an output similar comment 4, i.e. stat looping over fstab
  and mtab.

pstack shows the following (not so interesting):
#0  0x000000374b2b94a5 in _xstat () from /lib64/tls/libc.so.6
#1  0x000000000040c81a in ?? ()
#2  0x0000000000404496 in ?? ()
#3  0x000000000040719b in ?? ()
#4  0x00000000004072e3 in ?? ()
#5  0x000000000040c021 in ?? ()
#6  0x000000374c9266bd in g_main_context_dispatch ()
#7  0x000000374c928397 in g_main_context_acquire ()
#8  0x000000374c928735 in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#9  0x0000000000404836 in ?? ()
#10 0x000000374b21c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#11 0x000000000040307a in ?? ()
#12 0x0000007fbfffef88 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000007fbffff274 in ?? ()
#16 0x0000000000000000 in ?? ()


More intersting might be that running pstack has the side effect of making the
gam_server processes calm down and stay at about 0% afterwards. Running strace
on the same processes trigger them up at 100% again. (???)

I can repeate this with pstack and strace on these two processes and it will
toggle them betwen 0% and 100% cpu. Other gam_server processes on the same
machine that were not running 100% are not affected in the same way by strace.


Comment 14 Olle Liljenzin 2007-08-31 19:34:55 UTC
Also the process satus is toggling between running and stopped by strace and
pstack. Maybe some side effect from interrupted system calls?

Comment 15 Olle Liljenzin 2007-09-05 07:22:03 UTC
Steps to reproduce:

1. put home directory on nfs and use gnome or kde
2. touch ~/Desktop/apa
3. wait for the file icon to pop up on the desktop
4. rm ~/Desktop/apa

strace shows gam_server keeps polling for ~/Desktop/apa for all future, but only
if nfs is used.

One file will not be enough to put a high load on gam_server, but this "leak"
could at least explain a few of the polls people are seeing. And of course the
load will go up if the number of watched files grow.


On comment 14. This is an unrelated kernel bug, now reported as bug 276091.

Comment 16 Andy Holland 2007-09-10 14:37:27 UTC
(In reply to comment #12)
> Andy, I think that you might be confusing bugs, The bugzilla for the bug where
> it does chew 100% cpu is 240154 (which has been fixed and has a work around of
> going back to the U4 gamin rpms).   This doesn't spread between processes or
> machines, so its doesn't fit the classic virus definition.
> This is a split out of that bug, where gamin does a lot of polling on non
> existent files.

You continue to distribute this buggy software - I "caught it" from an rpm. 

Also, it has a bad interaction with KDE in that it not only consumes 100% CPU, 
but prevents KDE from starting up when logging in, so I have to ssh into the 
machine and kill gam_server to login.

This bug is atrocious, its been going on for months, and it needs to be fixed 
ASAP. Creating dozens of places where the bug is reported does not solve the 
problem. 


Comment 17 Wade Mealing 2007-09-19 04:51:00 UTC
Hey Andy,

This particular bug pertains to a "spin off" situation, where it may poll for
files for a few seconds on files that do not exist.  This would not cause your
situation where you are unable to log into kde. 

This bug has fixed and a hotfix package has been created.  The new package will
be released in a fixed package in the future. 

If your system is pegged at 100% cpu, you are experiencing bugzilla 240154, the
fix is known and you can lodge a request for a hotfix via phone or on the web at
our support portal at https://www.redhat.com/apps/support/

If a situation is very urgent or distressing please lodge a support call, it is
the best way to get a human to solve your situation.

Thank you.



Comment 18 Andy Holland 2007-09-20 13:15:43 UTC
(In reply to comment #17)
> Hey Andy,
> This particular bug pertains to a "spin off" situation, where it may poll for
> files for a few seconds on files that do not exist.  This would not cause your
> situation where you are unable to log into kde. 
> This bug has fixed and a hotfix package has been created.  The new package 
will
> be released in a fixed package in the future. 
> If your system is pegged at 100% cpu, you are experiencing bugzilla 240154, 
the
> fix is known and you can lodge a request for a hotfix via phone or on the web 
at
> our support portal at https://www.redhat.com/apps/support/
> If a situation is very urgent or distressing please lodge a support call, it 
is
> the best way to get a human to solve your situation.
> Thank you.

Turns out there is a side effect problem between gam_server and XDHCP as well. 
I guess the fix I want is something to get rid of this code - I don't want it 
on my machine any longer (I never did). Is there any package way of getting rid 
of gam_server once and for all?



Comment 19 Filipe Brandenburger 2009-08-31 20:45:25 UTC
(In reply to comment #1)
> If you get this loop (with strace showing continuous polling of the same files)
> can you please attach with gdb (with gamin-debuginfo installed) and get a few (5
> or so) backtraces at different times?  

Same problem here, running 4.7 with updates.

Kernel 2.6.9-78.0.22.ELsmp, gamin-0.1.7-1.4.EL4.

From strace, it looks like it's polling the files in /home/user/dir/bin continually, then it gets some SIGRT_2 and sometimes I see this:

--- SIGRT_2 (Real-time signal 0) @ 0 (0) ---
write(4, "bogusbogusbogusbogusbogusbogusbo"..., 1025) = -1 EAGAIN (Resource temporarily unavailable)
write(4, "bogusbogusbogusbogusbogusbogusbo"..., 1025) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigreturn(0x71b1f8)                  = 0

Using gdb, I caught it five or six times in _xstat:

(gdb) bt
#0  0x00000035bbabbe85 in _xstat () from /lib64/tls/libc.so.6
#1  0x000000000040c80a in gam_poll_dnotify_poll_file (node=0x7b4540) at gam_poll_dnotify.c:206
#2  0x00000000004043e6 in gam_poll_file (node=0x7b4540) at gam_server.c:516
#3  0x00000000004070eb in gam_poll_generic_scan_directory_internal (dir_node=0x78a440) at gam_poll_generic.c:349
#4  0x0000000000407233 in gam_poll_generic_scan_directory (path=0x77e330 "/home/user/dir/bin") at gam_poll_generic.c:399
#5  0x000000000040c01b in gam_dnotify_pipe_handler (user_data=0x61c680) at gam_dnotify.c:334
#6  0x00000035bdf26606 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#7  0x00000035bdf2821e in g_main_context_acquire () from /usr/lib64/libglib-2.0.so.0
#8  0x00000035bdf2858a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#9  0x0000000000404786 in main (argc=1, argv=0x7fbfffeaa8) at gam_server.c:647
#10 0x00000035bba1c40b in __libc_start_main () from /lib64/tls/libc.so.6
#11 0x0000000000402fca in _start ()
#12 0x0000007fbfffea98 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000007fbfffedd7 in ?? ()
#16 0x0000000000000000 in ?? ()

One time in g_string_insert_len:

(gdb) bt
#0  0x00000035bdf3dff0 in g_string_insert_len () from /usr/lib64/libglib-2.0.so.0
#1  0x00000035bdf1c3a5 in g_file_open_tmp () from /usr/lib64/libglib-2.0.so.0
#2  0x00000035bdf1c677 in g_build_filename () from /usr/lib64/libglib-2.0.so.0
#3  0x0000000000406fb6 in gam_poll_generic_scan_directory_internal (dir_node=0x78a440) at gam_poll_generic.c:321
#4  0x0000000000407233 in gam_poll_generic_scan_directory (path=0x77e330 "/home/user/dir/bin") at gam_poll_generic.c:399
#5  0x000000000040c01b in gam_dnotify_pipe_handler (user_data=0x61c680) at gam_dnotify.c:334
#6  0x00000035bdf26606 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#7  0x00000035bdf2821e in g_main_context_acquire () from /usr/lib64/libglib-2.0.so.0
#8  0x00000035bdf2858a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#9  0x0000000000404786 in main (argc=1, argv=0x7fbfffeaa8) at gam_server.c:647
#10 0x00000035bba1c40b in __libc_start_main () from /lib64/tls/libc.so.6
#11 0x0000000000402fca in _start ()
#12 0x0000007fbfffea98 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000007fbfffedd7 in ?? ()
#16 0x0000000000000000 in ?? ()

One time in strcmp:

(gdb) bt
#0  0x00000035bba708a2 in strcmp () from /lib64/tls/libc.so.6
#1  0x00000035bdf3d6c9 in g_str_equal () from /usr/lib64/libglib-2.0.so.0
#2  0x00000035bdf1c89d in g_hash_table_lookup () from /usr/lib64/libglib-2.0.so.0
#3  0x00000000004055e0 in gam_tree_get_at_path (tree=0x61afa0, path=0x90cb20 "/home/user/dir/bin/execfile") at gam_tree.c:168
#4  0x0000000000406fca in gam_poll_generic_scan_directory_internal (dir_node=0x78a440) at gam_poll_generic.c:322
#5  0x0000000000407233 in gam_poll_generic_scan_directory (path=0x77e330 "/home/user/dir/bin") at gam_poll_generic.c:399
#6  0x000000000040c01b in gam_dnotify_pipe_handler (user_data=0x61c680) at gam_dnotify.c:334
#7  0x00000035bdf26606 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#8  0x00000035bdf2821e in g_main_context_acquire () from /usr/lib64/libglib-2.0.so.0
#9  0x00000035bdf2858a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#10 0x0000000000404786 in main (argc=1, argv=0x7fbfffeaa8) at gam_server.c:647
#11 0x00000035bba1c40b in __libc_start_main () from /lib64/tls/libc.so.6
#12 0x0000000000402fca in _start ()
#13 0x0000007fbfffea98 in ?? ()
#14 0x000000000000001c in ?? ()
#15 0x0000000000000001 in ?? ()
#16 0x0000007fbfffedd7 in ?? ()
#17 0x0000000000000000 in ?? ()

And twice in __getdents64:

(gdb) bt
#0  0x00000035bba8caee in __getdents64 () from /lib64/tls/libc.so.6
#1  0x00000035bba8c437 in readdir64 () from /lib64/tls/libc.so.6
#2  0x00000035bdf1b298 in g_dir_read_name () from /usr/lib64/libglib-2.0.so.0
#3  0x0000000000406f86 in gam_poll_generic_scan_directory_internal (dir_node=0x78a440) at gam_poll_generic.c:320
#4  0x0000000000407233 in gam_poll_generic_scan_directory (path=0x77e330 "/home/user/dir/bin") at gam_poll_generic.c:399
#5  0x000000000040c01b in gam_dnotify_pipe_handler (user_data=0x61c680) at gam_dnotify.c:334
#6  0x00000035bdf26606 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#7  0x00000035bdf2821e in g_main_context_acquire () from /usr/lib64/libglib-2.0.so.0
#8  0x00000035bdf2858a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#9  0x0000000000404786 in main (argc=1, argv=0x7fbfffeaa8) at gam_server.c:647
#10 0x00000035bba1c40b in __libc_start_main () from /lib64/tls/libc.so.6
#11 0x0000000000402fca in _start ()
#12 0x0000007fbfffea98 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000007fbfffedd7 in ?? ()
#16 0x0000000000000000 in ?? ()

The user is using KDE, but has a "nautilus" process and a "gnome-vfs-daemon" process running.

Comment 20 Habig, Alec 2009-11-25 17:43:22 UTC
Still the same thing in RHEL5.4, gamin-0.1.7-8.el5.

gam_server sits at 100% CPU time.  Sometimes it clobbers my nfs server with traffic too.  The frequency of the network freakout can be controlled by the gaminrc line

  fsset nfs poll 200

but that does not change the client-side CPU usage (which is constant).

If I tell gam_server to please not touch nfs mounts with:

  fsset nfs none

then gam_server goes away (yay!), but then nautilus picks up the slack and sits on the network and CPU constantly.  As if nautilus has decided since it's not hearing from gam_server, it feels obligated to fstat the heck out of things all on its own.

ANYWAY - this remains a giant bug in the current release, for the case of gnome desktop users with nfsmounted files to look at.  How can we stop it?  Non-autorefreshing GUI file managers is a small price to pay to stop the CPU and network abuse.

Comment 21 Jiri Pallich 2012-06-20 13:30:20 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.