Bug 204906

Summary: gam_server causes an incredibly high number of context switches/sec
Product: [Fedora] Fedora Reporter: Arjan van de Ven <arjan>
Component: gaminAssignee: Alexander Larsson <alexl>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: dwmw2, john, mingo, veillard
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-07 06:12:14 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 204948    
Attachments:
Description Flags
a patch
none
Patch to make the 100msec timer an on-demand timer as well
none
patch to kill another useless timer
none
Fix for cvs version
none
gam_poll_basic_conditional_callback.diff
none
Nicer version none

Description Arjan van de Ven 2006-09-01 05:01:24 EDT
Description of problem:

We're working with RH engineers on a tickless idle kernel, which has the goal of
reducing power and hypervisor loads when the system is idle. This is done by
removing the regular ticking clock when the system is idle, so that in theory
long sleep periods are possible for the hardware (or hypervisor). The kernel
portion of this works great, however when using a gnome desktop there are many
timers going off all the time for userspace, so many that the actual savings are
not so great (about 250 events per second).

By far the biggest offenders of this is gam_server. This daemon seems to have a
30 Hz timer and several 1 Hz timers. While the 30Hz timer is the most serious
problem obviously, even the 1Hz timers form a system issue (since many processes
do 1Hz timers, the aggregate of these is still significant).

On first sight the 30Hz timer has no need to fire continuously and could be made
to fire only when actual file modifications are happening, which would solve the
biggest issue that we currently have with gam_server.

You can check the effectiveness via two methods:
1)

vmstat 10

and check the "cs" column. On an idle runlevel 3 this should be between 10 and
20, on an idle (fixed) gnome system this is about 35 to 40 for me. (I hacked the
30Hz timer to a 1Hz timer to get there, as well as about a dozen other fixes to
gnome that are in various bugzillas)

2)

strace -p <pid of fam_server>

the strace output should
  1) not have more than poll line per second on idle
  2) show only poll delays (last parameter to the call) greater than 1000
Comment 1 Alexander Larsson 2006-09-01 09:03:39 EDT
I don't really know the gamin design all that well. It was mainly written by
daniel veillard. Daniel, do you know why it needs a 30Hz timer?
Comment 2 Daniel Veillard 2006-09-01 09:38:10 EDT
Honnestly I don't know ! The only timer I really need and I think I register
is one every second to check for file being watched and which didn't exist.
Well I wrote the code for dnotify, I just scanned the sources and found:

gam_inotify.c:#define SCAN_MISSING_TIME 1000 /* 1 Hz */
gam_inotify.c:#define SCAN_LINKS_TIME 1000 /* 1 Hz */
gam_inotify.c:#define PROCESS_EVENTS_TIME 33 /* 30 Hz */

Novell contributed the inotify back-end, no idea why the need a 30Hz timer
I suggest asking Robert Love or John McCutchan

paphio:~/gamin/server -> grep g_timeout_add *.c
gam_connection.c:    ret->eq_source = g_timeout_add (100 /* ms */,
gam_connection_eq_flush, ret);
gam_connection.c: * This function can be called periodically by e.g.
g_timeout_add and
gam_inotify.c:    g_timeout_add (SCAN_MISSING_TIME, gam_inotify_scan_missing, NULL);
gam_inotify.c:    g_timeout_add (SCAN_LINKS_TIME, gam_inotify_scan_links,
NULL);gam_inotify.c:    g_timeout_add (PROCESS_EVENTS_TIME,
gam_inotify_process_event_queue, NULL);
gam_kqueue.c:    poller->timeout_id = g_timeout_add(poller->interval,
(GSourceFunc) gam_kqueue_poller_timeout_cb, poller);
gam_poll_basic.c:       g_timeout_add(1000, gam_poll_basic_scan_callback,
NULL);gam_poll_dnotify.c:     g_timeout_add(1000,
gam_poll_dnotify_scan_callback, NULL);
gam_server.c:   g_timeout_add(1000, (GSourceFunc) gam_connections_check, NULL);
gam_server.c:    g_timeout_add(1000, (GSourceFunc) gam_error_check, NULL);
paphio:~/gamin/server ->

  my dnotify back-end only use 1 second timers on normal processing,
the kqueue.c one should affect only BSD's, I guess you hit the inotify 
back-end code, and that I have no clue about, sorry ! 

Daniel 
Comment 3 Alexander Larsson 2006-09-01 10:04:04 EDT
Hmmm. That stuff was fixed by john in the inotify support in gnome-vfs.
Comment 4 Matthias Clasen 2006-09-01 12:39:23 EDT
Alex, are you looking at doing similar fixes for gam_server, or have you
asked McClutchan to look into it ?
Comment 5 Alexander Larsson 2006-09-02 03:03:31 EDT
I pinged John about it.
Comment 6 Matthias Clasen 2006-09-03 00:24:58 EDT
Created attachment 135450 [details]
a patch

Here is a patch that does basically the same thing that was done in gnome-vfs.
Can you try this one, Arjan ?
I also note that gnome-vfs has the missing timeout with 1/4 Hz instead of 1 Hz
Comment 7 Arjan van de Ven 2006-09-03 03:20:11 EDT
Ok this part works for me; however there is a second timer, a 10 Hz one which
also spoils the fun somewhat; it's the gam_connection_eq_flush timer.

(I'd like to leave the missing timeout to be 1 second so that we can switch it
to the g_timeout_add_seconds() api later)

If the gam_connection_eq_flush timer could be 1 Hz instead I'd be a lot happier,
that way it could also use g_timeout_add_seconds() 
Comment 8 Arjan van de Ven 2006-09-03 03:23:53 EDT
with the patch, and the gam_connection_eq_flush timer set to 1Hz, all timers are
in seconds, and when I switched gam_server to the g_timeout_add_seconds() API,
there is now exactly one wakeup per second. That'll have to do :)
Comment 9 Arjan van de Ven 2006-09-03 13:49:52 EDT
Created attachment 135463 [details]
Patch to make the 100msec timer an on-demand timer as well

The 10Hz timer can undergo the same treatment as your other patch, just arm it
on demand when there's actual work and stop firing when the queues are empty
Comment 10 Arjan van de Ven 2006-09-03 13:51:07 EDT
Question for Daniel: why is gam_connections_check a 1Hz timer? Can it be a
1/10th Hz timer instead without any bad behavior?
Comment 11 Arjan van de Ven 2006-09-03 14:04:37 EDT
Created attachment 135464 [details]
patch to kill another useless timer

The inotify backend called the initialization of the basic poll backend (which
sets up a 1Hz timer) but then overwrites all methods that could call this
backend, so this 1Hz timer is utterly pointless.
It looks like a bug in the inotify initialization, it appears they ment to call
gam_poll_generic_init not gam_poll_basic_init....
Comment 12 John McCutchan 2006-09-03 23:56:11 EDT
I have ported the gnome-vfs inotify backend over to gamin and will be committing
it to CVS early this week. This should help immensely. 
Comment 13 Matthias Clasen 2006-09-04 00:09:27 EDT
Also compare older patches in 

http://bugzilla.gnome.org/show_bug.cgi?id=348974
Comment 14 John McCutchan 2006-09-04 11:33:26 EDT
Patches attached to Comment #1 and #4 of
http://bugzilla.gnome.org/show_bug.cgi?id=348974 are not needed after I merge
the new backend. But the rest are applicable to gamin core and still usefull. 
Comment 15 John McCutchan 2006-09-04 20:55:06 EDT
I just committed the ported backend. That should solve the inotify side of the
problem.
Comment 16 Daniel Veillard 2006-09-04 22:38:07 EDT
Hi Arjan,

basically the delay for the 1s timer is the time it will take to discover that
a missing watched file was created. On very specific use case, a 10s timer might 
just fine. Potentially that could be a setting in the config file, and then set
up in the global config file.
For patch 1 from #9, I also had to change some include files which were missing
from the patch. 

Unfortunately I'm completely unable to test this version, all regression tests
seems to be broken if I run "make tests", still I commited it on top of John's
patch.

Daniel
Comment 17 Daniel Veillard 2006-09-04 22:44:06 EDT
Hi John,

thanks for the patch in #14 and #15. However it seems to have affected 
the regression tests, even the basic non-python ones seems to fail completely
now. Did you tried "make tests" before commiting ? Because here, even the
most basic ones fail immedistely, maybe I would need to restart with the new
version on the desktop to really test it, but I'm on the road with just
the laptop and testing is a bit hard for me right now...

Daniel
Comment 18 John McCutchan 2006-09-04 23:02:27 EDT
Daniel I ran most of the test suite on my system before committing. All the
non-dnotify tests work fine on my system. Just to be explicit the following passed:

basic.py
basic2.py
basic3.py
basic4.py
basic5.py
basic6.py
bigfile.py
level.py
multiple.py
multiple2.py
multiple3.py
noexists.py
1-11.tst in the non-python test suite

dnotify*.py and flood*.py rely on the dnotify code so I can't test them.
Comment 19 Alexander Larsson 2006-09-05 06:16:46 EDT
Created attachment 135533 [details]
Fix for cvs version

The current version in cvs doesn't give me any events at all without this
patch. With this patch some of the tests pass, but some don't.
Comment 20 Alexander Larsson 2006-09-05 06:34:50 EDT
+	gam_server_install_kernel_hooks (GAMIN_K_INOTIFY, 
That should probably be GAMIN_K_INOTIFY2.
Comment 21 Alexander Larsson 2006-09-05 06:57:20 EDT
Ok, i just built gamin-0.1.7-5.fc6 in rawhide. It has the new inotify backend
from cvs (with the fix in comment #19 and #20) and the 100msec timer an
on-demand timer from comment #9 (with fixed header).

It seems to work fine, and only wakes up once a second. (Although it still
doesn't pass all tests in the testcase, haven't looked into why.)

I consider this fixed for now.
Comment 22 Daniel Veillard 2006-09-05 09:47:48 EDT
Alex, if you could just commit the extra patches to CVS, that would be great,
I will try to make a new release when I'm back in a couple of weeks,

  thanks !

Daniel
Comment 23 Alexander Larsson 2006-09-05 10:14:22 EDT
0.1.7-6 has even less timers, now only the 4 sec one.
Comment 24 Alexander Larsson 2006-09-05 10:29:21 EDT
I commited the patches upstream too.
Comment 25 Arjan van de Ven 2007-02-06 08:37:53 EST
in rawhide it's back to once per second... can we go back to once per 4 seconds?
Comment 26 Matthias Clasen 2007-02-06 09:55:21 EST
Looking at the current gam_server code, 

the inotify backend initializes the basic poll backend, and
that installs a 1Hz timout that is always running, even if there is
nothing to do (gam_poll_basic_scan_callback). That should be triggered
by gam_poll_basic_add_subscription().
Comment 27 John McCutchan 2007-02-06 10:12:58 EST
Matthias is correct, related to this commit:

2006-11-20  Alexander Larsson  <alexl@redhat.com>

* server/gam_inotify.c: (gam_inotify_init):
  Enable the basic polling code for the inotify
  backend so e.g. polling on NFS works.

Comment 28 John McCutchan 2007-02-06 10:14:57 EST
Created attachment 147470 [details]
gam_poll_basic_conditional_callback.diff

Patch is untested
Comment 29 Alexander Larsson 2007-03-07 05:54:51 EST
Created attachment 149440 [details]
Nicer version

I think this approach is nicer.
Comment 30 Alexander Larsson 2007-03-07 06:12:14 EST
* Wed Mar  7 2007 Alexander Larsson <alexl@redhat.com> - 0.1.8-4
- Add patch to fix #204906