Bug 251580 - fail2ban wakes up 28 times a second
Summary: fail2ban wakes up 28 times a second
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: fail2ban
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Axel Thimm
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 437442
Blocks: wakeup
TreeView+ depends on / blocked
 
Reported: 2007-08-09 21:10 UTC by David Rees
Modified: 2008-03-26 17:14 UTC (History)
2 users (show)

Fixed In Version: 0.8.2-13.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-03-26 17:14:07 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Remove join workaround to test real rate of wakeups (598 bytes, patch)
2007-08-16 03:07 UTC, Axel Thimm
no flags Details | Diff

Description David Rees 2007-08-09 21:10:16 UTC
Reporting as requested by
https://www.redhat.com/archives/fedora-devel-list/2007-August/msg00544.html

On an otherwise idle system, fail2ban is generating nearly 30 wakeups a second:

 64.9% ( 28.0)   fail2ban-server : schedule_timeout (process_timeout)

Stracing the process reveals a lot of lines like this:

select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
futex(0x9320600, FUTEX_WAKE, 1)         = 0
gettimeofday({1185303825, 615120}, NULL) = 0

Comment 1 Axel Thimm 2007-08-09 22:25:26 UTC
There are plans to add an inotify backend (now it's gamin or polling), this may
change the wakeup frequency.

(although I would had expected gamin to also spare the wakeups)


Comment 2 Jonathan Underwood 2007-08-15 17:14:46 UTC
It may be that on David's machine that it isn't using gamin - David, in your
/etc/fail2ban/jail.conf can you check what you have "backend" set to? By default
it is set to backend = auto and that should default to using gamin *IF* gamin is
installed. Things to try:

a) yum install gamin if it isn't installed already
b) if you still see wakeups at 30 / sec, try changing to backend = gamin

Comment 3 Jonathan Underwood 2007-08-15 17:17:13 UTC
Actually, I tried those suggestions on my machine, and powertop is still showing
30 wakeups a second from fail2ban-server.

Comment 4 David Rees 2007-08-15 18:48:02 UTC
My servers didn't have gamin-python installed (gamin alone isn't enough), so
they were indeed using the poller instead of gamin.

But even after installing gamin-python and /var/log/fail2ban.log reports it
using gamin with backend = auto, fail2ban-server is still waking up 28 times a
second according to powertop. The behavior doesn't change when setting backend =
gamin, in jail.conf, either.

Comment 5 Jonathan Underwood 2007-08-15 19:01:18 UTC
Yes, I see the same.

As an aside, Axel, it's probably worth adding a Requires: gamin-python to the
fail2ban package, though some might consider this "package bloat" when it isn't
strictly required. Oh for soft requires.


Comment 6 Axel Thimm 2007-08-15 19:30:47 UTC
(In reply to comment #5)
> As an aside, Axel, it's probably worth adding a Requires: gamin-python to the
> fail2ban package, though some might consider this "package bloat" when it isn't
> strictly required.

I think this bloat is justified. I'll add it on the next update. But it looks
like it isn't really being used though.

Comment 7 David Rees 2007-08-15 20:58:54 UTC
Looking at the gamin changelogs, I wonder if this is fixed in 0.1.9 (if it is
gamin causing the wakeups that is, since the wakeup behavior of fail2ban didn't
change with or without gamin).

From the gamin 0.1.9 changelog:

2007-03-07  Alexander Larsson  <alexl>
        * server/gam_poll_basic.c (gam_poll_basic_poll_file):
        Don't run polling idle handler if not needed.


Comment 8 David Rees 2007-08-15 21:53:32 UTC
gamin 0.1.9 doesn't the number of wakeups (installed on F7 from development repo)

Comment 9 Axel Thimm 2007-08-16 03:07:10 UTC
Created attachment 161427 [details]
Remove join workaround to test real rate of wakeups

I contacted the upstream author, Cyril Jaquier, and he provided the following
information and patch:

> There is the same bug report in fail2ban bug tracker:
> 
>
http://sourceforge.net/tracker/index.php?func=detail&aid=1769616&group_id=121032&atid=689044

> 
> I look at this issue quickly and I found the problem (which has nothing
> to do with gamin or polling).
> 
> I had a problem with join() not getting interrupted by SIGINT and
> SIGTERM in server.py. There is a bug report for this here:
> 
>
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1167930&group_id=5470

> 
> So I implemented the workaround suggested in the bug report. However,
> this results in frequent weak-ups.
> 
> The attached patch fixes these weak-ups. However, SIGINT and SIGTERM
> does NOT work anymore and fail2ban-server does NOT terminate if you send
> it those signals. "fail2ban-client stop" still works as expected.
> 
> So we probably need a better fix for this one. Eventually, I will
> replace the whole socket communication code with asynchat and asyncore.
> I hope this will provide a better fix.

I'm not sure what is better: less wakeups or a proper signal handling setup.
But if you like you can test this patch and see how the wakeup rate changes.

Comment 10 David Rees 2007-08-16 08:07:10 UTC
Yes, that patch reduced the number of wakeups from 28/s to 3/s. We still have 3
wakeups/s to go. :-) I think there are a couple places where there are 1 second
polling loops.

The signal bug may be fixed in Python 2.5.1 according to the comment by gildea
in the sf.net bug Axle linked to above (#1167930). Unfortunately, it seems that
trying to pull in Python 2.5.1 from development also requires glibc 2.6.90-8 -
not something I'm willing to be upgrading at this point.

Bug even without that bug fix, I'm not sure if fail2ban-server not responding to
TERM/INT signals is that bad of an issue.

Comment 11 Jonathan Underwood 2007-11-16 15:53:57 UTC
Just a small update: on Fedora 8, applying the patch reduces wakeups/second to
about 5, and the server does still not respond to sigint and sigterm (kill -2
and kill -15). On Fedora 8, we have python-2.5.1-15.fc8.

Comment 12 Axel Thimm 2007-11-29 19:29:08 UTC
I've been waiting for 0.9.x to pop up, as it was said that thsi fixes
everything, but it still hasn't surfffaced. Does anyone know anything about an
ETA? It it's in  sight I would wait for it, otherwise we need to choose between
more sleeping and proper signal handling.

Comment 13 Jonathan Underwood 2007-11-29 19:44:32 UTC
Doesn't seem to have been much activity in the svn repository, so i'd guess that
0.9 isn't imminent.

Oddly, the python bug that is at the root of this problem seems to have been
made private:

https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1167930&group_id=5470

I don't really know which is preferable out of more sleeping and proper signal
handling...

Comment 14 Tim Niemueller 2007-12-19 10:23:18 UTC
I still see this on F8. Is there any progress on this?

Comment 15 Axel Thimm 2007-12-22 12:30:57 UTC
Not, really. The new code from fail2ban fixes this by replacing this part of the
code completely, but it is yet unreleased. The most we can do is trade wakeups
with bad signal handling and I still feel that killing off the signal handling
may be worse.

There will be some other patching necessary for selinux compatibility (see bug
#425241). Depending on that I would revisit this issue. Also Jonathan what would
your current vote on wakeups vs signal handling be?

Comment 16 Jonathan Underwood 2007-12-22 14:06:49 UTC
(In reply to comment #15)
> There will be some other patching necessary for selinux compatibility (see bug
> #425241). Depending on that I would revisit this issue. Also Jonathan what would
> your current vote on wakeups vs signal handling be?

To be honest, I'm on the fence on this. I'd lean towards leaving things as they
are, as I imagine an upstream release is fairly imminent as it seems to fix a
number of issues being reported on the upstream mailing list.

I'll have a crack at working up a patch to resolve the leaking file descriptors
over the holidays though, that seems like a trivial fix.

Comment 17 Axel Thimm 2008-03-15 10:22:08 UTC
OK, we'll wait for 0.9.x to be released which has this part redesigned and
fixed. According to

https://bugzilla.redhat.com/page.cgi?id=bug_status.html#resolution

this means we must resolve as "DEFERRED"

Comment 18 Jonathan Underwood 2008-03-15 13:58:35 UTC
Actually, the asyncore/asynchat rewrite is in version 0.8.2 which was released a
few days ago - perhaps it is worth pushing packages for that, a it fixes a
number of other bugs too.

Comment 19 Axel Thimm 2008-03-15 15:08:10 UTC
That's good to know - 0.8.2 is already built and ready for F-7 upwards (see bug
#437442).

There isn't any note about the waking up in the changelog, but maybe it is
implied by "Rewrote the communication server" ;)

Closing as a duplicate of the update to 0.8.2 request.

*** This bug has been marked as a duplicate of 437442 ***

Comment 20 David Rees 2008-03-16 06:14:08 UTC
fail2ban with 0.8.2 only wakes up between 2-6 times/second on the two servers
I've tested which is significantly better and now falls within the noise, so I'd
consider this bug resolved.

But instead of resolving it as a duplicate of 0.8.2, I think it should be marked
as RELEASE_PENDING since it will be fixed in the next release, so I will be
changing it as such and making it depend on bug 437442. Once 437442 is closed we
can close this one, too.

Comment 21 Fedora Update System 2008-03-16 19:30:04 UTC
fail2ban-0.8.2-13.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update fail2ban'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F7/FEDORA-2008-2527

Comment 22 Fedora Update System 2008-03-26 17:14:03 UTC
fail2ban-0.8.2-13.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 23 Fedora Update System 2008-03-26 17:14:31 UTC
fail2ban-0.8.2-13.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.