Bug 252127

Summary: mailman wakes up 7 times a second when idle
Product: [Fedora] Fedora Reporter: David Rees <drees76>
Component: mailmanAssignee: Tomas Smetana <tsmetana>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 7   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-20 08:53:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 204948    
Attachments:
Description Flags
Working (probably not final) version of the patch
none
New patch -- tested only briefly
none
Patch now processes inotify events avoiding busy loop none

Description David Rees 2007-08-14 02:00:48 UTC
Reporting as requested by
https://www.redhat.com/archives/fedora-devel-list/2007-August/msg00544.html

On an idle system, mailman-2.1.9-5 wakes up 7 times a second as reported by
powertop. Doing some stracing shows that 7/8 qrunners wake up once a second
scanning mailman spool folders.

Below are the qrunners which appear to sleep 1 second then wake up and open a
directory:

/usr/lib/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
/usr/lib/mailman/bin/qrunner --runner=BounceRunner:0:1 -s
/usr/lib/mailman/bin/qrunner --runner=CommandRunner:0:1 -s
/usr/lib/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s
/usr/lib/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
/usr/lib/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s

The qrunner process should probably be converted to inotify, or at least tweaked
so that the wakeup times are synchronized to allow the CPU to idle longer.

Comment 1 Tomas Smetana 2007-08-17 12:05:57 UTC
I have prepared a patch of Runner.py that uses inotify instead of polling and
sent it upstream.

Comment 2 David Rees 2007-08-17 20:51:52 UTC
Thanks! Can you attach the patch here so that I can apply it to my system and
test it?

Comment 3 David Rees 2007-08-18 02:10:37 UTC
I found where you submitted the patch.

http://sourceforge.net/tracker/index.php?func=detail&aid=1776178&group_id=103&atid=300103

It applied with a bit of fuzz to mailman-2.1.9-5, but doesn't appear to work,
the qrunners get stuck sucking a bunch of CPU. Trying to shut down mailman using
service doesn't work, either.

Comment 4 Tomas Smetana 2007-08-20 07:13:34 UTC
Created attachment 161856 [details]
Working (probably not final) version of the patch

Thank you for testing.	I really don't have any of the problems you describe. 
Note I'm using rawhide with the following packages:

python-2.5.1-8.fc8
python-inotify-0.7.0-3.fc8
mailman-2.1.9-6

You may try the attached patch.  If you do, please leave me a comment whether
it works or not and possibly also a contens of /var/log/mailman/error.

Comment 5 David Rees 2007-08-23 21:29:35 UTC
I'm running the following versions:

python-2.5-12.fc7
python-inotify-0.7.0-3.fc7
mailman-2.1.9-5

The patch applies cleanly to Runner.py. But after starting up mailman, the
OutgoingRunner goes into a busy loop. stracing it it is in a loop that repeats
like this:

gettimeofday({1187904251, 287366}, NULL) = 0
poll([{fd=5, events=POLLIN, revents=POLLIN}], 1, -1) = 1
open("/var/spool/mailman/out", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 11
fstat64(11, {st_mode=S_IFDIR|S_ISGID|0770, st_size=4096, ...}) = 0
fcntl64(11, F_SETFD, FD_CLOEXEC)        = 0
getdents64(11, /* 2 entries */, 4096)   = 48
getdents64(11, /* 0 entries */, 4096)   = 0
close(11)                               = 0

Stopping mailman leaves all the qrunners running except for the OutgoingRunner.
Killing the mailmanctl parent process gets mailman to stop.

I understand that there is a join bug fixed in python 2.5.1 which fixes a
problem sending signals to processes which may be the cause of at least the
issue stopping mailman on my system.

See this comment for more information on that bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=251580#c9

Comment 6 Tomas Smetana 2007-08-24 08:08:38 UTC
Hm...  I have tried some more tests and it seems that Python 2.5.1 won't help. 
I was able to have all the runners running after "service mailman stop".  I
probably didn't notice this before because otherwise Mailman seems to work OK. 
I'll see whether I can do something about that.

Comment 7 David Rees 2007-08-24 19:32:07 UTC
Thanks for testing. This seems like a fairly severe bug in Python especially
with it being used more and more for development these days.

Were you able to also reproduce the busy loop I experienced?

Comment 8 Tomas Smetana 2007-08-27 13:03:43 UTC
I haven't seen the bug you described.  As I wrote -- apart from the stopping
Mailman seems to behave correctly.  Also I don't think it's python's fault
anymore.  Most probably I made some stupid mistake in the code.

Comment 9 Tomas Smetana 2007-08-28 09:04:55 UTC
It seems that the problem with service stop comes from
self._notifier.check_events(None) -- the purpose was to block in this function
until "something happens".  But when the termination signal comes this call
seems not to react (and it looks to be a known python "feature").  The new patch
doesn't block in the call so we can skip out of the loop when required.

Comment 10 Tomas Smetana 2007-08-28 09:07:51 UTC
Created attachment 175381 [details]
New patch -- tested only briefly

Comment 11 David Rees 2007-08-28 20:47:14 UTC
OK, mailman stops now with your latest patch, but it still gets stuck in the
infinite loop.

I finally go around to reading some python and pyinotify docs and realized that
it wasn't actually processing the inotify event queue so it always thought there
were more events to process.

I'll attach an updated patch which seems to work.

Comment 12 David Rees 2007-08-28 20:51:33 UTC
Created attachment 177381 [details]
Patch now processes inotify events avoiding busy loop

I do wonder if we should also watch for the IN_MODIFY event as well to avoid a
race condition where a queue file has been created but not fully written yet?

Comment 13 Tomas Smetana 2007-08-29 06:31:56 UTC
Seems my configuration didn't allow me to spot that mistake.  Thanks for your help.

Comment 14 Tomas Smetana 2007-09-20 08:53:34 UTC
I have reopened the upstream bug with the new patches.  I think we should
continue there.

Comment 15 David Rees 2007-09-20 18:19:32 UTC
Do you think this bug should remain open until the patch is officially accepted
upstream?