Reporting as requested by https://www.redhat.com/archives/fedora-devel-list/2007-August/msg00544.html On an idle system, mailman-2.1.9-5 wakes up 7 times a second as reported by powertop. Doing some stracing shows that 7/8 qrunners wake up once a second scanning mailman spool folders. Below are the qrunners which appear to sleep 1 second then wake up and open a directory: /usr/lib/mailman/bin/qrunner --runner=ArchRunner:0:1 -s /usr/lib/mailman/bin/qrunner --runner=BounceRunner:0:1 -s /usr/lib/mailman/bin/qrunner --runner=CommandRunner:0:1 -s /usr/lib/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s /usr/lib/mailman/bin/qrunner --runner=NewsRunner:0:1 -s /usr/lib/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s The qrunner process should probably be converted to inotify, or at least tweaked so that the wakeup times are synchronized to allow the CPU to idle longer.
I have prepared a patch of Runner.py that uses inotify instead of polling and sent it upstream.
Thanks! Can you attach the patch here so that I can apply it to my system and test it?
I found where you submitted the patch. http://sourceforge.net/tracker/index.php?func=detail&aid=1776178&group_id=103&atid=300103 It applied with a bit of fuzz to mailman-2.1.9-5, but doesn't appear to work, the qrunners get stuck sucking a bunch of CPU. Trying to shut down mailman using service doesn't work, either.
Created attachment 161856 [details] Working (probably not final) version of the patch Thank you for testing. I really don't have any of the problems you describe. Note I'm using rawhide with the following packages: python-2.5.1-8.fc8 python-inotify-0.7.0-3.fc8 mailman-2.1.9-6 You may try the attached patch. If you do, please leave me a comment whether it works or not and possibly also a contens of /var/log/mailman/error.
I'm running the following versions: python-2.5-12.fc7 python-inotify-0.7.0-3.fc7 mailman-2.1.9-5 The patch applies cleanly to Runner.py. But after starting up mailman, the OutgoingRunner goes into a busy loop. stracing it it is in a loop that repeats like this: gettimeofday({1187904251, 287366}, NULL) = 0 poll([{fd=5, events=POLLIN, revents=POLLIN}], 1, -1) = 1 open("/var/spool/mailman/out", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 11 fstat64(11, {st_mode=S_IFDIR|S_ISGID|0770, st_size=4096, ...}) = 0 fcntl64(11, F_SETFD, FD_CLOEXEC) = 0 getdents64(11, /* 2 entries */, 4096) = 48 getdents64(11, /* 0 entries */, 4096) = 0 close(11) = 0 Stopping mailman leaves all the qrunners running except for the OutgoingRunner. Killing the mailmanctl parent process gets mailman to stop. I understand that there is a join bug fixed in python 2.5.1 which fixes a problem sending signals to processes which may be the cause of at least the issue stopping mailman on my system. See this comment for more information on that bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=251580#c9
Hm... I have tried some more tests and it seems that Python 2.5.1 won't help. I was able to have all the runners running after "service mailman stop". I probably didn't notice this before because otherwise Mailman seems to work OK. I'll see whether I can do something about that.
Thanks for testing. This seems like a fairly severe bug in Python especially with it being used more and more for development these days. Were you able to also reproduce the busy loop I experienced?
I haven't seen the bug you described. As I wrote -- apart from the stopping Mailman seems to behave correctly. Also I don't think it's python's fault anymore. Most probably I made some stupid mistake in the code.
It seems that the problem with service stop comes from self._notifier.check_events(None) -- the purpose was to block in this function until "something happens". But when the termination signal comes this call seems not to react (and it looks to be a known python "feature"). The new patch doesn't block in the call so we can skip out of the loop when required.
Created attachment 175381 [details] New patch -- tested only briefly
OK, mailman stops now with your latest patch, but it still gets stuck in the infinite loop. I finally go around to reading some python and pyinotify docs and realized that it wasn't actually processing the inotify event queue so it always thought there were more events to process. I'll attach an updated patch which seems to work.
Created attachment 177381 [details] Patch now processes inotify events avoiding busy loop I do wonder if we should also watch for the IN_MODIFY event as well to avoid a race condition where a queue file has been created but not fully written yet?
Seems my configuration didn't allow me to spot that mistake. Thanks for your help.
I have reopened the upstream bug with the new patches. I think we should continue there.
Do you think this bug should remain open until the patch is officially accepted upstream?