Red Hat Bugzilla – Bug 119128
httpd stops responding for several minutes after logrotate
Last modified: 2007-11-30 17:10:38 EST
Description of problem:
After logrotate has run, httpd does not server requests for up to 30
minutes. The reason is that one or more "workers" are still hanging
around, waiting to exit, so the -HUP does not succeed until apache
itself has killed them.
Version-Release number of selected component (if applicable):
Not 100% but on some servers this happens almost on each run of
logrotate, on others it happens only rarely.
Steps to Reproduce:
1. install httpd and logrotate
2. let logrotate run
3. watch the number of processes
1. install httpd
2. run "killall -HUP httpd; killall -HUP httpd
Some httpd-processes keep hanging around for up to 30 minutes.
httpd should imediately kill off its idle children, and the rest
should die as soon as they stop their current serving.
Then the -HUP should be completed and a new logfile opened.
I am 99% sure the reason for this problem is the "postrotate" section
It seems like logrotate itself is sending a killall -HUP, and then it
is run from postrotate as well.
When httpd recieves two HUPs in a short timespan, some of the proesses
will refuse to die.
Which processes and why is still a mystery to me.
If this should be fixed in logrotate - so it does not send a -HUP to
processes that have a postrotate section, or in httpd - so it does not
add a postrotate section is for you to decide.
The is not that the -HUP is sent twice.
I can reproduce the same issue with just one HUP.
But as said in the first post, not every time.
Thiere is no difference between the processes that do not die and
those that exits as they should as far as I can tell.
Created attachment 98878 [details]
Merged list of processes, error_log and a couple of straces
This is a log consiting of a ps axuwn every second during such a hang (snipped
a bit when nothing happens) and entries in the error_log and strace of two
There appears to be a rather serious mathematical error in the
algorithm which waits for children to terminate / terminates them
prematurely; the parent will indeed wait for the children to terminate
for up to ~24 minutes!
BTW, what modules are you using, that the children are getting stuck
in futex calls? Subversion?
Process 18696 attached - interrupt to quit
futex(0x427ba0, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system
+++ killed by SIGKILL +++
Nothing special as far as I know.
This particular server is only running a webserver that serves a few
php-scripts which reads status information from other processes.
# uname -prv
2.4.22-1.2174.nptl #1 Wed Feb 18 16:38:32 EST 2004 i686
# rpm -qa |grep httpd
# rpm -qa |grep mod_
# rpm -qa |grep php
We could probably remove both mod_ssl, mod_perl and mod_python, and
most of the php packages. This is just a generic install we run, but
no customized modules or other packages related to apache (according
to our packager).
Anything else you need?
There are really two problems here:
1) the httpd parent process does not restart in a timely fashion when
a child process has hung and ignores SIGTERM
2) some of your httpd children are blocked in futex() calls; this
could possibly be a problem in a PHP script (but unlikely), or some
(1) is a simple bug and is easy enough to solve. (2) is not.
If 1 is fixed, then 2 should not cause such a problem.
However, I would like to get 2 fixed as well. Is there any way we can
try to locate what causes this?
The server is only serving an automated script, that is run once a
minute (from cron) and the script simply returns "OK" or "Error", so
it is no complicated pages that should cause a timeout from the client
I will try to remove all the modules we do not need and see if that helps.
I believe the problem is in mod_python somewhere.
I removed all unneeded packages, and then reinstalled one by one.
After installing mod_python httpd did not respond well to kill -HUP
I will experiment a bit more.
Almost forgot about this bug, but I realized it still semms to be a
problem from time to time.
Have you been able to fix 1) in comment #6?
Not yet, sorry Ola. It's in the TODO list.
I have built experimental packages in Raw Hide which should fix the
timing algorithm now. If you'd like to test these out, please see bug
132360 comment 4.
Hello, just curious if there is any update on this?
It is also affecting RHEL 3U3
PHP varies from :
to compiled 5.0.4 & 4.4.1
The problem with slow restarts is fixed in all current Fedora Core httpd
packages, and all current RHEL httpd updates; Sean please file a new bug or open
a support case describing the problem you're having.