119128 – httpd stops responding for several minutes after logrotate

Bug 119128 - httpd stops responding for several minutes after logrotate

Summary: httpd stops responding for several minutes after logrotate

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	httpd
Sub Component:
Version:	1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Joe Orton
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-03-25 11:30 UTC by Ola Thoresen
Modified:	2007-11-30 22:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-11-23 09:04:59 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Merged list of processes, error_log and a couple of straces (14.10 KB, text/plain) 2004-03-26 11:16 UTC, Ola Thoresen	no flags	Details
View All

Description Ola Thoresen 2004-03-25 11:30:41 UTC

Description of problem:
After logrotate has run, httpd does not server requests for up to 30
minutes. The reason is that one or more "workers" are still hanging
around, waiting to exit, so the -HUP does not succeed until apache
itself has killed them.

Version-Release number of selected component (if applicable):
httpd-2.0.48-1.2
logrotate-3.6.10-1

How reproducible:
Not 100% but on some servers this happens almost on each run of
logrotate, on others it happens only rarely.

Steps to Reproduce:
1. install httpd and logrotate
2. let logrotate run
3. watch the number of processes
  
or:

1. install httpd
2. run "killall -HUP httpd; killall -HUP httpd


Actual results:
Some httpd-processes keep hanging around for up to 30 minutes.

Expected results:
httpd should imediately kill off its idle children, and the rest
should die as soon as they stop their current serving.
Then the -HUP should be completed and a new logfile opened.

Additional info:
I am 99% sure the reason for this problem is the "postrotate" section
in /etc/logrotate.d/httpd
It seems like logrotate itself is sending a killall -HUP, and then it
is run from postrotate as well.  
When httpd recieves two HUPs in a short timespan, some of the proesses
will refuse to die. 
Which processes and why is still a mystery to me.
=;-)

If this should be fixed in logrotate - so it does not send a -HUP to
processes that have a postrotate section, or in httpd - so it does not
add a postrotate section is for you to decide.

Comment 1 Ola Thoresen 2004-03-26 09:38:57 UTC

Humm...

The is not that the -HUP is sent twice.
I can reproduce the same issue with just one HUP.

But as said in the first post, not every time.
Thiere is no difference between the processes that do not die and
those that exits as they should as far as I can tell.

Comment 2 Ola Thoresen 2004-03-26 11:16:50 UTC

Created attachment 98878 [details]
Merged list of processes, error_log and a couple of straces

This is a log consiting of a ps axuwn every second during such a hang (snipped
a bit when nothing happens) and entries in the error_log and strace of two
hanging processes.

Comment 3 Joe Orton 2004-03-26 14:16:39 UTC

There appears to be a rather serious mathematical error in the
algorithm which waits for children to terminate / terminates them
prematurely; the parent will indeed wait for the children to terminate
for up to ~24 minutes!

Comment 4 Joe Orton 2004-03-26 15:00:36 UTC

BTW, what modules are you using, that the children are getting stuck
in futex calls? Subversion?

Process 18696 attached - interrupt to quit
futex(0x427ba0, FUTEX_WAIT, 2, NULL)    = -1 EINTR (Interrupted system
call)
+++ killed by SIGKILL +++

Comment 5 Ola Thoresen 2004-03-26 15:32:24 UTC

Nothing special as far as I know.
This particular server is only running a webserver that serves a few
php-scripts which reads status information from other processes.


# uname -prv 
2.4.22-1.2174.nptl #1 Wed Feb 18 16:38:32 EST 2004 i686

# rpm -qa |grep httpd
httpd-2.0.48-1.2
httpd-manual-2.0.48-1.2

# rpm -qa |grep mod_
mod_ssl-2.0.48-1.2
mod_python-3.0.4-0.1
mod_perl-1.99_12-2

# rpm -qa |grep php
php-4.3.4-1.1
php-snmp-4.3.4-1.1
php-ldap-4.3.4-1.1
php-devel-4.3.4-1.1
php-pgsql-4.3.4-1.1
php-odbc-4.3.4-1.1
php-domxml-4.3.4-1.1
php-xmlrpc-4.3.4-1.1
php-mysql-4.3.4-1.1
php-imap-4.3.4-1.1


We could probably remove both mod_ssl, mod_perl and mod_python, and
most of the php packages.  This is just a generic install we run, but
no customized modules or other packages related to apache (according
to our packager).

Anything else you need?

Comment 6 Joe Orton 2004-03-26 15:37:10 UTC

There are really two problems here:

1) the httpd parent process does not restart in a timely fashion when
a child process has hung and ignores SIGTERM

2) some of your httpd children are blocked in futex() calls; this
could possibly be a problem in a PHP script (but unlikely), or some
other module.

(1) is a simple bug and is easy enough to solve.  (2) is not.

Comment 7 Ola Thoresen 2004-03-26 16:03:24 UTC

Great.
If 1 is fixed, then 2 should not cause such a problem.

However, I would like to get 2 fixed as well.  Is there any way we can
try to locate what causes this?

The server is only serving an automated script, that is run once a
minute (from cron) and the script simply returns "OK" or "Error", so
it is no complicated pages that should cause a timeout from the client
or anything.

I will try to remove all the modules we do not need and see if that helps.

Comment 8 Ola Thoresen 2004-03-29 09:00:46 UTC

I believe the problem is in mod_python somewhere.
I removed all unneeded packages, and then reinstalled one by one.
After installing mod_python httpd did not respond well to kill -HUP
anymore.
I will experiment a bit more.

Comment 9 Ola Thoresen 2004-08-11 21:12:11 UTC

Almost forgot about this bug, but I realized it still semms to be a
problem from time to time.
Have you been able to fix 1) in comment #6?

Comment 10 Joe Orton 2004-08-12 10:01:45 UTC

Not yet, sorry Ola.  It's in the TODO list.

Comment 11 Joe Orton 2004-09-14 15:00:50 UTC

I have built experimental packages in Raw Hide which should fix the
timing algorithm now.  If you'd like to test these out, please see bug
132360 comment 4.

Comment 12 Seán O Sullivan 2005-11-18 10:17:19 UTC

Hello, just curious if there is any update on this?
It is also affecting RHEL 3U3
httpd-2.0.46-54.ent
mod_authz_ldap-0.22-5
mod_perl-1.99_09-10.ent
mod_python-3.0.3-5.ent
mod_ssl-2.0.46-54.ent

PHP varies from :
php-4.3.2-26.ent

to compiled 5.0.4 & 4.4.1

Comment 13 Joe Orton 2005-11-23 09:04:59 UTC

The problem with slow restarts is fixed in all current Fedora Core httpd
packages, and all current RHEL httpd updates; Sean please file a new bug or open
a support case describing the problem you're having.

Note You need to log in before you can comment on or make changes to this bug.