Bug 119128
Summary: | httpd stops responding for several minutes after logrotate | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ola Thoresen <redhat> | ||||
Component: | httpd | Assignee: | Joe Orton <jorton> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 1 | CC: | marius.andreiana, seanos | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2005-11-23 09:04:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Ola Thoresen
2004-03-25 11:30:41 UTC
Humm... The is not that the -HUP is sent twice. I can reproduce the same issue with just one HUP. But as said in the first post, not every time. Thiere is no difference between the processes that do not die and those that exits as they should as far as I can tell. Created attachment 98878 [details]
Merged list of processes, error_log and a couple of straces
This is a log consiting of a ps axuwn every second during such a hang (snipped
a bit when nothing happens) and entries in the error_log and strace of two
hanging processes.
There appears to be a rather serious mathematical error in the algorithm which waits for children to terminate / terminates them prematurely; the parent will indeed wait for the children to terminate for up to ~24 minutes! BTW, what modules are you using, that the children are getting stuck in futex calls? Subversion? Process 18696 attached - interrupt to quit futex(0x427ba0, FUTEX_WAIT, 2, NULL) = -1 EINTR (Interrupted system call) +++ killed by SIGKILL +++ Nothing special as far as I know. This particular server is only running a webserver that serves a few php-scripts which reads status information from other processes. # uname -prv 2.4.22-1.2174.nptl #1 Wed Feb 18 16:38:32 EST 2004 i686 # rpm -qa |grep httpd httpd-2.0.48-1.2 httpd-manual-2.0.48-1.2 # rpm -qa |grep mod_ mod_ssl-2.0.48-1.2 mod_python-3.0.4-0.1 mod_perl-1.99_12-2 # rpm -qa |grep php php-4.3.4-1.1 php-snmp-4.3.4-1.1 php-ldap-4.3.4-1.1 php-devel-4.3.4-1.1 php-pgsql-4.3.4-1.1 php-odbc-4.3.4-1.1 php-domxml-4.3.4-1.1 php-xmlrpc-4.3.4-1.1 php-mysql-4.3.4-1.1 php-imap-4.3.4-1.1 We could probably remove both mod_ssl, mod_perl and mod_python, and most of the php packages. This is just a generic install we run, but no customized modules or other packages related to apache (according to our packager). Anything else you need? There are really two problems here: 1) the httpd parent process does not restart in a timely fashion when a child process has hung and ignores SIGTERM 2) some of your httpd children are blocked in futex() calls; this could possibly be a problem in a PHP script (but unlikely), or some other module. (1) is a simple bug and is easy enough to solve. (2) is not. Great. If 1 is fixed, then 2 should not cause such a problem. However, I would like to get 2 fixed as well. Is there any way we can try to locate what causes this? The server is only serving an automated script, that is run once a minute (from cron) and the script simply returns "OK" or "Error", so it is no complicated pages that should cause a timeout from the client or anything. I will try to remove all the modules we do not need and see if that helps. I believe the problem is in mod_python somewhere. I removed all unneeded packages, and then reinstalled one by one. After installing mod_python httpd did not respond well to kill -HUP anymore. I will experiment a bit more. Almost forgot about this bug, but I realized it still semms to be a problem from time to time. Have you been able to fix 1) in comment #6? Not yet, sorry Ola. It's in the TODO list. I have built experimental packages in Raw Hide which should fix the timing algorithm now. If you'd like to test these out, please see bug 132360 comment 4. Hello, just curious if there is any update on this? It is also affecting RHEL 3U3 httpd-2.0.46-54.ent mod_authz_ldap-0.22-5 mod_perl-1.99_09-10.ent mod_python-3.0.3-5.ent mod_ssl-2.0.46-54.ent PHP varies from : php-4.3.2-26.ent to compiled 5.0.4 & 4.4.1 The problem with slow restarts is fixed in all current Fedora Core httpd packages, and all current RHEL httpd updates; Sean please file a new bug or open a support case describing the problem you're having. |