Description of problem: piping apache logs through rotatelogs can hang some apache children on long lived connections Version-Release number of selected component (if applicable): <= 2.0.46-32.ent.3 How reproducible: intermitant Steps to Reproduce: 1. pipe apache logs through rotatelogs 2. create long lived client connection 3. kill -SIGUSR1 httpd Actual results: hung child ( no log entry ) Expected results: child exits with log entry Additional info:
From previous email: > I've run into this bug : > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26467 > > Any chance of getting it fixed? From the list thread mentioned in the > bug, this problem is easily identifiable, but seems hard to fix. Yep, as normal please do file a bug in the Red Hat bugzilla database for this, and we can prioritize development work on the issue appropriately. > From my point of view, sysadmin not developer, I'm wondering why > rotatelogs gets killed on a graceful restart. It seems that only the > httpd children need to be restarted. yes? no? maybe? Well, possibly that could be done, but that's not actually necessary to fix the bug. It might be confusing behaviour too: if you restart after making a *change* to a piped logger script, you'd expect the new script to be used after the restart. The simplest fix for this bug is as outlined on the upstream report: simply to allow the rotatelogs process for each server "generation" to terminate only when all the httpd children for that generation have terminated. This is simple to implement since it can be done by using normal pipe semantics, that you get an EOF when all the processes at the other end of the pipe have exited. Regards, joe
target for RHEL3 U3 ? :)
Thanks for filing the bug. Even "the simplest fix" is not really a dead-simple fix, so pretty unlikely to fix this for U3 at this stage.
We have 3 licenses for RH ES3 and this is a recurring problem. Both DEV and Production servers are intermittently failing to restart at 4:02 on Sunday morning after logrotate. Contents of /etc/logrotate.d/httpd are: /var/log/httpd/*log { missingok notifempty sharedscripts postrotate /bin/kill -HUP `cat /var/run/httpd.pid 2>/dev/null` 2> /dev/null || true endscript } I have yet to find a 'workaround' for this anywhere on the net. Is this an Apache problem? Could a viable script workaround for this be posted here to ENSURE apache will be restarted ? We would prefer a Redhat solution to this instead of crafting our own hack.
Jolyon, the issue described here should only be triggered by a graceful restart, and logrotate in RHEL3 does uses a non-graceful restart. Please open an issue with support or file a separate bug describing the issue you are having.
Any news on a fix for the graceful restart hang? the apache bugzilla doesn't have any updates either.
Not yet sorry, progress on the patch got stalled.
/me willing to test patch(s)
Any progress on this? If not, is there an apache developer willing to be "sponsored" to work on this?
In fact, fortuitous timing, Jeff Trawick upstream has produced a patch which avoids the hangs. I'll build some test packages including this. There are really two aspects to this problem: (1) the fact that log entries can be lost for requests made during a graceful restart if using piped loggers, and (2) the fact that children can hang during a graceful restart. Jeff's patch fixes (2) but not (1); is that acceptable to you?
works for me. I already have to deal with both (1) and (2), so any fix is better :) If you attach the patch or link I can build test packages locally.
Test packages with the patch applied are available from here: http://people.redhat.com/jorton/Taroon-httpd/
FYI, I just updated a newer set of packages there (*-2.0.46-48.ent) which fix several other issues in the piped logger code (potential parent process segfault in configurations using large numbers of piped loggers).
Hi, This is a little OT, but before filing another silly bug (cf. 164367) I'll ask here. Is there a reason for the /etc/logrotate.d/httpd script from the httpd package to use -HUP rather that -USR1? This thread seems to suggest that a graceful restart may fail with piped logs. I use cronolog with this feature.
Vincent: SIGHUP is just a "reload", SIGUSR1 is a "graceful restart" - only the latter suffers from this problem when using piped loggers.
Joe: Sure, I understand the differences. If it's the case that the only difference between using these two signals is that -USR1 would cause connections to be not dropped then why is -HUP used in the log rotation script? Is the problem described in this thread the reason for this choice?
It's really just historical that the logrotate script uses -HUP rather than -USR1, but yes, the fact that the graceful restart can (without the fix for this bug) cause child processes to hang if piped loggers are used is indeed a good reason why -HUP is should be used by default rather than -USR1. In the future, when the graceful restart process is made more robust even when piped loggers are used, then it would be good to change the logrotate script to use -USR1.
More info: I've been running the patched binaries in devel for several months without problems. I've been running the RHEL3U6beta binaries for a couple weeks in production without any major problems. The only issue I have seen are when we get a (usually bad) spider come in using keep alive. the 'soon to be dead' child can hang around for a while processing many spider requests. missing a log entry or two is acceptable. missing many is less desirable. possible resolution: if child gets SIGUSR1 ; then stop using keep alive. this should allow at most one lost log message. yes? no? maybe? should I post this elsewhere? thanks
(In reply to comment #16) > Joe: Sure, I understand the differences. If it's the case that the only > difference between using these two signals is that -USR1 would cause connections > to be not dropped then why is -HUP used in the log rotation script? > Is the problem described in this thread the reason for this choice? a -HUP causes processes to be instantly killed. This means a half way completed transaction fould fail. an example could be a 20 minute subversion commit. SIGUSR1 would let it complete. a SIGHUP would lose it.
For RHEL5 we can hopefully implement the complete solution so that *no* log messages are lost after a graceful restart even if piped loggers are used. This is technically quite simple but risky and requires a kind of an interface so can't really be done for RHEL3/4. It would be something of a hack to turn off keepalive globally after receiving a signal and might not be acceptable upstream. But yes, if you could post an RFE upstream we could see whether there would be resistance to that -- http://issues.apache.org/ (BTW, even if you have *minor* problems with the U6Beta packages please let us know especially if they are regressions!)
The "hung children" issue was fixed in the U6/U2 errata along with the other graceful restart/piped logger fixes. (though the bug reference was missed from the advisory).