127981 – rotatelogs and graceful failure

Bug 127981 - rotatelogs and graceful failure

Summary: rotatelogs and graceful failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	httpd
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Joe Orton
QA Contact:
Docs Contact:
URL:	http://nagoya.apache.org/bugzilla/sho...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-07-15 21:57 UTC by Christopher McCrory
Modified:	2018-11-07 16:28 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-10-17 08:37:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Christopher McCrory 2004-07-15 21:57:05 UTC

Description of problem:
piping apache logs through rotatelogs can hang some apache children on
long lived connections

Version-Release number of selected component (if applicable):

 <= 2.0.46-32.ent.3

How reproducible:
intermitant

Steps to Reproduce:
1.  pipe apache logs through rotatelogs
2.  create long lived client connection
3.  kill -SIGUSR1 httpd
  
Actual results:

hung child ( no log entry )

Expected results:
child exits with log entry

Additional info:

Comment 1 Christopher McCrory 2004-07-15 21:59:54 UTC

From previous email:

>       I've run into this bug :
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26467
> 
>       Any chance of getting it fixed?   From the list thread
mentioned in the
> bug, this problem is easily identifiable, but seems hard to fix.

Yep, as normal please do file a bug in the Red Hat bugzilla database for
this, and we can prioritize development work on the issue appropriately.

>       From my point of view, sysadmin not developer, I'm wondering why
> rotatelogs gets killed on a graceful restart.  It seems that only the
> httpd children need to be restarted.  yes? no? maybe? 

Well, possibly that could be done, but that's not actually necessary to
fix the bug.  It might be confusing behaviour too: if you restart after
making a *change* to a piped logger script, you'd expect the new script
to be used after the restart.

The simplest fix for this bug is as outlined on the upstream report:
simply to allow the rotatelogs process for each server "generation" to
terminate only when all the httpd children for that generation have
terminated.  This is simple to implement since it can be done by using
normal pipe semantics, that you get an EOF when all the processes at the
other end of the pipe have exited.

Regards,

joe

Comment 2 Christopher McCrory 2004-07-15 22:00:52 UTC

target for RHEL3 U3 ?  :)

Comment 3 Joe Orton 2004-07-16 07:59:00 UTC

Thanks for filing the bug.  Even "the simplest fix" is not really a
dead-simple fix, so pretty unlikely to fix this for U3 at this stage.

Comment 4 Jolyon Terwilliger 2004-10-05 02:26:58 UTC

We have 3 licenses for RH ES3 and this is a recurring problem.  Both
DEV and Production servers are intermittently failing to restart at
4:02 on Sunday morning after logrotate.  Contents of
/etc/logrotate.d/httpd are:

/var/log/httpd/*log {
    missingok
    notifempty
    sharedscripts
    postrotate
        /bin/kill -HUP `cat /var/run/httpd.pid 2>/dev/null` 2>
/dev/null || true
    endscript
}

I have yet to find a 'workaround' for this anywhere on the net.

Is this an Apache problem?  Could a viable script workaround for this
be posted here to ENSURE apache will be restarted ?  We would prefer a
Redhat solution to this instead of crafting our own hack.

Comment 5 Joe Orton 2004-10-06 15:31:05 UTC

Jolyon, the issue described here should only be triggered by a
graceful restart, and logrotate in RHEL3 does uses a non-graceful
restart.  Please open an issue with support or file a separate bug
describing the issue you are having.

Comment 6 Christopher McCrory 2005-02-10 16:41:18 UTC

Any news on a fix for the graceful restart hang?

the apache bugzilla doesn't have any updates either.

Comment 7 Joe Orton 2005-02-16 13:35:41 UTC

Not yet sorry, progress on the patch got stalled.

Comment 8 Christopher McCrory 2005-02-16 15:45:59 UTC

/me willing to test patch(s)

Comment 9 Christopher McCrory 2005-05-11 18:54:35 UTC

Any progress on this?

If not, is there an apache developer willing to be "sponsored" to work on this?

Comment 10 Joe Orton 2005-05-16 08:52:19 UTC

In fact, fortuitous timing, Jeff Trawick upstream has produced a patch which
avoids the hangs.  I'll build some test packages including this.

There are really two aspects to this problem: (1) the fact that log entries can
be lost for requests made during a graceful restart if using piped loggers, and
(2) the fact that  children can hang during a graceful restart.   Jeff's patch
fixes (2) but not (1);  is that acceptable to you?

Comment 11 Christopher McCrory 2005-05-16 12:32:50 UTC

works for me.

I already have to deal with both (1) and (2), so any fix is better :)

If you attach the patch or link I can build test packages locally.

Comment 12 Joe Orton 2005-05-16 13:01:57 UTC

Test packages with the patch applied are available from here:

http://people.redhat.com/jorton/Taroon-httpd/

Comment 13 Joe Orton 2005-05-17 15:55:45 UTC

FYI, I just updated a newer set of packages there (*-2.0.46-48.ent) which fix
several other issues in the piped logger code (potential parent process segfault
in configurations using large numbers of piped loggers).

Comment 14 Vincent Bray 2005-08-08 10:34:10 UTC

Hi,
This is a little OT, but before filing another silly bug (cf. 164367) I'll ask
here. Is there a reason for the /etc/logrotate.d/httpd script from the httpd
package to use -HUP rather that -USR1? This thread seems to suggest that a
graceful restart may fail with piped logs. I use cronolog with this feature.

Comment 15 Joe Orton 2005-08-08 10:41:56 UTC

Vincent: SIGHUP is just a "reload", SIGUSR1 is a "graceful restart" - only the
latter suffers from this problem when using piped loggers.

Comment 16 Vincent Bray 2005-08-08 11:11:25 UTC

Joe: Sure, I understand the differences. If it's the case that the only
difference between using these two signals is that -USR1 would cause connections
to be not dropped then why is -HUP used in the log rotation script?
Is the problem described in this thread the reason for this choice?

Comment 17 Joe Orton 2005-08-08 11:18:56 UTC

It's really just historical that the logrotate script uses -HUP rather than
-USR1, but yes, the fact that the graceful restart can (without the fix for this
bug) cause child processes to hang if piped loggers are used is indeed a good
reason why -HUP is should be used by default rather than -USR1.

In the future, when the graceful restart process is made more robust even when
piped loggers are used, then it would be good to change the logrotate script to
use -USR1.

Comment 18 Christopher McCrory 2005-09-13 20:28:29 UTC

More info:

I've been running the patched binaries in devel for several months without problems.

I've been running the RHEL3U6beta binaries for a couple weeks in production
without any major problems.

The only issue I have seen are when we get a (usually bad) spider come in using
keep alive.  the 'soon to be dead' child can hang around for a while processing
many spider requests.

missing a log entry or two is acceptable.  missing many is less desirable.

possible resolution:

if child gets SIGUSR1 ; then stop using keep alive.

this should allow at most one lost log message.

 yes? no? maybe?


 should I post this elsewhere?

thanks

Comment 19 Christopher McCrory 2005-09-13 20:32:20 UTC

(In reply to comment #16)
> Joe: Sure, I understand the differences. If it's the case that the only
> difference between using these two signals is that -USR1 would cause connections
> to be not dropped then why is -HUP used in the log rotation script?
> Is the problem described in this thread the reason for this choice?


a -HUP causes processes to be instantly killed.  This means a half way completed
transaction fould fail.  an example could be a 20 minute subversion commit.  
SIGUSR1 would let it complete.  a SIGHUP would lose it.

Comment 20 Joe Orton 2005-09-14 09:55:14 UTC

For RHEL5 we can hopefully implement the complete solution so that *no* log
messages are lost after a graceful restart even if piped loggers are used.  This
is technically quite simple but risky and requires a kind of an interface so
can't really be done for RHEL3/4.

It would be something of a hack to turn off keepalive globally after receiving a
signal and might not be acceptable upstream.  But yes, if you could post an RFE
upstream we could see whether there would be resistance to that --
http://issues.apache.org/

(BTW, even if you have *minor* problems with the U6Beta packages please let us
know especially if they are regressions!)

Comment 21 Joe Orton 2005-10-17 08:37:03 UTC

The "hung children" issue was fixed in the U6/U2 errata along with the other
graceful restart/piped logger fixes. (though the bug reference was missed from
the advisory).

Note You need to log in before you can comment on or make changes to this bug.