Bug 1288395 - httpd segfault when logrotate invoked
httpd segfault when logrotate invoked
Status: ON_QA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: httpd (Show other bugs)
7.1
All Linux
medium Severity unspecified
: rc
: 7.4
Assigned To: Luboš Uhliarik
BaseOS QE - Apps
:
Depends On:
Blocks: 1298243 1420851 1465904 1466370 1473612 1289025
  Show dependency treegraph
 
Reported: 2015-12-04 01:05 EST by Hisanobu Okuda
Modified: 2017-10-03 11:55 EDT (History)
8 users (show)

See Also:
Fixed In Version: httpd-2.4.6-72.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Apache Bugzilla 57177 None None None Never

  None (edit)
Description Hisanobu Okuda 2015-12-04 01:05:49 EST
Description of problem:
The httpd process dies with segfault when logrotate is invoked.

Version-Release number of selected component (if applicable):
httpd-2.4.6-31.el7_1.1.x86_64
logrotate-3.8.6-4.el7.x86_64

How reproducible:


Steps to Reproduce:
1. run logrotate

Actual results:


Expected results:


Additional info:
Comment 1 Jan Kaluža 2015-12-04 01:44:48 EST
What other httpd packages do you have installed? It does not happen to me with only "httpd" and "mod_ssl" installed on my system with clean configuration. What's the httpd configuration you use?

Can you get a coredump from crash? The description how to do that is here: https://access.redhat.com/solutions/1467
Comment 2 Hisanobu Okuda 2015-12-04 02:32:40 EST
Reviewing my costomer's core file, I found the function handler pfn_ap_logio_get_last_bytes has a wrong value. The backtrace is as follows:-


(gdb) bt
#0  0x00007f633cd0ab60 in ?? ()
#1  0x00007f6343b377d2 in ap_increment_counts (sb=0x7f6344a004d0, r=r@entry=0x7f6344a06310) at scoreboard.c:369
...


The function ap_increment_counts is :-


359: AP_DECLARE(void) ap_increment_counts(ap_sb_handle_t *sb, request_rec *r)
360: {
361:     worker_score *ws;
362:     apr_off_t bytes;
363: 
364:     if (!sb)
365:         return;
366: 
367:     ws = &ap_scoreboard_image->servers[sb->child_num][sb->thread_num];
368:     if (pfn_ap_logio_get_last_bytes != NULL) {
369:        bytes = pfn_ap_logio_get_last_bytes(r->connection);


And the handler pfn_ap_logio_get_last_bytes is :-


(gdb) print pfn_ap_logio_get_last_bytes
$9 = (apr_OFN_ap_logio_get_last_bytes_t *) 0x7f633cd0ab60


It looks pfn_ap_logio_get_last_bytes points a head of a valid function, but in fact, it is a wrong value.


(gdb) x 0x7f633cd0ab60
0x7f633cd0ab60: Cannot access memory at address 0x7f633cd0ab60


Calling non-existent memory address killed the process with segfault. This is likely caused by unloading a library which should have a valid function where pfn_ap_logio_get_last_bytes points. The problem is "what is unloaded?".

I attached a debugger to my httpd process using the configs and noticed that pfn_ap_logio_get_last_bytes points the function <ap_logio_get_last_bytes> in mod_logio.so. 
Then I reviewed again the core file, and found pfn_ap_logio_get_last_bytes in the core also points somewhere in mod_logio.so as well:-


(gdb) print pfn_ap_logio_get_last_bytes
$1 = (apr_OFN_ap_logio_get_last_bytes_t *) 0x7f633cd0ab60

(gdb) info sharedlibrary 
From                To                  Syms Read   Shared Object Library
...
0x00007f633cb14a30  0x00007f633cb14de4  Yes         ./etc/httpd/modules/mod_logio.so <===where pfn_ap_logio_get_last_bytes points


The customer is using logrotate command to rotate logs. It invokes `systemctl reload httpd.service`, and `systemctl reload httpd.service` invokes `/usr/sbin/httpd $OPTIONS -k graceful`. Searching for "httpd logrotate segfault", you can find some reports that say the "graceful" command is problematic. In fact, changing `systemctl reload httpd.service` to `systemctl restart httpd.service`, the issue is gone on the customer's setup.
Comment 4 Hisanobu Okuda 2015-12-04 02:46:26 EST
Jan, honestly to say, I could not reproduce the issue, but theoretically it can happen in the race condition that mod_logio is unloaded during an access is processed. It is caused by unloading mod_logio, you can (theoretically) reproduce the issue without any additional packages.
Comment 5 Jan Kaluža 2015-12-04 03:46:24 EST
Thanks for the additional information. I've found out this can be fixed by following upstream patch:

https://svn.apache.org/viewvc?view=revision&revision=1668532
Comment 6 Jan Kaluža 2015-12-04 03:52:53 EST
There's also follow-up commit:

https://svn.apache.org/viewvc?view=revision&revision=1668553

Note You need to log in before you can comment on or make changes to this bug.