Bug 1599075 - httpd coredump
Summary: httpd coredump
Keywords:
Status: CLOSED DUPLICATE of bug 1680481
Alias: None
Product: Fedora
Classification: Fedora
Component: mod_wsgi
Version: 29
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Matthias Runge
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1612994 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-08 14:25 UTC by redhat
Modified: 2019-04-23 15:25 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-04-23 15:25:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
One of the coredumps (1.89 MB, application/octet-stream)
2018-07-08 14:25 UTC, redhat
no flags Details

Description redhat 2018-07-08 14:25:02 UTC
Created attachment 1457283 [details]
One of the coredumps

Description of problem:
Apache daemon dumps core.

Version-Release number of selected component (if applicable):
httpd-2.4.33-5.fc28

How reproducible:
Happens from time to time.

Steps to Reproduce:
1. Don't know
2.
3.

Actual results:
Coredump

Expected results:
No coredump

Additional info:

Comment 1 Joe Orton 2018-07-20 17:55:03 UTC
Please try 2.4.34 from updates-testing.  If you can still reproduce, attach a backtrace (not a core dump).

Comment 2 redhat 2018-07-30 15:44:00 UTC
           PID: 13750 (httpd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Sun 2018-07-29 03:39:05 CEST (1 day 14h ago)
  Command Line: /usr/sbin/httpd -DFOREGROUND
    Executable: /usr/sbin/httpd
 Control Group: /system.slice/httpd.service
          Unit: httpd.service
         Slice: system.slice
       Boot ID: afa847893dff4ef2bb75ab23404bd2f9
    Machine ID: e11622fca54a439a9b3954265d001569
      Hostname: manicminer.lan.zx-spectrum
       Storage: /var/lib/systemd/coredump/core.httpd.0.afa847893dff4ef2bb75ab23404bd2f9.13750.1532828345000000.lz4
       Message: Process 13750 (httpd) of user 0 dumped core.
                
                Stack trace of thread 13750:
                #0  0x00007f5c992ce6c0 n/a (n/a)
                #1  0x00007f5cb14f7a68 apr_proc_fork (libapr-1.so.0)
                #2  0x00007f5ca2c48f29 wsgi_start_process (mod_wsgi.so)
                #3  0x00007f5ca2c4b598 wsgi_start_daemons (mod_wsgi.so)
                #4  0x00007f5ca2c4bf0b wsgi_hook_init (mod_wsgi.so)
                #5  0x0000564d58557083 ap_run_post_config (httpd)
                #6  0x0000564d585327bf main (httpd)
                #7  0x00007f5cb0d0f24b __libc_start_main (libc.so.6)
                #8  0x0000564d5853291a _start (httpd)

Comment 3 Joe Orton 2018-08-06 17:14:29 UTC
*** Bug 1612994 has been marked as a duplicate of this bug. ***

Comment 4 Joe Orton 2018-08-06 17:21:15 UTC
I don't know what's going on here, looks like a crash in mod_wsgi though that hasn't been updated in a while so possibly is not at fault here.

References to dbus in bug 1612994 might be something random, looks like some unmapped function is being called.  Is someone registering an atfork handler?

Can we get some mod_wsgi package versions from reporters?

Comment 5 redhat 2018-08-07 14:11:05 UTC
python2-mod_wsgi-4.5.20-4.fc28.x86_64

Comment 6 Patrick Dung 2018-08-07 14:54:39 UTC
For my case in bug 1612994, it's python3-mod_wsgi-4.5.20-4.fc28.x86_64.

Comment 7 Graham Dumpleton 2018-08-08 01:12:36 UTC
Of note, if you look at https://bugzilla.redhat.com/show_bug.cgi?id=1612994 it also shows:

[Tue Aug 07 00:01:03.744087 2018] [core:notice] [pid 25705:tid 140235695196416] AH00051: child pid 16914 exit signal Segmentation fault (11), possible coredump in /etc/httpd
[Tue Aug 07 00:01:03.744109 2018] [cgid:error] [pid 25705:tid 140235695196416] AH01239: cgid daemon process died, restarting

So it isn't just mod_wsgi daemon processes which are crashing, but also cgid daemon process.

This suggests it is a broader issue and not mod_wsgi specific.

The errors in the other report of:

Aug  7 00:01:01 server1 kernel: [35331.845063] httpd[16900]: segfault at 7f8b1798d6c0 ip 00007f8b1798d6c0 sp 00007ffe71b65db8 error 14 in libdbus-1.so.3.19.7[7f8b183d4000+50000]

are therefore worth looking at, since that is a separate package again which may be triggering issues across any daemon sub process in Apache managed using the APR routines for other processes. Both mod_wsgi daemons and mod_cgid use these same APR routines.

https://apr.apache.org/docs/apr/2.0/group__apr__thread__proc.html#ga5a9d123afe81eaa97955fbe45704b662

So possibly there is some conflict.

Comment 8 Joe Orton 2018-08-08 07:16:36 UTC
Good spot, thanks Graham.  Let's keep this on httpd until we get better data.

Can both reporters here give some context.  Are these running servers which have recently started seeing crashes?  If so, can you identify what packages were updated?

I would bet the house there is something calling pthread_atfork() to register an atfork handler and then getting unloaded, but I have no idea what.

grep atfork /etc/httpd/modules/*.so

Also, can someone try:

# systemctl stop httpd
# LD_DEBUG=all LD_DEBUG_OUTPUT=/tmp/httpd.ld.debug httpd -DFOREGROUND &
... wait a bit...
# httpd -k stop

then gzip /tmp/httpd.ld.debug.* and mail it to me, or attach here (privately if you prefer).

Comment 9 Patrick Dung 2018-08-08 16:06:53 UTC
For me, I installed IPA a few days ago. I got problem with httpd (mpm_event) 2 days ago, and httpd (mpm_worker) on yesterday. For both cases, it is having problem when it performed logrotate. I changed to use mpm_prefork and it seems fine just now (logrotate performed without problem).

Comment 10 redhat 2018-08-10 06:06:45 UTC
Well, it started crashing right after upgrading from FC27 to FC28. I'm using eGroupware, a PHP-based, web application since ages.

Comment 11 redhat 2018-08-10 06:15:27 UTC
(In reply to redhat from comment #10)
> I'm using eGroupware, a PHP-based, web application since ages.

I was wrong - mod_wsgi is used here with a private Firefox sync server.

Comment 12 Lonni J Friedman 2018-09-29 17:30:50 UTC
Also seeing this problem ever since upgrading from F27 to F28:

Process 28690 (/usr/sbin/httpd) of user 0 dumped core.#012#012Stack trace of thread 28690:
#012#0  0x00007f5b9deed700 n/a (n/a)
#012#1  0x00007f5bbf54ba68 apr_proc_fork (libapr-1.so.0)
#012#2  0x00007f5bafdebf4a procmgr_post_config (mod_fcgid.so)
#012#3  0x00007f5bafde5900 n/a (mod_fcgid.so)
#012#4  0x000055580f4b2573 ap_run_post_config (httpd)
#012#5  0x000055580f48d9cf main (httpd)#012#6  0x00007f5bbed6311b __libc_start_main (libc.so.6)
#012#7  0x000055580f48db2a _start (httpd)

Process 28694 (/usr/sbin/httpd) of user 0 dumped core.
#012
#012Stack trace of thread 28694:
#012#0  0x00007f5b9deed700 n/a (n/a)
#012#1  0x00007f5bbf54ba68 apr_proc_fork (libapr-1.so.0)
#012#2  0x00007f5bada86f29 wsgi_start_process (mod_wsgi.so)
#012#3  0x00007f5bada89598 wsgi_start_daemons (mod_wsgi.so)
#012#4  0x00007f5bada89f0b wsgi_hook_init (mod_wsgi.so)
#012#5  0x000055580f4b2573 ap_run_post_config (httpd)
#012#6  0x000055580f48d9cf main (httpd)#012#7  0x00007f5bbed6311b __libc_start_main (libc.so.6)#012#8  0x000055580f48db2a _start (httpd)


Process 28697 (/usr/sbin/httpd) of user 0 dumped core.
#012#012Stack trace of thread 28697:
#012#0  0x00007f5b9deed700 n/a (libcap-ng.so.0)
#012#1  0x00007f5bbf54ba68 apr_proc_fork (libapr-1.so.0)
#012#2  0x00007f5baf7769dc post_config (mod_dnssd.so)
#012#3  0x000055580f4b2573 ap_run_post_config (httpd)
#012#4  0x000055580f48d9cf main (httpd)
#012#5  0x00007f5bbed6311b __libc_start_main (libc.so.6)#012#6  0x000055580f48db2a _start (httpd)

Process 28703 (/usr/sbin/httpd) of user 0 dumped core.
#012#012Stack trace of thread 28703:
#012#0  0x00007f5b9deed700 n/a (libcap-ng.so.0)
#012#1  0x00007f5bb2ee70af make_child (mod_mpm_event.so)
#012#2  0x00007f5bb2ee8024 event_run (mod_mpm_event.so)
#012#3  0x000055580f49530e ap_run_mpm (httpd)
#012#4  0x000055580f48da03 main (httpd)
#012#5  0x00007f5bbed6311b __libc_start_main (libc.so.6)#012#6  0x000055580f48db2a _start (httpd)

Process 28701 (/usr/sbin/httpd) of user 0 dumped core.
#012#012Stack trace of thread 28701:
#012#0  0x00007f5b9deed700 n/a (libcap-ng.so.0)
#012#1  0x00007f5bb2ee70af make_child (mod_mpm_event.so)
#012#2  0x00007f5bb2ee8024 event_run (mod_mpm_event.so)
#012#3  0x000055580f49530e ap_run_mpm (httpd)
#012#4  0x000055580f48da03 main (httpd)
#012#5  0x00007f5bbed6311b __libc_start_main (libc.so.6)
#012#6  0x000055580f48db2a _start (httpd)

Its happening every time logroate performs a reload on httpd.

python2-mod_wsgi-4.5.20-4.fc28.x86_64
httpd-2.4.34-3.fc28.x86_64

Comment 13 redhat 2018-11-25 09:18:16 UTC
Just updated to FC29, httpd-2.4.37-3.fc29 and python2-mod_wsgi-4.6.4-2.fc29. It got even worse. Now I'm faced with an endless coredump loop:

Sun 2018-11-25 10:14:49 CET   21511     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:49 CET   21534     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:49 CET   21524     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:49 CET   21536     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:50 CET   21515     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:50 CET   21514     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:50 CET   21529     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:51 CET   21522     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:51 CET   21513     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:51 CET   21517     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:51 CET   21584     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:51 CET   21585     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:51 CET   21595     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:52 CET   21615     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:52 CET   21617     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:53 CET   21646     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:55 CET   21648     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:55 CET   21656     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:57 CET   21649     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:57 CET   21647     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:57 CET   21653     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:58 CET   21692     0     0  11 present   /usr/sbin/httpd
Sun 2018-11-25 10:14:58 CET   21736     0     0  11 present   /usr/sbin/httpd
-- Notice: 13 systemd-coredump@.service units are running, output may be 
incomplete.

       Message: Process 22815 (httpd) of user 0 dumped core.
                
                Stack trace of thread 22815:
                #0  0x00007f017269c6b0 gdImageScale (libgd.so.3)
                #1  0x00007ffe3bf71a73 n/a (n/a)

Comment 14 Joe Orton 2018-11-26 09:47:03 UTC
If you are seeing crashes with traces like:

                Stack trace of thread 13750:
                #0  0x00007f5c992ce6c0 n/a (n/a)
                #1  0x00007f5cb14f7a68 apr_proc_fork (libapr-1.so.0)

on F29 please try https://bodhi.fedoraproject.org/updates/FEDORA-2018-4b635e1df4 there was an off-by-two memory corruption bug in a patch to mod_ssl which could break in ways similar to this.

Comment 15 Joe Orton 2019-04-16 16:32:25 UTC
There was a completely unrelated bug with extremely similar symptoms fixed in libcap-ng for Fedora 29 (bug 1680481) - if you saw regressions with crashes in fork going to Fedora 29 please ensure you have both the httpd update mentioned in comment 14 and also:

https://bodhi.fedoraproject.org/updates/FEDORA-2019-2cc0b7524

And let me know if you can reproduce with current stable updates.

Comment 16 redhat 2019-04-23 15:20:56 UTC
Yes, the fix also fixes the problem by this issue. You may close this.

Comment 17 Joe Orton 2019-04-23 15:25:50 UTC
Thanks for confirming the fix.

*** This bug has been marked as a duplicate of bug 1680481 ***


Note You need to log in before you can comment on or make changes to this bug.