1680481 – libcap-ng segfault after dlclose (httpd crash on reload with php-imap)

Bug 1680481 - libcap-ng segfault after dlclose (httpd crash on reload with php-imap)

Summary: libcap-ng segfault after dlclose (httpd crash on reload with php-imap)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	libcap-ng
Sub Component:
Version:	29
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Steve Grubb
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (5):	1599075 1657297 1659768 1676842 1694321 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-25 07:29 UTC by Trevor Cordes
Modified:	2019-10-21 19:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libcap-ng-0.7.9-7.fc31 libcap-ng-0.7.9-5.fc29 libcap-ng-0.7.9-7.fc30
Clone Of:
Environment:
Last Closed:	2019-03-12 22:18:47 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sampling of gdb traces on coredumps apache produces (6.00 KB, text/plain) 2019-02-25 07:29 UTC, Trevor Cordes	no flags	Details
httpd -M output (2.60 KB, text/plain) 2019-02-25 09:15 UTC, Trevor Cordes	no flags	Details
fix pthread_atfork usage (751 bytes, patch) 2019-02-28 14:03 UTC, Joe Orton	no flags	Details \| Diff
View All

Description Trevor Cordes 2019-02-25 07:29:54 UTC

Created attachment 1538325 [details]
sampling of gdb traces on coredumps apache produces

Description of problem:
Ever since upgrade from Fedora 27 to 29 one box will have apache go into a segfault child-restart loop weekly when logrotate runs.  If I remove the php-imap rpm, the problem goes away.  From coredumps it appears something is causing random corruption in apache/modphp.

Bug only happens on reload, never restart of apache using systemd.  Apache seems 100% stable otherwise as long as reload is never issued.

The segfaults start immediately after the reload, no hit on the website is required (I blocked all access to 80/443 and bug still occurs).  It segfaults whether apache has been running a while (hours/days) or if I just restarted apache cleanly and no hits have occurred yet.

I think the php-imap might be a red herring.  It seems require to trigger the bug, but we're not using it at all from what I can tell.  And it seems fine on another box.

I have run tons of verifies, reinstalls, etc of relevant rpms, doesn't help, things look normal otherwise.  Box has ECC and is Ryzen-based but the bug occurred on an identical (decommissioned) box when it was i686-based.  It was F29 that started the bug.

rpm -V `rpm -qa | grep -iP 'php|httpd|mod_'`
looks normal.


Version-Release number of selected component (if applicable):
php-7.2.15-1.fc29.x86_64
php-imap-7.2.15-1.fc29.x86_64
httpd-2.4.38-2.fc29.x86_64
* we are not using any php/httpd related rpms or modules from any other source than fedora repos


How reproducible:
On demand on this one box when php-imap is installed, I can generate coredumps at will.  I tried creating a similar (not identical) config on another box andI cannot reproduce this on another box yet.  I'm trying to figure out what is different.


Steps to Reproduce:
1. Install typical complement of php rpms, have apache in modphp mode
2. systemctl reload httpd.service

Actual results:
[Mon Feb 25 00:27:15.934157 2019] [mpm_prefork:notice] [pid 21721] AH00163: Apache/2.4.38 (Fedora) OpenSSL/1.1.1a PHP/7.2.15 mod_perl/2.0.10 Perl/v5.28.1 configured -- resuming normal operations
[Mon Feb 25 00:27:15.934173 2019] [core:notice] [pid 21721] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Mon Feb 25 00:27:16.938676 2019] [core:notice] [pid 21721] AH00051: child pid 21771 exit signal Segmentation fault (11), possible coredump in /etc/httpd
[Mon Feb 25 00:27:16.938760 2019] [core:notice] [pid 21721] AH00051: child pid 21773 exit signal Segmentation fault (11), possible coredump in /etc/httpd
[Mon Feb 25 00:27:16.938803 2019] [core:notice] [pid 21721] AH00051: child pid 21775 exit signal Segmentation fault (11), possible coredump in /etc/httpd
[Mon Feb 25 00:27:16.938845 2019] [core:notice] [pid 21721] AH00051: child pid 21777 exit signal Segmentation fault (11), possible coredump in /etc/httpd
[Mon Feb 25 00:27:16.938887 2019] [core:notice] [pid 21721] AH00051: child pid 21779 exit signal Segmentation fault (11), possible coredump in /etc/httpd
... many coredumps a second until parent is killed
... web server does not respond to requests during this time

Expected results:
normal operation, no coredumps, respond to requests


Additional info:
I removed rpms one by one until the problem disappeared and it was php-imap that I stumbled upon.  Removing it and things work bug free.  Install it and bug happens every time.

I did extensive strace (such that I can on child processes, which isn't much except on shutdown, not startup) and gdb on cores.

Each coredump in a set (i.e. during the same crash test before I kill the parent) seems similar, but often not identical.  Interestingly, each new crash test produces (usually) wildly varying coredump results (different segfault points and args to fns).  This hints to me that major memory corruption is going on here.

Most often the segfault occurs in some function of gd (php-gd), but not always.  The args to functions are usually totally not sane (like images with 0 width and 2 billion height).

See attached sampling of various gdb results from coredumps.

Comment 1 Remi Collet 2019-02-25 07:58:23 UTC

According to provided backtrace, it looks like you are using httpd with mod_php

Please confirm, with "httpd -M" output


Notice: default provided configuration is httpd in event mode (mpm_event_module) and without mod_php (PHP scripts are executed via proxy_fcgi_module and php-fpm)

Comment 2 Trevor Cordes 2019-02-25 09:13:54 UTC

Yes, we are using mod_php on this box and php_fpm has been removed.  We will convert to fpm in the future, but not at this moment.

Comment 3 Remi Collet 2019-02-25 09:15:39 UTC

Sorry but mpm_event_module + mod_php is NOT supported, and cannot
See warning on http://php.net/manual/en/install.unix.apache2.php

Comment 4 Trevor Cordes 2019-02-25 09:15:50 UTC

Created attachment 1538361 [details]
httpd -M output

Comment 5 Remi Collet 2019-02-25 09:16:29 UTC

Also in module configuration file:  "# ZTS module is not supported, so FPM is preferred"

Comment 6 Trevor Cordes 2019-02-25 09:17:30 UTC

We are not event.  We are mpm_prefork_module

Comment 7 Trevor Cordes 2019-02-25 09:21:55 UTC

Not using ZTS.  We will when we switch to FPM, but not right now.

Prefork + mod_php may be slightly dated but AFAIK it's a fully supported config still.  Segfault and possible stack clobbering seems rather nasty in any supported configuration.

Comment 8 Remi Collet 2019-02-25 09:35:03 UTC

So seems more a httpd issue, probably similar to bug #1659768

Comment 9 Joe Orton 2019-02-25 11:39:07 UTC

Are there any unused modules here which you can remove?  I see mod_perl is listed.

One theory is that this is some OpenSSL (re-)initialization bug, php-imap links to libcrypto so it may not be a red herring.

Comment 10 Joe Orton 2019-02-25 11:40:36 UTC

Could you attach output from "rpm -qf /etc/httpd/modules/*.so".

Comment 11 Trevor Cordes 2019-02-26 05:29:45 UTC

#rpm -qf /etc/httpd/modules/*.so |sort -u
httpd-2.4.38-2.fc29.x86_64
mod_http2-1.11.1-1.fc29.x86_64
mod_perl-2.0.10-13.fc29.x86_64
mod_ssl-2.4.38-2.fc29.x86_64
php-7.2.15-1.fc29.x86_64

#rpm -qf /usr/lib64/php/modules/*.so | sort -u
php-bcmath-7.2.15-1.fc29.x86_64
php-common-7.2.15-1.fc29.x86_64
php-gd-7.2.15-1.fc29.x86_64
php-imap-7.2.15-1.fc29.x86_64
php-intl-7.2.15-1.fc29.x86_64
php-json-7.2.15-1.fc29.x86_64
php-mbstring-7.2.15-1.fc29.x86_64
php-mysqlnd-7.2.15-1.fc29.x86_64
php-pdo-7.2.15-1.fc29.x86_64
php-pecl-libsodium-1.0.7-4.fc29.x86_64
php-pecl-mcrypt-1.0.1-4.fc29.x86_64
php-pecl-zip-1.15.4-1.fc29.x86_64
php-process-7.2.15-1.fc29.x86_64
php-soap-7.2.15-1.fc29.x86_64
php-tidy-7.2.15-1.fc29.x86_64
php-xml-7.2.15-1.fc29.x86_64

I'll try removing a few at random to see what happens.  I'll start with mod_perl, not sure if the devs are using it (I'll have to ask).  I'm guessing half the modules are not really needed.

Comment 12 Trevor Cordes 2019-02-26 06:27:15 UTC

I reduced the number of rpms installed to:

#rpm -qa | grep php- | sort
php-7.2.15-1.fc29.x86_64
php-cli-7.2.15-1.fc29.x86_64
php-common-7.2.15-1.fc29.x86_64
php-debuginfo-7.2.15-1.fc29.x86_64
php-pecl-mcrypt-debuginfo-1.0.1-4.fc29.x86_64
php-pecl-zip-debuginfo-1.15.4-1.fc29.x86_64
#rpm -qf /usr/lib64/php/modules/*.so | sort -u
php-common-7.2.15-1.fc29.x86_64

And I took out mod_perl.

That's as low as it'd let me go without uninstalling php/httpd itself (well, ignoring the harmless debuginfos).

With just those installed, if I have php-imap installed, the bug (still) occurs.  If I have php-imap out, bug doesn't occur.  So maybe not a red herring after all, as you said.

Now the cores are crashing at random core module places like:
/usr/lib64/php/modules/phar.so

So it has nothing to do with gd per se, it's must be php-imap just hosing random apache/php memory.

I got it to reproduce on another 2 boxes!  A Xeon one with ECC also and an older Core2 Quad box.  So that is good news, it's not just that one box anymore.  However, I can't get it to reproduce on my main workstation (also C2Q).  The boxes that show the bug are all configured somewhat similarly but not identical.  Thinking about SSL, one of the boxes has a real cert (the original box) and 2 have self-signed certs (rarely used).

The C2Q box had slightly different error.log output:
[Tue Feb 26 00:23:18.668887 2019] [core:notice] [pid 13556] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
*** stack smashing detected ***: <unknown> terminated
*** stack smashing detected ***: <unknown> terminated
*** stack smashing detected ***: <unknown> terminated
*** stack smashing detected ***: <unknown> terminated
*** stack smashing detected ***: <unknown> terminated
[Tue Feb 26 00:23:19.675292 2019] [core:notice] [pid 13556] AH00051: child pid 13609 exit signal Aborted (6), possible coredump in /etc/httpd
[Tue Feb 26 00:23:19.906095 2019] [core:notice] [pid 13556] AH00051: child pid 13611 exit signal Aborted (6), possible coredump in /etc/httpd

So it seems it is a stack corruption issue.  Not sure why only that box reports it.

Comment 13 Joe Orton 2019-02-28 07:55:29 UTC

OK that's great, thanks for the info. I can also reproduce with php-imap installed, investigating further.

Comment 14 Joe Orton 2019-02-28 09:17:58 UTC

*** Bug 1659768 has been marked as a duplicate of this bug. ***

Comment 15 Joe Orton 2019-02-28 14:02:48 UTC

Many thanks to Florian for helping track this down.  It is a libcap-ng bug.

Please try this libcap-ng build:

https://koji.fedoraproject.org/koji/taskinfo?taskID=33103683

yum install https://kojipkgs.fedoraproject.org//work/tasks/3683/33103683/libcap-ng-0.7.9-4.2.fc29.x86_64.rpm

Comment 16 Joe Orton 2019-02-28 14:03:36 UTC

Created attachment 1539507 [details]
fix pthread_atfork usage

Comment 17 Trevor Cordes 2019-02-28 14:15:45 UTC

Yup, that kojipkg fixes it!  Tried it a few times, couldn't get it to bug out again.  Thanks for the quick fix, super awesome.  No idea how Florian solved that so fast.  Would have never guessed libcap-ng.

Comment 18 Florian Weimer 2019-02-28 15:06:04 UTC

(In reply to Trevor Cordes from comment #17)
> Yup, that kojipkg fixes it!  Tried it a few times, couldn't get it to bug
> out again.  Thanks for the quick fix, super awesome.  No idea how Florian
> solved that so fast.  Would have never guessed libcap-ng.

Me neither, Joe Orton figured it out.  I just looked the DSO to figure out what was wrong with it.

Comment 19 redhat 2019-02-28 15:50:33 UTC

I had a look at the atatched diff and I don't understand it. Could someone please explain why this oddly enough should be the correct one. Did it came from the upstream project?

Comment 20 Joe Orton 2019-02-28 16:11:05 UTC

There was some offline discussion.  tl;dr the way libcap-ng registers atfork handlers was not safe, which caused crashes on fork after PHP was unloaded if php-imap was installed, because php-imap links to something which pulls in libcap-ng.

This Debian bug tracks the same case, though upstream libcap-ng used a workaround (preventing libcap-ng from being unloaded) rather than finding the root cause which was improper use of pthread_atfork. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904808

Comment 21 Florian Weimer 2019-02-28 16:13:49 UTC

This is what I'm trying to submit to the Debian bug:

From: Florian Weimer <fweimer>
Subject: libcap-ng0: libcap-ng's use of pthread_atfork causes segfaults
To: 904808.org, 904808-submitter.org
Date: Thu, 28 Feb 2019 15:01:28 +0100
Message-ID: <87tvgoovuv.fsf.redhat.com>

The problem here is the weak declaration:

$ eu-readelf --symbols=.dynsym /lib64/libcap-ng.so.0.0.0 | grep pthread_atfork
   28: 0000000000000000      0 NOTYPE  WEAK   DEFAULT    UNDEF pthread_atfork

In the Fedora 29 build, the constructor looks like this:

Dump of assembler code for function init_lib:
   0x00000000000025d0 <+0>:	endbr64 
   0x00000000000025d4 <+4>:	cmpq   $0x0,0x4a0c(%rip)        # 0x6fe8
   0x00000000000025dc <+12>:	je     0x25ee <init_lib+30>
   0x00000000000025de <+14>:	lea    0xcb(%rip),%rdx        # 0x26b0 <deinit>
   0x00000000000025e5 <+21>:	xor    %esi,%esi
   0x00000000000025e7 <+23>:	xor    %edi,%edi
   0x00000000000025e9 <+25>:	jmpq   0x24f0 <pthread_atfork@plt>
   0x00000000000025ee <+30>:	retq

src/cap-ng.c has this:

/*
 * The pthread_atfork function is being made weak so that we can use it
 * if the program is linked with pthreads and not requiring it for
 * everything that uses libcap-ng.
 */
extern int __attribute__((weak)) pthread_atfork(void (*prepare)(void),
        void (*parent)(void), void (*child)(void));
…
static void init_lib(void) __attribute__ ((constructor));
static void init_lib(void)
{
        if (pthread_atfork)
                pthread_atfork(NULL, NULL, deinit);
}

This is wrong.  pthread_atfork needs to be *strong* reference, otherwise
the implementation in libc_nonshared.a is not used.  This implementation
provides the correct __dso_handle argument, allowing unregistration at
dlclose.

For glibc 2.28 and later, the fix should be simple: Just delete the weak
declaration.  For older glibc versions, you need to call
__register_atfork directly, with an explicit __dso_handle argument.  (I
believe systemd has an example of this which looks correct.)  This is a
stable glibc ABI, despite all those glibc internals.

We cannot fix this in libpthread because of the tail call in init_lib.
It destroys the caller's stack frame, so the identity of the calling DSO
is not available to pthread_atfork.  (Without the tail call, we could
use __builtin_return_address (0) and the internal variant of dladdr to
figure out the caller.)

Comment 22 Joe Orton 2019-02-28 16:33:54 UTC

*** Bug 1676842 has been marked as a duplicate of this bug. ***

Comment 23 Steven Haigh 2019-03-05 11:00:30 UTC

*** Bug 1657297 has been marked as a duplicate of this bug. ***

Comment 24 Joe Orton 2019-03-08 10:56:05 UTC

Commit: http://pkgs.fedoraproject.org/rpms/libcap-ng/c/13d7626fbca5931ee6266c013fdbb2cba47a50f1

Comment 25 Joe Orton 2019-03-08 11:19:19 UTC

Commit: http://pkgs.fedoraproject.org/rpms/libcap-ng/c/2589c4ef52c27119ff4d070dd672a42b2da24d2b

Comment 26 Joe Orton 2019-03-08 11:25:44 UTC

Package: libcap-ng-0.7.9-7.fc31

Comment 27 Fedora Update System 2019-03-08 12:13:05 UTC

libcap-ng-0.7.9-5.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2cc0b7524f

Comment 28 Fedora Update System 2019-03-08 12:13:11 UTC

libcap-ng-0.7.9-7.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-3805415f13

Comment 29 Joe Orton 2019-03-08 12:13:48 UTC

Since it seems like a number of httpd/php users are hitting this I've pushed updates for f29/30, hope this is OK Steve.

Comment 30 Steve Grubb 2019-03-08 13:47:35 UTC

Thanks Joe.

Comment 31 Fedora Update System 2019-03-08 19:45:28 UTC

libcap-ng-0.7.9-7.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-3805415f13

Comment 32 Fedora Update System 2019-03-08 22:40:49 UTC

libcap-ng-0.7.9-5.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-2cc0b7524f

Comment 33 Fedora Update System 2019-03-12 22:18:47 UTC

libcap-ng-0.7.9-5.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 34 Fedora Update System 2019-03-29 19:11:09 UTC

libcap-ng-0.7.9-7.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 35 Joe Orton 2019-04-23 15:25:50 UTC

*** Bug 1599075 has been marked as a duplicate of this bug. ***

Comment 36 Zbigniew Jędrzejewski-Szmek 2019-10-21 19:33:57 UTC

*** Bug 1694321 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.