Created attachment 1538325 [details] sampling of gdb traces on coredumps apache produces Description of problem: Ever since upgrade from Fedora 27 to 29 one box will have apache go into a segfault child-restart loop weekly when logrotate runs. If I remove the php-imap rpm, the problem goes away. From coredumps it appears something is causing random corruption in apache/modphp. Bug only happens on reload, never restart of apache using systemd. Apache seems 100% stable otherwise as long as reload is never issued. The segfaults start immediately after the reload, no hit on the website is required (I blocked all access to 80/443 and bug still occurs). It segfaults whether apache has been running a while (hours/days) or if I just restarted apache cleanly and no hits have occurred yet. I think the php-imap might be a red herring. It seems require to trigger the bug, but we're not using it at all from what I can tell. And it seems fine on another box. I have run tons of verifies, reinstalls, etc of relevant rpms, doesn't help, things look normal otherwise. Box has ECC and is Ryzen-based but the bug occurred on an identical (decommissioned) box when it was i686-based. It was F29 that started the bug. rpm -V `rpm -qa | grep -iP 'php|httpd|mod_'` looks normal. Version-Release number of selected component (if applicable): php-7.2.15-1.fc29.x86_64 php-imap-7.2.15-1.fc29.x86_64 httpd-2.4.38-2.fc29.x86_64 * we are not using any php/httpd related rpms or modules from any other source than fedora repos How reproducible: On demand on this one box when php-imap is installed, I can generate coredumps at will. I tried creating a similar (not identical) config on another box andI cannot reproduce this on another box yet. I'm trying to figure out what is different. Steps to Reproduce: 1. Install typical complement of php rpms, have apache in modphp mode 2. systemctl reload httpd.service Actual results: [Mon Feb 25 00:27:15.934157 2019] [mpm_prefork:notice] [pid 21721] AH00163: Apache/2.4.38 (Fedora) OpenSSL/1.1.1a PHP/7.2.15 mod_perl/2.0.10 Perl/v5.28.1 configured -- resuming normal operations [Mon Feb 25 00:27:15.934173 2019] [core:notice] [pid 21721] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND' [Mon Feb 25 00:27:16.938676 2019] [core:notice] [pid 21721] AH00051: child pid 21771 exit signal Segmentation fault (11), possible coredump in /etc/httpd [Mon Feb 25 00:27:16.938760 2019] [core:notice] [pid 21721] AH00051: child pid 21773 exit signal Segmentation fault (11), possible coredump in /etc/httpd [Mon Feb 25 00:27:16.938803 2019] [core:notice] [pid 21721] AH00051: child pid 21775 exit signal Segmentation fault (11), possible coredump in /etc/httpd [Mon Feb 25 00:27:16.938845 2019] [core:notice] [pid 21721] AH00051: child pid 21777 exit signal Segmentation fault (11), possible coredump in /etc/httpd [Mon Feb 25 00:27:16.938887 2019] [core:notice] [pid 21721] AH00051: child pid 21779 exit signal Segmentation fault (11), possible coredump in /etc/httpd ... many coredumps a second until parent is killed ... web server does not respond to requests during this time Expected results: normal operation, no coredumps, respond to requests Additional info: I removed rpms one by one until the problem disappeared and it was php-imap that I stumbled upon. Removing it and things work bug free. Install it and bug happens every time. I did extensive strace (such that I can on child processes, which isn't much except on shutdown, not startup) and gdb on cores. Each coredump in a set (i.e. during the same crash test before I kill the parent) seems similar, but often not identical. Interestingly, each new crash test produces (usually) wildly varying coredump results (different segfault points and args to fns). This hints to me that major memory corruption is going on here. Most often the segfault occurs in some function of gd (php-gd), but not always. The args to functions are usually totally not sane (like images with 0 width and 2 billion height). See attached sampling of various gdb results from coredumps.
According to provided backtrace, it looks like you are using httpd with mod_php Please confirm, with "httpd -M" output Notice: default provided configuration is httpd in event mode (mpm_event_module) and without mod_php (PHP scripts are executed via proxy_fcgi_module and php-fpm)
Yes, we are using mod_php on this box and php_fpm has been removed. We will convert to fpm in the future, but not at this moment.
Sorry but mpm_event_module + mod_php is NOT supported, and cannot See warning on http://php.net/manual/en/install.unix.apache2.php
Created attachment 1538361 [details] httpd -M output
Also in module configuration file: "# ZTS module is not supported, so FPM is preferred"
We are not event. We are mpm_prefork_module
Not using ZTS. We will when we switch to FPM, but not right now. Prefork + mod_php may be slightly dated but AFAIK it's a fully supported config still. Segfault and possible stack clobbering seems rather nasty in any supported configuration.
So seems more a httpd issue, probably similar to bug #1659768
Are there any unused modules here which you can remove? I see mod_perl is listed. One theory is that this is some OpenSSL (re-)initialization bug, php-imap links to libcrypto so it may not be a red herring.
Could you attach output from "rpm -qf /etc/httpd/modules/*.so".
#rpm -qf /etc/httpd/modules/*.so |sort -u httpd-2.4.38-2.fc29.x86_64 mod_http2-1.11.1-1.fc29.x86_64 mod_perl-2.0.10-13.fc29.x86_64 mod_ssl-2.4.38-2.fc29.x86_64 php-7.2.15-1.fc29.x86_64 #rpm -qf /usr/lib64/php/modules/*.so | sort -u php-bcmath-7.2.15-1.fc29.x86_64 php-common-7.2.15-1.fc29.x86_64 php-gd-7.2.15-1.fc29.x86_64 php-imap-7.2.15-1.fc29.x86_64 php-intl-7.2.15-1.fc29.x86_64 php-json-7.2.15-1.fc29.x86_64 php-mbstring-7.2.15-1.fc29.x86_64 php-mysqlnd-7.2.15-1.fc29.x86_64 php-pdo-7.2.15-1.fc29.x86_64 php-pecl-libsodium-1.0.7-4.fc29.x86_64 php-pecl-mcrypt-1.0.1-4.fc29.x86_64 php-pecl-zip-1.15.4-1.fc29.x86_64 php-process-7.2.15-1.fc29.x86_64 php-soap-7.2.15-1.fc29.x86_64 php-tidy-7.2.15-1.fc29.x86_64 php-xml-7.2.15-1.fc29.x86_64 I'll try removing a few at random to see what happens. I'll start with mod_perl, not sure if the devs are using it (I'll have to ask). I'm guessing half the modules are not really needed.
I reduced the number of rpms installed to: #rpm -qa | grep php- | sort php-7.2.15-1.fc29.x86_64 php-cli-7.2.15-1.fc29.x86_64 php-common-7.2.15-1.fc29.x86_64 php-debuginfo-7.2.15-1.fc29.x86_64 php-pecl-mcrypt-debuginfo-1.0.1-4.fc29.x86_64 php-pecl-zip-debuginfo-1.15.4-1.fc29.x86_64 #rpm -qf /usr/lib64/php/modules/*.so | sort -u php-common-7.2.15-1.fc29.x86_64 And I took out mod_perl. That's as low as it'd let me go without uninstalling php/httpd itself (well, ignoring the harmless debuginfos). With just those installed, if I have php-imap installed, the bug (still) occurs. If I have php-imap out, bug doesn't occur. So maybe not a red herring after all, as you said. Now the cores are crashing at random core module places like: /usr/lib64/php/modules/phar.so So it has nothing to do with gd per se, it's must be php-imap just hosing random apache/php memory. I got it to reproduce on another 2 boxes! A Xeon one with ECC also and an older Core2 Quad box. So that is good news, it's not just that one box anymore. However, I can't get it to reproduce on my main workstation (also C2Q). The boxes that show the bug are all configured somewhat similarly but not identical. Thinking about SSL, one of the boxes has a real cert (the original box) and 2 have self-signed certs (rarely used). The C2Q box had slightly different error.log output: [Tue Feb 26 00:23:18.668887 2019] [core:notice] [pid 13556] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND' *** stack smashing detected ***: <unknown> terminated *** stack smashing detected ***: <unknown> terminated *** stack smashing detected ***: <unknown> terminated *** stack smashing detected ***: <unknown> terminated *** stack smashing detected ***: <unknown> terminated [Tue Feb 26 00:23:19.675292 2019] [core:notice] [pid 13556] AH00051: child pid 13609 exit signal Aborted (6), possible coredump in /etc/httpd [Tue Feb 26 00:23:19.906095 2019] [core:notice] [pid 13556] AH00051: child pid 13611 exit signal Aborted (6), possible coredump in /etc/httpd So it seems it is a stack corruption issue. Not sure why only that box reports it.
OK that's great, thanks for the info. I can also reproduce with php-imap installed, investigating further.
*** Bug 1659768 has been marked as a duplicate of this bug. ***
Many thanks to Florian for helping track this down. It is a libcap-ng bug. Please try this libcap-ng build: https://koji.fedoraproject.org/koji/taskinfo?taskID=33103683 yum install https://kojipkgs.fedoraproject.org//work/tasks/3683/33103683/libcap-ng-0.7.9-4.2.fc29.x86_64.rpm
Created attachment 1539507 [details] fix pthread_atfork usage
Yup, that kojipkg fixes it! Tried it a few times, couldn't get it to bug out again. Thanks for the quick fix, super awesome. No idea how Florian solved that so fast. Would have never guessed libcap-ng.
(In reply to Trevor Cordes from comment #17) > Yup, that kojipkg fixes it! Tried it a few times, couldn't get it to bug > out again. Thanks for the quick fix, super awesome. No idea how Florian > solved that so fast. Would have never guessed libcap-ng. Me neither, Joe Orton figured it out. I just looked the DSO to figure out what was wrong with it.
I had a look at the atatched diff and I don't understand it. Could someone please explain why this oddly enough should be the correct one. Did it came from the upstream project?
There was some offline discussion. tl;dr the way libcap-ng registers atfork handlers was not safe, which caused crashes on fork after PHP was unloaded if php-imap was installed, because php-imap links to something which pulls in libcap-ng. This Debian bug tracks the same case, though upstream libcap-ng used a workaround (preventing libcap-ng from being unloaded) rather than finding the root cause which was improper use of pthread_atfork. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904808
This is what I'm trying to submit to the Debian bug: From: Florian Weimer <fweimer> Subject: libcap-ng0: libcap-ng's use of pthread_atfork causes segfaults To: 904808.org, 904808-submitter.org Date: Thu, 28 Feb 2019 15:01:28 +0100 Message-ID: <87tvgoovuv.fsf.redhat.com> The problem here is the weak declaration: $ eu-readelf --symbols=.dynsym /lib64/libcap-ng.so.0.0.0 | grep pthread_atfork 28: 0000000000000000 0 NOTYPE WEAK DEFAULT UNDEF pthread_atfork In the Fedora 29 build, the constructor looks like this: Dump of assembler code for function init_lib: 0x00000000000025d0 <+0>: endbr64 0x00000000000025d4 <+4>: cmpq $0x0,0x4a0c(%rip) # 0x6fe8 0x00000000000025dc <+12>: je 0x25ee <init_lib+30> 0x00000000000025de <+14>: lea 0xcb(%rip),%rdx # 0x26b0 <deinit> 0x00000000000025e5 <+21>: xor %esi,%esi 0x00000000000025e7 <+23>: xor %edi,%edi 0x00000000000025e9 <+25>: jmpq 0x24f0 <pthread_atfork@plt> 0x00000000000025ee <+30>: retq src/cap-ng.c has this: /* * The pthread_atfork function is being made weak so that we can use it * if the program is linked with pthreads and not requiring it for * everything that uses libcap-ng. */ extern int __attribute__((weak)) pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void)); … static void init_lib(void) __attribute__ ((constructor)); static void init_lib(void) { if (pthread_atfork) pthread_atfork(NULL, NULL, deinit); } This is wrong. pthread_atfork needs to be *strong* reference, otherwise the implementation in libc_nonshared.a is not used. This implementation provides the correct __dso_handle argument, allowing unregistration at dlclose. For glibc 2.28 and later, the fix should be simple: Just delete the weak declaration. For older glibc versions, you need to call __register_atfork directly, with an explicit __dso_handle argument. (I believe systemd has an example of this which looks correct.) This is a stable glibc ABI, despite all those glibc internals. We cannot fix this in libpthread because of the tail call in init_lib. It destroys the caller's stack frame, so the identity of the calling DSO is not available to pthread_atfork. (Without the tail call, we could use __builtin_return_address (0) and the internal variant of dladdr to figure out the caller.)
*** Bug 1676842 has been marked as a duplicate of this bug. ***
*** Bug 1657297 has been marked as a duplicate of this bug. ***
Commit: http://pkgs.fedoraproject.org/rpms/libcap-ng/c/13d7626fbca5931ee6266c013fdbb2cba47a50f1
Commit: http://pkgs.fedoraproject.org/rpms/libcap-ng/c/2589c4ef52c27119ff4d070dd672a42b2da24d2b
Package: libcap-ng-0.7.9-7.fc31
libcap-ng-0.7.9-5.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2cc0b7524f
libcap-ng-0.7.9-7.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-3805415f13
Since it seems like a number of httpd/php users are hitting this I've pushed updates for f29/30, hope this is OK Steve.
Thanks Joe.
libcap-ng-0.7.9-7.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-3805415f13
libcap-ng-0.7.9-5.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-2cc0b7524f
libcap-ng-0.7.9-5.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.
libcap-ng-0.7.9-7.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 1599075 has been marked as a duplicate of this bug. ***
*** Bug 1694321 has been marked as a duplicate of this bug. ***