RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1445540 - mod_wsgi segfault if loaded during a restart
Summary: mod_wsgi segfault if loaded during a restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: mod_wsgi
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Luboš Uhliarik
QA Contact: Jan Houska
URL:
Whiteboard:
: 1163516 1548540 (view as bug list)
Depends On:
Blocks: 1549616 1562205
TreeView+ depends on / blocked
 
Reported: 2017-04-25 22:18 UTC by Ian Wienand
Modified: 2021-12-10 15:01 UTC (History)
10 users (show)

Fixed In Version: mod_wsgi-3.4-14.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-30 11:19:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
core file generated from gdb (4.14 MB, application/x-core)
2017-04-25 22:18 UTC, Ian Wienand
no flags Details
full bt (2.97 KB, text/plain)
2017-04-25 22:19 UTC, Ian Wienand
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1493372 0 None None None 2017-04-26 00:02:08 UTC
Red Hat Product Errata RHBA-2018:3210 0 None None None 2018-10-30 11:19:38 UTC

Description Ian Wienand 2017-04-25 22:18:26 UTC
Created attachment 1274035 [details]
core file generated from gdb

We are seeing mod_wsgi get into a bad state during devstack runs in OpenStack CI.  Apache will just sit in a tight loop of segfaulting mod_wsgi children until it is restarted.  I am not currently sure how it gets into this situation, but have some info from a debugging session

---
Program received signal SIGSEGV, Segmentation fault.
0x00007fb6886cf15b in wsgi_python_init (p=p@entry=0x7fb69909e0f8) at mod_wsgi.c:5952
5952	        if (wsgi_server_config->py3k_warning_flag == 1)

(gdb) bt
#0  0x00007fb6886cf15b in wsgi_python_init (p=p@entry=0x7fb69909e0f8) at mod_wsgi.c:5952
#1  0x00007fb6886d8b40 in wsgi_hook_child_init (p=0x7fb69909e0f8, s=<optimized out>) at mod_wsgi.c:13778
#2  0x00007fb697c6855c in ap_run_child_init (pchild=0x7fb69909e0f8, s=0x7fb698fbb480) at config.c:165
#3  0x00007fb68d86644b in child_main (child_num_arg=child_num_arg@entry=0) at prefork.c:546
#4  0x00007fb68d866a55 in make_child (s=0x7fb698fbb480, slot=0) at prefork.c:810
#5  0x00007fb68d8676ee in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:912
#6  prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1100
#7  0x00007fb697c4cf6e in ap_run_mpm (pconf=pconf@entry=0x7fb698f86158, plog=0x7fb698fb3378, s=0x7fb698fbb480) at mpm_common.c:96
#8  0x00007fb697c45d76 in main (argc=2, argv=0x7ffd4f772aa8) at main.c:777


(gdb) print wsgi_server_config
$5 = (WSGIServerConfig *) 0x0

(gdb) frame
#1  0x00007fb6886d8b40 in wsgi_hook_child_init (p=0x7fb69909e0f8, s=<optimized out>) at mod_wsgi.c:13778
13778	            wsgi_python_init(p);

(gdb) print wsgi_server
$8 = (server_rec *) 0x0
---

I have attached a core and a full backtrace.

When we restart apache, this starts working.  The issue is this is happening in a CI situation where clearly some of our setup is causing problems, so finding the root cause so we can ensure we don't go down that path will be helpful

Comment 2 Ian Wienand 2017-04-25 22:19:41 UTC
Created attachment 1274036 [details]
full bt

Comment 3 Ian Wienand 2017-04-25 23:39:05 UTC
As found by @tbarron, [1] seems like a very similar issue.

In short, 

---
wsgi_hook_child_init is supposed to be called after wsgi_hook_init and uses the information in wsgi_server_config. Problem is that when graceful is triggered, the mod_wsgi.so is unloaded and reloaded BUT wsgi_hook_init is called just once! As documented in the apache wiki [1] this smart hack fails miserably for our case
---

in summary maybe, graceful restarts aren't so graceful?  we are not calling "apachectl" directly, I don't think ... only via systemd.  but it might be entering the same path

[1] https://bugs.launchpad.net/fuel/+bug/1493372/comments/35

Comment 4 Ian Wienand 2017-04-26 00:01:08 UTC
ok, reload timeline is making sense;  our CI bits are installing mod_wsgi at 19:56:18

---
2017-04-25 19:56:18.244 | + functions-common:sudo_with_proxies:2446  :   sudo http_proxy= https_proxy= no_proxy= yum install -y httpd mod_wsgi
---

We then do some configuration of the uwsgi proxy, and issue a "reload" at 19:56:21

---
2017-04-25 19:56:21.862 | + lib/apache:write_uwsgi_config:270        :   reload_apache_server
2017-04-25 19:56:21.864 | + lib/apache:reload_apache_server:223      :   reload_service httpd
2017-04-25 19:56:21.866 | + functions-common:reload_service:2416     :   '[' -x /bin/systemctl ']'
2017-04-25 19:56:21.868 | + functions-common:reload_service:2417     :   sudo /bin/systemctl reload-or-restart httpd
---

this is where the graceful restart comes in

---
-bash-4.2# cat /usr/lib/systemd/system/httpd.service | grep Reload
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
---

it's right after this that the segfaults start

---
[Tue Apr 25 19:56:21.942157 2017] [mpm_prefork:notice] [pid 17792] AH00171: Graceful restart requested, doing restart
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1. Set the 'ServerName' directive globally to suppress this message
[Tue Apr 25 19:56:21.968518 2017] [auth_digest:notice] [pid 17792] AH01757: generating secret for digest authentication ...
[Tue Apr 25 19:56:21.969794 2017] [lbmethod_heartbeat:notice] [pid 17792] AH02282: No slotmem from mod_heartmonitor
[Tue Apr 25 19:56:21.970704 2017] [mpm_prefork:notice] [pid 17792] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips configured -- resuming normal operations
[Tue Apr 25 19:56:21.970714 2017] [core:notice] [pid 17792] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Apr 25 19:56:21.974122 2017] [core:notice] [pid 17792] AH00052: child pid 26554 exit signal Segmentation fault (11)
---

Depending on various things after that, apache may get "restarted" (actually stopped and started) or not; those that do restart before trying to do anything with apache would work.  That would explain why some jobs work and others fail.

(http://logs.openstack.org/06/352606/13/check/gate-manila-tempest-minimal-dsvm-lvm-centos-7-nv/0ca39b2/logs)

Comment 5 Joe Orton 2017-04-26 07:21:30 UTC
Can we have some config details, package versions etc?

Comment 6 Ian Wienand 2017-04-27 00:13:10 UTC
Sorry about that, the details are all in the run logs but I forget not everyone reads them all day long :)

The packages in question:
 - httpd 2.4.6-45
 - mod_wsgi 3.4-12

With a bit of playing around just now, I believe the replication is to install httpd, start it, install mod_wsgi and then *reload* httpd.

---
 [centos@test ~]$ sudo yum install httpd
 ---> Package httpd.x86_64 0:2.4.6-45.el7.centos.4 will be installed
 [ ... ]

 [centos@test ~]$ sudo service httpd start
 Redirecting to /bin/systemctl start  httpd.service

 [centos@test ~]$ sudo yum install mod_wsgi
 ---> Package mod_wsgi.x86_64 0:3.4-12.el7_0 will be installed
 [ ... ]

 [centos@test ~]$ sudo service httpd reload
 Redirecting to /bin/systemctl reload  httpd.service

 [root@test centos]# tail /var/log/httpd/error_log 
 [Thu Apr 27 00:02:43.045172 2017] [core:notice] [pid 1107] AH00052: child pid 1206 exit signal Segmentation fault (11)
 [Thu Apr 27 00:02:43.045174 2017] [core:error] [pid 1107] AH00546: no record of generation 0 of exiting child 1206
 ...
---

Although this seems obscure, I guess that other heat bug shows we're not the first people to hit it.

Comment 7 Graham Dumpleton 2017-04-27 00:40:12 UTC
From memory this problem arises due to differences between how Apache 2.2 and 2.4 handles initialisation of modules. It was addressed in mod_wsgi version 4.4.15. See (2) in:

* http://modwsgi.readthedocs.io/en/develop/release-notes/version-4.4.15.html

Details of patch available from:

* https://github.com/GrahamDumpleton/mod_wsgi/commit/3aef8bf4fb713fc94403bfa45b13d8a7df7dfca7

Comment 8 Joe Orton 2017-04-27 06:55:39 UTC
Ah, great, thanks a lot Graham.

Comment 12 Joe Orton 2017-07-13 16:34:28 UTC
*** Bug 1163516 has been marked as a duplicate of this bug. ***

Comment 14 Joe Orton 2018-02-26 16:17:42 UTC
*** Bug 1548540 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2018-10-30 11:19:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3210


Note You need to log in before you can comment on or make changes to this bug.