Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1445540

Summary: mod_wsgi segfault if loaded during a restart
Product: Red Hat Enterprise Linux 7 Reporter: Ian Wienand <iwienand>
Component: mod_wsgiAssignee: Luboš Uhliarik <luhliari>
Status: CLOSED ERRATA QA Contact: Jan Houska <jhouska>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: bnater, gandavar, gdumplet, hpham, iwienand, jkejda, jorton, luhliari, spoore, tbarron
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: mod_wsgi-3.4-14.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 11:19:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1549616, 1562205    
Attachments:
Description Flags
core file generated from gdb
none
full bt none

Description Ian Wienand 2017-04-25 22:18:26 UTC
Created attachment 1274035 [details]
core file generated from gdb

We are seeing mod_wsgi get into a bad state during devstack runs in OpenStack CI.  Apache will just sit in a tight loop of segfaulting mod_wsgi children until it is restarted.  I am not currently sure how it gets into this situation, but have some info from a debugging session

---
Program received signal SIGSEGV, Segmentation fault.
0x00007fb6886cf15b in wsgi_python_init (p=p@entry=0x7fb69909e0f8) at mod_wsgi.c:5952
5952	        if (wsgi_server_config->py3k_warning_flag == 1)

(gdb) bt
#0  0x00007fb6886cf15b in wsgi_python_init (p=p@entry=0x7fb69909e0f8) at mod_wsgi.c:5952
#1  0x00007fb6886d8b40 in wsgi_hook_child_init (p=0x7fb69909e0f8, s=<optimized out>) at mod_wsgi.c:13778
#2  0x00007fb697c6855c in ap_run_child_init (pchild=0x7fb69909e0f8, s=0x7fb698fbb480) at config.c:165
#3  0x00007fb68d86644b in child_main (child_num_arg=child_num_arg@entry=0) at prefork.c:546
#4  0x00007fb68d866a55 in make_child (s=0x7fb698fbb480, slot=0) at prefork.c:810
#5  0x00007fb68d8676ee in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:912
#6  prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1100
#7  0x00007fb697c4cf6e in ap_run_mpm (pconf=pconf@entry=0x7fb698f86158, plog=0x7fb698fb3378, s=0x7fb698fbb480) at mpm_common.c:96
#8  0x00007fb697c45d76 in main (argc=2, argv=0x7ffd4f772aa8) at main.c:777


(gdb) print wsgi_server_config
$5 = (WSGIServerConfig *) 0x0

(gdb) frame
#1  0x00007fb6886d8b40 in wsgi_hook_child_init (p=0x7fb69909e0f8, s=<optimized out>) at mod_wsgi.c:13778
13778	            wsgi_python_init(p);

(gdb) print wsgi_server
$8 = (server_rec *) 0x0
---

I have attached a core and a full backtrace.

When we restart apache, this starts working.  The issue is this is happening in a CI situation where clearly some of our setup is causing problems, so finding the root cause so we can ensure we don't go down that path will be helpful

Comment 2 Ian Wienand 2017-04-25 22:19:41 UTC
Created attachment 1274036 [details]
full bt

Comment 3 Ian Wienand 2017-04-25 23:39:05 UTC
As found by @tbarron, [1] seems like a very similar issue.

In short, 

---
wsgi_hook_child_init is supposed to be called after wsgi_hook_init and uses the information in wsgi_server_config. Problem is that when graceful is triggered, the mod_wsgi.so is unloaded and reloaded BUT wsgi_hook_init is called just once! As documented in the apache wiki [1] this smart hack fails miserably for our case
---

in summary maybe, graceful restarts aren't so graceful?  we are not calling "apachectl" directly, I don't think ... only via systemd.  but it might be entering the same path

[1] https://bugs.launchpad.net/fuel/+bug/1493372/comments/35

Comment 4 Ian Wienand 2017-04-26 00:01:08 UTC
ok, reload timeline is making sense;  our CI bits are installing mod_wsgi at 19:56:18

---
2017-04-25 19:56:18.244 | + functions-common:sudo_with_proxies:2446  :   sudo http_proxy= https_proxy= no_proxy= yum install -y httpd mod_wsgi
---

We then do some configuration of the uwsgi proxy, and issue a "reload" at 19:56:21

---
2017-04-25 19:56:21.862 | + lib/apache:write_uwsgi_config:270        :   reload_apache_server
2017-04-25 19:56:21.864 | + lib/apache:reload_apache_server:223      :   reload_service httpd
2017-04-25 19:56:21.866 | + functions-common:reload_service:2416     :   '[' -x /bin/systemctl ']'
2017-04-25 19:56:21.868 | + functions-common:reload_service:2417     :   sudo /bin/systemctl reload-or-restart httpd
---

this is where the graceful restart comes in

---
-bash-4.2# cat /usr/lib/systemd/system/httpd.service | grep Reload
ExecReload=/usr/sbin/httpd $OPTIONS -k graceful
---

it's right after this that the segfaults start

---
[Tue Apr 25 19:56:21.942157 2017] [mpm_prefork:notice] [pid 17792] AH00171: Graceful restart requested, doing restart
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1. Set the 'ServerName' directive globally to suppress this message
[Tue Apr 25 19:56:21.968518 2017] [auth_digest:notice] [pid 17792] AH01757: generating secret for digest authentication ...
[Tue Apr 25 19:56:21.969794 2017] [lbmethod_heartbeat:notice] [pid 17792] AH02282: No slotmem from mod_heartmonitor
[Tue Apr 25 19:56:21.970704 2017] [mpm_prefork:notice] [pid 17792] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips configured -- resuming normal operations
[Tue Apr 25 19:56:21.970714 2017] [core:notice] [pid 17792] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Tue Apr 25 19:56:21.974122 2017] [core:notice] [pid 17792] AH00052: child pid 26554 exit signal Segmentation fault (11)
---

Depending on various things after that, apache may get "restarted" (actually stopped and started) or not; those that do restart before trying to do anything with apache would work.  That would explain why some jobs work and others fail.

(http://logs.openstack.org/06/352606/13/check/gate-manila-tempest-minimal-dsvm-lvm-centos-7-nv/0ca39b2/logs)

Comment 5 Joe Orton 2017-04-26 07:21:30 UTC
Can we have some config details, package versions etc?

Comment 6 Ian Wienand 2017-04-27 00:13:10 UTC
Sorry about that, the details are all in the run logs but I forget not everyone reads them all day long :)

The packages in question:
 - httpd 2.4.6-45
 - mod_wsgi 3.4-12

With a bit of playing around just now, I believe the replication is to install httpd, start it, install mod_wsgi and then *reload* httpd.

---
 [centos@test ~]$ sudo yum install httpd
 ---> Package httpd.x86_64 0:2.4.6-45.el7.centos.4 will be installed
 [ ... ]

 [centos@test ~]$ sudo service httpd start
 Redirecting to /bin/systemctl start  httpd.service

 [centos@test ~]$ sudo yum install mod_wsgi
 ---> Package mod_wsgi.x86_64 0:3.4-12.el7_0 will be installed
 [ ... ]

 [centos@test ~]$ sudo service httpd reload
 Redirecting to /bin/systemctl reload  httpd.service

 [root@test centos]# tail /var/log/httpd/error_log 
 [Thu Apr 27 00:02:43.045172 2017] [core:notice] [pid 1107] AH00052: child pid 1206 exit signal Segmentation fault (11)
 [Thu Apr 27 00:02:43.045174 2017] [core:error] [pid 1107] AH00546: no record of generation 0 of exiting child 1206
 ...
---

Although this seems obscure, I guess that other heat bug shows we're not the first people to hit it.

Comment 7 Graham Dumpleton 2017-04-27 00:40:12 UTC
From memory this problem arises due to differences between how Apache 2.2 and 2.4 handles initialisation of modules. It was addressed in mod_wsgi version 4.4.15. See (2) in:

* http://modwsgi.readthedocs.io/en/develop/release-notes/version-4.4.15.html

Details of patch available from:

* https://github.com/GrahamDumpleton/mod_wsgi/commit/3aef8bf4fb713fc94403bfa45b13d8a7df7dfca7

Comment 8 Joe Orton 2017-04-27 06:55:39 UTC
Ah, great, thanks a lot Graham.

Comment 12 Joe Orton 2017-07-13 16:34:28 UTC
*** Bug 1163516 has been marked as a duplicate of this bug. ***

Comment 14 Joe Orton 2018-02-26 16:17:42 UTC
*** Bug 1548540 has been marked as a duplicate of this bug. ***

Comment 24 errata-xmlrpc 2018-10-30 11:19:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3210