Bug 1445540
| Summary: | mod_wsgi segfault if loaded during a restart | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ian Wienand <iwienand> | ||||||
| Component: | mod_wsgi | Assignee: | Luboš Uhliarik <luhliari> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Jan Houska <jhouska> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.3 | CC: | bnater, gandavar, gdumplet, hpham, iwienand, jkejda, jorton, luhliari, spoore, tbarron | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | mod_wsgi-3.4-14.el7 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-10-30 11:19:02 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1549616, 1562205 | ||||||||
| Attachments: |
|
||||||||
Created attachment 1274036 [details]
full bt
As found by @tbarron, [1] seems like a very similar issue. In short, --- wsgi_hook_child_init is supposed to be called after wsgi_hook_init and uses the information in wsgi_server_config. Problem is that when graceful is triggered, the mod_wsgi.so is unloaded and reloaded BUT wsgi_hook_init is called just once! As documented in the apache wiki [1] this smart hack fails miserably for our case --- in summary maybe, graceful restarts aren't so graceful? we are not calling "apachectl" directly, I don't think ... only via systemd. but it might be entering the same path [1] https://bugs.launchpad.net/fuel/+bug/1493372/comments/35 ok, reload timeline is making sense; our CI bits are installing mod_wsgi at 19:56:18 --- 2017-04-25 19:56:18.244 | + functions-common:sudo_with_proxies:2446 : sudo http_proxy= https_proxy= no_proxy= yum install -y httpd mod_wsgi --- We then do some configuration of the uwsgi proxy, and issue a "reload" at 19:56:21 --- 2017-04-25 19:56:21.862 | + lib/apache:write_uwsgi_config:270 : reload_apache_server 2017-04-25 19:56:21.864 | + lib/apache:reload_apache_server:223 : reload_service httpd 2017-04-25 19:56:21.866 | + functions-common:reload_service:2416 : '[' -x /bin/systemctl ']' 2017-04-25 19:56:21.868 | + functions-common:reload_service:2417 : sudo /bin/systemctl reload-or-restart httpd --- this is where the graceful restart comes in --- -bash-4.2# cat /usr/lib/systemd/system/httpd.service | grep Reload ExecReload=/usr/sbin/httpd $OPTIONS -k graceful --- it's right after this that the segfaults start --- [Tue Apr 25 19:56:21.942157 2017] [mpm_prefork:notice] [pid 17792] AH00171: Graceful restart requested, doing restart AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1. Set the 'ServerName' directive globally to suppress this message [Tue Apr 25 19:56:21.968518 2017] [auth_digest:notice] [pid 17792] AH01757: generating secret for digest authentication ... [Tue Apr 25 19:56:21.969794 2017] [lbmethod_heartbeat:notice] [pid 17792] AH02282: No slotmem from mod_heartmonitor [Tue Apr 25 19:56:21.970704 2017] [mpm_prefork:notice] [pid 17792] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips configured -- resuming normal operations [Tue Apr 25 19:56:21.970714 2017] [core:notice] [pid 17792] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND' [Tue Apr 25 19:56:21.974122 2017] [core:notice] [pid 17792] AH00052: child pid 26554 exit signal Segmentation fault (11) --- Depending on various things after that, apache may get "restarted" (actually stopped and started) or not; those that do restart before trying to do anything with apache would work. That would explain why some jobs work and others fail. (http://logs.openstack.org/06/352606/13/check/gate-manila-tempest-minimal-dsvm-lvm-centos-7-nv/0ca39b2/logs) Can we have some config details, package versions etc? Sorry about that, the details are all in the run logs but I forget not everyone reads them all day long :) The packages in question: - httpd 2.4.6-45 - mod_wsgi 3.4-12 With a bit of playing around just now, I believe the replication is to install httpd, start it, install mod_wsgi and then *reload* httpd. --- [centos@test ~]$ sudo yum install httpd ---> Package httpd.x86_64 0:2.4.6-45.el7.centos.4 will be installed [ ... ] [centos@test ~]$ sudo service httpd start Redirecting to /bin/systemctl start httpd.service [centos@test ~]$ sudo yum install mod_wsgi ---> Package mod_wsgi.x86_64 0:3.4-12.el7_0 will be installed [ ... ] [centos@test ~]$ sudo service httpd reload Redirecting to /bin/systemctl reload httpd.service [root@test centos]# tail /var/log/httpd/error_log [Thu Apr 27 00:02:43.045172 2017] [core:notice] [pid 1107] AH00052: child pid 1206 exit signal Segmentation fault (11) [Thu Apr 27 00:02:43.045174 2017] [core:error] [pid 1107] AH00546: no record of generation 0 of exiting child 1206 ... --- Although this seems obscure, I guess that other heat bug shows we're not the first people to hit it. From memory this problem arises due to differences between how Apache 2.2 and 2.4 handles initialisation of modules. It was addressed in mod_wsgi version 4.4.15. See (2) in: * http://modwsgi.readthedocs.io/en/develop/release-notes/version-4.4.15.html Details of patch available from: * https://github.com/GrahamDumpleton/mod_wsgi/commit/3aef8bf4fb713fc94403bfa45b13d8a7df7dfca7 Ah, great, thanks a lot Graham. *** Bug 1163516 has been marked as a duplicate of this bug. *** *** Bug 1548540 has been marked as a duplicate of this bug. *** VERIFIED New PASS: mod_wsgi-3.4-18.el7 Test finally PASSED: https://beaker.engineering.redhat.com/tasks/executed?recipe_task_id=78344022&recipe_task_id=78344015&recipe_task_id=78344029&new_pkg_tasks=78344022,78344015,78344029 https://beaker.engineering.redhat.com/tasks/executed?recipe_task_id=77580303&recipe_task_id=77580287&recipe_task_id=77580351&recipe_task_id=77580335&recipe_task_id=77580319&recipe_task_id=77580399&recipe_task_id=77580367&recipe_task_id=77580383&old_pkg_tasks=77580351,77580383,77580399,77580367&new_pkg_tasks=77580303,77580335,77580287,77580319 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3210 |
Created attachment 1274035 [details] core file generated from gdb We are seeing mod_wsgi get into a bad state during devstack runs in OpenStack CI. Apache will just sit in a tight loop of segfaulting mod_wsgi children until it is restarted. I am not currently sure how it gets into this situation, but have some info from a debugging session --- Program received signal SIGSEGV, Segmentation fault. 0x00007fb6886cf15b in wsgi_python_init (p=p@entry=0x7fb69909e0f8) at mod_wsgi.c:5952 5952 if (wsgi_server_config->py3k_warning_flag == 1) (gdb) bt #0 0x00007fb6886cf15b in wsgi_python_init (p=p@entry=0x7fb69909e0f8) at mod_wsgi.c:5952 #1 0x00007fb6886d8b40 in wsgi_hook_child_init (p=0x7fb69909e0f8, s=<optimized out>) at mod_wsgi.c:13778 #2 0x00007fb697c6855c in ap_run_child_init (pchild=0x7fb69909e0f8, s=0x7fb698fbb480) at config.c:165 #3 0x00007fb68d86644b in child_main (child_num_arg=child_num_arg@entry=0) at prefork.c:546 #4 0x00007fb68d866a55 in make_child (s=0x7fb698fbb480, slot=0) at prefork.c:810 #5 0x00007fb68d8676ee in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:912 #6 prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1100 #7 0x00007fb697c4cf6e in ap_run_mpm (pconf=pconf@entry=0x7fb698f86158, plog=0x7fb698fb3378, s=0x7fb698fbb480) at mpm_common.c:96 #8 0x00007fb697c45d76 in main (argc=2, argv=0x7ffd4f772aa8) at main.c:777 (gdb) print wsgi_server_config $5 = (WSGIServerConfig *) 0x0 (gdb) frame #1 0x00007fb6886d8b40 in wsgi_hook_child_init (p=0x7fb69909e0f8, s=<optimized out>) at mod_wsgi.c:13778 13778 wsgi_python_init(p); (gdb) print wsgi_server $8 = (server_rec *) 0x0 --- I have attached a core and a full backtrace. When we restart apache, this starts working. The issue is this is happening in a CI situation where clearly some of our setup is causing problems, so finding the root cause so we can ensure we don't go down that path will be helpful