Bug 1478042

Summary: Horizon won't load with OSP11 - Script timed out before returning headers: django.wsgi
Product: Red Hat OpenStack Reporter: Rhys Oxenham <roxenham>
Component: python-django-horizonAssignee: Radomir Dopieralski <rdopiera>
Status: CLOSED ERRATA QA Contact: Ido Ovadia <iovadia>
Severity: medium Docs Contact:
Priority: medium    
Version: 11.0 (Ocata)CC: aortega, apannu, asimonel, athomas, audra_cooper, beth.white, brendan.shephard, broskos, cchen, dbellant, dh3, ealcaniz, gcharot, gkadam, ksavich, kurt_hey, lbezdick, mburns, mrunge, mzamot, mzheng, nalmond, nchandek, neoanantha, nsatsia, nshetty, plancast, rcernin, rdopiera, roxenham, samccann, smerrow, sreichar, srevivo, surya_prabhakar, udayendu.kar, vcojot, vincent.misson
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-django-horizon-11.0.3-2.el7ost Doc Type: Bug Fix
Doc Text:
Some binary python modules used by Horizon were compiled with a flag that makes them use the simplified GIL state API. Since mod_wsgi uses sub-interpreters, and the simplified GIL state API is not re-entrant, this results in the mod_wsgi locking up and hanging, making the dashboard unavailable. Added "WSGIApplicationGroup %{GLOBAL}" to mod_wsgi configuration, which forces it to always use the main interpreter instead of sub-interpreters. With that option, mod_wsgi works correctly no matter what options were used to compile the binary python modules. The slight performance penalty is not noticeable.
Story Points: ---
Clone Of:
: 1478870 1479051 1496173 (view as bug list) Environment:
Last Closed: 2017-09-13 21:37:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1335596, 1356451, 1478870, 1479051, 1479055, 1479056, 1479057, 1479058, 1479732, 1481774, 1485889, 1485890, 1485894, 1496173    
Attachments:
Description Flags
Horizon Admin menu post Upgrade to OSP10 none

Description Rhys Oxenham 2017-08-03 13:25:51 UTC
Description of problem:

Using the latest CDN OSP11, and I cannot access the Horizon dashboard on my external network (or via any network for that matter). My browser get successfully redirected to http://<ip>/dashboard, but the application never loads up, and eventually I get a "504 Gateway Time-out error", which I suspect is from HAproxy.

Version-Release number of selected component (if applicable):

Linux infra-controller 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

openstack-dashboard-11.0.3-1.el7ost.noarch
httpd-2.4.6-67.el7.x86_64

How reproducible:

Every time, with my environment. Not sure if it's my specific configuration. I'm using 1x controller, 2x compute, and 1x networker. Vanilla templates, although I only have two network types - external and internal (with internal being a VLAN on the same physical network that the external uses).

Steps to Reproduce:
1. Provision OSP11 with OSP director / TripleO using network-isolation.yaml
2. Attempt to open the dashboard (I'm using via squid, but also fails with direct)
3. Observe the application failing to load, and eventually it timing out

Actual results:

Application times out after a few minutes and the browser session is closed.

Expected results:

Horizon dashboard loads as expected.

Additional info:

In the log files (/var/log/httpd/horizon_error.log) I get the following:

[Thu Aug 03 12:55:22.006732 2017] [:error] [pid 203037] WARNING:root:"dashboards" and "default_dashboard" in (local_)settings is DEPRECATED now and may be unsupported in some future release. The preferred way to specify the order of dashboards and the default dashboard is the pluggable dashboard mechanism (in /usr/share/openstack-dashboard/openstack_dashboard/enabled, /usr/share/openstack-dashboard/openstack_dashboard/local/enabled).
[Thu Aug 03 12:56:40.544916 2017] [core:error] [pid 227117] [client 192.168.122.1:44828] End of script output before headers: django.wsgi

Followed by the resulting, showing the 500 error:

==> /var/log/httpd/horizon_access.log <==
192.168.122.1 - - [03/Aug/2017:12:51:34 +0000] "GET /dashboard HTTP/1.1" 500 531 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0"

I've managed to work around this by following a similar bug reported upstream: https://bugs.launchpad.net/horizon/+bug/1573488/comments/6

This then allows the application to load successfully, and Horizon run properly. See output in log (with debug enabled):

[Thu Aug 03 13:12:21.224826 2017] [authz_core:debug] [pid 334138] mod_authz_core.c(835): [client 192.168.122.1:44708] AH01628: authorization result: granted (no directives)
[Thu Aug 03 13:12:21.225334 2017] [authz_core:debug] [pid 334138] mod_authz_core.c(809): [client 192.168.122.1:44708] AH01626: authorization result of Require all granted: granted
[Thu Aug 03 13:12:21.225344 2017] [authz_core:debug] [pid 334138] mod_authz_core.c(809): [client 192.168.122.1:44708] AH01626: authorization result of <RequireAny>: granted
[Thu Aug 03 13:12:21.225704 2017] [:info] [pid 334126] [remote 192.168.122.1:8] mod_wsgi (pid=334126, process='apache', application=''): Loading WSGI script '/usr/share/openstack-dashboard/openstack_dashboard/wsgi/django.wsgi'.
[Thu Aug 03 13:12:21.695350 2017] [:error] [pid 334126] WARNING:root:"dashboards" and "default_dashboard" in (local_)settings is DEPRECATED now and may be unsupported in some future release. The preferred way to specify the order of dashboards and the default dashboard is the pluggable dashboard mechanism (in /usr/share/openstack-dashboard/openstack_dashboard/enabled, /usr/share/openstack-dashboard/openstack_dashboard/local/enabled).
[Thu Aug 03 13:12:22.905956 2017] [deflate:debug] [pid 334138] mod_deflate.c(849): [client 192.168.122.1:44708] AH01384: Zlib: Compressed 8250 to 2854 : URL /dashboard/auth/login/
[Thu Aug 03 13:12:23.042856 2017] [authz_core:debug] [pid 334137] mod_authz_core.c(835): [client 192.168.122.1:44754] AH01628: authorization result: granted (no directives), referer: http://192.168.122.100/dashboard/auth/login/?next=/dashboard/
[...]

Comment 1 Radomir Dopieralski 2017-08-03 13:50:28 UTC
Thank you for your report.

How did you install OpenStack? Was this problem there from the beginning, or did it appear later?

Comment 2 Rhys Oxenham 2017-08-03 13:53:51 UTC
(In reply to Radomir Dopieralski from comment #1)
> Thank you for your report.
> 
> How did you install OpenStack? Was this problem there from the beginning, or
> did it appear later?

Via the latest TripleO / OSP director.

The problem exists immediately after deployment. In fact, I just ran a stack-update and it reverted the "WSGIApplicationGroup %{GLOBAL}" that I put into /etc/httpd/conf.d/10-horizon_vhost.conf, and the application stopped working again.

Comment 3 Radomir Dopieralski 2017-08-03 15:49:38 UTC
I will look into it, in the mean time can you try enabling debug in Horizon's configuration and see if it shows anything?

Comment 4 Rhys Oxenham 2017-08-03 15:51:51 UTC
(In reply to Radomir Dopieralski from comment #3)
> I will look into it, in the mean time can you try enabling debug in
> Horizon's configuration and see if it shows anything?

I actually tried everywhere to set debug in Horizon, and /var/log/horizon/* was empty every time. I only managed to find the workaround to this via Google searching. Are there other locations other than /etc/openstack-dashboard/local_settings that I should be setting up debug? I managed to get some more info through httpd by using LogLevel, but it didn't result in any problems showing.

Comment 5 Radomir Dopieralski 2017-08-04 07:46:16 UTC
No, local_settings is the only location, and /var/log/horizon/* is the location for the logs. It looks like it hangs/crashes before any python code even gets to run. I wonder if it's a bug in mod_wsgi plugin for apache, especially since changing its configuration seems to fix it.

Comment 6 Rhys Oxenham 2017-08-04 08:07:48 UTC
Similar report shown here: https://bugzilla.redhat.com/show_bug.cgi?id=1477816

Comment 7 Radomir Dopieralski 2017-08-04 08:38:55 UTC
*** Bug 1478290 has been marked as a duplicate of this bug. ***

Comment 8 Radomir Dopieralski 2017-08-04 08:41:02 UTC
I didn't reproduce it yet, but can you try something? I wonder if running "sudo /usr/share/openstack-dashboard/manage.py compress" fixes the issue?

Comment 10 Radomir Dopieralski 2017-08-04 12:08:52 UTC
I have this issue reproduced on my local environment, and it seems that it's caused by this:

http://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#python-simplified-gil-state-api

I'm still not sure whether it's the change of mod_wsgi version or of versions of one of the libraries that horizon uses that triggered this, but I think that the solution is to add that WSGIApplicationGroup %{GLOBAL} line to mod_wsgi configuration for Horizon. This will need to be changed both in horizon's example files, and in the puppet configuration.

Comment 11 Radomir Dopieralski 2017-08-04 12:37:34 UTC
python-keyring-5.7.1-1.el7ost.noarch is the culprit, it imports python-gobject-base-3.22.0-1.el7.x86_64 as part of its gnome-keyring support, which has a C python extension that is apparently using simplified GIL state API, which in turn makes mod_wsgi hang.

Comment 12 Gregory Charot 2017-08-04 13:45:25 UTC
I have the same behavior on OSP10. Same fix solves the issue.

10.36.117.166 - - [04/Aug/2017:13:25:19 +0000] "GET /dashboard HTTP/1.1" 500 531 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:51.0) Gecko/20100101 Firefox/51.0"

[Fri Aug 04 13:25:20.120219 2017] [:error] [pid 92028] WARNING:root:"dashboards" and "default_dashboard" in (local_)settings is DEPRECATED now and may be unsupported in some future release. The preferred way to specify the order of dashboards and the default dashboard is the pluggable dashboard mechanism (in /usr/share/openstack-dashboard/openstack_dashboard/enabled, /usr/share/openstack-dashboard/openstack_dashboard/local/enabled).
[Fri Aug 04 13:26:46.244634 2017] [core:error] [pid 92036] [client 10.36.117.166:59376] End of script output before headers: django.wsgi


[root@overcloud-controller-0 ~]# cat /etc/rhosp-release
Red Hat OpenStack Platform release 10.0 (Newton)
[root@overcloud-controller-0 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Fresh vanilla deployment of OSP10:
time openstack overcloud deploy     --templates /usr/share/openstack-tripleo-heat-templates/     --ntp-server 10.16.255.1 --control-scale 1 --compute-scale 1     --neutron-tunnel-types vxlan --neutron-network-type vxlan --control-flavor control     --compute-flavor compute

Comment 13 Radomir Dopieralski 2017-08-04 13:51:39 UTC
Yes, this will require patches to all versions (6-12) of python-django-horizon and puppet-horizon.

Comment 14 Mike Burns 2017-08-04 17:14:28 UTC
Radomir -- is there a workaround here?  maybe uninstall libgnome-keyring?

Comment 15 Radomir Dopieralski 2017-08-04 17:35:05 UTC
The workaround is to add "WSGIApplicationGroup %{GLOBAL}" to the mod_wsgi configuration for horizon in apache. I will prepare the patches soon.

Comment 16 Lukas Bezdicka 2017-08-07 12:44:06 UTC
*** Bug 1477816 has been marked as a duplicate of this bug. ***

Comment 19 Brendan Shephard 2017-08-08 09:53:47 UTC
Hey all. Also experiencing this issue when manually install OSP11, it doesn't happen with Deployments from director for me.

The work around suggested by Radomir worked to bring up the dashboard UI. But I'm now getting this issue:

[Tue Aug 08 09:37:52.472366 2017] [:error] [pid 7082] The OPENSTACK_KEYSTONE_URL setting points to a v2.0 Keystone endpoint, but v3 is specified as the API version to use by Horizon. Using v3 endpoint for authentication.
[Tue Aug 08 09:37:55.487015 2017] [:error] [pid 7082] Attempted scope to domain Default failed, will attemptto scope to another domain.
[Tue Aug 08 09:37:58.400556 2017] [:error] [pid 7082] Login successful for user "admin", remote address 172.24.42.42.
[Tue Aug 08 09:38:01.727209 2017] [:error] [pid 7082] Internal Server Error: /dashboard/auth/login/
[Tue Aug 08 09:38:01.727252 2017] [:error] [pid 7082] Traceback (most recent call last):
[Tue Aug 08 09:38:01.727256 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 132, in get_response
[Tue Aug 08 09:38:01.727259 2017] [:error] [pid 7082]     response = wrapped_callback(request, *callback_args, **callback_kwargs)
[Tue Aug 08 09:38:01.727262 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper
[Tue Aug 08 09:38:01.727264 2017] [:error] [pid 7082]     return view(request, *args, **kwargs)
[Tue Aug 08 09:38:01.727268 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/utils/decorators.py", line 110, in _wrapped_view
[Tue Aug 08 09:38:01.727270 2017] [:error] [pid 7082]     response = view_func(request, *args, **kwargs)
[Tue Aug 08 09:38:01.727273 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/views/decorators/cache.py", line 57, in _wrapped_view_func
[Tue Aug 08 09:38:01.727301 2017] [:error] [pid 7082]     response = view_func(request, *args, **kwargs)
[Tue Aug 08 09:38:01.727306 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/openstack_auth/views.py", line 104, in login
[Tue Aug 08 09:38:01.727310 2017] [:error] [pid 7082]     **kwargs)
[Tue Aug 08 09:38:01.727313 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper
[Tue Aug 08 09:38:01.727331 2017] [:error] [pid 7082]     return view(request, *args, **kwargs)
[Tue Aug 08 09:38:01.727336 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/utils/decorators.py", line 110, in _wrapped_view
[Tue Aug 08 09:38:01.727341 2017] [:error] [pid 7082]     response = view_func(request, *args, **kwargs)
[Tue Aug 08 09:38:01.727345 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/views/decorators/cache.py", line 57, in _wrapped_view_func
[Tue Aug 08 09:38:01.727349 2017] [:error] [pid 7082]     response = view_func(request, *args, **kwargs)
[Tue Aug 08 09:38:01.727354 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/contrib/auth/views.py", line 51, in login
[Tue Aug 08 09:38:01.727358 2017] [:error] [pid 7082]     auth_login(request, form.get_user())
[Tue Aug 08 09:38:01.727363 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 110, in login
[Tue Aug 08 09:38:01.727377 2017] [:error] [pid 7082]     request.session.cycle_key()
[Tue Aug 08 09:38:01.727398 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/contrib/sessions/backends/base.py", line 285, in cycle_key
[Tue Aug 08 09:38:01.727408 2017] [:error] [pid 7082]     self.create()
[Tue Aug 08 09:38:01.727422 2017] [:error] [pid 7082]   File "/usr/lib/python2.7/site-packages/django/contrib/sessions/backends/cache.py", line 48, in create
[Tue Aug 08 09:38:01.727428 2017] [:error] [pid 7082]     "Unable to create a new session key. "
[Tue Aug 08 09:38:01.727433 2017] [:error] [pid 7082] RuntimeError: Unable to create a new session key. It is likely that the cache is unavailable.


Cache config:

SESSION_ENGINE = 'django.contrib.sessions.backends.cache'

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION' : '172.24.38.2:11211',
    },
}

[root@controller httpd(keystone_admin)]# iptables -L | grep mem
ACCEPT     tcp  --  anywhere             anywhere             multiport dports memcache
[root@controller httpd(keystone_admin)]# ss -tpnl | grep 11211
LISTEN     0      128    127.0.0.1:11211                    *:*                   users:(("memcached",pid=9630,fd=26))
LISTEN     0      128        ::1:11211                   :::*                   users:(("memcached",pid=9630,fd=27))
[root@controller httpd(keystone_admin)]# 


Is this likely to be related? Or something more closely related to my own idiocy?

Comment 20 Radomir Dopieralski 2017-08-08 11:50:27 UTC
That doesn't look related to this particular issue. Can you please report this in a separate bug?

Comment 21 Steve Reichard 2017-08-09 13:18:30 UTC
Radomir, 

Could you explicit say which file should be modified, some partners have asked.

Comment 22 Surya Prabhakar 2017-08-09 13:21:12 UTC
*** Bug 1479737 has been marked as a duplicate of this bug. ***

Comment 23 Radomir Dopieralski 2017-08-09 13:53:24 UTC
The file is different depending on the deployment, but it's always in "/etc/httpd/conf.d/" and has "horizon" in the name. With director, I believe it's "15-horizon.conf".

Note however, that if you modify that file, Puppet will most likely overwrite your changes on the next occasion.

Comment 24 Radomir Dopieralski 2017-08-09 13:54:21 UTC
It can also be "openstack-dashboard" instead of "horizon".

Comment 25 Michael Zamot 2017-08-09 22:13:09 UTC
Can confirm the issue is happening in RHOSP 10. 
Adding the line "WSGIApplicationGroup %{GLOBAL}" to /etc/httpd/conf.d/10-horizon_vhost.conf fixes the issue.

Comment 26 Vincent Misson 2017-08-09 23:02:38 UTC
Hi all,

I just got the same issue (504 Gateway Time-out error) with a fresh RHOP 11 installation.
I just add "WSGIApplicationGroup %{GLOBAL}" to /etc/httpd/conf.d/10-horizon_vhost.conf and restart http : systemctl restart httpd

I have now access to my Horizon dashboard.

Comment 27 Kurt Hey 2017-08-10 20:54:51 UTC
What needs to be patched in puppet for horizon on the undercloud to fix the overcloud controller nodes?  Will this also stick if another puppet run is performed? 

Is there a fix to be back ported for OSP8, 9, 10?

Comment 28 Radomir Dopieralski 2017-08-10 21:53:54 UTC
The pupppet-horizon bug is here: https://bugzilla.redhat.com/show_bug.cgi?id=1478870 and the patch is https://review.openstack.org/#/c/490857/

The downstream patches are https://code.engineering.redhat.com/gerrit/#/c/114799/ and https://code.engineering.redhat.com/gerrit/#/c/114801/ for OSP11 and OSP10, I'm still working on backports further back.

Comment 29 ananth 2017-08-14 18:11:00 UTC
Installed RHEL Server 7.4 and deployed Packstack AIO by subscribing to OSP 11 repos. Hit this issue.

Workaround as suggested, added "WSGIApplicationGroup %{GLOBAL}" to 
/var/log/httpd/15-horizon_vhost.conf resolved the issue for me

Comment 30 Rhys Oxenham 2017-08-14 19:23:10 UTC
(In reply to Kurt Hey from comment #27)
> What needs to be patched in puppet for horizon on the undercloud to fix the
> overcloud controller nodes?  Will this also stick if another puppet run is
> performed? 
> 
> Is there a fix to be back ported for OSP8, 9, 10?

If you make the modified changes as noted in the workaround (see #2), then they won't persist if you run an overcloud update or puppet run. The patches will fix this longer term so it always pulls these changes in by default.

Comment 32 Kurt Hey 2017-08-16 21:06:27 UTC
Can we get a copy of the "downstream patches are https://code.engineering.redhat.com/gerrit/#/c/114799/ and https://code.engineering.redhat.com/gerrit/#/c/114801/ for OSP11 and OSP10"?  We do not have access to this url.

Comment 36 Audra Cooper 2017-08-24 16:40:44 UTC
I applied this patch post Upgrade to OSP10 and I can access Horizon, but there is an error accessing anything under the Admin menu.  Most catagories in the top menu under Admin are missing.  See attachment.

Comment 37 Audra Cooper 2017-08-24 16:45:57 UTC
Created attachment 1317841 [details]
Horizon Admin menu post Upgrade to OSP10

Comment 38 Radomir Dopieralski 2017-08-24 17:01:30 UTC
That looks like a separate issue. Can you please create a new bugzilla ticket for it?

Comment 39 Audra Cooper 2017-08-24 17:31:18 UTC
(In reply to Radomir Dopieralski from comment #38)
> That looks like a separate issue. Can you please create a new bugzilla
> ticket for it?

Will do..

Comment 43 Ido Ovadia 2017-09-05 15:19:22 UTC
Verified
========
python-django-horizon-11.0.3-3.el7ost.noarch

Comment 45 errata-xmlrpc 2017-09-13 21:37:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2717

Comment 46 Edu Alcaniz 2017-09-26 15:22:09 UTC
*** Bug 1496173 has been marked as a duplicate of this bug. ***