Bug 1445886

Summary: OSP9 -> OSP10 -> OSP11 upgrade fails during major-upgrade-composable-steps because keystone admin is not reachable
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 11.0 (Ocata)CC: aschultz, dbecker, dmacpher, emacchi, jcoufal, jschluet, lbopf, mandreou, mburns, morazi, rhel-osp-director-maint, sasha, sathlang, sclewis
Target Milestone: asyncKeywords: Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-6.0.0-11.el7ost Doc Type: Bug Fix
Doc Text:
An issue with the OpenStack Identity (keystone) admin API hindered the upgrade path from OpenStack Platform 10 to 11. This fix corrects the issue with the keystone API. This provides a clear upgrade path. This issue affects customers who previously upgraded from OpenStack Platform 9 to 10 and now aim to upgrade to OpenStack Platform 11.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-15 16:56:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2017-04-26 17:31:20 UTC
Description of problem:
OSP9 -> OSP10 -> OSP11 upgrade fails during major-upgrade-composable-steps because keystone admin is not reachable

Version-Release number of selected component (if applicable):


How reproducible:
4/4

Steps to Reproduce:
1. Deploy OSP9
2. Upgrade to latest OSP10
3. Upgrade to OSP11

Actual results:
major-upgrade-composable-steps fails:

    Error: Failed to apply catalog: Execution of '/usr/bin/openstack domain list --quiet --format csv' returned 1: Unable to establish connection to http://controller-0.ctlplane.localdomain:35357/v3/domains?: HTTPConnectionPool(host='controller-0.ctlplane.localdomain', port=35357): Max retries exceeded with url: /v3/domains (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x3d96990>: Failed to establish a new connection: [Errno 111] Connection refused',)) (tried 46, for a total of 170 seconds)

Expected results:
Upgrade completes fine.

Additional info:
httpd is unable to start because it's missing mod_ssl so Keystone is unreachable.

This seems to be a regression introduced in the 2017-04-24 build.

Comment 1 Alexander Chuzhoy 2017-04-26 18:04:42 UTC
Reproduced.

Comment 3 Marius Cornea 2017-04-26 19:10:56 UTC
[root@controller-0 ~]# httpd
httpd: Syntax error on line 38 of /etc/httpd/conf/httpd.conf: Syntax error on line 1 of /etc/httpd/conf.modules.d/ssl.load: Cannot load modules/mod_ssl.so into server: /etc/httpd/modules/mod_ssl.so: cannot open shared object file: No such file or directory
[root@controller-0 ~]# rpm -qa | grep mod_ssl
[root@controller-0 ~]# cat /etc/httpd/conf.modules.d/ssl.load
LoadModule ssl_module modules/mod_ssl.so
[root@controller-0 ~]# 
[root@controller-0 ~]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/httpd.service.d
           └─openstack-dashboard.conf
   Active: failed (Result: exit-code) since Wed 2017-04-26 18:53:53 UTC; 16min ago
     Docs: man:httpd(8)
           man:apachectl(8)
  Process: 407849 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE)
  Process: 407846 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
  Process: 407791 ExecStartPre=/usr/bin/python /usr/share/openstack-dashboard/manage.py compress --force (code=exited, status=0/SUCCESS)
  Process: 407761 ExecStartPre=/usr/bin/python /usr/share/openstack-dashboard/manage.py collectstatic --noinput --clear (code=exited, status=0/SUCCESS)
 Main PID: 407846 (code=exited, status=1/FAILURE)

Apr 26 18:53:45 controller-0.localdomain python[407761]: Copying '/usr/share/javascript/jquery_ui/themes/ui-lightness/images/animated-overlay.gif'
Apr 26 18:53:45 controller-0.localdomain python[407761]: Copying '/usr/share/javascript/jquery_ui/themes/ui-lightness/images/ui-bg_gloss-wave_35_f6a828_500x100.png'
Apr 26 18:53:45 controller-0.localdomain python[407761]: Copying '/usr/share/javascript/jquery_ui/themes/ui-lightness/images/ui-icons_ffd27a_256x240.png'
Apr 26 18:53:45 controller-0.localdomain python[407761]: Copying '/usr/share/javascript/jquery_ui/themes/ui-lightness/images/ui-bg_glass_100_fdf5ce_1x400.png'
Apr 26 18:53:53 controller-0.localdomain systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
Apr 26 18:53:53 controller-0.localdomain kill[407849]: kill: cannot find process ""
Apr 26 18:53:53 controller-0.localdomain systemd[1]: httpd.service: control process exited, code=exited status=1
Apr 26 18:53:53 controller-0.localdomain systemd[1]: Failed to start The Apache HTTP Server.
Apr 26 18:53:53 controller-0.localdomain systemd[1]: Unit httpd.service entered failed state.
Apr 26 18:53:53 controller-0.localdomain systemd[1]: httpd.service failed.

Comment 4 Alex Schultz 2017-04-26 19:20:43 UTC
Apr 26 18:56:56 controller-0 os-collect-config: #033[mNotice: /Stage[main]/Apache::Mod::Ssl/Apache::Mod[ssl]/Package[mod_ssl]/ensure: created#033[0m

mod ssl package is missing and if you use tripleo::packages, the rpm provider is a noop so it never gets installed.

Comment 5 Alex Schultz 2017-04-26 19:28:55 UTC
python-cradox and python-aodhclient are also missing and attempting to be installed via puppet but are being nooped by the use of OS::TripleO::Services::TripleoPackages

Comment 6 Alex Schultz 2017-04-26 19:37:49 UTC
FYI, this was caused by https://review.openstack.org/#/c/458033/1 which added the inclusion of apache::mod::ssl to all the api classes in puppet-tripleo. This has the side effect of pulling (and activating) in mod_ssl which when combined with OS::TripleO::Services::TripleoPackages results in apache not being able to start because the mod_ssl module was not actually installed.

Comment 7 Sofer Athlan-Guyot 2017-04-26 21:21:21 UTC
Adding a missing backport from master to stable/ocata.

Comment 13 errata-xmlrpc 2017-06-15 16:56:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1475