Bug 1524422

Summary: OSP10 -> OSP11 upgrade: upgrade fails during 'Setup gnocchi db during upgrade' task because httpd is stopped and Keystone is unreacheable
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: puppet-tripleoAssignee: Mehdi ABAAKOUK <mabaakou>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: urgent Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: abregman, augol, dbecker, jdanjou, jjoyce, jschluet, mabaakou, mandreou, mburns, morazi, pkilambi, rhel-osp-director-maint, slinaber, tvignaud
Target Milestone: z5Keywords: TestOnly, Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-6.5.4-2.el7ost Doc Type: No Doc Update
Doc Text:
-
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-18 17:02:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1517977    
Attachments:
Description Flags
sosreport controller-0 none

Description Marius Cornea 2017-12-11 13:26:14 UTC
Description of problem:
OSP10 -> OSP11 upgrade: upgrade fails during 'Setup gnocchi db during upgrade' task because httpd is stopped and Keystone is unreacheable:

[stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.AllNodesDeploySteps.ControllerUpgrade_Step5.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 6f01d508-664a-4a44-9970-490c9ab2ea34
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [set is_bootstrap_node fact] **********************************************
    ok: [localhost]
    
    TASK [Setup gnocchi db during upgrade] *****************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["gnocchi-upgrade"], "delta": "0:00:01.794639", "end": "2017-12-11 13:08:55.898215", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2017-12-11 13:08:54.103576", "stderr": "Option \"metric_processing_delay\" from group \"storage\" is deprecated. Use option \"metric_processing_delay\" from group \"metricd\".", "stderr_lines": ["Option \"metric_processing_delay\" from group \"storage\" is deprecated. Use option \"metric_processing_delay\" from group \"metricd\"."], "stdout": "", "stdout_lines": []}
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/a0bdd5f0-6d02-4a17-878b-c869d968427a_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=30   changed=27   unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |


Checking gnocchi-upgrade log on the first controller we can spot:

2017-12-11 13:08:55.788 459449 CRITICAL gnocchi [-] ClientException: Authorization Failure. Authorization Failed: Service Unavailable (HTTP 503)
2017-12-11 13:08:55.788 459449 ERROR gnocchi Traceback (most recent call last):
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/bin/gnocchi-upgrade", line 10, in <module>
2017-12-11 13:08:55.788 459449 ERROR gnocchi     sys.exit(upgrade())
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 70, in upgrade
2017-12-11 13:08:55.788 459449 ERROR gnocchi     s = storage.get_driver(conf)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 144, in get_driver
2017-12-11 13:08:55.788 459449 ERROR gnocchi     conf.incoming)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/storage/incoming/swift.py", line 36, in __init__
2017-12-11 13:08:55.788 459449 ERROR gnocchi     self.swift.put_container(self.MEASURE_PREFIX)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1755, in put_container
2017-12-11 13:08:55.788 459449 ERROR gnocchi     query_string=query_string)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1661, in _retry
2017-12-11 13:08:55.788 459449 ERROR gnocchi     self.url, self.token = self.get_auth()
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1613, in get_auth
2017-12-11 13:08:55.788 459449 ERROR gnocchi     timeout=self.timeout)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 669, in get_auth
2017-12-11 13:08:55.788 459449 ERROR gnocchi     auth_version=auth_version)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 581, in get_auth_keystone
2017-12-11 13:08:55.788 459449 ERROR gnocchi     raise ClientException('Authorization Failure. %s' % err)
2017-12-11 13:08:55.788 459449 ERROR gnocchi ClientException: Authorization Failure. Authorization Failed: Service Unavailable (HTTP 503)

At this point httpd is stopped on the first controller so Keystone is unreacheable:

[root@controller-0 heat-admin]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/httpd.service.d
           └─openstack-dashboard.conf
   Active: inactive (dead) since Mon 2017-12-11 12:53:14 UTC; 30min ago
     Docs: man:httpd(8)
           man:apachectl(8)
 Main PID: 102032 (code=exited, status=0/SUCCESS)
   Status: "Total requests: 1733; Current requests/sec: 0; Current traffic:   0 B/sec"

Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-icons_98d2fb_256x240.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-bg_diagonals-thick_15_0b3e6f_40x40.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-icons_9ccdfc_256x240.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-bg_flat_40_292929_40x100.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/cupertino/theme.css'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/cupertino/jquery-ui.css'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/cupertino/jquery-ui.min.css'
Dec 11 11:40:01 controller-0 systemd[1]: Started The Apache HTTP Server.
Dec 11 12:53:12 controller-0 systemd[1]: Stopping The Apache HTTP Server...
Dec 11 12:53:14 controller-0 systemd[1]: Stopped The Apache HTTP Server.


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.2.4-3.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 computes + 2 networker nodes
2. Upgrade to OSP11

Actual results:
major-upgrade-composable-steps.yaml fails when gnocchi-upgrade runs because keystone is unreachable(httpd is stopped on controllers)

Expected results:
Upgrade doesn't fail.

Additional info:

This issue cannot be reproduced when the deployment contains Ceph nodes which leads me to believe that this issue is particular to environments where Gnocchi uses Swift as backend.

Comment 1 Marius Cornea 2017-12-11 13:31:32 UTC
I think the issue here is that we stop httpd in step1:

https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/puppet/services/gnocchi-api.yaml#L137-L139

but when running  gnocchi-upgrade in step5 it cannot authenticate against Keystone(running under httpd) as httpd is stopped:

https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/puppet/services/gnocchi-api.yaml#L147-L150

Comment 2 Marius Cornea 2017-12-11 13:39:08 UTC
Created attachment 1366017 [details]
sosreport controller-0

Comment 10 Mehdi ABAAKOUK 2017-12-18 14:46:29 UTC
I have already done all backports and built the fixed package. That's why I put it to MODIFIED.

Comment 17 errata-xmlrpc 2018-05-18 17:02:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1627