Bug 1524422 - OSP10 -> OSP11 upgrade: upgrade fails during 'Setup gnocchi db during upgrade' task because httpd is stopped and Keystone is unreacheable
Summary: OSP10 -> OSP11 upgrade: upgrade fails during 'Setup gnocchi db during upgrad...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: z5
: 11.0 (Ocata)
Assignee: Mehdi ABAAKOUK
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks: 1517977
TreeView+ depends on / blocked
 
Reported: 2017-12-11 13:26 UTC by Marius Cornea
Modified: 2022-07-09 09:47 UTC (History)
14 users (show)

Fixed In Version: puppet-tripleo-6.5.4-2.el7ost
Doc Type: No Doc Update
Doc Text:
-
Clone Of:
Environment:
Last Closed: 2018-05-18 17:02:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport controller-0 (14.73 MB, application/x-xz)
2017-12-11 13:39 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-16936 0 None None None 2022-07-09 09:47:28 UTC
Red Hat Product Errata RHSA-2018:1627 0 None None None 2018-05-18 17:04:45 UTC

Description Marius Cornea 2017-12-11 13:26:14 UTC
Description of problem:
OSP10 -> OSP11 upgrade: upgrade fails during 'Setup gnocchi db during upgrade' task because httpd is stopped and Keystone is unreacheable:

[stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.AllNodesDeploySteps.ControllerUpgrade_Step5.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 6f01d508-664a-4a44-9970-490c9ab2ea34
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [set is_bootstrap_node fact] **********************************************
    ok: [localhost]
    
    TASK [Setup gnocchi db during upgrade] *****************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["gnocchi-upgrade"], "delta": "0:00:01.794639", "end": "2017-12-11 13:08:55.898215", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2017-12-11 13:08:54.103576", "stderr": "Option \"metric_processing_delay\" from group \"storage\" is deprecated. Use option \"metric_processing_delay\" from group \"metricd\".", "stderr_lines": ["Option \"metric_processing_delay\" from group \"storage\" is deprecated. Use option \"metric_processing_delay\" from group \"metricd\"."], "stdout": "", "stdout_lines": []}
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/a0bdd5f0-6d02-4a17-878b-c869d968427a_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=30   changed=27   unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |


Checking gnocchi-upgrade log on the first controller we can spot:

2017-12-11 13:08:55.788 459449 CRITICAL gnocchi [-] ClientException: Authorization Failure. Authorization Failed: Service Unavailable (HTTP 503)
2017-12-11 13:08:55.788 459449 ERROR gnocchi Traceback (most recent call last):
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/bin/gnocchi-upgrade", line 10, in <module>
2017-12-11 13:08:55.788 459449 ERROR gnocchi     sys.exit(upgrade())
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 70, in upgrade
2017-12-11 13:08:55.788 459449 ERROR gnocchi     s = storage.get_driver(conf)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 144, in get_driver
2017-12-11 13:08:55.788 459449 ERROR gnocchi     conf.incoming)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/storage/incoming/swift.py", line 36, in __init__
2017-12-11 13:08:55.788 459449 ERROR gnocchi     self.swift.put_container(self.MEASURE_PREFIX)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1755, in put_container
2017-12-11 13:08:55.788 459449 ERROR gnocchi     query_string=query_string)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1661, in _retry
2017-12-11 13:08:55.788 459449 ERROR gnocchi     self.url, self.token = self.get_auth()
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1613, in get_auth
2017-12-11 13:08:55.788 459449 ERROR gnocchi     timeout=self.timeout)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 669, in get_auth
2017-12-11 13:08:55.788 459449 ERROR gnocchi     auth_version=auth_version)
2017-12-11 13:08:55.788 459449 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 581, in get_auth_keystone
2017-12-11 13:08:55.788 459449 ERROR gnocchi     raise ClientException('Authorization Failure. %s' % err)
2017-12-11 13:08:55.788 459449 ERROR gnocchi ClientException: Authorization Failure. Authorization Failed: Service Unavailable (HTTP 503)

At this point httpd is stopped on the first controller so Keystone is unreacheable:

[root@controller-0 heat-admin]# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/httpd.service.d
           └─openstack-dashboard.conf
   Active: inactive (dead) since Mon 2017-12-11 12:53:14 UTC; 30min ago
     Docs: man:httpd(8)
           man:apachectl(8)
 Main PID: 102032 (code=exited, status=0/SUCCESS)
   Status: "Total requests: 1733; Current requests/sec: 0; Current traffic:   0 B/sec"

Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-icons_98d2fb_256x240.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-bg_diagonals-thick_15_0b3e6f_40x40.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-icons_9ccdfc_256x240.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/dot-luv/images/ui-bg_flat_40_292929_40x100.png'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/cupertino/theme.css'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/cupertino/jquery-ui.css'
Dec 11 11:39:52 controller-0 python[101287]: Copying '/usr/share/javascript/jquery_ui/themes/cupertino/jquery-ui.min.css'
Dec 11 11:40:01 controller-0 systemd[1]: Started The Apache HTTP Server.
Dec 11 12:53:12 controller-0 systemd[1]: Stopping The Apache HTTP Server...
Dec 11 12:53:14 controller-0 systemd[1]: Stopped The Apache HTTP Server.


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.2.4-3.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 computes + 2 networker nodes
2. Upgrade to OSP11

Actual results:
major-upgrade-composable-steps.yaml fails when gnocchi-upgrade runs because keystone is unreachable(httpd is stopped on controllers)

Expected results:
Upgrade doesn't fail.

Additional info:

This issue cannot be reproduced when the deployment contains Ceph nodes which leads me to believe that this issue is particular to environments where Gnocchi uses Swift as backend.

Comment 1 Marius Cornea 2017-12-11 13:31:32 UTC
I think the issue here is that we stop httpd in step1:

https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/puppet/services/gnocchi-api.yaml#L137-L139

but when running  gnocchi-upgrade in step5 it cannot authenticate against Keystone(running under httpd) as httpd is stopped:

https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/puppet/services/gnocchi-api.yaml#L147-L150

Comment 2 Marius Cornea 2017-12-11 13:39:08 UTC
Created attachment 1366017 [details]
sosreport controller-0

Comment 10 Mehdi ABAAKOUK 2017-12-18 14:46:29 UTC
I have already done all backports and built the fixed package. That's why I put it to MODIFIED.

Comment 17 errata-xmlrpc 2018-05-18 17:02:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1627


Note You need to log in before you can comment on or make changes to this bug.