Bug 1745857 - Minor update fails at step5 on controllers during gnocchi_db_sync
Summary: Minor update fails at step5 on controllers during gnocchi_db_sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: z9
: 13.0 (Queens)
Assignee: Michele Baldessari
QA Contact: pkomarov
URL:
Whiteboard:
: 1737911 1748364 1760882 1766144 (view as bug list)
Depends On:
Blocks: 1761367 1761369
TreeView+ depends on / blocked
 
Reported: 2019-08-27 06:03 UTC by Luca Miccini
Modified: 2019-11-14 07:30 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-5.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1761367 1761369 (view as bug list)
Environment:
Last Closed: 2019-11-07 14:01:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1841629 0 None None None 2019-08-27 17:34:15 UTC
OpenStack gerrit 682315 0 'None' MERGED HA: fix <service>_restart_bundle with minor update workflow 2021-01-03 23:44:27 UTC
Red Hat Knowledge Base (Solution) 4425921 0 None None None 2019-11-08 13:49:14 UTC
Red Hat Knowledge Base (Solution) 4562441 0 None None None 2019-11-08 13:47:25 UTC
Red Hat Product Errata RHBA-2019:3794 0 None None None 2019-11-07 14:02:01 UTC

Description Luca Miccini 2019-08-27 06:03:57 UTC
Description of problem:

Reproduced both here:
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-7.6-latest-HA-ipv4/1/

as well as in our labs.


Minor update of controllers from 2019-01-10.1 to 2019-08-19.1 fails during step5:

2019-08-24 03:34:13 |         "Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label', 'config_id=tripleo_step5', '--label', 'container_name=gnocchi_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 0, \"image\": \"192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=325849970e594f11baa8740a2e41134b\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/gnocchi:/var/lib/gnocchi\", \"/var/log/containers/gnocchi:/var/log/gnocchi\", \"/var/log/containers/httpd/gnocchi-api:/var/log/httpd\", \"/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro\"], \"net\": \"host\", \"detach\": false, \"privileged\": false}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=325849970e594f11baa8740a2e41134b', '--net=host', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/gnocchi:/var/lib/gnocchi', '--volume=/var/log/containers/gnocchi:/var/log/gnocchi', '--volume=/var/log/containers/httpd/gnocchi-api:/var/log/httpd', '--volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro', '192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1']. [1]", 



[root@controller-0 ceilometer]# docker ps -a |grep "Exited (1)"
672ec2bfb03b        192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2019-08-19.1          "/usr/bin/bootstra..."   14 hours ago        Exited (1) 14 hours ago                       ceilometer_gnocchi_upgrade
09e7b80bf056        192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1                 "kolla_start"            14 hours ago        Exited (1) 14 hours ago                       gnocchi_db_sync

[root@controller-0 ceilometer]# docker ps -a |grep -i "unhealthy"
284aab369317        192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1                 "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       gnocchi_api
ed16af2bddaa        192.168.24.1:8787/rhosp13/openstack-gnocchi-metricd:2019-08-19.1             "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       gnocchi_metricd
e2813709c2aa        192.168.24.1:8787/rhosp13/openstack-aodh-evaluator:2019-08-19.1              "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       aodh_evaluator
462aab694b38        192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2019-08-19.1          "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       ceilometer_agent_central


2019-08-26 12:57:51.065 12 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:42
2019-08-26 15:24:47.000 12 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:45
2019-08-26 15:26:47.614 12 CRITICAL ceilometer [-] Unhandled error: ClientException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
 (HTTP 504)
2019-08-26 15:26:47.614 12 ERROR ceilometer Traceback (most recent call last):
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/bin/ceilometer-upgrade", line 10, in <module>
2019-08-26 15:26:47.614 12 ERROR ceilometer     sys.exit(upgrade())
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py", line 60, in upgrade
2019-08-26 15:26:47.614 12 ERROR ceilometer     )(gnocchi_client.upgrade_resource_types, conf)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 295, in call
2019-08-26 15:26:47.614 12 ERROR ceilometer     start_time=start_time)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 252, in iter
2019-08-26 15:26:47.614 12 ERROR ceilometer     return fut.result()
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result
2019-08-26 15:26:47.614 12 ERROR ceilometer     return self.__get_result()
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
2019-08-26 15:26:47.614 12 ERROR ceilometer     result = fn(*args, **kwargs)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/ceilometer/gnocchi_client.py", line 228, in upgrade_resource_types
2019-08-26 15:26:47.614 12 ERROR ceilometer     gnocchi.resource_type.get(name=name)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/resource_type.py", line 44, in get
2019-08-26 15:26:47.614 12 ERROR ceilometer     headers={'Content-Type': "application/json"}).json()
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/base.py", line 37, in _get
2019-08-26 15:26:47.614 12 ERROR ceilometer     return self.client.api.get(*args, **kwargs)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 304, in get
2019-08-26 15:26:47.614 12 ERROR ceilometer     return self.request(url, 'GET', **kwargs)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/client.py", line 52, in request
2019-08-26 15:26:47.614 12 ERROR ceilometer     raise exceptions.from_response(resp, method)
2019-08-26 15:26:47.614 12 ERROR ceilometer ClientException: <html><body><h1>504 Gateway Time-out</h1>
2019-08-26 15:26:47.614 12 ERROR ceilometer The server didn't respond in time.
2019-08-26 15:26:47.614 12 ERROR ceilometer </body></html>
2019-08-26 15:26:47.614 12 ERROR ceilometer  (HTTP 504)
2019-08-26 15:26:47.614 12 ERROR ceilometer 
2019-08-26 15:27:18.572 77 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:45
2019-08-26 15:29:19.228 77 CRITICAL ceilometer [-] Unhandled error: ClientException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
 (HTTP 504)


 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Slave controller-0
   redis-bundle-1       (ocf::heartbeat:redis): Master controller-1
   redis-bundle-2       (ocf::heartbeat:redis): Slave controller-2

Comment 2 Luca Miccini 2019-08-27 06:32:35 UTC
so redis seems to be down according to haproxy.

This looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1730723, but the version of puppet-tripleo is newer than that.


We noticed that the timestamp of haproxy.cfg is more recent than the last start of the haproxy-bundle resource:

-rw-r-----. 1 root root 13386 Aug 26 15:10 haproxy.cfg

 #StartedAt": "2019-08-26T15:06:19.346712501Z"

restarting the haproxy-bundle allowed haproxy to pick up the new config file, and the db sync worked afterwards.

Comment 3 Luca Miccini 2019-08-28 06:07:03 UTC
we have a test patch that seem to solve this issue.

Comment 4 Luca Miccini 2019-09-03 14:20:19 UTC
*** Bug 1737911 has been marked as a duplicate of this bug. ***

Comment 5 Luca Miccini 2019-09-03 14:20:46 UTC
*** Bug 1748364 has been marked as a duplicate of this bug. ***

Comment 7 Michele Baldessari 2019-10-12 08:14:34 UTC
*** Bug 1760882 has been marked as a duplicate of this bug. ***

Comment 12 pkomarov 2019-10-24 11:10:56 UTC
Verified , 

(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2019-10-23.1(undercloud) 

[stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| dd6d9c11-c980-47d0-b519-ddb9c83fe68b | overcloud  | 09d59df841064155ac358df28a99a4e4 | UPDATE_COMPLETE | 2019-10-23T22:29:23Z | 2019-10-24T09:14:04Z |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+




PLAY RECAP *********************************************************************
controller-0               : ok=285  changed=125  unreachable=0    failed=0   
controller-1               : ok=276  changed=122  unreachable=0    failed=0   
controller-2               : ok=276  changed=122  unreachable=0    failed=0   

Thursday 24 October 2019  06:45:20 -0400 (0:00:00.068)       1:18:39.737 ****** 
=============================================================================== 

Updated nodes - Controller
Success


(undercloud) [stack@undercloud-0 ~]$ ansible controller -mshell -b -a'docker logs haproxy_restart_bundle|grep locally'
 [WARNING]: Found both group and host with same name: undercloud

controller-0 | SUCCESS | rc=0 >>
Thu Oct 24 10:32:24 UTC 2019: Restarting haproxy-bundle locally on 'controller-0'

controller-2 | SUCCESS | rc=0 >>
Thu Oct 24 09:43:39 UTC 2019: Restarting haproxy-bundle locally on 'controller-2'

controller-1 | SUCCESS | rc=0 >>
Thu Oct 24 10:06:36 UTC 2019: Restarting haproxy-bundle locally on 'controller-1'

Comment 13 Michele Baldessari 2019-10-28 13:40:47 UTC
*** Bug 1766144 has been marked as a duplicate of this bug. ***

Comment 14 Alex McLeod 2019-10-31 11:32:40 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 16 errata-xmlrpc 2019-11-07 14:01:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3794


Note You need to log in before you can comment on or make changes to this bug.