Bug 1745857

Summary: Minor update fails at step5 on controllers during gnocchi_db_sync
Product: Red Hat OpenStack Reporter: Luca Miccini <lmiccini>
Component: openstack-tripleo-heat-templatesAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: dciabrin, dhill, mbollo, mburns, michele, nchandek, pkomarov, rbartal, shdunne, slinaber
Target Milestone: z9Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.4.1-5.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1761367 1761369 (view as bug list) Environment:
Last Closed: 2019-11-07 14:01:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1761367, 1761369    

Description Luca Miccini 2019-08-27 06:03:57 UTC
Description of problem:

Reproduced both here:
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-7.6-latest-HA-ipv4/1/

as well as in our labs.


Minor update of controllers from 2019-01-10.1 to 2019-08-19.1 fails during step5:

2019-08-24 03:34:13 |         "Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label', 'config_id=tripleo_step5', '--label', 'container_name=gnocchi_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 0, \"image\": \"192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=325849970e594f11baa8740a2e41134b\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/gnocchi:/var/lib/gnocchi\", \"/var/log/containers/gnocchi:/var/log/gnocchi\", \"/var/log/containers/httpd/gnocchi-api:/var/log/httpd\", \"/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro\"], \"net\": \"host\", \"detach\": false, \"privileged\": false}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=325849970e594f11baa8740a2e41134b', '--net=host', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/gnocchi:/var/lib/gnocchi', '--volume=/var/log/containers/gnocchi:/var/log/gnocchi', '--volume=/var/log/containers/httpd/gnocchi-api:/var/log/httpd', '--volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro', '192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1']. [1]", 



[root@controller-0 ceilometer]# docker ps -a |grep "Exited (1)"
672ec2bfb03b        192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2019-08-19.1          "/usr/bin/bootstra..."   14 hours ago        Exited (1) 14 hours ago                       ceilometer_gnocchi_upgrade
09e7b80bf056        192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1                 "kolla_start"            14 hours ago        Exited (1) 14 hours ago                       gnocchi_db_sync

[root@controller-0 ceilometer]# docker ps -a |grep -i "unhealthy"
284aab369317        192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1                 "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       gnocchi_api
ed16af2bddaa        192.168.24.1:8787/rhosp13/openstack-gnocchi-metricd:2019-08-19.1             "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       gnocchi_metricd
e2813709c2aa        192.168.24.1:8787/rhosp13/openstack-aodh-evaluator:2019-08-19.1              "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       aodh_evaluator
462aab694b38        192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2019-08-19.1          "kolla_start"            14 hours ago        Up 14 hours (unhealthy)                       ceilometer_agent_central


2019-08-26 12:57:51.065 12 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:42
2019-08-26 15:24:47.000 12 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:45
2019-08-26 15:26:47.614 12 CRITICAL ceilometer [-] Unhandled error: ClientException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
 (HTTP 504)
2019-08-26 15:26:47.614 12 ERROR ceilometer Traceback (most recent call last):
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/bin/ceilometer-upgrade", line 10, in <module>
2019-08-26 15:26:47.614 12 ERROR ceilometer     sys.exit(upgrade())
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py", line 60, in upgrade
2019-08-26 15:26:47.614 12 ERROR ceilometer     )(gnocchi_client.upgrade_resource_types, conf)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 295, in call
2019-08-26 15:26:47.614 12 ERROR ceilometer     start_time=start_time)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 252, in iter
2019-08-26 15:26:47.614 12 ERROR ceilometer     return fut.result()
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result
2019-08-26 15:26:47.614 12 ERROR ceilometer     return self.__get_result()
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
2019-08-26 15:26:47.614 12 ERROR ceilometer     result = fn(*args, **kwargs)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/ceilometer/gnocchi_client.py", line 228, in upgrade_resource_types
2019-08-26 15:26:47.614 12 ERROR ceilometer     gnocchi.resource_type.get(name=name)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/resource_type.py", line 44, in get
2019-08-26 15:26:47.614 12 ERROR ceilometer     headers={'Content-Type': "application/json"}).json()
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/base.py", line 37, in _get
2019-08-26 15:26:47.614 12 ERROR ceilometer     return self.client.api.get(*args, **kwargs)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 304, in get
2019-08-26 15:26:47.614 12 ERROR ceilometer     return self.request(url, 'GET', **kwargs)
2019-08-26 15:26:47.614 12 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/client.py", line 52, in request
2019-08-26 15:26:47.614 12 ERROR ceilometer     raise exceptions.from_response(resp, method)
2019-08-26 15:26:47.614 12 ERROR ceilometer ClientException: <html><body><h1>504 Gateway Time-out</h1>
2019-08-26 15:26:47.614 12 ERROR ceilometer The server didn't respond in time.
2019-08-26 15:26:47.614 12 ERROR ceilometer </body></html>
2019-08-26 15:26:47.614 12 ERROR ceilometer  (HTTP 504)
2019-08-26 15:26:47.614 12 ERROR ceilometer 
2019-08-26 15:27:18.572 77 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:45
2019-08-26 15:29:19.228 77 CRITICAL ceilometer [-] Unhandled error: ClientException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
 (HTTP 504)


 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Slave controller-0
   redis-bundle-1       (ocf::heartbeat:redis): Master controller-1
   redis-bundle-2       (ocf::heartbeat:redis): Slave controller-2

Comment 2 Luca Miccini 2019-08-27 06:32:35 UTC
so redis seems to be down according to haproxy.

This looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1730723, but the version of puppet-tripleo is newer than that.


We noticed that the timestamp of haproxy.cfg is more recent than the last start of the haproxy-bundle resource:

-rw-r-----. 1 root root 13386 Aug 26 15:10 haproxy.cfg

 #StartedAt": "2019-08-26T15:06:19.346712501Z"

restarting the haproxy-bundle allowed haproxy to pick up the new config file, and the db sync worked afterwards.

Comment 3 Luca Miccini 2019-08-28 06:07:03 UTC
we have a test patch that seem to solve this issue.

Comment 4 Luca Miccini 2019-09-03 14:20:19 UTC
*** Bug 1737911 has been marked as a duplicate of this bug. ***

Comment 5 Luca Miccini 2019-09-03 14:20:46 UTC
*** Bug 1748364 has been marked as a duplicate of this bug. ***

Comment 7 Michele Baldessari 2019-10-12 08:14:34 UTC
*** Bug 1760882 has been marked as a duplicate of this bug. ***

Comment 12 pkomarov 2019-10-24 11:10:56 UTC
Verified , 

(undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
2019-10-23.1(undercloud) 

[stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| dd6d9c11-c980-47d0-b519-ddb9c83fe68b | overcloud  | 09d59df841064155ac358df28a99a4e4 | UPDATE_COMPLETE | 2019-10-23T22:29:23Z | 2019-10-24T09:14:04Z |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+




PLAY RECAP *********************************************************************
controller-0               : ok=285  changed=125  unreachable=0    failed=0   
controller-1               : ok=276  changed=122  unreachable=0    failed=0   
controller-2               : ok=276  changed=122  unreachable=0    failed=0   

Thursday 24 October 2019  06:45:20 -0400 (0:00:00.068)       1:18:39.737 ****** 
=============================================================================== 

Updated nodes - Controller
Success


(undercloud) [stack@undercloud-0 ~]$ ansible controller -mshell -b -a'docker logs haproxy_restart_bundle|grep locally'
 [WARNING]: Found both group and host with same name: undercloud

controller-0 | SUCCESS | rc=0 >>
Thu Oct 24 10:32:24 UTC 2019: Restarting haproxy-bundle locally on 'controller-0'

controller-2 | SUCCESS | rc=0 >>
Thu Oct 24 09:43:39 UTC 2019: Restarting haproxy-bundle locally on 'controller-2'

controller-1 | SUCCESS | rc=0 >>
Thu Oct 24 10:06:36 UTC 2019: Restarting haproxy-bundle locally on 'controller-1'

Comment 13 Michele Baldessari 2019-10-28 13:40:47 UTC
*** Bug 1766144 has been marked as a duplicate of this bug. ***

Comment 14 Alex McLeod 2019-10-31 11:32:40 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 16 errata-xmlrpc 2019-11-07 14:01:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3794