Description of problem: Reproduced both here: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-7.6-latest-HA-ipv4/1/ as well as in our labs. Minor update of controllers from 2019-01-10.1 to 2019-08-19.1 fails during step5: 2019-08-24 03:34:13 | "Error running ['docker', 'run', '--name', 'gnocchi_db_sync', '--label', 'config_id=tripleo_step5', '--label', 'container_name=gnocchi_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 0, \"image\": \"192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=325849970e594f11baa8740a2e41134b\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/gnocchi:/var/lib/gnocchi\", \"/var/log/containers/gnocchi:/var/log/gnocchi\", \"/var/log/containers/httpd/gnocchi-api:/var/log/httpd\", \"/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro\"], \"net\": \"host\", \"detach\": false, \"privileged\": false}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=325849970e594f11baa8740a2e41134b', '--net=host', '--privileged=false', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/kolla/config_files/gnocchi_db_sync.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/gnocchi/:/var/lib/kolla/config_files/src:ro', '--volume=/var/lib/gnocchi:/var/lib/gnocchi', '--volume=/var/log/containers/gnocchi:/var/log/gnocchi', '--volume=/var/log/containers/httpd/gnocchi-api:/var/log/httpd', '--volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro', '192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1']. [1]", [root@controller-0 ceilometer]# docker ps -a |grep "Exited (1)" 672ec2bfb03b 192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2019-08-19.1 "/usr/bin/bootstra..." 14 hours ago Exited (1) 14 hours ago ceilometer_gnocchi_upgrade 09e7b80bf056 192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1 "kolla_start" 14 hours ago Exited (1) 14 hours ago gnocchi_db_sync [root@controller-0 ceilometer]# docker ps -a |grep -i "unhealthy" 284aab369317 192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1 "kolla_start" 14 hours ago Up 14 hours (unhealthy) gnocchi_api ed16af2bddaa 192.168.24.1:8787/rhosp13/openstack-gnocchi-metricd:2019-08-19.1 "kolla_start" 14 hours ago Up 14 hours (unhealthy) gnocchi_metricd e2813709c2aa 192.168.24.1:8787/rhosp13/openstack-aodh-evaluator:2019-08-19.1 "kolla_start" 14 hours ago Up 14 hours (unhealthy) aodh_evaluator 462aab694b38 192.168.24.1:8787/rhosp13/openstack-ceilometer-central:2019-08-19.1 "kolla_start" 14 hours ago Up 14 hours (unhealthy) ceilometer_agent_central 2019-08-26 12:57:51.065 12 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:42 2019-08-26 15:24:47.000 12 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:45 2019-08-26 15:26:47.614 12 CRITICAL ceilometer [-] Unhandled error: ClientException: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> (HTTP 504) 2019-08-26 15:26:47.614 12 ERROR ceilometer Traceback (most recent call last): 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/bin/ceilometer-upgrade", line 10, in <module> 2019-08-26 15:26:47.614 12 ERROR ceilometer sys.exit(upgrade()) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py", line 60, in upgrade 2019-08-26 15:26:47.614 12 ERROR ceilometer )(gnocchi_client.upgrade_resource_types, conf) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 295, in call 2019-08-26 15:26:47.614 12 ERROR ceilometer start_time=start_time) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 252, in iter 2019-08-26 15:26:47.614 12 ERROR ceilometer return fut.result() 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result 2019-08-26 15:26:47.614 12 ERROR ceilometer return self.__get_result() 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call 2019-08-26 15:26:47.614 12 ERROR ceilometer result = fn(*args, **kwargs) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/ceilometer/gnocchi_client.py", line 228, in upgrade_resource_types 2019-08-26 15:26:47.614 12 ERROR ceilometer gnocchi.resource_type.get(name=name) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/resource_type.py", line 44, in get 2019-08-26 15:26:47.614 12 ERROR ceilometer headers={'Content-Type': "application/json"}).json() 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/base.py", line 37, in _get 2019-08-26 15:26:47.614 12 ERROR ceilometer return self.client.api.get(*args, **kwargs) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 304, in get 2019-08-26 15:26:47.614 12 ERROR ceilometer return self.request(url, 'GET', **kwargs) 2019-08-26 15:26:47.614 12 ERROR ceilometer File "/usr/lib/python2.7/site-packages/gnocchiclient/client.py", line 52, in request 2019-08-26 15:26:47.614 12 ERROR ceilometer raise exceptions.from_response(resp, method) 2019-08-26 15:26:47.614 12 ERROR ceilometer ClientException: <html><body><h1>504 Gateway Time-out</h1> 2019-08-26 15:26:47.614 12 ERROR ceilometer The server didn't respond in time. 2019-08-26 15:26:47.614 12 ERROR ceilometer </body></html> 2019-08-26 15:26:47.614 12 ERROR ceilometer (HTTP 504) 2019-08-26 15:26:47.614 12 ERROR ceilometer 2019-08-26 15:27:18.572 77 DEBUG ceilometer.cmd.storage [-] Upgrading Gnocchi resource types upgrade /usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py:45 2019-08-26 15:29:19.228 77 CRITICAL ceilometer [-] Unhandled error: ClientException: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> (HTTP 504) Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Slave controller-0 redis-bundle-1 (ocf::heartbeat:redis): Master controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2
so redis seems to be down according to haproxy. This looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1730723, but the version of puppet-tripleo is newer than that. We noticed that the timestamp of haproxy.cfg is more recent than the last start of the haproxy-bundle resource: -rw-r-----. 1 root root 13386 Aug 26 15:10 haproxy.cfg #StartedAt": "2019-08-26T15:06:19.346712501Z" restarting the haproxy-bundle allowed haproxy to pick up the new config file, and the db sync worked afterwards.
we have a test patch that seem to solve this issue.
*** Bug 1737911 has been marked as a duplicate of this bug. ***
*** Bug 1748364 has been marked as a duplicate of this bug. ***
*** Bug 1760882 has been marked as a duplicate of this bug. ***
Verified , (undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 2019-10-23.1(undercloud) [stack@undercloud-0 ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ | dd6d9c11-c980-47d0-b519-ddb9c83fe68b | overcloud | 09d59df841064155ac358df28a99a4e4 | UPDATE_COMPLETE | 2019-10-23T22:29:23Z | 2019-10-24T09:14:04Z | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ PLAY RECAP ********************************************************************* controller-0 : ok=285 changed=125 unreachable=0 failed=0 controller-1 : ok=276 changed=122 unreachable=0 failed=0 controller-2 : ok=276 changed=122 unreachable=0 failed=0 Thursday 24 October 2019 06:45:20 -0400 (0:00:00.068) 1:18:39.737 ****** =============================================================================== Updated nodes - Controller Success (undercloud) [stack@undercloud-0 ~]$ ansible controller -mshell -b -a'docker logs haproxy_restart_bundle|grep locally' [WARNING]: Found both group and host with same name: undercloud controller-0 | SUCCESS | rc=0 >> Thu Oct 24 10:32:24 UTC 2019: Restarting haproxy-bundle locally on 'controller-0' controller-2 | SUCCESS | rc=0 >> Thu Oct 24 09:43:39 UTC 2019: Restarting haproxy-bundle locally on 'controller-2' controller-1 | SUCCESS | rc=0 >> Thu Oct 24 10:06:36 UTC 2019: Restarting haproxy-bundle locally on 'controller-1'
*** Bug 1766144 has been marked as a duplicate of this bug. ***
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3794