Description of problem: Deploying overcloud fails during TASK [Debug output for task Start Containers for step 2] with No such key: \"tripleo::oslo_messaging_rpc::mysql_user (several of these errors stack up and then) Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action verbosity=self.app_args.verbose_level) File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 327, in config_download raise exceptions.DeploymentError("Overcloud configuration failed.") Version-Release number of selected component (if applicable): RHEL8 -> 1830 image puddle: RHOS_TRUNK-15.0-RHEL-8-20190328.n.1 ansible-role-tripleo-modify-image.noarch 1.0.1-0.20190322190304.c0bcc3c.el8ost @rhelosp-15.0-trunk ansible-tripleo-ipsec.noarch 9.0.1-0.20190220162047.f60ad6c.el8ost @rhelosp-15.0-trunk openstack-tripleo-common.noarch 10.6.1-0.20190327210341.25250f0.el8ost @rhelosp-15.0-trunk openstack-tripleo-common-containers.noarch 10.6.1-0.20190327210341.25250f0.el8ost @rhelosp-15.0-trunk openstack-tripleo-heat-templates.noarch 10.4.1-0.20190328020342.b79f438.el8ost @rhelosp-15.0-trunk openstack-tripleo-image-elements.noarch 10.3.1-0.20190325204940.253fe88.el8ost @rhelosp-15.0-trunk openstack-tripleo-puppet-elements.noarch 10.2.1-0.20190327211339.0f6cacb.el8ost @rhelosp-15.0-trunk openstack-tripleo-validations.noarch 10.3.1-0.20190326150349.de9812b.el8ost @rhelosp-15.0-trunk puppet-tripleo.noarch 10.3.1-0.20190327210329.5b176cb.el8ost @rhelosp-15.0-trunk python3-tripleo-common.noarch 10.6.1-0.20190327210341.25250f0.el8ost @rhelosp-15.0-trunk python3-tripleoclient.noarch 11.3.1-0.20190328080340.0132e7d.el8ost @rhelosp-15.0-trunk python3-tripleoclient-heat-installer.noarch 11.3.1-0.20190328080340.0132e7d.el8ost @rhelosp-15.0-trunk How reproducible: Consistent between 4 runs with the attached deploy script and referenced patch/workarounds. Steps to Reproduce: run attached deploy script Actual results: 2019-03-28 17:55:19.385 198105 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud [ admin] Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action verbosity=self.app_args.verbose_level) File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 327, in config_download raise exceptions.DeploymentError("Overcloud configuration failed.") tripleoclient.exceptions.DeploymentError: Overcloud configuration failed. 2019-03-28 17:55:19.386 198105 ERROR openstack [ admin] Overcloud configuration failed. Expected results: Overcloud deploys successfully Additional info:
Created attachment 1549187 [details] deployment shell script to install openstack in virt env
The actual bug is: https://bugs.launchpad.net/tripleo/+bug/1821611, I'll rename this bug.
The bug at this stage is supposed to be fixed, but I've seen it again with John Fulton when he tried to deploy Ceph. I'll keep it open until we are sure it's fixed.
*** Bug 1693426 has been marked as a duplicate of this bug. ***
Did a stack update with increased timeouts but hit the same issue, just after more time. TASK [Start containers for step 5] ********************************************* Thursday 04 April 2019 22:09:29 +0000 (0:00:00.181) 0:39:15.130 ******** ok: [overcloud-controller-2] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false} Timed out waiting for messages from Execution (ID: 3cbcbde8-6e6e-4cbc-b517-1e638b989067, State: RUNNING). The WebSocket timed out before the Workflow completed. Exception occured while running the command Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action verbosity=self.app_args.verbose_level) File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 321, in config_download for payload in base.wait_for_messages(workflow_client, ws, execution): File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/base.py", line 61, in wait_for_messages for payload in websocket.wait_for_messages(timeout=timeout): File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 153, in wait_for_messages message = self.recv() File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 131, in recv return json.loads(self._ws.recv()) File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/websocket/_socket.py", line 81, in recv bytes_ = sock.recv(bufsize) File "/usr/lib64/python3.6/ssl.py", line 953, in recv return self.read(buflen) File "/usr/lib64/python3.6/ssl.py", line 830, in read return self._sslobj.read(len, buffer) File "/usr/lib64/python3.6/ssl.py", line 589, in read v = self._sslobj.read(len) socket.timeout: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 153, in wait_for_messages message = self.recv() File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 131, in recv return json.loads(self._ws.recv()) File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 310, in recv opcode, data = self.recv_data() File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 327, in recv_data opcode, frame = self.recv_data_frame(control_frame) File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 340, in recv_data_frame frame = self.recv_frame() File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 374, in recv_frame return self.frame_buffer.recv_frame() File "/usr/lib/python3.6/site-packages/websocket/_abnf.py", line 361, in recv_frame self.recv_header() File "/usr/lib/python3.6/site-packages/websocket/_abnf.py", line 309, in recv_header header = self.recv_strict(2) File "/usr/lib/python3.6/site-packages/websocket/_abnf.py", line 396, in recv_strict bytes_ = self.recv(min(16384, shortage)) File "/usr/lib/python3.6/site-packages/websocket/_core.py", line 449, in _recv return recv(self.sock, bufsize) File "/usr/lib/python3.6/site-packages/websocket/_socket.py", line 84, in recv raise WebSocketTimeoutException(message) websocket._exceptions.WebSocketTimeoutException: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 30, in run super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run return super(Command, self).run(parsed_args) File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run return_code = self.take_action(parsed_args) or 0 File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 953, in take_action plan=stack.stack_name) File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 419, in set_deployment_status _WORKFLOW_TIMEOUT): File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/base.py", line 61, in wait_for_messages for payload in websocket.wait_for_messages(timeout=timeout): File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 158, in wait_for_messages raise exceptions.WebSocketTimeout() tripleoclient.exceptions.WebSocketTimeout real 246m4.751s user 0m24.449s sys 0m1.520s (undercloud) [stack@undercloud-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ sudo grep -A 4 action_heartbeat /var/lib/config-data/puppet-generated/mistral/etc/mistral/mistral.conf [action_heartbeat] max_missed_heartbeats=30 check_interval=40 first_heartbeat_timeout=8200 (undercloud) [stack@undercloud-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ cat deploy_ceph.sh #!/bin/bash export THT=/usr/share/openstack-tripleo-heat-templates time openstack overcloud deploy \ --timeout 600 \ --templates $THT \ --libvirt-type kvm \ --stack overcloud \ -r /home/stack/composable_roles_ceph/roles/roles_data.yaml \ -e /home/stack/composable_roles_ceph/roles/nodes.yaml \ -e /home/stack/composable_roles_ceph/config_lvm.yaml \ -e $THT/environments/network-isolation.yaml \ -e /home/stack/composable_roles_ceph/network/network-environment.yaml \ -e ~/fencing.yaml \ -e /home/stack/composable_roles_ceph/inject-trust-anchor.yaml \ -e $THT/environments/services/neutron-ovn-ha.yaml \ -e /home/stack/composable_roles_ceph/debug.yaml \ -e /home/stack/composable_roles_ceph/config_heat.yaml \ -e ~/extraconfigpre_env.yaml \ -e ~/containers-prepare-parameter.yaml \ -e /home/stack/composable_roles_ceph/docker-images.yaml \ -e $THT/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/ceph/ceph.yaml \ --log-file overcloud_deployment_ceph.log (undercloud) [stack@undercloud-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+------------------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------------------+--------+------------------------+----------------+------------+ | 033a9b16-d17d-43aa-82a0-ef1a4f5d02ce | overcloud-controller-2 | ACTIVE | ctlplane=192.168.24.9 | overcloud-full | controller | | 23d7d520-8062-4725-a559-7de4352df26f | overcloud-computehci-1 | ACTIVE | ctlplane=192.168.24.11 | overcloud-full | compute | | 37993bf7-748e-442f-9b8a-9059f410ab24 | overcloud-computehci-2 | ACTIVE | ctlplane=192.168.24.21 | overcloud-full | compute | | c995e5de-d0f3-44e1-95ba-fc33ccd10957 | overcloud-controller-1 | ACTIVE | ctlplane=192.168.24.14 | overcloud-full | controller | | f8abee7b-2d27-4c99-83c6-eec04126852d | overcloud-controller-0 | ACTIVE | ctlplane=192.168.24.8 | overcloud-full | controller | | e2389313-143f-4b5b-bea3-5e20cd343037 | overcloud-computehci-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | compute | +--------------------------------------+------------------------+--------+------------------------+----------------+------------+ (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.6 "sudo podman ps" Warning: Permanently added '192.168.24.6' (ECDSA) to the list of known hosts. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8168b97740be quay.io/rhceph-dev/rhceph-4.0-rhel-8:latest /opt/ceph-contain... 14 hours ago Up 14 hours ago ceph-osd-0 (undercloud) [stack@undercloud-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.8 "sudo podman ps" Warning: Permanently added '192.168.24.8' (ECDSA) to the list of known hosts. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4f4a72e483d0 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-gnocchi-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago gnocchi_db_s ync 243a720fec7a brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-proxy-server:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_proxy 4546645d1e2a brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-panko-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago panko_api 0bee35dd76fb brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_metadat a ade0ec1133a9 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_api a98fcffa8668 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-glance-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago glance_api 8ddbe1c06e72 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ovn-controller:latest dumb-init --singl... 14 hours ago Up 14 hours ago ovn_controller f3c3e426c476 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-placement-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_placeme nt 1e279c8eae4e brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-object:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_rsync 2c0589025754 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-object:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_object_updater c9b01fa9ac63 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-object:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_object_server 94b5bf42eaa1 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-object:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_object_replicator 93f33d1e8a7e brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-proxy-server:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_object_expirer 6495d2fbe116 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-object:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_object _auditor eb0d90b16a4f brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-container:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_contai ner_updater 2fffce3e1eca brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-container:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_contai ner_server 2216afd29758 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-container:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_contai ner_replicator e762cf381d0b brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-container:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_container_auditor 578b6071d794 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-account:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_account_server 632004d3be1c brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-account:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_account_replicator f2ac7585b93e brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-account:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_account_reaper d774e640bd0d brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-swift-account:latest dumb-init --singl... 14 hours ago Up 14 hours ago swift_accoun t_auditor 09ae4fd047a2 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-novncproxy:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_vnc_pro xy 1e84cc85635b brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-scheduler:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_schedul er 38d7a2685cc0 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-consoleauth:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_consolauth 6372e24f5b21 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-conductor:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_conductor 261200e7a44d brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago nova_api_cron b6ff6b93c012 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-neutron-server-ovn:latest dumb-init --singl... 14 hours ago Up 14 hours ago neutron_api 3dced2021732 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-cron:latest dumb-init --singl... 14 hours ago Up 14 hours ago logrotate_crond 9a99c602733c brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-heat-engine:latest dumb-init --singl... 14 hours ago Up 14 hours ago heat_engine b21c04625a28 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-heat-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago heat_api_cron 22fbec4a8bb4 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-heat-api-cfn:latest dumb-init --singl... 14 hours ago Up 14 hours ago heat_api_cfn 78e9c4bdf227 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-heat-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago heat_api 4f2187802c9c brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-cinder-scheduler:latest dumb-init --singl... 14 hours ago Up 14 hours ago cinder_scheduler be8dc6bd665b brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-cinder-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago cinder_api_cron 0e89e10b0fe5 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-cinder-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago cinder_api 02a45d30adc3 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ceilometer-notification:latest dumb-init --singl... 14 hours ago Up 14 hours ago ceilometer_agent_notification 39389ff72279 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ceilometer-central:latest dumb-init --singl... 14 hours ago Up 14 hours ago ceilometer_agent_central 530f18e666c8 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-aodh-notifier:latest dumb-init --singl... 14 hours ago Up 14 hours ago aodh_notifier ab33c2c4f363 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-aodh-listener:latest dumb-init --singl... 14 hours ago Up 14 hours ago aodh_listener fc55389aa63f brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-aodh-evaluator:latest dumb-init --singl... 14 hours ago Up 14 hours ago aodh_evaluator bc6e8512fb81 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-aodh-api:latest dumb-init --singl... 14 hours ago Up 14 hours ago aodh_api 6938ada702d9 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-keystone:latest dumb-init --singl... 14 hours ago Up 14 hours ago keystone_cron da18c78e4050 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-keystone:latest dumb-init --singl... 14 hours ago Up 14 hours ago keystone fc955ce80499 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-iscsid:latest dumb-init --singl... 14 hours ago Up 14 hours ago iscsid 50f9e95b0d69 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ovn-northd:latest dumb-init --singl... 14 hours ago Up 14 hours ago ovn-dbs-bundle-podman-0 8b1d922b5230 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-horizon:latest dumb-init --singl... 14 hours ago Up 14 hours ago horizon 47d590edc293 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-haproxy:latest dumb-init --singl... 14 hours ago Up 14 hours ago haproxy-bundle-podman-0 25300f34d18f brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-redis:latest dumb-init --singl... 14 hours ago Up 14 hours ago redis-bundle-podman-0 fa613ee02243 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-rabbitmq:latest dumb-init --singl... 14 hours ago Up 14 hours ago rabbitmq-bundle-podman-0 0433270497c0 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-mariadb:latest dumb-init /bin/ba... 14 hours ago Up 14 hours ago galera-bundle-podman-0 395387fd2ff7 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-mariadb:latest dumb-init kolla_s... 14 hours ago Up 14 hours ago clustercheck f8f178004a4d quay.io/rhceph-dev/rhceph-4.0-rhel-8:latest /opt/ceph-contain... 14 hours ago Up 14 hours ago ceph-mon-overcloud-controller-0 4c4d2dd71e8f quay.io/rhceph-dev/rhceph-4.0-rhel-8:latest /opt/ceph-contain... 14 hours ago Up 14 hours ago ceph-mgr-overcloud-controller-0 d2bde7014e64 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-memcached:latest dumb-init --singl... 14 hours ago Up 14 hours ago memcached (undercloud) [stack@undercloud-0 ~]$
Could the reason the containers on the compute nodes aren't coming back up be a network issue and a different bug? I see the ovn-container died. (undercloud) [stack@undercloud-0 ~]$ for x in $(cat computes); do echo $x; ssh heat-admin@$x "sudo podman ps -a"; done 192.168.24.11 Warning: Permanently added '192.168.24.11' (ECDSA) to the list of known hosts. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES df4b718121f9 quay.io/rhceph-dev/rhceph-4.0-rhel-8:latest /opt/ceph-contain... 14 hours ago Up 14 hours ago ceph-osd-2 29006f4ced33 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ovn-controller:latest /var/lib/containe... 14 hours ago Exited (4) 14 hours ago container-puppet-ovn_controller 192.168.24.21 Warning: Permanently added '192.168.24.21' (ECDSA) to the list of known hosts. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b69da4bdfdf7 quay.io/rhceph-dev/rhceph-4.0-rhel-8:latest /opt/ceph-contain... 14 hours ago Up 14 hours ago ceph-osd-1 a04f03905042 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ovn-controller:latest /var/lib/containe... 14 hours ago Exited (4) 14 hours ago container-puppet-ovn_controller 192.168.24.6 Warning: Permanently added '192.168.24.6' (ECDSA) to the list of known hosts. CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8168b97740be quay.io/rhceph-dev/rhceph-4.0-rhel-8:latest /opt/ceph-contain... 14 hours ago Up 14 hours ago ceph-osd-0 d2cdd1540597 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-ovn-controller:latest /var/lib/containe... 14 hours ago Exited (4) 14 hours ago container-puppet-ovn_controller (undercloud) [stack@undercloud-0 ~]$
I think I'm *still* seeing this timeout issue: (note TZ's : # date Thu Apr 11 16:35:32 IDT 2019 [root@titan56 tmp]# [stack@undercloud-0 ~]$ date Thu Apr 11 09:35:24 EDT 2019 ) (action heartbeat settings: [action_heartbeat] max_missed_heartbeats=40 check_interval=50 first_heartbeat_timeout=9000 ) ============================================================================================================== From ir deploy command script: PLAY [Verify overcloud deployment] ______________________________________ TASK [fail] Thursday 11 April 2019 02:29:28 +0300 (0:00:00.374) 0:50:59.702 ******** fatal: [undercloud-0]: FAILED! => {"changed": false, "msg": "Overcloud deployment failed... :("} _____________________________________ NO MORE HOSTS LEFT to retry, use: --limit @/tmp/RHEL8_test.jDqnJY7O0m/plugins/tripleo-overcloud/main.retry PLAY RECAP hypervisor : ok=2 changed=0 unreachable=0 failed=0 localhost : ok=6 changed=2 unreachable=0 failed=0 undercloud-0 : ok=174 changed=80 unreachable=0 failed=1 _______________________________________________________________ (-7 hours 19:29:30 ) From undercloud-0: openstack task execution list |grep ERROR | 7e0d02b4-2630-4d6f-a1cd-86866769e8a1 | run_ansible | tripleo.deployment.v1.config_download_deploy | | 25884e19-cfbb-4eed-9bf2-103bae94c6c2 | ERROR | Heartbeat wasn't received... | 2019-04-10 22:57:34 | 2019-04-10 23:22:28 | | 095e3224-8785-4cf3-a2bb-ad30b9c8a1fc | get_messages | tripleo.plan_management.v1.publish_ui_logs_to_swift | | a9ead942-c46d-48e8-9d60-27d4852ba164 | ERROR | Heartbeat wasn't received... | 2019-04-10 23:01:12 | 2019-04-10 23:22:28 | from /var/lib/mistral/overcloud/ansible.log: 019-04-10 19:52:35,247 p=601 u=mistral | PLAY [Server Post Deployments] ************************************************* 2019-04-10 19:52:35,312 p=601 u=mistral | TASK [include_tasks] *********************************************************** 2019-04-10 19:52:35,312 p=601 u=mistral | Wednesday 10 April 2019 19:52:35 -0400 (0:00:00.409) 0:54:58.728 ******* 2019-04-10 19:52:35,647 p=601 u=mistral | PLAY [External deployment Post Deploy tasks] *********************************** 2019-04-10 19:52:35,651 p=601 u=mistral | PLAY RECAP ********************************************************************* 2019-04-10 19:52:35,651 p=601 u=mistral | compute-0 : ok=182 changed=79 unreachable=0 failed=0 2019-04-10 19:52:35,651 p=601 u=mistral | compute-1 : ok=182 changed=79 unreachable=0 failed=0 2019-04-10 19:52:35,651 p=601 u=mistral | controller-0 : ok=260 changed=143 unreachable=0 failed=0 2019-04-10 19:52:35,651 p=601 u=mistral | controller-1 : ok=249 changed=140 unreachable=0 failed=0 2019-04-10 19:52:35,651 p=601 u=mistral | controller-2 : ok=249 changed=140 unreachable=0 failed=0 2019-04-10 19:52:35,652 p=601 u=mistral | undercloud : ok=11 changed=7 unreachable=0 failed=0 2019-04-10 19:52:35,652 p=601 u=mistral | Wednesday 10 April 2019 19:52:35 -0400 (0:00:00.340) 0:54:59.068 ******* 2019-04-10 19:52:35,652 p=601 u=mistral | =============================================================================== openstack task execution show 7e0d02b4-2630-4d6f-a1cd-86866769e8a1 +-----------------------+----------------------------------------------+ | Field | Value | +-----------------------+----------------------------------------------+ | ID | 7e0d02b4-2630-4d6f-a1cd-86866769e8a1 | | Name | run_ansible | | Workflow name | tripleo.deployment.v1.config_download_deploy | | Workflow namespace | | | Workflow Execution ID | 25884e19-cfbb-4eed-9bf2-103bae94c6c2 | | State | ERROR | | State info | Heartbeat wasn't received. | | Created at | 2019-04-10 22:57:34 | | Updated at | 2019-04-10 23:22:28 | +-----------------------+----------------------------------------------+ (undercloud) [stack@undercloud-0 ~]$ openstack task execution show 095e3224-8785-4cf3-a2bb-ad30b9c8a1fc +-----------------------+-----------------------------------------------------+ | Field | Value | +-----------------------+-----------------------------------------------------+ | ID | 095e3224-8785-4cf3-a2bb-ad30b9c8a1fc | | Name | get_messages | | Workflow name | tripleo.plan_management.v1.publish_ui_logs_to_swift | | Workflow namespace | | | Workflow Execution ID | a9ead942-c46d-48e8-9d60-27d4852ba164 | | State | ERROR | | State info | Heartbeat wasn't received. | | Created at | 2019-04-10 23:01:12 | | Updated at | 2019-04-10 23:22:28 | +-----------------------+-----------------------------------------------------+ openstack workflow execution show 25884e19-cfbb-4eed-9bf2-103bae94c6c2 +--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ID | 25884e19-cfbb-4eed-9bf2-103bae94c6c2 | | Workflow ID | 5e54a2f6-245e-4de4-a31d-941682e63655 | | Workflow name | tripleo.deployment.v1.config_download_deploy | | Workflow namespace | | | Description | | | Task Execution ID | <none> | | Root Execution ID | <none> | | State | ERROR | | State info | Failed to run task [error=Failed to find workflow [name=tripleo.messaging.v1.send] [namespace=], wf=tripleo.deployment.v1.config_download_deploy, task=send_message]: | | | Traceback (most recent call last): | | | File "/usr/lib/python3.6/site-packages/mistral/engine/task_handler.py", line 63, in run_task | | | task.run() | | | File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper | | | result = f(*args, **kwargs) | | | File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 453, in run | | | self._run_new() | | | File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper | | | result = f(*args, **kwargs) | | | File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 485, in _run_new | | | self._schedule_actions() | | | File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 569, in _schedule_actions | | | timeout=self._get_timeout() | | | File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper | | | result = f(*args, **kwargs) | | | File "/usr/lib/python3.6/site-packages/mistral/engine/actions.py", line 586, in schedule | | | wf_spec_name=self.wf_name | | | File "/usr/lib/python3.6/site-packages/mistral/engine/utils.py", line 91, in resolve_workflow_definition | | | (wf_spec_name, namespace) | | | mistral.exceptions.WorkflowException: Failed to find workflow [name=tripleo.messaging.v1.send] [namespace=] | | | | | Created at | 2019-04-10 22:56:44 | | Updated at | 2019-04-10 23:22:28 | +--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ openstack workflow execution show a9ead942-c46d-48e8-9d60-27d4852ba164 +--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ID | a9ead942-c46d-48e8-9d60-27d4852ba164 | | Workflow ID | 0142824d-29a1-4506-8262-50ae4c961637 | | Workflow name | tripleo.plan_management.v1.publish_ui_logs_to_swift | | Workflow namespace | | | Description | {"description": "Workflow execution created by cron trigger '(edcd71bf-a4f6-441b-acda-ea53ff66b91d)'.", "triggered_by": {"type": "cron_trigger", "id": "edcd71bf-a4f6-441b-acda-ea53ff66b91d", "name": "publish-ui-logs-hourly"}} | | Task Execution ID | <none> | | Root Execution ID | <none> | | State | RUNNING | | State info | None | | Created at | 2019-04-10 23:00:57 | | Updated at | 2019-04-10 23:00:57 |
We're seeing the very same error in an upgraded OSP15 undercloud when trying to run mistral to upgrade the overcloud nodes Operating System. The "overcloud upgrade run" command will fail without any reason (last ansible task has a rc of 0) and when checking the failed mistral actions we get: (undercloud) [stack@undercloud-0 ~]$ openstack task execution list |grep ERROR | 8128e053-74ec-42e3-bd4d-31989f096bc9 | node_update | tripleo.package_update.v1.update_nodes | | 6c289cea-22d7-459 e-8e4f-ab38aeed584c | ERROR | Heartbeat wasn't received... | 2019-05-09 11:25:06 | 2019-05-09 11:30:16 | | ee4aafde-5fa6-40d4-8820-ea96a861d2ce | node_update | tripleo.package_update.v1.update_nodes | | 0389abcf-5cf1-400 c-957d-61350285594d | ERROR | Heartbeat wasn't received... | 2019-05-09 13:07:21 | 2019-05-09 13:12:39 | (undercloud) [stack@undercloud-0 ~]$ openstack task execution show ee4aafde-5fa6-40d4-8820-ea96a861d2ce +-----------------------+----------------------------------------+ | Field | Value | +-----------------------+----------------------------------------+ | ID | ee4aafde-5fa6-40d4-8820-ea96a861d2ce | | Name | node_update | | Workflow name | tripleo.package_update.v1.update_nodes | | Workflow namespace | | | Workflow Execution ID | 0389abcf-5cf1-400c-957d-61350285594d | | State | ERROR | | State info | Heartbeat wasn't received. | | Created at | 2019-05-09 13:07:21 | | Updated at | 2019-05-09 13:12:39 | +-----------------------+----------------------------------------+ (undercloud) [stack@undercloud-0 ~]$ openstack workflow execution show 0389abcf-5cf1-400c-957d-61350285594d +--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------- ----------+ | Field | Value | +--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------- ----------+ | ID | 0389abcf-5cf1-400c-957d-61350285594d | | Workflow ID | c9b0b04a-c4a3-4ebf-9e98-03e8d8e00cde | | Workflow name | tripleo.package_update.v1.update_nodes | | Workflow namespace | | | Description | | | Task Execution ID | <none> | | Root Execution ID | <none> | | State | ERROR | [15/1969] | State info | Failed to run task [error=Failed to find workflow [name=tripleo.messaging.v1.send] [namespace=], wf=tripleo.package_update.v1.update_nodes, task=send_ message]: | | | Traceback (most recent call last): | | | File "/usr/lib/python3.6/site-packages/mistral/engine/task_handler.py", line 63, in run_task | | | task.run() | | | File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper | | | result = f(*args, **kwargs) | | | File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 453, in run | | | self._run_new() | | | File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper | | | result = f(*args, **kwargs) | | | File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 485, in _run_new | | | self._schedule_actions() | | | File "/usr/lib/python3.6/site-packages/mistral/engine/tasks.py", line 569, in _schedule_actions | | | timeout=self._get_timeout() | | | File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper | | | result = f(*args, **kwargs) | | | File "/usr/lib/python3.6/site-packages/mistral/engine/actions.py", line 561, in schedule | | | wf_spec_name=self.wf_name | | | File "/usr/lib/python3.6/site-packages/mistral/engine/utils.py", line 91, in resolve_workflow_definition | | | (wf_spec_name, namespace) | | | mistral.exceptions.WorkflowException: Failed to find workflow [name=tripleo.messaging.v1.send] [namespace=] | | | | | Created at | 2019-05-09 13:07:12 | | Updated at | 2019-05-09 13:25:48 | +--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------- ----------+
Okay -- If I've followed the full list here correctly we've collided two issues in this bug - Originally the case was mistral itself timing out and failing the deploy, the second case was the ceph node deployment hitting the issue of grub-install taking up to 2 minutes per disk attached to the (node) due to the missing bind mount in the IPA image. mistral timeouts covered in https://bugzilla.redhat.com/show_bug.cgi?id=1700044 ceph node grub issues covered in https://bugzilla.redhat.com/show_bug.cgi?id=1691551 *** This bug has been marked as a duplicate of bug 1700044 ***