Description of problem: Update from osp13 ga to latest with rhel-7.8 fails with that error: 2020-03-02 16:30:47 | TASK [Debug output for task: Start containers for step 2] ********************** 2020-03-02 16:30:47 | Monday 02 March 2020 16:29:58 -0500 (0:08:32.890) 1:45:39.032 ********** 2020-03-02 16:30:47 | fatal: [controller-0]: FAILED! => { .... 2020-03-02 16:30:47 | "Warning: Undefined variable 'deploy_config_name'; ", 2020-03-02 16:30:47 | " (file & line not available)", 2020-03-02 16:30:47 | "Warning: ModuleLoader: module 'pacemaker' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", 2020-03-02 16:30:47 | "error: Could not connect to cluster (is it running?)", 2020-03-02 16:30:47 | "Warning: ModuleLoader: module 'rabbitmq' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", 2020-03-02 16:30:47 | "Error: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]: Could not evaluate: backup_cib: Running: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200302-12-32oa59 failed with code: 1 -> Error: unable to get cib", 2020-03-02 16:30:47 | "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Pacemaker::Property[rabbitmq-role-controller-0]/Pcmk_property[property-controller-0-rabbitmq-role]: Could not evaluate: backup_cib: Running: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200302-12-1w3xng4 failed with code: 1 -> Error: unable to get cib", 2020-03-02 16:30:47 | "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Pacemaker::Property[rabbitmq-role-controller-1]/Pcmk_property[property-controller-1-rabbitmq-role]: Could not evaluate: backup_cib: Running: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200302-12-1mcggh8 failed with code: 1 -> Error: unable to get cib", 2020-03-02 16:30:47 | "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Pacemaker::Property[rabbitmq-role-controller-2]/Pcmk_property[property-controller-2-rabbitmq-role]: Could not evaluate: backup_cib: Running: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20200302-12-11hlqc9 failed with code: 1 -> Error: unable to get cib", 2020-03-02 16:30:47 | "Warning: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Pacemaker::Resource::Bundle[rabbitmq-bundle]/Pcmk_bundle[rabbitmq-bundle]: Skipping because of failed dependencies", 2020-03-02 16:30:47 | "Warning: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Pacemaker::Resource::Ocf[rabbitmq]/Pcmk_resource[rabbitmq]: Skipping because of failed dependencies", 2020-03-02 16:30:47 | "Warning: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]: Skipping because of failed dependencies", 2020-03-02 16:30:47 | "Error: Failed to apply catalog: Command is still failing after 180 seconds expired!", 2020-03-02 16:30:47 | "+ rc=1", 2020-03-02 16:30:47 | "+ set -e", 2020-03-02 16:30:47 | "+ set +ux", 2020-03-02 16:30:47 | "Error running ['docker', 'run', '--name', 'rabbitmq_init_bundle', '--label', 'config_id=tripleo_step2', '--label', 'container_name=rabbitmq_init_bundle', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 1, \"image\": \"192.168.24.1:8787/rh-osbs/rhosp13-openstack-rabbitmq:20200220.1\", \"environment\": [\"TRIPLEO_DEPLOY_IDENTIFIER=1583169594\"], \"command\": [\"/docker_puppet_apply.sh\", \"2\", \"file,file_line,concat,augeas,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation,rabbitmq_policy,rabbitmq_user,rabbitmq_ready\", \"include ::tripleo::profile::base::pacemaker;include ::tripleo::profile::pacemaker::rabbitmq_bundle\", \"--debug\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\",\"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/var/lib/docker-config-scripts/docker_puppet_apply.sh:/docker_puppet_apply.sh:ro\", \"/etc/puppet:/tmp/puppet-etc:ro\", \"/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro\", \"/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro\", \"/dev/shm:/dev/shm:rw\", \"/bin/true:/bin/epmd\"], \"net\": \"host\", \"detach\": false}', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1583169594', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/var/lib/docker-config-scripts/docker_puppet_apply.sh:/docker_puppet_apply.sh:ro', '--volume=/etc/puppet:/tmp/puppet-etc:ro', '--volume=/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro', '--volume=/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro', '--volume=/dev/shm:/dev/shm:rw', '--volume=/bin/true:/bin/epmd', '--cpuset-cpus=0,1,2,3,4,5,6,7', '192.168.24.1:8787/rh-osbs/rhosp13-openstack-rabbitmq:20200220.1', '/docker_puppet_apply.sh', '2', 'file,file_line,concat,augeas,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation,rabbitmq_policy,rabbitmq_user,rabbitmq_ready', 'include ::tripleo::profile::base::pacemaker;include ::tripleo::profile::pacemaker::rabbitmq_bundle', '--debug']. [1]", On the cluster the status is bad: ast updated: Wed Mar 4 14:53:15 2020 Last change: Mon Mar 2 20:59:00 2020 by hacluster via crmd on controller-2 12 nodes configured 38 resources configured Online: [ controller-0 controller-1 controller-2 ] Full list of resources: Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Stopped rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped Docker container set: galera-bundle [192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Stopped galera-bundle-1 (ocf::heartbeat:galera): Stopped galera-bundle-2 (ocf::heartbeat:galera): Stopped Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Stopped redis-bundle-1 (ocf::heartbeat:redis): Stopped redis-bundle-2 (ocf::heartbeat:redis): Stopped ip-192.168.24.11 (ocf::heartbeat:IPaddr2): Stopped ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.1.18 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.1.10 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.3.19 (ocf::heartbeat:IPaddr2): Stopped ip-172.17.4.19 (ocf::heartbeat:IPaddr2): Stopped Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Stopped haproxy-bundle-docker-1 (ocf::heartbeat:docker): Stopped haproxy-bundle-docker-2 (ocf::heartbeat:docker): Stopped Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Stopped Docker container: openstack-cinder-backup [192.168.24.1:8787/rhosp13/openstack-cinder-backup:pcmklatest] openstack-cinder-backup-docker-0 (ocf::heartbeat:docker): Stopped Failed Resource Actions: * rabbitmq-bundle-docker-0_start_0 on controller-0 'unknown error' (1): call=46, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest', last-rc-change='Mon Mar 2 21:14:18 2020', queued=0ms, exec=348ms * rabbitmq-bundle-docker-1_start_0 on controller-0 'unknown error' (1): call=100, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest', last-rc-change='Mon Mar 2 21:14:19 2020', queued=0ms, exec=388ms .... The issue is that there is a change in the repository name. There are multiple re-tag actions: 020-03-02 16:02:18 | TASK [Pull latest Redis images] ************************************************ 2020-03-02 16:02:18 | Monday 02 March 2020 16:01:32 -0500 (0:00:00.785) 1:17:12.592 ********** 2020-03-02 16:02:18 | changed: [controller-0] => {"changed": true, "cmd": ["docker", "pull", "192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis:20200220.1"], "delta": "0:00:04.504547", "end": "2020-03-02 21:01:36.861551", "rc": 0, "start ": "2020-03-02 21:01:32.357004", "stderr": "", "stderr_lines": [], "stdout": "Trying to pull repository 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis ... \n20200220.1: Pulling from 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis\nc 9ff3e9281bc: Already exists\nf897b9608c98: Already exists\n70081c7899d3: Already exists\n1c1b3adaaec4: Pulling fs layer\nc83e69f55d16: Pulling fs layer\n1c1b3adaaec4: Verifying Checksum\n1c1b3adaaec4: Download complete\n1c1b3adaaec4: Pul l complete\nc83e69f55d16: Verifying Checksum\nc83e69f55d16: Download complete\nc83e69f55d16: Pull complete\nDigest: sha256:df9148da34f58bcddbc8ab4dc582653fe333306c0eb12b837836d67295c12888\nStatus: Downloaded newer image for 192.168.24.1: 8787/rh-osbs/rhosp13-openstack-redis:20200220.1", "stdout_lines": ["Trying to pull repository 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis ... ", "20200220.1: Pulling from 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis", "c9ff3e9 281bc: Already exists", "f897b9608c98: Already exists", "70081c7899d3: Already exists", "1c1b3adaaec4: Pulling fs layer", "c83e69f55d16: Pulling fs layer", "1c1b3adaaec4: Verifying Checksum", "1c1b3adaaec4: Download complete", "1c1b3adaa ec4: Pull complete", "c83e69f55d16: Verifying Checksum", "c83e69f55d16: Download complete", "c83e69f55d16: Pull complete", "Digest: sha256:df9148da34f58bcddbc8ab4dc582653fe333306c0eb12b837836d67295c12888", "Status: Downloaded newer image for 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis:20200220.1"]} 2020-03-02 16:02:18 | 2020-03-02 16:02:18 | TASK [Retag pcmklatest to latest Redis image] ********************************** 2020-03-02 16:02:18 | Monday 02 March 2020 16:01:36 -0500 (0:00:04.879) 1:17:17.471 ********** 2020-03-02 16:02:18 | changed: [controller-0] => {"changed": true, "cmd": "docker tag 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis:20200220.1 192.168.24.1:8787/rh-osbs/rhosp13-openstack-redis:pcmklatest", "delta": "0:00:00.031311", "end": "2020-03-02 21:01:37.268249", "rc": 0, "start": "2020-03-02 21:01:37.236938", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} 2020-03-02 16:21:21 | Monday 02 March 2020 16:20:53 -0500 (0:00:15.026) 1:36:34.339 ********** 2020-03-02 16:21:21 | ok: [controller-0] => { .... 020-03-02 16:21:21 | "$ docker run --name mysql_image_tag --label config_id=tripleo_step1 --label container_name=mysql_image_tag --label managed_by=paunch --label config_data={\"start_order\": 2, \"command\": [\"/bin/bash\", \"- c\", \"/usr/bin/docker tag '192.168.24.1:8787/rh-osbs/rhosp13-openstack-mariadb:20200220.1' '192.168.24.1:8787/rh-osbs/rhosp13-openstack-mariadb:pcmklatest'\"], \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/local time:/etc/localtime:ro\", \"/dev/shm:/dev/shm:rw\", \"/etc/sysconfig/docker:/etc/sysconfig/docker:ro\", \"/usr/bin/docker:/usr/bin/docker:ro\", \"/usr/bin/docker-current:/usr/bin/docker-current:ro\", \"/var/run/docker.sock:/var/run/docke r.sock:rw\"], \"image\": \"192.168.24.1:8787/rh-osbs/rhosp13-openstack-mariadb:20200220.1\", \"detach\": false, \"net\": \"host\"} --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volum e=/dev/shm:/dev/shm:rw --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro --volume=/usr/bin/docker:/usr/bin/docker:ro --volume=/usr/bin/docker-current:/usr/bin/docker-current:ro --volume=/var/run/docker.sock:/var/run/docker.sock:rw --cpuset-cpus=0,1,2,3,4,5,6,7 192.168.24.1:8787/rh-osbs/rhosp13-openstack-mariadb:20200220.1 /bin/bash -c /usr/bin/docker tag '192.168.24.1:8787/rh-osbs/rhosp13-openstack-mariadb:20200220.1' '192.168.24.1:8787/rh-osbs/rhosp13-openstack-m ariadb:pcmklatest'" but they point pcmklatest to rh-osbs/rhosp13-openstack instead of 192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest which is expected by the resource. This doesn't happen when updating to rhel-7.7, certainly the repo path doesn't change there. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-8.4.1-42.el7ost.noarch Red Hat Enterprise Linux Server release 7.8 (Maipo) Update from GA to 2020-02-24.2 How reproducible: all the time.
Solved upstream for osp16, need somehow to be backported.
Hi, so the last puddle we can update to is 2020-01-15.3[1]. Starting with 2020-02-10.8[2] we have a new path in the registry that breaks update of HA containers. Note that ci can still be green as pacemaker can recover during update and thus the "breakage" (which need to be formally analysed) can stay unseen in ci. This means that sequence is happening: 1. stop pacemaker on ctl-0; ctl-1,2 are still up and running; 2. update the resource with the new pcmklatest on ctl-0; 3. the change is taken into right away by ctl-1 and ctl-2, they try to pull that new image and fail; 4. at that time all HA services are down but on ctl-0. So at 3. we shouldn't have a cut in api as ctl-0 will take the load, but ctl-1 and ctl-2 will be down. They will recover when we get to update those node, but we loose High availability during the update. They may be other consequences, that need to be further analysed. Thanks, [1] http://rhos-qe-mirror-tlv.usersys.redhat.com/rcm-guest/puddles/OpenStack/13.0-RHEL-7/2020-01-15.3/overcloud_container_image_prepare.yaml [2] http://rhos-qe-mirror-tlv.usersys.redhat.com/rcm-guest/puddles/OpenStack/13.0-RHEL-7/2020-02-10.8/overcloud_container_image_prepare.yaml
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2718