Description of problem: The FFU job form 13 to 16.1 is failing during the last controller overcloud upgrade run step in the following task: 2020-06-19 06:05:34 | TASK [restart pacemaker resource for haproxy] ********************************** 2020-06-19 06:05:34 | Friday 19 June 2020 06:05:33 -0400 (0:00:00.241) 0:01:48.907 *********** 2020-06-19 06:05:34 | skipping: [controller-0] => {"changed": false, "skip_reason": "Conditional result was False"} 2020-06-19 06:05:34 | skipping: [controller-1] => {"changed": false, "skip_reason": "Conditional result was False"} 2020-06-19 06:05:34 | skipping: [controller-2] => {"changed": false, "skip_reason": "Conditional result was False"} 2020-06-19 06:05:34 | 2020-06-19 06:05:34 | TASK [copy certificate from host to container] ********************************* 2020-06-19 06:05:34 | Friday 19 June 2020 06:05:33 -0400 (0:00:00.236) 0:01:49.144 *********** 2020-06-19 06:05:34 | skipping: [controller-2] => {"changed": false, "skip_reason": "Conditional result was False"} 2020-06-19 06:05:34 | fatal: [controller-1]: FAILED! => {"changed": true, "cmd": "podman cp /etc/pki/tls/private/overcloud_endpoint.pem ad9612da1a4c\n0d7665445110:/etc/pki/tls/private/overcloud_endpoint.pem", "delta": "0:00:00.098841", "end": "2020-06-19 10:05:34.085772", "msg": "non-zero return code", "rc": 127, "start": "2020-06-19 10:05:33.986931", "stderr": "Error: invalid arguments /etc/pki/tls/private/overcloud_endpoint.pem, ad9612da1a4c you must use just one container\n/bin/sh: line 1: 0d7665445110:/etc/pki/tls/private/overcloud_endpoint.pem: No such file or directory", "stderr_lines": ["Error: invalid arguments /etc/pki/tls/private/overcloud_endpoint.pem, ad9612da1a4c you must use just one container", "/bin/sh: line 1: 0d7665445110:/etc/pki/tls/private/overcloud_endpoint.pem: No such file or directory"], "stdout": "", "stdout_lines": []} 2020-06-19 06:05:34 | changed: [controller-0] => {"changed": true, "cmd": "podman cp /etc/pki/tls/private/overcloud_endpoint.pem 3b4bafa53b98:/etc/pki/tls/private/overcloud_endpoint.pem", "delta": "0:00:00.473766", "end": "2020-06-19 10:05:34.391347", "rc": 0, "start": "2020-06-19 10:05:33.917581", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} 2020-06-19 06:05:34 | 2020-06-19 06:05:34 | NO MORE HOSTS LEFT ************************************************************* 2020-06-19 06:05:34 | 2020-06-19 06:05:34 | PLAY RECAP ********************************************************************* 2020-06-19 06:05:34 | controller-0 : ok=103 changed=24 unreachable=0 failed=0 skipped=51 rescued=0 ignored=0 2020-06-19 06:05:34 | controller-1 : ok=95 changed=23 unreachable=0 failed=1 skipped=51 rescued=0 ignored=0 2020-06-19 06:05:34 | controller-2 : ok=95 changed=23 unreachable=0 failed=0 skipped=52 rescued=0 ignored=0 2020-06-19 06:05:34 | 2020-06-19 06:05:34 | Friday 19 June 2020 06:05:34 -0400 (0:00:00.791) 0:01:49.935 *********** Log: http://cougar11.scl.lab.tlv.redhat.com/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-vxlan-HA-no-ceph/12/undercloud-0.tar.gz?undercloud-0/home/stack/overcloud_upgrade_run_controller-2,controller-1,controller-0.log Curiously, when upgrading the other two controllers the task is being skipped so it didn't fall before. Job logs: http://cougar11.scl.lab.tlv.redhat.com/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-vxlan-HA-no-ceph/12/ There is a recently merged patch which could be the reason why we didn't observe this failure before: https://review.opendev.org/#/c/724863/ Version-Release number of selected component (if applicable): How reproducible: Run CI job: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-vxlan-HA-no-ceph/ Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Looks like the "get container_id" returns two container ids, instead of the assumed one, a fix should probably iterate on it as a list.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148