Description of problem: Running an osp16 undercloud update from phase1 to RHOS_TRUNK-16.0-RHEL-8-20191120.n.1 fails. See complete log there https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/5/ The relevant error seems to be for swift_ringbuilder: [stack@undercloud-0 ~]$ grep ERROR undercloud_update.log | grep 'Failed running contain' 2019-11-22 15:12:57 | "2019-11-22 15:12:56,066 ERROR: 446740 -- Failed running container for swift_ringbuilder", but all podman puppet container are in error.
So in troubleshooting this, it seems to be failing when trying to cleanup files that were being removed via rsync. This is related to https://opendev.org/openstack/tripleo-heat-templates/commit/34107c3b1c548552f5c2c5823a57be82937f9cbd which was trying to ensure the files are properly cleaned from the puppet-generated folder. The issue in this case is that the swift ring builder has some files that get created in etc/swift/backup/. So this code gets a list like: deleting etc/swift/backups/1574440742.container.ring.gz deleting etc/swift/backups/1574440742.container.builder deleting etc/swift/backups/1574440741.object.ring.gz deleting etc/swift/backups/1574440741.object.builder deleting etc/swift/backups/1574440741.account.ring.gz deleting etc/swift/backups/1574440741.account.builder deleting etc/swift/backups/1574440736.container.builder deleting etc/swift/backups/1574440736.account.builder deleting etc/swift/backups/1574440735.object.builder These lines are outputted to $TMPFILE rsync -av -R --dry-run --delete-after $exclude_files $rsync_srcs ${conf_data_path} |\ awk '/^deleting/ {print $2}' > $TMPFILE The code then takes these files and tries to make sure they are removed: cat $TMPFILE | xargs -n1 -r -I{} \ bash -c "test -f ${puppet_generated_path}/{} && rm -f ${puppet_generated_path}/{}" However if the files don't exist, this command actually fails with a 123 causing the task to fail. Considering this line is trying to remove these files, it likely shouldn't fail if the file is already missing
[root@undercloud-0 container-puppet]# cat foo /does/not/exist [root@undercloud-0 container-puppet]# cat foo | xargs -n1 -r -I{} bash -c "test -f {} && echo 'hi'" [root@undercloud-0 container-puppet]# echo $? 123
The undercloud update failed again. Error starting containers which are in use 2019-12-03 20:09:11 | "<13>Dec 3 20:08:59 puppet-user: Notice: /Stage[main]/Swift::Proxy/Swift_proxy_config[pipeline:main/pipeline]/value: value changed catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk tempurl ratelimit copy container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server to catch_errors healthcheck proxy-logging cache ratelimit bulk tempurl formpost authtoken s3api s3token keystone staticweb copy container_quotas account_quotas slo dlo versioned_writes proxy-logging proxy-server", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"keepalived\" is already in use by \"892313d1465566693e604b374b75a953f0bf4e6049a1310f0a4a7d5bde3fafe2\". You have to remove that container to be able to reuse that name.: that name is already in use", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"memcached\" is already in use by \"faa5ed43b0b65e485dbddeb6949aea6547791c3e5edaf7d13f23dd68265a56db\". You have to remove that container to be able to reuse that name.: that name is already in use", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"mysql_init_logs\" is already in use by \"9a8ec7c87c646d50a4c28b034d4e4f74e3d77837fd827a09b4b2b2b67be9a30d\". You have to remove that container to be able to reuse that name.: that name is already in use", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"rabbitmq_init_logs\" is already in use by \"bfaf6fb9d51a08f0738440950a94dc5d342a504eaaa4a35c030cd489fd153e5b\". You have to remove that container to be able to reuse that name.: that name is already in use", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"haproxy\" is already in use by \"3171adcf80051d24c783bf7d2fbe2580fc14c3580ecbe4b663c5539077de976f\". You have to remove that container to be able to reuse that name.: that name is already in use", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"rabbitmq_bootstrap\" is already in use by \"3d77f3265f22ece2c0b0b91b473c38beaf44b96b67fa17eb45eadf4df1ff58b3\". You have to remove that container to be able to reuse that name.: that name is already in use", 2019-12-03 20:09:22 | "Error: error creating container storage: the container name \"rabbitmq\" is already in use by \"d59e5669e7ac4b6660fd9f56a6d755f2c5f263b7a7f390cdb3fb3358861366a6\". You have to remove that container to be able to reuse that name.: that name is already in use" 2019-12-03 20:09:22 | raise exceptions.DeploymentError('Deployment failed') 2019-12-03 20:03:22 3114 [Warning] Aborted connection 3114 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:22 2763 [Warning] Aborted connection 2763 to db: 'nova_api' user: 'nova_api' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:22 2784 [Warning] Aborted connection 2784 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:22 3021 [Warning] Aborted connection 3021 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:22 2966 [Warning] Aborted connection 2966 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:22 3139 [Warning] Aborted connection 3139 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:22 3116 [Warning] Aborted connection 3116 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 2755 [Warning] Aborted connection 2755 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 2756 [Warning] Aborted connection 2756 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 2752 [Warning] Aborted connection 2752 to db: 'nova_cell0' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 3078 [Warning] Aborted connection 3078 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 3076 [Warning] Aborted connection 3076 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 3077 [Warning] Aborted connection 3077 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 2749 [Warning] Aborted connection 2749 to db: 'nova_api' user: 'nova_api' host: 'undercloud-0.redhat.local' (Got an error reading communication packets) 2019-12-03 20:03:23 3012 [Warning] Aborted connection 3012 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
This doesn't happen anymore with latest puddle, moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283