Bug 1775731

Summary: Undercloud update fails during during configuration generation (step1)
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: openstack-tripleo-heat-templatesAssignee: Alex Schultz <aschultz>
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.0 (Train)CC: aschultz, jfrancoa, mburns
Target Milestone: rcKeywords: Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191129201420.8343952.el8ost.noarch.rpm Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-06 14:42:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sofer Athlan-Guyot 2019-11-22 16:52:23 UTC
Description of problem:  Running an osp16 undercloud update from phase1 to RHOS_TRUNK-16.0-RHEL-8-20191120.n.1 fails.

See complete log there https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/5/

The relevant error seems to be for swift_ringbuilder:

[stack@undercloud-0 ~]$ grep ERROR undercloud_update.log | grep 'Failed running contain'
2019-11-22 15:12:57 |         "2019-11-22 15:12:56,066 ERROR: 446740 -- Failed running container for swift_ringbuilder",

but all podman puppet container are in error.

Comment 1 Alex Schultz 2019-11-22 17:16:07 UTC
So in troubleshooting this, it seems to be failing when trying to cleanup files that were being removed via rsync.  This is related to https://opendev.org/openstack/tripleo-heat-templates/commit/34107c3b1c548552f5c2c5823a57be82937f9cbd which was trying to ensure the files are properly cleaned from the puppet-generated folder.

The issue in this case is that the swift ring builder has some files that get created in etc/swift/backup/. So this code gets a list like:

deleting etc/swift/backups/1574440742.container.ring.gz
deleting etc/swift/backups/1574440742.container.builder
deleting etc/swift/backups/1574440741.object.ring.gz
deleting etc/swift/backups/1574440741.object.builder
deleting etc/swift/backups/1574440741.account.ring.gz
deleting etc/swift/backups/1574440741.account.builder
deleting etc/swift/backups/1574440736.container.builder
deleting etc/swift/backups/1574440736.account.builder
deleting etc/swift/backups/1574440735.object.builder

These lines are outputted to $TMPFILE

            rsync -av -R --dry-run --delete-after $exclude_files $rsync_srcs ${conf_data_path} |\
                awk '/^deleting/ {print $2}' > $TMPFILE

The code then takes these files and tries to make sure they are removed:

            cat $TMPFILE | xargs -n1 -r -I{} \
                bash -c "test -f ${puppet_generated_path}/{} && rm -f ${puppet_generated_path}/{}"

However if the files don't exist, this command actually fails with a 123 causing the task to fail.  Considering this line is trying to remove these files, it likely shouldn't fail if the file is already missing

Comment 2 Alex Schultz 2019-11-22 17:19:45 UTC
[root@undercloud-0 container-puppet]# cat foo 
/does/not/exist
[root@undercloud-0 container-puppet]# cat foo | xargs -n1 -r -I{} bash -c "test -f {} && echo 'hi'"
[root@undercloud-0 container-puppet]# echo $?
123

Comment 6 Ronnie Rasouli 2019-12-04 05:32:34 UTC
The undercloud update failed again. 
Error starting containers which are in use


2019-12-03 20:09:11 |         "<13>Dec  3 20:08:59 puppet-user: Notice: /Stage[main]/Swift::Proxy/Swift_proxy_config[pipeline:main/pipeline]/value: value changed catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk tempurl ratelimit copy container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server to catch_errors healthcheck proxy-logging cache ratelimit bulk tempurl formpost authtoken s3api s3token keystone staticweb copy container_quotas account_quotas slo dlo versioned_writes proxy-logging proxy-server",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"keepalived\" is already in use by \"892313d1465566693e604b374b75a953f0bf4e6049a1310f0a4a7d5bde3fafe2\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"memcached\" is already in use by \"faa5ed43b0b65e485dbddeb6949aea6547791c3e5edaf7d13f23dd68265a56db\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"mysql_init_logs\" is already in use by \"9a8ec7c87c646d50a4c28b034d4e4f74e3d77837fd827a09b4b2b2b67be9a30d\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"rabbitmq_init_logs\" is already in use by \"bfaf6fb9d51a08f0738440950a94dc5d342a504eaaa4a35c030cd489fd153e5b\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"haproxy\" is already in use by \"3171adcf80051d24c783bf7d2fbe2580fc14c3580ecbe4b663c5539077de976f\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"rabbitmq_bootstrap\" is already in use by \"3d77f3265f22ece2c0b0b91b473c38beaf44b96b67fa17eb45eadf4df1ff58b3\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"rabbitmq\" is already in use by \"d59e5669e7ac4b6660fd9f56a6d755f2c5f263b7a7f390cdb3fb3358861366a6\". You have to remove that container to be able to reuse that name.: that name is already in use"
2019-12-03 20:09:22 |     raise exceptions.DeploymentError('Deployment failed')



2019-12-03 20:03:22 3114 [Warning] Aborted connection 3114 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 2763 [Warning] Aborted connection 2763 to db: 'nova_api' user: 'nova_api' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 2784 [Warning] Aborted connection 2784 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 3021 [Warning] Aborted connection 3021 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 2966 [Warning] Aborted connection 2966 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 3139 [Warning] Aborted connection 3139 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 3116 [Warning] Aborted connection 3116 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2755 [Warning] Aborted connection 2755 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2756 [Warning] Aborted connection 2756 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2752 [Warning] Aborted connection 2752 to db: 'nova_cell0' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3078 [Warning] Aborted connection 3078 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3076 [Warning] Aborted connection 3076 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3077 [Warning] Aborted connection 3077 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2749 [Warning] Aborted connection 2749 to db: 'nova_api' user: 'nova_api' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3012 [Warning] Aborted connection 3012 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)

Comment 10 Sofer Athlan-Guyot 2020-01-15 12:15:22 UTC
This doesn't happen anymore with latest puddle, moving to verified.

Comment 12 errata-xmlrpc 2020-02-06 14:42:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283