Bug 1775731 - Undercloud update fails during during configuration generation (step1)
Summary: Undercloud update fails during during configuration generation (step1)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 16.0 (Train on RHEL 8.1)
Assignee: Alex Schultz
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-22 16:52 UTC by Sofer Athlan-Guyot
Modified: 2020-02-06 14:43 UTC (History)
3 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191129201420.8343952.el8ost.noarch.rpm
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 14:42:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1853183 0 None None None 2019-11-22 17:21:12 UTC
OpenStack gerrit 695803 0 'None' MERGED Drop file test before removal 2021-01-27 23:55:29 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:43:58 UTC

Description Sofer Athlan-Guyot 2019-11-22 16:52:23 UTC
Description of problem:  Running an osp16 undercloud update from phase1 to RHOS_TRUNK-16.0-RHEL-8-20191120.n.1 fails.

See complete log there https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-16-from-passed_phase1-HA-ipv4/5/

The relevant error seems to be for swift_ringbuilder:

[stack@undercloud-0 ~]$ grep ERROR undercloud_update.log | grep 'Failed running contain'
2019-11-22 15:12:57 |         "2019-11-22 15:12:56,066 ERROR: 446740 -- Failed running container for swift_ringbuilder",

but all podman puppet container are in error.

Comment 1 Alex Schultz 2019-11-22 17:16:07 UTC
So in troubleshooting this, it seems to be failing when trying to cleanup files that were being removed via rsync.  This is related to https://opendev.org/openstack/tripleo-heat-templates/commit/34107c3b1c548552f5c2c5823a57be82937f9cbd which was trying to ensure the files are properly cleaned from the puppet-generated folder.

The issue in this case is that the swift ring builder has some files that get created in etc/swift/backup/. So this code gets a list like:

deleting etc/swift/backups/1574440742.container.ring.gz
deleting etc/swift/backups/1574440742.container.builder
deleting etc/swift/backups/1574440741.object.ring.gz
deleting etc/swift/backups/1574440741.object.builder
deleting etc/swift/backups/1574440741.account.ring.gz
deleting etc/swift/backups/1574440741.account.builder
deleting etc/swift/backups/1574440736.container.builder
deleting etc/swift/backups/1574440736.account.builder
deleting etc/swift/backups/1574440735.object.builder

These lines are outputted to $TMPFILE

            rsync -av -R --dry-run --delete-after $exclude_files $rsync_srcs ${conf_data_path} |\
                awk '/^deleting/ {print $2}' > $TMPFILE

The code then takes these files and tries to make sure they are removed:

            cat $TMPFILE | xargs -n1 -r -I{} \
                bash -c "test -f ${puppet_generated_path}/{} && rm -f ${puppet_generated_path}/{}"

However if the files don't exist, this command actually fails with a 123 causing the task to fail.  Considering this line is trying to remove these files, it likely shouldn't fail if the file is already missing

Comment 2 Alex Schultz 2019-11-22 17:19:45 UTC
[root@undercloud-0 container-puppet]# cat foo 
/does/not/exist
[root@undercloud-0 container-puppet]# cat foo | xargs -n1 -r -I{} bash -c "test -f {} && echo 'hi'"
[root@undercloud-0 container-puppet]# echo $?
123

Comment 6 Ronnie Rasouli 2019-12-04 05:32:34 UTC
The undercloud update failed again. 
Error starting containers which are in use


2019-12-03 20:09:11 |         "<13>Dec  3 20:08:59 puppet-user: Notice: /Stage[main]/Swift::Proxy/Swift_proxy_config[pipeline:main/pipeline]/value: value changed catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk tempurl ratelimit copy container-quotas account-quotas slo dlo versioned_writes proxy-logging proxy-server to catch_errors healthcheck proxy-logging cache ratelimit bulk tempurl formpost authtoken s3api s3token keystone staticweb copy container_quotas account_quotas slo dlo versioned_writes proxy-logging proxy-server",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"keepalived\" is already in use by \"892313d1465566693e604b374b75a953f0bf4e6049a1310f0a4a7d5bde3fafe2\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"memcached\" is already in use by \"faa5ed43b0b65e485dbddeb6949aea6547791c3e5edaf7d13f23dd68265a56db\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"mysql_init_logs\" is already in use by \"9a8ec7c87c646d50a4c28b034d4e4f74e3d77837fd827a09b4b2b2b67be9a30d\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"rabbitmq_init_logs\" is already in use by \"bfaf6fb9d51a08f0738440950a94dc5d342a504eaaa4a35c030cd489fd153e5b\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"haproxy\" is already in use by \"3171adcf80051d24c783bf7d2fbe2580fc14c3580ecbe4b663c5539077de976f\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"rabbitmq_bootstrap\" is already in use by \"3d77f3265f22ece2c0b0b91b473c38beaf44b96b67fa17eb45eadf4df1ff58b3\". You have to remove that container to be able to reuse that name.: that name is already in use",
2019-12-03 20:09:22 |         "Error: error creating container storage: the container name \"rabbitmq\" is already in use by \"d59e5669e7ac4b6660fd9f56a6d755f2c5f263b7a7f390cdb3fb3358861366a6\". You have to remove that container to be able to reuse that name.: that name is already in use"
2019-12-03 20:09:22 |     raise exceptions.DeploymentError('Deployment failed')



2019-12-03 20:03:22 3114 [Warning] Aborted connection 3114 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 2763 [Warning] Aborted connection 2763 to db: 'nova_api' user: 'nova_api' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 2784 [Warning] Aborted connection 2784 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 3021 [Warning] Aborted connection 3021 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 2966 [Warning] Aborted connection 2966 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 3139 [Warning] Aborted connection 3139 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:22 3116 [Warning] Aborted connection 3116 to db: 'ironic' user: 'ironic' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2755 [Warning] Aborted connection 2755 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2756 [Warning] Aborted connection 2756 to db: 'nova' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2752 [Warning] Aborted connection 2752 to db: 'nova_cell0' user: 'nova' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3078 [Warning] Aborted connection 3078 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3076 [Warning] Aborted connection 3076 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3077 [Warning] Aborted connection 3077 to db: 'ovs_neutron' user: 'neutron' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 2749 [Warning] Aborted connection 2749 to db: 'nova_api' user: 'nova_api' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)
2019-12-03 20:03:23 3012 [Warning] Aborted connection 3012 to db: 'heat' user: 'heat' host: 'undercloud-0.redhat.local' (Got an error reading communication packets)

Comment 10 Sofer Athlan-Guyot 2020-01-15 12:15:22 UTC
This doesn't happen anymore with latest puddle, moving to verified.

Comment 12 errata-xmlrpc 2020-02-06 14:42:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.