Bug 1802597
| Summary: | Swift related containers are restarted during converge in the minor update process. | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sofer Athlan-Guyot <sathlang> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Bogdan Dobrelya <bdobreli> |
| Status: | CLOSED ERRATA | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | aschultz, augol, bdobreli, cschwede, emacchi, jpretori, kecarter, lbezdick, mburns, michele, pgrist |
| Target Milestone: | --- | Keywords: | Triaged, ZStream, ZStreamTracked |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-1.20210104205658.el8ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-03-17 15:30:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Sofer Athlan-Guyot
2020-02-13 13:57:22 UTC
note: upstream the standalone-upgrade job does not run converge steps, and the minor updates testing job does not deploy swift. So there is no coverage for the proposed fix upstream. I think in order to test it properly downstream, one should perform a minor update run while adding a new disk to swift, like it is described in https://bugs.launchpad.net/tripleo/+bug/1802066. And to check, if config changes have been applied to swift containers and those *shall* be restarted during minor update. Then execute the minor update once again and check there was *no* additional restarts for swift containers. Hi, (In reply to Bogdan Dobrelya from comment #6) > note: upstream the standalone-upgrade job does not run converge steps, and > the minor updates testing job does not deploy swift. So there is no coverage > for the proposed fix upstream. > > I think in order to test it properly downstream, one should perform a minor > update run while adding a new disk to swift, like it is described in > https://bugs.launchpad.net/tripleo/+bug/1802066. So we can run things during update testing at certain critical time: - {pre/post}_overcloud_update - {pre/post}_overcloud_run - {pre/post}_overcloud_converge Is there one stage that would make more sense than another to check 1802066 ? Could you give a precise sequence of commands (run from the undercloud) that would trigger the necessary change in swift ? > And to check, if config > changes have been applied to swift containers and those *shall* be restarted > during minor update. oki, wouldn't the fact that they are restarted in any case hide issues here ? > Then execute the minor update once again and check there was *no* additional > restarts for swift containers. But then we don't have any tag change, would it really validate it, ie if I currently run an update without changing, it's enough to reproduce the problem described here ? Thanks for looking into this and let's devise the proper test sequence for this. (don't hesitate to put back the need_info flag when you reply to this) I'm not sure about the exact commands, I suggest to do the whole process end-to-end and see how that impacts swift containers?.. Also note, https://review.opendev.org/#/c/719671/ may as well help with improving the idempotency of swift containers. Hi, so I've tested with https://review.opendev.org/#/c/722786/1/common/container-puppet.sh and I still get swift related container restarted during converge. controller-0 | CHANGED | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8c573d26ba30 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_proxy 5ba3c7487386 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_rsync 145798df4196 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_object_updater 7c62a97c3811 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_object_server f777ccfd4ee2 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_object_replicator e0bcfbed9d93 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_object_expirer 4bc07acab009 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_object_auditor 2f97fe2caaf9 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_container_updater 6fd381ded446 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_container_server 5a9ae4b90aa2 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_container_replicator fde921048ff3 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_container_auditor 32d599b597b4 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_account_server b22a0442cd96 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_account_replicator 84839df4e4d6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_account_reaper 5509510135cc undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2 kolla_start 2 hours ago Up 2 hours ago swift_account_auditor a8c0cdaf263f undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:16.1_20200722.2 /bin/bash /usr/lo... 4 days ago Up 4 days ago openstack-cinder-backup-podman-0 91c65eb21de6 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20200722.2 kolla_start 4 days ago Up 4 days ago nova_api_cron 0e493cd112ec undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20200722.2 kolla_start 4 days ago Up 4 days ago nova_metadata e503659d5dee undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20200722.2 kolla_start 4 days ago Up 4 days ago nova_api on all controllers. This happen during converge, so maybe the converge for swift isn't idempotent. I'm moving this to DF, because I feel like if converge was re-run on osp16.1 after converge they should be able to reproduce. I don't think it's update/upgrade related. Tested on: Red Hat OpenStack Platform release 16.1.0 GA (Train) adjusting the flags accordingly. Thanks. Please, could you share some example to change the rings config for UC to make sure if that ends up with proxy et al containers restarted? This needed for final testing of https://review.opendev.org/#/c/713415 Copy-pasting my comment from the review https://review.opendev.org/#/c/713415/3: Unfortunately containers are not restarted after a ring change. Still investigating, but there is something broken here. Here's what I did: used the existing rings, changed the IP of the node to 127.0.0.1 and run an overcloud update. The modified ring was deployed to the node, including a second entry with the correct IP address (so far so good). However, none of the Swift pods were restarted. Commands to reproduce this on the undercloud: sudo yum install -y python-swift swift download overcloud-swift-rings tar xzvf swift-rings.tar.gz swift-ring-builder etc/swift/object.builder set_info d0 127.0.0.1:6000 swift-ring-builder etc/swift/object.builder write_ring tar cvzf swift-rings.tar.gz etc/ swift upload overcloud-swift-rings swift-rings.tar.gz ./overcloud-deploy.sh I also noted that the updated ring was not uploaded again to the undercloud Swift container, which likely breaks the deployment on the next run. The patch itself might be fine actually, but failed due to a regression that was found during testing: https://bugs.launchpad.net/tripleo/+bug/1892674 Will re-test the patch with the fix applied. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0817 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0817 |