Bug 1802597 - Swift related containers are restarted during converge in the minor update process.
Summary: Swift related containers are restarted during converge in the minor update pr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Bogdan Dobrelya
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-13 13:57 UTC by Sofer Athlan-Guyot
Modified: 2021-03-17 15:37 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210104205658.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:30:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 770639 0 None MERGED Refresh Swift ring files without restarting containers 2021-02-02 13:24:19 UTC
OpenStack gerrit 770640 0 None MERGED Fix swift containers idempotency 2021-02-02 13:24:19 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:37:48 UTC

Description Sofer Athlan-Guyot 2020-02-13 13:57:22 UTC
Description of problem:

After the update controller step we have:

5d5c24f85a8  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:20200130.1-hotfix  kolla_start           About an hour ago  Up About an hour ago                 swift_proxy
cd3975f4fa29  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           About an hour ago  Up About an hour ago                 swift_rsync
7db65fc7fe14  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           About an hour ago  Up About an hour ago                 swift_object_updater
33dacf287b6f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           About an hour ago  Up About an hour ago                 swift_object_server
a39897621dfe  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           About an hour ago  Up About an hour ago                 swift_object_replicator
650ba1ae8722  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:20200130.1-hotfix  kolla_start           About an hour ago  Up About an hour ago                 swift_object_expirer
278dbbc2eaa7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           About an hour ago  Up About an hour ago                 swift_object_auditor
f70fc21fcc50  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           About an hour ago  Up About an hour ago                 swift_container_updater
8a8f63c85496  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           About an hour ago  Up About an hour ago                 swift_container_server
015dcd90bf9c  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           About an hour ago  Up About an hour ago                 swift_container_replicator
c1363b81b6de  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           About an hour ago  Up About an hour ago                 swift_container_auditor
fda9a944841a  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           About an hour ago  Up About an hour ago                 swift_account_server
e0078d176f9f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           About an hour ago  Up About an hour ago                 swift_account_replicator
189d004e9005  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           About an hour ago  Up About an hour ago                 swift_account_reaper
1cd148e20483  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           About an hour ago  Up About an hour ago                 swift_account_auditor


after the converge we have:

d54c575bbfa3  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:20200130.1-hotfix  kolla_start           4 minutes ago       Up 4 minutes ago                      swift_proxy
1067b7173944  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           4 minutes ago       Up 4 minutes ago                      swift_rsync
0ab5b9cc6899  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           4 minutes ago       Up 4 minutes ago                      swift_object_updater
f228fde33381  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           4 minutes ago       Up 4 minutes ago                      swift_object_server
7ccaa8ece759  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           4 minutes ago       Up 4 minutes ago                      swift_object_replicator
604843bdcf1f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:20200130.1-hotfix  kolla_start           4 minutes ago       Up 4 minutes ago                      swift_object_expirer
1c94a3c146f2  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:20200130.1-hotfix        kolla_start           4 minutes ago       Up 4 minutes ago                      swift_object_auditor
4afe18c0fcc0  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           4 minutes ago       Up 4 minutes ago                      swift_container_updater
425a53836ede  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           4 minutes ago       Up 4 minutes ago                      swift_container_server
7c54b87e5002  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           4 minutes ago       Up 4 minutes ago                      swift_container_replicator
05e7969bbf9d  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:20200130.1-hotfix     kolla_start           5 minutes ago       Up 5 minutes ago                      swift_container_auditor
54454b4049f7  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           5 minutes ago       Up 5 minutes ago                      swift_account_server
278e71915e84  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           5 minutes ago       Up 5 minutes ago                      swift_account_replicator
996b9f6d737b  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           5 minutes ago       Up 5 minutes ago                      swift_account_reaper
6783e24caf4e  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:20200130.1-hotfix       kolla_start           5 minutes ago       Up 5 minutes ago                      swift_account_auditor


This seems like not necessary to reboot them during the converge as well.

Need investigation.

Comment 6 Bogdan Dobrelya 2020-03-18 09:15:35 UTC
note: upstream the standalone-upgrade job does not run converge steps, and the minor updates testing job does not deploy swift. So there is no coverage for the proposed fix upstream.

I think in order to test it properly downstream, one should perform a minor update run while adding a new disk to swift, like it is described in https://bugs.launchpad.net/tripleo/+bug/1802066. And to check, if config changes have been applied to swift containers and those *shall* be restarted during minor update.
Then execute the minor update once again and check there was *no* additional restarts for swift containers.

Comment 7 Sofer Athlan-Guyot 2020-04-15 08:34:59 UTC
Hi,

(In reply to Bogdan Dobrelya from comment #6)
> note: upstream the standalone-upgrade job does not run converge steps, and
> the minor updates testing job does not deploy swift. So there is no coverage
> for the proposed fix upstream.
> 
> I think in order to test it properly downstream, one should perform a minor
> update run while adding a new disk to swift, like it is described in
> https://bugs.launchpad.net/tripleo/+bug/1802066.

So we can run things during update testing at certain critical time:

 - {pre/post}_overcloud_update
 - {pre/post}_overcloud_run
 - {pre/post}_overcloud_converge

Is there one stage that would make more sense than another to check 1802066 ?

Could you give a precise sequence of commands (run from the undercloud) that 
would trigger the necessary change in swift ?

> And to check, if config
> changes have been applied to swift containers and those *shall* be restarted
> during minor update.

oki, wouldn't the fact that they are restarted in any case hide issues here ?

> Then execute the minor update once again and check there was *no* additional
> restarts for swift containers.

But then we don't have any tag change, would it really validate it, ie if I
currently run an update without changing, it's enough to reproduce the problem
described here ?

Thanks for looking into this and let's devise the proper test sequence for this.

(don't hesitate to put back the need_info flag when you reply to this)

Comment 8 Bogdan Dobrelya 2020-04-15 08:51:51 UTC
I'm not sure about the exact commands, I suggest to do the whole process end-to-end and see how that impacts swift containers?..

Also note, https://review.opendev.org/#/c/719671/ may as well help with improving the idempotency of swift containers.

Comment 9 Sofer Athlan-Guyot 2020-07-31 12:36:27 UTC
Hi,

so I've tested with https://review.opendev.org/#/c/722786/1/common/container-puppet.sh and I still get swift related container restarted during converge.

controller-0 | CHANGED | rc=0 >>
CONTAINER ID  IMAGE                                                                                                 COMMAND               CREATED      STATUS          PORTS  NAMES
8c573d26ba30  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.1_20200722.2  kolla_start           2 hours ago  Up 2 hours ago         swift_proxy
5ba3c7487386  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2        kolla_start           2 hours ago  Up 2 hours ago         swift_rsync
145798df4196  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2        kolla_start           2 hours ago  Up 2 hours ago         swift_object_updater
7c62a97c3811  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2        kolla_start           2 hours ago  Up 2 hours ago         swift_object_server
f777ccfd4ee2  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2        kolla_start           2 hours ago  Up 2 hours ago         swift_object_replicator
e0bcfbed9d93  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-proxy-server:16.1_20200722.2  kolla_start           2 hours ago  Up 2 hours ago         swift_object_expirer
4bc07acab009  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-object:16.1_20200722.2        kolla_start           2 hours ago  Up 2 hours ago         swift_object_auditor
2f97fe2caaf9  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2     kolla_start           2 hours ago  Up 2 hours ago         swift_container_updater
6fd381ded446  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2     kolla_start           2 hours ago  Up 2 hours ago         swift_container_server
5a9ae4b90aa2  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2     kolla_start           2 hours ago  Up 2 hours ago         swift_container_replicator
fde921048ff3  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-container:16.1_20200722.2     kolla_start           2 hours ago  Up 2 hours ago         swift_container_auditor
32d599b597b4  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2       kolla_start           2 hours ago  Up 2 hours ago         swift_account_server
b22a0442cd96  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2       kolla_start           2 hours ago  Up 2 hours ago         swift_account_replicator
84839df4e4d6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2       kolla_start           2 hours ago  Up 2 hours ago         swift_account_reaper
5509510135cc  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-swift-account:16.1_20200722.2       kolla_start           2 hours ago  Up 2 hours ago         swift_account_auditor
a8c0cdaf263f  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-backup:16.1_20200722.2       /bin/bash /usr/lo...  4 days ago   Up 4 days ago          openstack-cinder-backup-podman-0
91c65eb21de6  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20200722.2            kolla_start           4 days ago   Up 4 days ago          nova_api_cron
0e493cd112ec  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20200722.2            kolla_start           4 days ago   Up 4 days ago          nova_metadata
e503659d5dee  undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-api:16.1_20200722.2            kolla_start           4 days ago   Up 4 days ago          nova_api


on all controllers.

This happen during converge, so maybe the converge for swift isn't idempotent.

I'm moving this to DF, because I feel like if converge was re-run on osp16.1 after converge they should be able to reproduce.  I don't think it's update/upgrade related.

Tested on:
Red Hat OpenStack Platform release 16.1.0 GA (Train)

adjusting the flags accordingly.

Thanks.

Comment 12 Bogdan Dobrelya 2020-08-13 10:20:45 UTC
Please, could you share some example to change the rings config for UC to make sure if that ends up with proxy et al containers restarted? This needed for final testing of https://review.opendev.org/#/c/713415

Comment 13 Christian Schwede (cschwede) 2020-08-19 12:41:44 UTC
Copy-pasting my comment from the review https://review.opendev.org/#/c/713415/3:

Unfortunately containers are not restarted after a ring change. Still investigating, but there is something broken here.

Here's what I did: used the existing rings, changed the IP of the node to 127.0.0.1 and run an overcloud update. The modified ring was deployed to the node, including a second entry with the correct IP address (so far so good). However, none of the Swift pods were restarted.

Commands to reproduce this on the undercloud:

sudo yum install -y python-swift
swift download overcloud-swift-rings
tar xzvf swift-rings.tar.gz 
swift-ring-builder etc/swift/object.builder set_info d0 127.0.0.1:6000
swift-ring-builder etc/swift/object.builder write_ring
tar cvzf swift-rings.tar.gz etc/
swift upload overcloud-swift-rings swift-rings.tar.gz 
./overcloud-deploy.sh

I also noted that the updated ring was not uploaded again to the undercloud Swift container, which likely breaks the deployment on the next run.

Comment 14 Christian Schwede (cschwede) 2020-08-24 06:17:17 UTC
The patch itself might be fine actually, but failed due to a regression that was found during testing: https://bugs.launchpad.net/tripleo/+bug/1892674

Will re-test the patch with the fix applied.

Comment 25 errata-xmlrpc 2021-03-17 15:30:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817

Comment 26 errata-xmlrpc 2021-03-17 15:37:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.