Overcloud update fails if adding additional disks to Swift on OSP10. This has been already fixed upstream, but requires a backport for Newton. Can be easily reproduced: deploy using 3 controllers, and then update the deployment to include an additional disk for Swift using a following environment file like this: parameter_defaults: SwiftRawDisks: {"vdb": {}} This fails during ring rebalance then.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
This also applies if the rebalance is sup-optimal; ie the rebalance was executed, but partition distribution could be improved. In that case another rebalance should be done later on. However, if the next rebalance is executed within the minimum rebalance time, it will again return an exit code of 1. All of this should be fixed by the proposed patch.
Hi Christian, The customer have a huge ceph cluster, and according to them a rebalance could take some time there. Note that this is happening on a stack CREATE. Not during an UPDATE. And I have frankly no idea why Ceph would have an influence on the swift stuff: ++++++++++++++++++++++++++++++++++++++++++++ RH: Could you apply the following change to test this? https://review.openstack.org/#/c/472253/1/manifests/ringbuilder/rebalance.pp Apply this change on the controller nodes, in file /etc/puppet/modules/swift/manifests/ringbuilder/rebalance.pp So after a failed deploy with this error message, ssh to the 3 controllers, apply the change, and kick off a stack update. Let's see if that fixed the issue, if it does, I'll get you a hotfix. Customer: Update completed successfully after the change.
Christian, Can we get a hotfix for https://review.openstack.org/#/c/472253/1/manifests/ringbuilder/rebalance.pp The customer can then virt-customize their images and upload the RPM. I can also tell them to virt-customize the code change directly ... I'm not a fan of that, though. Thanks!!! - Andreas
*** Bug 1470789 has been marked as a duplicate of this bug. ***
Workaround until we get the fixed RPMs: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Ugly hack: ====================== https://review.openstack.org/#/c/472253/1/manifests/ringbuilder/rebalance.pp Apply this change on the controller nodes, in file /etc/puppet/modules/swift/manifests/ringbuilder/rebalance.pp So after a failed deploy with this error message, ssh to the 3 controllers, apply the change, and kick off a stack update. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A bit less ugly hack: ======================== On the undercloud, go to the same Directory where overcloud.qcow2 resides. Create file rebalance.pp ~~~ # Swift::Ring::Rebalance # Reblances the specified ring. Assumes that the ring already exists # and is stored at /etc/swift/${name}.builder # # == Parameters # # [*name*] Type of ring to rebalance. The ring file is assumed to be at the path # /etc/swift/${name}.builder # # [*seed*] Optional. Seed value used to seed pythons pseudo-random for ringbuilding. define swift::ringbuilder::rebalance( $seed = undef ) { include ::swift::deps validate_re($name, '^object|container|account$') if $seed { validate_re($seed, '^\d+$') } exec { "rebalance_${name}": command => strip("swift-ring-builder /etc/swift/${name}.builder rebalance ${seed}"), path => ['/usr/bin'], refreshonly => true, before => Anchor['swift::config::end'], returns => [0, 1], } } ~~~ Then, execute: ~~~ virt-customize -a overcloud-full.qcow2 --upload rebalance.pp:/etc/puppet/modules/swift/manifests/ringbuilder/rebalance.pp ~~~ Then, execute: ~~~ source stackrc openstack overcloud image upload --update-existing --image-path . ~~~
There may be another workaround: On the next iteration when I deleted stack and the overcloud plan, I ran swift list to find [stack@devbclu001 ~]$ swift list overcloud-swift-rings Somehow these are being left behind. I deleted them and attempted a fresh deploy [stack@devbclu001 ~]$ swift delete overcloud-swift-rings swift-rings.tar.gz overcloud-swift-rings And the deployment succeeded. I have tested one live migration and it was worked fine. I plan to redeploy once more and then test several migrations. will keep you posted.
the above workaround is only for new deployments after an old deployment was deleted
Hi, If there are old rings they will be updated, and therefore the rebalance will be executed. There is no rebalance if the rings didn't change, but the issue is still there, even if you don't hit it then - it might happen later on. So the fix which I sent to you is still needed. The above workaround (deleting the swift overcloud-swift-rings) does work for new deployments. According to engineering, there is also a BZ for the left-over rings and this is being handled in another bugzilla, but with the provided fix the old rings will be updated. But of course it would be better if they are cleaned before. - Andreas
*** Bug 1468030 has been marked as a duplicate of this bug. ***
*** Bug 1488290 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2654