As an operator, I want to trigger a Swift ring rebalance without executing a full overcloud update. Today rebalances are only executed if there is a change in the devices (IP address, device added etc). It is executed during an overcloud update. This is a huge overhead if one only wants to rebalance the rings. To make things worse, rebalances might need to be executed multiple times - there should be at least an option to automate this. Some recent changes in Tripleo/Mistral make it possible to execute Ansible playbooks within a workflow. Implementing the rebalance as a workflow has multiple benefits: 1. It can be executed by an operator as a standalone operation on the undercloud 2. It can be executed automatically by Mistral multiple times (eg when there is a bigger rebalance required) The workflow itself could be nearly the same as today: 1. Download most recent rings from the undercloud 2. Rebalance 3. Upload updated rings to the overcloud It could be also done in a more lightweight manner, for example the above workflow is executed only on a single node, and once the updated rings are available they will be fetched by all nodes. There should be an additional sanity check that uses the swift-dispersion-report tool to check if the dispersion already reached a given level (for example 95%) and also ensure that at least one replication pass finished since the last rebalance (using swift-recon data). Expected outcome ---------------- Operator can trigger a Mistral workflow that rebalances Swift rings in a safe manner and distributes them. Work items ---------- 1. Add option to swift-ring-builder to limit rebalance operations to a maximum of X percent 2. Make swift-dispersion-report importable 3. Write Ansible playbook & library to execute descibed workflow
Upstream patch merged, moving to POST.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086