Bug 1469435

Summary: [RFE][Swift] - Implement Swift ring rebalance as an Ansible-based Mistral workflow
Product: Red Hat OpenStack Reporter: Christian Schwede (cschwede) <cschwede>
Component: openstack-tripleo-commonAssignee: Christian Schwede (cschwede) <cschwede>
Status: CLOSED ERRATA QA Contact: Mike Abrams <mabrams>
Severity: unspecified Docs Contact: Kim Nylander <knylande>
Priority: unspecified    
Version: 13.0 (Queens)CC: cschwede, jschluet, mburns, pgrist, scohen, slinaber, tshefi
Target Milestone: Upstream M1Keywords: FutureFeature, Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-8.6.1-3.el7ost Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:31:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469441, 1469452    

Description Christian Schwede (cschwede) 2017-07-11 09:26:51 UTC
As an operator, I want to trigger a Swift ring rebalance without executing a full overcloud update. 

Today rebalances are only executed if there is a change in the devices (IP address, device added etc). It is executed during an overcloud update. This is a huge overhead if one only wants to rebalance the rings. To make things worse, rebalances might need to be executed multiple times - there should be at least an option to automate this.

Some recent changes in Tripleo/Mistral make it possible to execute Ansible playbooks within a workflow. Implementing the rebalance as a workflow has multiple benefits:

1. It can be executed by an operator as a standalone operation on the undercloud
2. It can be executed automatically by Mistral multiple times (eg when there is a bigger rebalance required)

The workflow itself could be nearly the same as today:

1. Download most recent rings from the undercloud
2. Rebalance
3. Upload updated rings to the overcloud

It could be also done in a more lightweight manner, for example the above workflow is executed only on a single node, and once the updated rings are available they will be fetched by all nodes.

There should be an additional sanity check that uses the swift-dispersion-report tool to check if the dispersion already reached a given level (for example 95%) and also ensure that at least one replication pass finished since the last rebalance (using swift-recon data).

Expected outcome
----------------
Operator can trigger a Mistral workflow that rebalances Swift rings in a safe manner and distributes them. 

Work items
----------
1. Add option to swift-ring-builder to limit rebalance operations to a maximum of X percent
2. Make swift-dispersion-report importable
3. Write Ansible playbook & library to execute descibed workflow

Comment 2 Christian Schwede (cschwede) 2017-12-21 17:32:34 UTC
Upstream patch merged, moving to POST.

Comment 18 errata-xmlrpc 2018-06-27 13:31:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086