Bug 1538751

Summary: accidental scale down from 96 to 3 computes, failed with error
Product: Red Hat OpenStack Reporter: Andrew Ludwar <aludwar>
Component: rhosp-directorAssignee: Emilien Macchi <emacchi>
Status: CLOSED CURRENTRELEASE QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: dbecker, emacchi, jslagle, mburns, morazi, ohochman, rhel-osp-director-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-02 20:48:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Ludwar 2018-01-25 18:03:18 UTC
Description of problem:

During the course of a minor update to OSP10z6, the operator accidentally ran the overcloud deploy command on 3 nodes, instead of 96, effectively downscaling to 3 nodes. The stack ended up erroring out, but deleted 92 of the 93 nodes selected for deletion. Any help on any possible recoveries would be great. We have 4 live VM's in this environment.

Version-Release number of selected component (if applicable):

OSP10 

How reproducible:

Everytime

Steps to Reproduce:
1. Deploy existing stack of 3 controller, 96 computes.
2. Accidentally run stack deploy with 3 computes specified instead 96, and instead of the stack update.


Actual results:

Accidental scale down failed.

Expected results:

Scale down succeeds, but this was accidental scale down from 96 to 3 computes.


Additional info:

Looking for assistance in recovering/rolling back what we can from a Director perspective. And investigating what we'll be able to recover for data on the computes/overcloud environment with a mostly successful accidental scaledown from 96 to 3 computes.