Description of problem:
In all of our environments we see the control plane fail during the Director deploy. The failures vary from failed controller services to 'Failed Actions'. When these things fail we get a service disruption and have to date just stopped any openstack activity until after the update is complete. Sadly this has in some cases meant the control plane is out of service for a week plus.
Version-Release number of selected component (if applicable):
RHOSP 8
How reproducible:
Repeatedly
Steps to Reproduce:
1. When adding new computes nodes using Director to existing environment
Actual results:
Control plane services, OSP APIs are down
Expected results:
Director should not bring down the control plane services during the compute node additions, this way OSP APIs available all the time.
Additional info:
Walkthrough of services getting down when adding nodes: cluster-during-deploy
sosreports from controllers and compute.