Red Hat Bugzilla – Bug 1284669
[Docs] [Director] Deleting instances during overcloud scale down (reduction of computes), causes the instances to stuck in deletion.
Last modified: 2016-03-16 19:59:32 EDT
rhel-osp-director: 8.0 - Deleting instances during overcloud scale down (reduction of computes), causes the instances to stuck in deletion.
Steps to reproduce:
1. Deploy overcloud with several computes.
2. Launch several instances on all computes.
3. Attempt to delete instances on computes being removed.
The deleted instances get stuck:
| 934e05a7-f25e-42df-93f2-75039a84d600 | new_instance-2 | ACTIVE | deleting | Running | Internal=192.168.50.10 |
| cfd03305-a349-4167-83a8-458ba841c673 | new_instance-2 | ACTIVE | deleting | Running | Internal=192.168.50.14 |
| 1a8fa414-0ca7-4531-b3b3-bfac8b13483d | new_instance-3 | ACTIVE | deleting | Running | Internal=192.168.50.11 |
| 705e33ad-f50b-458b-af31-5359e4e4b6f6 | new_instance-3 | ACTIVE | deleting | Running | Internal=192.168.50.13 |
We should either take care of that or not allow removal of instances during scale down.
Based on IRC discussion:
related docs: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html
<sasha> jcoufal: the link you pasted - scroll to the previous ection (7.6)
<sasha> scaling the overcloud
<sasha> jcoufal: so I scaled down the with --compute-scale X
<sasha> where X is n-1
I think scaling down should specifically follow 7.7 section:
Sasha, could you please verify whether you are having this same issue when you use above mentioned documented process?
Assigning to Dan for review.
It seems like I should retitle Section 7.6 so that it specifically deals with Compute and Ceph nodes. maybe also merge Section 7.7 into Section 7.6.
Jarda, what do you think about this plan?
Sasha, Jarda -- just following up on this BZ. Any further changes required to these sections?
So I see this under Important:
"Before removing a Compute node from the Overcloud, migrate the workload from the node to other Compute nodes."
Following this guideline, the issue doesn't reproduce.
Cool. In that case, I'll close this BZ. If further changes are required, please reopen and let me know.