Bug 1255910
| Summary: | overcloud node delete of one compute node removed all of them | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ben Nemec <bnemec> |
| Component: | rhosp-director | Assignee: | Jan Provaznik <jprovazn> |
| Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 (Kilo) | CC: | dmacpher, mburns, rhel-osp-director-maint, rhosp-bugs-internal, sbaker, shardy, vcojot, yeylon, zbitter |
| Target Milestone: | y1 | Keywords: | TestOnly, Triaged, ZStream |
| Target Release: | 7.0 (Kilo) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-heat-2015.1.1-1.el7ost | Doc Type: | Bug Fix |
| Doc Text: |
When deleting a node in the Overcloud, the Heat stack's ComputeCount parameter calculated the number of nodes. However, Heat did not update parameters if a scale up operation failed. This meant the number of nodes that Heat returned in parameters did not reflect the real number of nodes. This caused problems with the number of nodes deleted on a failed stack. This fix ensures Heat updates the parameters even if a scale operation failed previously. Now the director deletes the requested nodes when running "overcloud node delete" on a stack where scale up operation failed before.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-10-08 12:17:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1258967 | ||
| Bug Blocks: | |||
|
Description
Ben Nemec
2015-08-21 20:44:45 UTC
The problem was most probably caused by having OC stack in inconsistent state when deleting a node. After failed scale up number of nodes in heat stack (ComputeCount) doesn't reflect real number of nodes (ComputeCount wasn't updated because scale up failed). Then when doing node deletion, the inconsistent ComputeCount value is used. A solution is to make sure that OC is in a consistent state before deleting a node (e.g. re-run "openstack overcloud deploy"). Although I'm afraid that in some situations it's not possible to get stack into consistent state, so alternative solution might be allow user specify desired number of nodes when deleting a node. Zane pointed out (thanks!) that this recent upstream patch should solve the inconsistency of stack.parameters if update operation fails: https://review.openstack.org/#/c/215618/ IOW it means that backporting should be sufficient solution - I'm testing this locally now. based on comment 5, switching this bug to the heat component This issue is solved by the Zane's backport patch for BZ 1258967 (https://code.engineering.redhat.com/gerrit/#/c/56834/). Thanks to this patch heat returns stack params from last update operation. How to test: 1) deploy overcloud 2) scale up compute nodes beyond available nodes 3) when scale up operation fails, try delete instances in ERROR state 4) w/o this patch some additional instances would be deleted Setting component back to director, making TestOnly. Already depends on the Heat bug 1258967. *** Bug 1261129 has been marked as a duplicate of this bug. *** It turns out that under certain circumstances BZ 1258967 is not sufficient fix for this issue: - if a user tried to delete a node and this operation failed before, then using ComputeCount parameter for computing new node count is insufficient. An upstream patch which counts new node count from actual nodes in ResourceGroup is here: https://review.openstack.org/226682 I raised a separate BZ, bug 1266102, for the issue in comment #11. Verified with: openstack-heat-2015.1.1-4.el7ost.noarch Thanks jprovazn for reproduction help : +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ | d05f98fc-585b-4c6c-9221-7faf0ed66af1 | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.14 | | 15b0aa8a-c858-4317-932a-eaab124f871f | overcloud-compute-1 | ERROR | - | NOSTATE | | | 5c4b6e52-f3ab-475e-946d-db44ef16d896 | overcloud-compute-2 | ACTIVE | - | Running | ctlplane=192.168.0.15 | | db68c6d5-6ac6-49e4-8b37-59c36800446c | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.13 | +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ openstack overcloud node delete --templates --stack overcloud 15b0aa8a-c858-4317-932a-eaab124f871f [stack@undercloud ~]$ nova list +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ | d05f98fc-585b-4c6c-9221-7faf0ed66af1 | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.14 | | 5c4b6e52-f3ab-475e-946d-db44ef16d896 | overcloud-compute-2 | ACTIVE | - | Running | ctlplane=192.168.0.15 | | db68c6d5-6ac6-49e4-8b37-59c36800446c | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.13 | +--------------------------------------+------------------------+--------+------------+-------------+-----------------------+ [stack@undercloud ~]$ heat stack-list +--------------------------------------+------------+-----------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+-----------------+----------------------+ | 8dbb7631-3b07-4fd8-874a-7a2502b7b018 | overcloud | UPDATE_COMPLETE | 2015-09-22T04:46:27Z | +--------------------------------------+------------+-----------------+----------------------+ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862 |