Bug 1313885
| Summary: | Ability to recover from RHELUnregistrationDeployment if the nodes are gone | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Juran <djuran> |
| Component: | openstack-heat | Assignee: | Dan Macpherson <dmacpher> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Amit Ugol <augol> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.0 (Kilo) | CC: | dmacpher, gchenuet, ipilcher, jason.dobies, jliberma, mburns, mschuppe, nchandek, rhel-osp-director-maint, rlondhe, sbaker, shardy, srevivo, zbitter |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | 8.0 (Liberty) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
Cause: When deleting the overcloud, the RHN unregistration step can hang if there is a problem with the node being deleted.
Consequence: The stack delete will wait until the unregister step times out, which makes the delete appear to be hung.
Fix: The unregistration step is, in Heat terminology, a "software deployment". Deployments wait for a signal from the node before moving out of the "in progress" state.
This signal can be manually sent to the stack. The first step is to determine the ID of the nested stack where the deployment exists:
heat resource-list -n5 overcloud | grep RHELUnregistrationDeployment
There is a column in that output titled "stack_name". This is the value to pass as <nested-stack-name> in the following command:
heat resource-signal <nested-stack-name> RHELUnregistrationDeployment
Result: The resource-signal command will allow Heat to move past the unregistration step and finish the delete.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-02-03 13:52:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Juran
2016-03-02 14:31:33 UTC
Another option is to perform the scale-down steps for any node which was removed outside of heat. Setting signal_transport:NO_SIGNAL is risky as the server may be halted before the unregister has a chance to run. One possible fix which could happen in heat (and be backported) is for the deployment resource to check the nova server exists on DELETE, and if it doesn't then behave like NO_SIGNAL. Will assign to heat just so we can discuss. RFE filed for deployment timeouts: https://bugs.launchpad.net/heat/+bug/1557764 Otherwise, I'm going to flag this as a doctext bug and write up the resource-signal approach. This is fixed in Newton by the linked OpenStack Gerrit review. For RHOS 8 (Liberty) we should first document the workaround in the release notes before considering a backport. Is the workaround the same for OSP 7? Yes. *** Bug 1312989 has been marked as a duplicate of this bug. *** Fixed in OSP 10. Use the linked workaround https://access.redhat.com/site/solutions/2260561 for OSP 7/8/9. *** Bug 1652784 has been marked as a duplicate of this bug. *** |