Bug 1360421
Summary: | rhel-osp-director: Attempted to scale +1 compute after upgrade 8.0->9.0, without "openstack baremetal configure boot" - the setup is in a bad state, can't fix. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> |
Component: | rhosp-director | Assignee: | Brad P. Crochet <brad> |
Status: | CLOSED WONTFIX | QA Contact: | Omri Hochman <ohochman> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 9.0 (Mitaka) | CC: | bnemec, dbecker, jason.dobies, jcoufal, mburns, morazi, rhel-osp-director-maint, tvignaud |
Target Milestone: | ga | Keywords: | Triaged |
Target Release: | 9.0 (Mitaka) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-02 13:21:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexander Chuzhoy
2016-07-26 17:03:26 UTC
On further review, this looks like we are putting the cloud in a bad state and then trying to scale out. Should this get a more similar treatment to: https://bugzilla.redhat.com/show_bug.cgi?id=1356777 re: can we document 'make sure the cloud is in a reasonable state before trying scale, update, or upgrade type operations'? The concerning thing here is that forgetting to run configure boot can leave your cloud in an unrecoverable state (this is also an example of why validation errors should be fatal by default...). It _looks_ to me like this may have triggered a rebuild of all the existing nodes, based on the fact that the previously deployed instances have all gone to error state too (unless the initial deploy failed, in which case we are back to "make sure your cloud is in a consistent state", but it's not clear to me whether that's the case here). So I'm not sure we can call this a doc text-only bug, but it may very well be related to the node rebuild bug Brad is looking into and may be fixed when that one is. Closing this out for 9 as it represents an unlikely case. This can be addressed via a new bug for 10 to handle such CLI interactions. Note that I went ahead and pushed a patch upstream to make this sort of error fatal, so we won't mistakenly try to deploy when the nodes are in a bad state: https://review.openstack.org/349609 Hopefully that will at least help with similar situations in the future. |