Bug 1394145
| Summary: | Unable to add node to deployment after one node was deleted | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||||||||
| Component: | openstack-tripleo-common | Assignee: | Brad P. Crochet <brad> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> | ||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 10.0 (Newton) | CC: | agurenko, dbecker, dprince, jcoufal, jschluet, jslagle, jstransk, mburns, mcornea, morazi, rhel-osp-director-maint, slinaber | ||||||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||||||
| Target Release: | 10.0 (Newton) | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | openstack-tripleo-common-5.4.0-2.el7ost | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2016-12-14 16:31:38 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Marius Cornea
2016-11-11 08:02:59 UTC
It appears that this is not strictly related to the Ceph node replacement scenario. I tried a different flow and I was able to reproduce the issue: deploy with 2 compute nodes: [stack@undercloud-0 ~]$ nova list +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | b847cf75-ddf6-4e8c-aed0-bd5c7eb5e14f | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.17 | | cbb4ced7-86df-4b0a-97df-3032490ba994 | overcloud-compute-1 | ACTIVE | - | Running | ctlplane=192.168.0.26 | | 070dc48a-8b77-475b-a7b3-39652a242327 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.25 | | 648182c7-cee3-4727-92e7-3c9dd68ca57d | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.0.16 | | 7e51fda0-d673-4a6b-a987-5ced1be4e286 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.0.11 | | 8043dc79-fe94-4240-8434-acac44f25be2 | overcloud-objectstorage-0 | ACTIVE | - | Running | ctlplane=192.168.0.12 | | ba1519e9-b3f2-4c21-a5d7-bb06f2883952 | overcloud-serviceapi-0 | ACTIVE | - | Running | ctlplane=192.168.0.24 | | a3ce24bb-7592-4a87-ab10-51f35bfcba2a | overcloud-serviceapi-1 | ACTIVE | - | Running | ctlplane=192.168.0.20 | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ delete one compute node: [stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud cbb4ced7-86df-4b0a-97df-3032490ba994 rerun the initial deploy command which contains 2 compute nodes: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -r ~/openstack_deployment/roles/roles_data_extceph.yaml \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/storage-environment.yaml \ -e $THT/environments/puppet-ceph-external.yaml \ -e $THT/environments/tls-endpoints-public-ip.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ -e ~/openstack_deployment/environments/external-ceph.yaml \ -e ~/openstack_deployment/environments/enable-tls.yaml \ -e ~/openstack_deployment/environments/inject-trust-anchor.yaml openstack_deployment/environments/nodes.yaml parameter_defaults: ControllerCount: 3 ComputeCount: 2 ServiceApiCount: 2 ObjectStorageCount: 1 OvercloudControlFlavor: controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 OvercloudComputeFlavor: compute-b634c10a-570f-59ba-bdbf-0c313d745a10 OvercloudServiceApiFlavor: serviceapi-84179870-b628-5ad5-b79e-da38a9f5e8d6 OvercloudSwiftStorageFlavor: swift-708a7c03-e751-529d-b4eb-2f2c3378713b stack gets updated but there is only one compute deployed: [stack@undercloud-0 ~]$ openstack stack list nova l+--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 4d2cd13b-80b1-4ffc-bacf-1025502a2074 | overcloud | UPDATE_COMPLETE | 2016-11-10T14:33:59Z | 2016-11-11T08:31:41Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+ i[stack@undercloud-0 ~]$ nova list +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | b847cf75-ddf6-4e8c-aed0-bd5c7eb5e14f | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.17 | | 070dc48a-8b77-475b-a7b3-39652a242327 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.25 | | 648182c7-cee3-4727-92e7-3c9dd68ca57d | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.0.16 | | 7e51fda0-d673-4a6b-a987-5ced1be4e286 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.0.11 | | 8043dc79-fe94-4240-8434-acac44f25be2 | overcloud-objectstorage-0 | ACTIVE | - | Running | ctlplane=192.168.0.12 | | ba1519e9-b3f2-4c21-a5d7-bb06f2883952 | overcloud-serviceapi-0 | ACTIVE | - | Running | ctlplane=192.168.0.24 | | a3ce24bb-7592-4a87-ab10-51f35bfcba2a | overcloud-serviceapi-1 | ACTIVE | - | Running | ctlplane=192.168.0.20 | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ i wonder if this could be a parameter vs parameter_defaults issue? can you provide the output of: openstack stack show overcloud openstack stack environment show overcloud Created attachment 1219793 [details]
output
The scale down code in tripleo-common seems to use 'parameters' instead of 'parameter_defaults'. It does this by generating a set of Count parameters on the fly here I think: http://git.openstack.org/cgit/openstack/tripleo-common/tree/tripleo_common/actions/scale.py#n99 I'm wondering if a work around might be to simply override the Count manually via an environment like this: parameters: CephStorageCount: 3 And then include that environment on the CLI with -e. The scale down code in tripleo-common seems to use 'parameters' instead of 'parameter_defaults'. It does this by generating a set of Count parameters on the fly here I think: http://git.openstack.org/cgit/openstack/tripleo-common/tree/tripleo_common/actions/scale.py#n99 I'm wondering if a work around might be to simply override the Count manually via an environment like this: parameters: CephStorageCount: 3 And then include that environment on the CLI with -e. Using the environment file with a parameters section the next time you scale up is a workaround. But it requires the user carrying that environment file around indefinitely, or at least we come up with a proper fix. And then when we do have the fix, we'd have to document that you don't have to use the environment file anymore, and take steps to clear out parameters. So, my impression is that this is something we need to go ahead and fix properly for osp10. jarda, any input here? I agree, James. Patch proposed: https://review.openstack.org/#/c/396712/ Tested by scaling down from 2 computes to 1 and then back to 2. Created attachment 1219833 [details]
fixed - stack env after deploy
Created attachment 1219834 [details]
fixed - stack env after scale down
Created attachment 1219835 [details]
fixed - stack env after scale up again
Added a couple of `openstack stack environment show overcloud` at various points w/ the fix included. [stack@instack ~]$ nova list +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | 1c78dd81-5936-42bc-acce-bd94cd8df40a | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.9 | | 0d525d8f-f58b-4e05-9ba6-bb95758c5ee9 | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.0.2.13 | | 1d509463-5679-4ae0-ad24-77ae9c45219c | overcloud-novacompute-2 | ACTIVE | - | Running | ctlplane=192.0.2.14 | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ [stack@instack ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+----------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 2438647d-95c6-496d-b0cf-c58d119a026b | overcloud | UPDATE_COMPLETE | 2016-11-11T16:45:02Z | 2016-11-11T17:37:55Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+ upstream master patch merged, was proposed and merged to stable/newton branch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html |