Description of problem: I'm testing the Ceph node replacement procedure and after deleting the Version-Release number of selected component (if applicable): Note: the testing includes the patch for BZ#1392995 How reproducible: 100$ Steps to Reproduce: 1. Start with a 3 x ctrl, 1 x compute, 3 ceph nodes deployment: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates openstack overcloud deploy --templates \ -e $THT/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/disk-layout.yaml \ --control-scale 3 \ --control-flavor controller \ --compute-scale 1 \ --compute-flavor compute \ --ceph-storage-scale 3 \ --ceph-storage-flavor ceph \ --ntp-server clock.ntp.com \ --log-file overcloud_deployment.log &> overcloud_install.log 2. Stop one of the Ceph storage nodes 3. Disable and remove from the crush map the OSDs running on the stop node according to the procedure in https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Replacing_Ceph_Storage_Nodes 4. Delete the Ceph node: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates openstack overcloud node delete --stack overcloud --templates $THT \ -e $THT/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/disk-layout.yaml \ 03915d83-6026-4a4f-9e93-a3807c9e0d8e 5. Add a Ceph node back to the deployment by rerunning the initial deploy command which contains 3 Ceph node: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates openstack overcloud deploy --templates \ -e $THT/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e $THT/environments/storage-environment.yaml \ -e ~/templates/disk-layout.yaml \ --control-scale 3 \ --control-flavor controller \ --compute-scale 1 \ --compute-flavor compute \ --ceph-storage-scale 3 \ --ceph-storage-flavor ceph \ --ntp-server clock.ntp.com \ --log-file overcloud_deployment.log &> overcloud_install.log Actual results: Stack gets updated but the deployment contains only 2 Ceph nodes: [stack@undercloud ~]$ nova list +--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+ | 15e73f98-d1bb-4ff7-b63a-1f64d18508bd | overcloud-cephstorage-1 | ACTIVE | - | Running | ctlplane=192.168.0.14 | | 5a85a948-8a41-42cb-b825-ef62e3629c04 | overcloud-cephstorage-2 | ACTIVE | - | Running | ctlplane=192.168.0.20 | | 9a62cea5-c724-4a4a-8323-5c8575b802c8 | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.21 | | 907e831e-7fac-4f51-ae06-9e162c0e95a7 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.22 | | 01174281-925a-47e7-b353-733634906c71 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.0.12 | | 014fb640-f449-4411-8dd3-b412265df39d | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.0.23 | +--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+ [stack@undercloud ~]$ mistral environment-get overcloud | grep Count | | "ControllerCount": 3, | | | "ComputeCount": 1, | | | "CephStorageCount": 3, Expected results: 3 Ceph nodes are deployed by the stack instead of 2. Additional info:
It appears that this is not strictly related to the Ceph node replacement scenario. I tried a different flow and I was able to reproduce the issue: deploy with 2 compute nodes: [stack@undercloud-0 ~]$ nova list +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | b847cf75-ddf6-4e8c-aed0-bd5c7eb5e14f | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.17 | | cbb4ced7-86df-4b0a-97df-3032490ba994 | overcloud-compute-1 | ACTIVE | - | Running | ctlplane=192.168.0.26 | | 070dc48a-8b77-475b-a7b3-39652a242327 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.25 | | 648182c7-cee3-4727-92e7-3c9dd68ca57d | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.0.16 | | 7e51fda0-d673-4a6b-a987-5ced1be4e286 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.0.11 | | 8043dc79-fe94-4240-8434-acac44f25be2 | overcloud-objectstorage-0 | ACTIVE | - | Running | ctlplane=192.168.0.12 | | ba1519e9-b3f2-4c21-a5d7-bb06f2883952 | overcloud-serviceapi-0 | ACTIVE | - | Running | ctlplane=192.168.0.24 | | a3ce24bb-7592-4a87-ab10-51f35bfcba2a | overcloud-serviceapi-1 | ACTIVE | - | Running | ctlplane=192.168.0.20 | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ delete one compute node: [stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud cbb4ced7-86df-4b0a-97df-3032490ba994 rerun the initial deploy command which contains 2 compute nodes: source ~/stackrc export THT=/usr/share/openstack-tripleo-heat-templates/ openstack overcloud deploy --templates $THT \ -r ~/openstack_deployment/roles/roles_data_extceph.yaml \ -e $THT/environments/network-isolation.yaml \ -e $THT/environments/network-management.yaml \ -e $THT/environments/storage-environment.yaml \ -e $THT/environments/puppet-ceph-external.yaml \ -e $THT/environments/tls-endpoints-public-ip.yaml \ -e ~/openstack_deployment/environments/nodes.yaml \ -e ~/openstack_deployment/environments/network-environment.yaml \ -e ~/openstack_deployment/environments/disk-layout.yaml \ -e ~/openstack_deployment/environments/neutron-settings.yaml \ -e ~/openstack_deployment/environments/external-ceph.yaml \ -e ~/openstack_deployment/environments/enable-tls.yaml \ -e ~/openstack_deployment/environments/inject-trust-anchor.yaml openstack_deployment/environments/nodes.yaml parameter_defaults: ControllerCount: 3 ComputeCount: 2 ServiceApiCount: 2 ObjectStorageCount: 1 OvercloudControlFlavor: controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484 OvercloudComputeFlavor: compute-b634c10a-570f-59ba-bdbf-0c313d745a10 OvercloudServiceApiFlavor: serviceapi-84179870-b628-5ad5-b79e-da38a9f5e8d6 OvercloudSwiftStorageFlavor: swift-708a7c03-e751-529d-b4eb-2f2c3378713b stack gets updated but there is only one compute deployed: [stack@undercloud-0 ~]$ openstack stack list nova l+--------------------------------------+------------+-----------------+----------------------+----------------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 4d2cd13b-80b1-4ffc-bacf-1025502a2074 | overcloud | UPDATE_COMPLETE | 2016-11-10T14:33:59Z | 2016-11-11T08:31:41Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+ i[stack@undercloud-0 ~]$ nova list +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+ | b847cf75-ddf6-4e8c-aed0-bd5c7eb5e14f | overcloud-compute-0 | ACTIVE | - | Running | ctlplane=192.168.0.17 | | 070dc48a-8b77-475b-a7b3-39652a242327 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.0.25 | | 648182c7-cee3-4727-92e7-3c9dd68ca57d | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.0.16 | | 7e51fda0-d673-4a6b-a987-5ced1be4e286 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.0.11 | | 8043dc79-fe94-4240-8434-acac44f25be2 | overcloud-objectstorage-0 | ACTIVE | - | Running | ctlplane=192.168.0.12 | | ba1519e9-b3f2-4c21-a5d7-bb06f2883952 | overcloud-serviceapi-0 | ACTIVE | - | Running | ctlplane=192.168.0.24 | | a3ce24bb-7592-4a87-ab10-51f35bfcba2a | overcloud-serviceapi-1 | ACTIVE | - | Running | ctlplane=192.168.0.20 | +--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+
i wonder if this could be a parameter vs parameter_defaults issue? can you provide the output of: openstack stack show overcloud openstack stack environment show overcloud
Created attachment 1219793 [details] output
The scale down code in tripleo-common seems to use 'parameters' instead of 'parameter_defaults'. It does this by generating a set of Count parameters on the fly here I think: http://git.openstack.org/cgit/openstack/tripleo-common/tree/tripleo_common/actions/scale.py#n99 I'm wondering if a work around might be to simply override the Count manually via an environment like this: parameters: CephStorageCount: 3 And then include that environment on the CLI with -e.
Using the environment file with a parameters section the next time you scale up is a workaround. But it requires the user carrying that environment file around indefinitely, or at least we come up with a proper fix. And then when we do have the fix, we'd have to document that you don't have to use the environment file anymore, and take steps to clear out parameters. So, my impression is that this is something we need to go ahead and fix properly for osp10. jarda, any input here?
I agree, James.
Patch proposed: https://review.openstack.org/#/c/396712/ Tested by scaling down from 2 computes to 1 and then back to 2.
Created attachment 1219833 [details] fixed - stack env after deploy
Created attachment 1219834 [details] fixed - stack env after scale down
Created attachment 1219835 [details] fixed - stack env after scale up again
Added a couple of `openstack stack environment show overcloud` at various points w/ the fix included. [stack@instack ~]$ nova list +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ | 1c78dd81-5936-42bc-acce-bd94cd8df40a | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.9 | | 0d525d8f-f58b-4e05-9ba6-bb95758c5ee9 | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.0.2.13 | | 1d509463-5679-4ae0-ad24-77ae9c45219c | overcloud-novacompute-2 | ACTIVE | - | Running | ctlplane=192.0.2.14 | +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+ [stack@instack ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+----------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+-----------------+----------------------+----------------------+ | 2438647d-95c6-496d-b0cf-c58d119a026b | overcloud | UPDATE_COMPLETE | 2016-11-11T16:45:02Z | 2016-11-11T17:37:55Z | +--------------------------------------+------------+-----------------+----------------------+----------------------+
upstream master patch merged, was proposed and merged to stable/newton branch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html