Bug 1394145 - Unable to add node to deployment after one node was deleted
Summary: Unable to add node to deployment after one node was deleted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 10.0 (Newton)
Assignee: Brad P. Crochet
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-11 08:02 UTC by Marius Cornea
Modified: 2016-12-14 16:31 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-common-5.4.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:31:38 UTC


Attachments (Terms of Use)
output (245.70 KB, text/plain)
2016-11-11 13:23 UTC, Marius Cornea
no flags Details
fixed - stack env after deploy (37.48 KB, text/plain)
2016-11-11 18:14 UTC, Jiri Stransky
no flags Details
fixed - stack env after scale down (37.59 KB, text/plain)
2016-11-11 18:14 UTC, Jiri Stransky
no flags Details
fixed - stack env after scale up again (37.59 KB, text/plain)
2016-11-11 18:15 UTC, Jiri Stransky
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 396804 None None None 2016-11-12 05:58:31 UTC
Launchpad 1641142 None None None 2016-11-11 15:43:46 UTC

Description Marius Cornea 2016-11-11 08:02:59 UTC
Description of problem:
I'm testing the Ceph node replacement procedure and after deleting the 

Version-Release number of selected component (if applicable):

Note: the testing includes the patch for BZ#1392995

How reproducible:
100$

Steps to Reproduce:
1. Start with a 3 x ctrl, 1 x compute, 3 ceph nodes deployment:

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates \
-e $THT/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/disk-layout.yaml \
--control-scale 3 \
--control-flavor controller \
--compute-scale 1 \
--compute-flavor compute \
--ceph-storage-scale 3 \
--ceph-storage-flavor ceph \
--ntp-server clock.ntp.com  \
--log-file overcloud_deployment.log &> overcloud_install.log

2. Stop one of the Ceph storage nodes

3. Disable and remove from the crush map the OSDs running on the stop node according to the procedure in
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Replacing_Ceph_Storage_Nodes

4. Delete the Ceph node:
source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud node delete --stack overcloud --templates $THT \
-e $THT/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/disk-layout.yaml \
03915d83-6026-4a4f-9e93-a3807c9e0d8e

5. Add a Ceph node back to the deployment by rerunning the initial deploy command which contains 3 Ceph node:

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates \
-e $THT/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e $THT/environments/storage-environment.yaml \
-e ~/templates/disk-layout.yaml \
--control-scale 3 \
--control-flavor controller \
--compute-scale 1 \
--compute-flavor compute \
--ceph-storage-scale 3 \
--ceph-storage-flavor ceph \
--ntp-server clock.ntp.com  \
--log-file overcloud_deployment.log &> overcloud_install.log

Actual results:
Stack gets updated but the deployment contains only 2 Ceph nodes:

[stack@undercloud ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| 15e73f98-d1bb-4ff7-b63a-1f64d18508bd | overcloud-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.14 |
| 5a85a948-8a41-42cb-b825-ef62e3629c04 | overcloud-cephstorage-2 | ACTIVE | -          | Running     | ctlplane=192.168.0.20 |
| 9a62cea5-c724-4a4a-8323-5c8575b802c8 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=192.168.0.21 |
| 907e831e-7fac-4f51-ae06-9e162c0e95a7 | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.0.22 |
| 01174281-925a-47e7-b353-733634906c71 | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| 014fb640-f449-4411-8dd3-b412265df39d | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.0.23 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+

[stack@undercloud ~]$ mistral environment-get overcloud | grep Count
|             |         "ControllerCount": 3,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|             |         "ComputeCount": 1,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|             |         "CephStorageCount": 3,                                                                            

Expected results:
3 Ceph nodes are deployed by the stack instead of 2. 

Additional info:

Comment 2 Marius Cornea 2016-11-11 08:50:30 UTC
It appears that this is not strictly related to the Ceph node replacement scenario.

I tried a different flow and I was able to reproduce the issue:

deploy with 2 compute nodes:
[stack@undercloud-0 ~]$ nova list
+--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                      | Status | Task State | Power State | Networks              |
+--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+
| b847cf75-ddf6-4e8c-aed0-bd5c7eb5e14f | overcloud-compute-0       | ACTIVE | -          | Running     | ctlplane=192.168.0.17 |
| cbb4ced7-86df-4b0a-97df-3032490ba994 | overcloud-compute-1       | ACTIVE | -          | Running     | ctlplane=192.168.0.26 |
| 070dc48a-8b77-475b-a7b3-39652a242327 | overcloud-controller-0    | ACTIVE | -          | Running     | ctlplane=192.168.0.25 |
| 648182c7-cee3-4727-92e7-3c9dd68ca57d | overcloud-controller-1    | ACTIVE | -          | Running     | ctlplane=192.168.0.16 |
| 7e51fda0-d673-4a6b-a987-5ced1be4e286 | overcloud-controller-2    | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| 8043dc79-fe94-4240-8434-acac44f25be2 | overcloud-objectstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| ba1519e9-b3f2-4c21-a5d7-bb06f2883952 | overcloud-serviceapi-0    | ACTIVE | -          | Running     | ctlplane=192.168.0.24 |
| a3ce24bb-7592-4a87-ab10-51f35bfcba2a | overcloud-serviceapi-1    | ACTIVE | -          | Running     | ctlplane=192.168.0.20 |
+--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+

delete one compute node: 
[stack@undercloud-0 ~]$ openstack overcloud node delete --stack overcloud cbb4ced7-86df-4b0a-97df-3032490ba994

rerun the initial deploy command which contains 2 compute nodes:

source ~/stackrc
export THT=/usr/share/openstack-tripleo-heat-templates/

openstack overcloud deploy --templates $THT \
-r ~/openstack_deployment/roles/roles_data_extceph.yaml \
-e $THT/environments/network-isolation.yaml \
-e $THT/environments/network-management.yaml \
-e $THT/environments/storage-environment.yaml \
-e $THT/environments/puppet-ceph-external.yaml \
-e $THT/environments/tls-endpoints-public-ip.yaml \
-e ~/openstack_deployment/environments/nodes.yaml \
-e ~/openstack_deployment/environments/network-environment.yaml \
-e ~/openstack_deployment/environments/disk-layout.yaml \
-e ~/openstack_deployment/environments/neutron-settings.yaml \
-e ~/openstack_deployment/environments/external-ceph.yaml \
-e ~/openstack_deployment/environments/enable-tls.yaml \
-e ~/openstack_deployment/environments/inject-trust-anchor.yaml

openstack_deployment/environments/nodes.yaml 
parameter_defaults:
  ControllerCount: 3
  ComputeCount: 2
  ServiceApiCount: 2
  ObjectStorageCount: 1

  OvercloudControlFlavor: controller-d75f3dec-c770-5f88-9d4c-3fea1bf9c484
  OvercloudComputeFlavor: compute-b634c10a-570f-59ba-bdbf-0c313d745a10
  OvercloudServiceApiFlavor: serviceapi-84179870-b628-5ad5-b79e-da38a9f5e8d6
  OvercloudSwiftStorageFlavor: swift-708a7c03-e751-529d-b4eb-2f2c3378713b


stack gets updated but there is only one compute deployed:

[stack@undercloud-0 ~]$ openstack stack list
nova l+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 4d2cd13b-80b1-4ffc-bacf-1025502a2074 | overcloud  | UPDATE_COMPLETE | 2016-11-10T14:33:59Z | 2016-11-11T08:31:41Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
i[stack@undercloud-0 ~]$ nova list
+--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                      | Status | Task State | Power State | Networks              |
+--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+
| b847cf75-ddf6-4e8c-aed0-bd5c7eb5e14f | overcloud-compute-0       | ACTIVE | -          | Running     | ctlplane=192.168.0.17 |
| 070dc48a-8b77-475b-a7b3-39652a242327 | overcloud-controller-0    | ACTIVE | -          | Running     | ctlplane=192.168.0.25 |
| 648182c7-cee3-4727-92e7-3c9dd68ca57d | overcloud-controller-1    | ACTIVE | -          | Running     | ctlplane=192.168.0.16 |
| 7e51fda0-d673-4a6b-a987-5ced1be4e286 | overcloud-controller-2    | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| 8043dc79-fe94-4240-8434-acac44f25be2 | overcloud-objectstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| ba1519e9-b3f2-4c21-a5d7-bb06f2883952 | overcloud-serviceapi-0    | ACTIVE | -          | Running     | ctlplane=192.168.0.24 |
| a3ce24bb-7592-4a87-ab10-51f35bfcba2a | overcloud-serviceapi-1    | ACTIVE | -          | Running     | ctlplane=192.168.0.20 |
+--------------------------------------+---------------------------+--------+------------+-------------+-----------------------+

Comment 3 James Slagle 2016-11-11 13:14:01 UTC
i wonder if this could be a parameter vs parameter_defaults issue?

can you provide the output of:
openstack stack show overcloud
openstack stack environment show overcloud

Comment 4 Marius Cornea 2016-11-11 13:23:45 UTC
Created attachment 1219793 [details]
output

Comment 6 Dan Prince 2016-11-11 13:56:12 UTC
The scale down code in tripleo-common seems to use 'parameters' instead of 'parameter_defaults'. It does this by generating a set of Count parameters on the fly here I think:

http://git.openstack.org/cgit/openstack/tripleo-common/tree/tripleo_common/actions/scale.py#n99

I'm wondering if a work around might be to simply override the Count manually via an environment like this:

parameters:
  CephStorageCount: 3


And then include that environment on the CLI with -e.

Comment 7 Dan Prince 2016-11-11 13:56:17 UTC
The scale down code in tripleo-common seems to use 'parameters' instead of 'parameter_defaults'. It does this by generating a set of Count parameters on the fly here I think:

http://git.openstack.org/cgit/openstack/tripleo-common/tree/tripleo_common/actions/scale.py#n99

I'm wondering if a work around might be to simply override the Count manually via an environment like this:

parameters:
  CephStorageCount: 3


And then include that environment on the CLI with -e.

Comment 8 James Slagle 2016-11-11 15:01:47 UTC
Using the environment file with a parameters section the next time you scale up is a workaround. But it requires the user carrying that environment file around indefinitely, or at least we come up with a proper fix.

And then when we do have the fix, we'd have to document that you don't have to use the environment file anymore, and take steps to clear out parameters.

So, my impression is that this is something we need to go ahead and fix properly for osp10.

jarda, any input here?

Comment 9 Jaromir Coufal 2016-11-11 15:08:46 UTC
I agree, James.

Comment 10 Jiri Stransky 2016-11-11 18:03:00 UTC
Patch proposed:

https://review.openstack.org/#/c/396712/

Tested by scaling down from 2 computes to 1 and then back to 2.

Comment 11 Jiri Stransky 2016-11-11 18:14:30 UTC
Created attachment 1219833 [details]
fixed - stack env after deploy

Comment 12 Jiri Stransky 2016-11-11 18:14:58 UTC
Created attachment 1219834 [details]
fixed - stack env after scale down

Comment 13 Jiri Stransky 2016-11-11 18:15:38 UTC
Created attachment 1219835 [details]
fixed - stack env after scale up again

Comment 14 Jiri Stransky 2016-11-11 18:16:52 UTC
Added a couple of `openstack stack environment show overcloud` at various points w/ the fix included.

[stack@instack ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| 1c78dd81-5936-42bc-acce-bd94cd8df40a | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
| 0d525d8f-f58b-4e05-9ba6-bb95758c5ee9 | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.13 |
| 1d509463-5679-4ae0-ad24-77ae9c45219c | overcloud-novacompute-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.14 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+


[stack@instack ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time         |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 2438647d-95c6-496d-b0cf-c58d119a026b | overcloud  | UPDATE_COMPLETE | 2016-11-11T16:45:02Z | 2016-11-11T17:37:55Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+

Comment 16 Jon Schlueter 2016-11-12 05:59:41 UTC
upstream master patch merged, was proposed and merged to stable/newton branch

Comment 22 errata-xmlrpc 2016-12-14 16:31:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.