Bug 1383636

Summary: [ocp-on-osp] Should use official way to scale up nodes
Product: OpenShift Container Platform Reporter: Wenkai Shi <weshi>
Component: InstallerAssignee: Jan Provaznik <jprovazn>
Status: CLOSED CURRENTRELEASE QA Contact: Wenkai Shi <weshi>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, ghuang, jokerman, jprovazn, mmccomas, sbaubeau
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-20 08:39:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Wenkai Shi 2016-10-11 10:07:02 UTC
Description of problem:
Currently the heat stack just run the scale up playbook directly, and not add "new_nodes" group into inventory host file. This is not recommendatory way in openshift-ansible as it just *reinstall* all the cluster. So it's better to use the official way to scale up so that only the new nodes would be changed by openshift-ansible.

Version-Release number of selected component (if applicable):
openshift-on-openstack v0.9.1

How reproducible:
100%

Steps to Reproduce:
1.Create a heat stack
2.Scale up a node
3.

Actual results:
There's no new_nodes group exists in the inventory hosts file when scaling up.

Expected results:
Should use the official way to scale up nodes which is supported by openshift-ansible.
https://docs.openshift.com/container-platform/3.3/install_config/adding_hosts_to_existing_cluster.html#adding-nodes-advanced

Additional info:

Comment 1 Gan Huang 2016-10-11 10:13:01 UTC
This is the summarization of the scale up playbook. There're too many changes about the cluster, this is not safe.

PLAY RECAP *********************************************************************
ghuang7-external-lb-openshift-infra-0.example.com : ok=121  changed=8    unreachable=0    failed=0
ghuang7-external-lb-openshift-infra-1.example.com : ok=121  changed=8    unreachable=0    failed=0
ghuang7-external-lb-openshift-master-0.example.com : ok=126  changed=9    unreachable=0    failed=0
ghuang7-external-lb-openshift-master-1.example.com : ok=121  changed=8    unreachable=0    failed=0
ghuang7-external-lb-openshift-node-qpoog553.example.com : ok=133  changed=44   unreachable=0    failed=0
ghuang7-external-lb-openshift-node-w60bg826.example.com : ok=121  changed=8    unreachable=0    failed=0
localhost                  : ok=48   changed=10   unreachable=0    failed=0

Comment 2 Jan Provaznik 2016-10-17 07:35:07 UTC
Fixed in this PR:
https://github.com/redhat-openstack/openshift-on-openstack/pull/282

Comment 3 Gan Huang 2016-10-19 01:41:04 UTC
Waiting for v0.9.4 which should include this fix.

Comment 4 Jan Provaznik 2016-10-19 14:50:27 UTC
Fixed in 0.9.4 (sorry for confusion, it was switched from MODIFIED to ON_QA automatically)

Comment 5 Gan Huang 2016-10-20 07:52:38 UTC
Verified with 0.9.4

1.new_nodes are added into inventory hosts file when scaling up

#cat /var/lib/ansible/inventory
[OSv3:children]
bastion
masters
nodes
etcd
new_nodes
lb
<--snip-->
[new_nodes]
ghuang-auto1-ocp-node-bd543w26.test.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
<--snip-->

2. the new node was mainly changed during scaling up.
PLAY RECAP *********************************************************************
ghuang-auto1-ocp-infra-0.test.com : ok=13   changed=2    unreachable=0    failed=0
ghuang-auto1-ocp-infra-1.test.com : ok=13   changed=2    unreachable=0    failed=0
ghuang-auto1-ocp-master-0.test.com : ok=25   changed=4    unreachable=0    failed=0
ghuang-auto1-ocp-master-1.test.com : ok=21   changed=2    unreachable=0    failed=0
ghuang-auto1-ocp-master-2.test.com : ok=21   changed=2    unreachable=0    failed=0
ghuang-auto1-ocp-node-46d9dby2.test.com : ok=13   changed=2    unreachable=0    failed=0
ghuang-auto1-ocp-node-bd543w26.test.com : ok=149  changed=47   unreachable=0    failed=0
ghuang-auto1-ocp-node-p03yqbi0.test.com : ok=13   changed=2    unreachable=0    failed=0
loadbalancer               : ok=3    changed=0    unreachable=0    failed=0
localhost                  : ok=49   changed=11   unreachable=0    failed=0

3. existing app and new created app work well after scaling up/down twice