Bug 1294748

Summary: Failed to scale out nodes in pre-existing env
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Samuel Munilla <smunilla>
Status: CLOSED ERRATA QA Contact: Ma xiaoqiang <xiama>
Severity: high Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, bleanhar, cryan, gpei, jokerman, mmccomas, smunilla, xtian
Target Milestone: ---Flags: smunilla: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-27 19:43:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gan Huang 2015-12-30 05:59:05 UTC
Description of problem:
Failed to scale out nodes in pre-existing env


Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.0.20-1.git.0.3703f1b.el7aos.noarch

How reproducible:
Always

Steps to Reproduce:
1.Install an env of one master and two nodes by quick install
2.Add a new node to the existing env using atomic-openshft-installer

Actual results:
TASK: [openshift_manage_node | Wait for Node Registration] ******************** 
failed: [10.x.x.158] => (item=openshift_nodes) => {"attempts": 20, "changed": true, "cmd": ["oc", "get", "node", "openshift_nodes"], "delta": "0:00:00.257345", "end": "2015-12-24 18:28:39.257526", "failed": true, "item": "openshift_nodes", "rc": 1, "start": "2015-12-24 18:28:39.000181", "warnings": []}
stderr: Error from server: node "openshift_nodes" not found
msg: Task failed as maximum retries was encountered

FATAL: all hosts have already failed -- aborting

Expected results:
Install successfully

Additional info:
It works when add nodes in pre-existing env that master and node in one host. 
It need new_node group in hosts file seen from playbook when scale out in a multi-nodes env. But QE have'n seen new_node related configuration in anisble inventory after adding a new node.

Comment 1 Samuel Munilla 2016-01-11 14:43:15 UTC
Should be resolved by: https://github.com/openshift/openshift-ansible/pull/1143
which is currently being updated with the given suggestions.

Comment 2 Brenton Leanhardt 2016-01-12 22:03:17 UTC
In the end we decided to go with the approach in Comment #1.  Can you test with the latest in openshift-ansible?  We have some other PRs pending and haven't created a build yet.

Comment 3 Gan Huang 2016-01-13 09:42:19 UTC
Verified with the latest in openshift-ansible. New node group  will be added in the host file, and install successfully. But ansible playbook also has a little change related new_node.
This is the variables in ansible playbook with version 3.0.20-1
- include: ../../common/openshift-cluster/scaleup.yml
  vars:
    g_etcd_group: "{{ 'etcd' }}"
    g_masters_group: "{{ 'masters' }}"
    g_new_nodes_group: "{{ 'new_nodes' }}"
    g_lb_group: "{{ 'lb' }}"
    openshift_cluster_id: "{{ cluster_id | default('default') }}"
    openshift_debug_level: 2
    openshift_deployment_type: "{{ deployment_type }}"

And this is the variables in ansible playbook with latest
---
g_etcd_hosts:   "{{ groups.etcd | default([]) }}"
g_lb_hosts:     "{{ groups.lb | default([]) }}"
g_master_hosts: "{{ groups.masters | default([]) }}"
g_node_hosts:   "{{ groups.nodes | default([]) }}"
g_nfs_hosts:   "{{ groups.nfs | default([]) }}"
g_all_hosts:    "{{ g_master_hosts | union(g_node_hosts) | union(g_etcd_hosts)
                    | union(g_lb_hosts) | default([]) }}"

So it does not take effect on this modification although install successfully.
Maybe we shoud make the relationship of quick-install and ansible-playbook  clear.

Comment 4 Gan Huang 2016-01-14 04:57:18 UTC
Hi, Brenton
As the description in Comment #3. Because the lastest ansible playbook has also been changed aganist adding new node, this commit seems useless if ansible playbook would not be changed anymore about adding new nodes. I want to know which way(new-group or no new-group) will be used eventually so that the bug won't be reproduced.

Comment 6 errata-xmlrpc 2016-01-27 19:43:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0075