Bug 1709004 - Labeled "compute" role badly to new master added after scaleup.yml
Summary: Labeled "compute" role badly to new master added after scaleup.yml
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.9.z
Assignee: Vadim Rutkovsky
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-12 12:39 UTC by Daein Park
Modified: 2019-07-05 06:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the playbooks were incorrectly checking all-in-one case when labeling nodes Consequence: new scaled up masters were mistakenly labelled as compute nodes Fix: node role playbook fixed to check for scale up masters when adding compute label Result: scaled up master no longer has extra compute label
Clone Of:
Environment:
Last Closed: 2019-07-05 06:58:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1642 0 None None None 2019-07-05 06:59:13 UTC

Description Daein Park 2019-05-12 12:39:53 UTC
Description of problem:

When "openshift-master/scaleup.yml" runs for additional masters, always it labeled "compute" role to new master added badly.

The testing evidence for adding "master-3.ocp.example.com" as follows.

* Inventory file
~~~
[OSEv3:children]
masters
nodes
etcd
new_nodes
new_masters

...
[new_nodes]
master-3.ocp.example.com

[new_masters]
mastre-3.ocp.example.com
~~~

* Before running "openshift-master/scaleup.yml"
~~~
# oc get node
NAME                       STATUS    ROLES            AGE       VERSION
infra-1.ocp.example.com    Ready     infra            5h        v1.9.1+a0ce1bc657
master-1.ocp.example.com   Ready     master           5h        v1.9.1+a0ce1bc657
master-2.ocp.example.com   Ready     master           5h        v1.9.1+a0ce1bc657
node-1.ocp.example.com     Ready     compute          5h        v1.9.1+a0ce1bc657
~~~

* After running "openshift-master/scaleup.yml"
~~~
# oc get node
NAME                       STATUS    ROLES            AGE       VERSION
infra-1.ocp.example.com    Ready     infra            5h        v1.9.1+a0ce1bc657
master-1.ocp.example.com   Ready     master           5h        v1.9.1+a0ce1bc657
master-2.ocp.example.com   Ready     master           5h        v1.9.1+a0ce1bc657
master-3.ocp.example.com   Ready     compute,master   4m        v1.9.1+a0ce1bc657
node-1.ocp.example.com     Ready     compute          5h        v1.9.1+a0ce1bc657
~~~


Version-Release number of the following components:
rpm -q atomic-openshift-utils
atomic-openshift-utils-3.9.74-1.git.0.70a0a63.el7.noarch

rpm -q ansible
ansible-2.6.11-1.el7ae.noarch

ansible --version
ansible 2.6.11
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

How reproducible:

If you add one new master node using "openshift-master/scaleup.yml", it always happens.

Steps to Reproduce:
1.
2.
3.

Actual results:
"compute,master" node-role label is added to new master node.

Expected results:
"master" node-role label is added to new master node.

Additional info:

The following task "Label all-in-one master as a compute node" labeled "compute" badly to new master.

~~~
TASK [openshift_manage_node : Label all-in-one master as a compute node] *********************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_manage_node/tasks/set_default_node_role.yml:23
Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_label.py
<10.0.1.10> ESTABLISH SSH CONNECTION FOR USER: quicklab
<10.0.1.10> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=900s -o GSSAPIAuthentication=no -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=quicklab -o ConnectTimeout=10 -o ControlPath=/home/quicklab/.ansible/cp/9ac3eceffa 10.0.1.10 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-kuczpnvvhprzshkcozwtwbuvijoktkbr; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<10.0.1.10> (0, '\n{"invocation": {"module_args": {"kind": "node", "name": "master-3.ocp.example.com", "labels": [{"value": "true", "key": "node-role.kubernetes.io/compute"}], "namespace": null, "kubeconfig": "/etc/origin/master/admin.kubeconfig", "state": "add", "debug": false, "selector": null}}, "state": "add", "changed": true, "results": {"returncode": 0, "cmd": "/bin/oc label node master-3.ocp.example.com node-role.kubernetes.io/compute=true --overwrite", "results": {}}}\n', '')
changed: [10.0.1.9 -> 10.0.1.10] => {
    "changed": true, 
    "failed": false, 
    "invocation": {
        "module_args": {
            "debug": false, 
            "kind": "node", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "labels": [
                {
                    "key": "node-role.kubernetes.io/compute", 
                    "value": "true"
                }
            ], 
            "name": "master-3.ocp.example.com", 
            "namespace": null, 
            "selector": null, 
            "state": "add"
        }
    }, 
    "results": {
        "cmd": "/bin/oc label node master-3.ocp.example.com node-role.kubernetes.io/compute=true --overwrite", 
        "results": {}, 
        "returncode": 0
    }, 
    "state": "add"
}
~~~

Comment 1 Daein Park 2019-05-12 13:04:34 UTC
I've also opened a PR here: https://github.com/openshift/openshift-ansible/pull/11602 for fix.

Comment 2 Vadim Rutkovsky 2019-05-14 14:37:05 UTC
Thanks, PR looks good

Comment 3 Vadim Rutkovsky 2019-06-10 07:27:28 UTC
Fix included in openshift-ansible-3.9.82-1

Comment 4 Weihua Meng 2019-06-12 13:26:12 UTC
Fixed.
openshift-ansible-3.9.82-1.git.0.3c8ce52.el7

$ oc get nodes 
NAME                                STATUS    ROLES            AGE       VERSION
wmengha3913-master-etcd-1           Ready     master           1h        v1.9.1+a0ce1bc657
wmengha3913-master-etcd-2           Ready     master           1h        v1.9.1+a0ce1bc657
wmengha3913-master-etcd-3           Ready     master           1h        v1.9.1+a0ce1bc657
wmengha3913-node-primary-1          Ready     compute          1h        v1.9.1+a0ce1bc657
wmengha3913-node-primary-2          Ready     compute          1h        v1.9.1+a0ce1bc657
wmengha3913-nrri-1                  Ready     infra            1h        v1.9.1+a0ce1bc657
wmengha3913-nrri-2                  Ready     infra            1h        v1.9.1+a0ce1bc657
new-master1                         Ready     master           1h        v1.9.1+a0ce1bc657

Comment 6 errata-xmlrpc 2019-07-05 06:58:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1642


Note You need to log in before you can comment on or make changes to this bug.