Bug 1777061

Summary: Scale up master fails on - TASK [openshift_ca : Generate the loopback master client config]
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Gaoyun Pei <gpei>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: bleanhar, dhellard, jrosenta, openshift-bugs-escalate, rbergami, rteague
Version: 3.11.0Keywords: Reopened, UpcomingSprint
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-18 14:09:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2019-11-26 20:50:49 UTC
Description of problem:

The scale up playbook for master fails on task:

TASK [openshift_ca : Generate the loopback master client config] *********************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_ca/tasks/main.yml:201
fatal: [10.10.92.199]: FAILED! => {
    "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'master'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_ca/tasks/main.yml': line 201, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    register: openshift_ca_loopback_tmpdir\n  - name: Generate the loopback master client config\n    ^ here\n"
}


Version-Release number of the following components:
rpm -q openshift-ansible:
------------------------
openshift-ansible-3.11.153-2.git.0.ee699b5.el7.noarch

rpm -q ansible:
------------------------
ansible-2.6.20-1.el7ae.noarch

ansible --version:
------------------------
ansible 2.6.20
  config file = /home/quicklab/ansible.cfg
  configured module search path = [u'/home/quicklab/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jun 11 2019, 14:33:56) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

How reproducible:
reproducible on my lab

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 4 Scott Dodson 2020-04-06 18:49:37 UTC
With no active cases associated this is being closed.

Comment 11 Russell Teague 2020-11-09 16:17:27 UTC
The /etc/ansible/facts.d/openshift.fact file should be identical on masters with the exception of the 'no_proxy_etcd_host_ips' item which would have the local host ip first in the list.  As a workaround, the file could be copied from another master.

Comment 16 Gaoyun Pei 2020-11-16 08:45:24 UTC
Verify this issue with openshift-ansible-3.11.318-1.git.0.bccee5b.el7.noarch.rpm.

1. Setup a 3-master 3.11.318 cluster, remove the ansible facts on each master

[root@gpei-311hamaster-etcd-1 ~]# cat /etc/ansible/facts.d/openshift.fact
cat: /etc/ansible/facts.d/openshift.fact: No such file or directory

[root@gpei-311hamaster-etcd-2 ~]# cat /etc/ansible/facts.d/openshift.fact
cat: /etc/ansible/facts.d/openshift.fact: No such file or directory

[root@gpei-311hamaster-etcd-3 ~]# cat /etc/ansible/facts.d/openshift.fact
cat: /etc/ansible/facts.d/openshift.fact: No such file or directory


2. Launch a new VM as the new master host, add the required hosts info into ansible inventory file
[OSEv3:children]
masters
nodes
etcd
lb
nfs
new_masters
new_nodes

[new_masters]
ci-vm-10-0-150-189.hosted.upshift.rdu2.redhat.com openshift_public_hostname=ci-vm-10-0-150-189.hosted.upshift.rdu2.redhat.com

[new_nodes]
ci-vm-10-0-150-189.hosted.upshift.rdu2.redhat.com openshift_public_hostname=ci-vm-10-0-150-189.hosted.upshift.rdu2.redhat.com  openshift_node_group_name='qe-master'


3. Run playbooks/openshift-master/openshift_node_group.yml and playbooks/openshift-master/scaleup.yml to scale-up the new master host. No error happened.

11-16 15:34:55  TASK [openshift_ca : Generate the loopback master client config] ***************
11-16 15:34:56  changed: [ci-vm-10-0-150-189.hosted.upshift.rdu2.redhat.com -> ci-vm-10-0-148-99.hosted.upshift.rdu2.redhat.com] => {"changed": true, "cmd": ["oc", "adm", "create-api-client-config", "--certificate-authority=/etc/origin/master/ca.crt", "--client-dir=/tmp/openshift-ansible-u5j0rW", "--groups=system:masters,system:openshift-master", "--master=https://gpei-311hamaster-etcd-1.int.1116-hkw.qe.rhcloud.com:443", "--public-master=https://gpei-311hamaster-etcd-1.int.1116-hkw.qe.rhcloud.com:443", "--signer-cert=/etc/origin/master/ca.crt", "--signer-key=/etc/origin/master/ca.key", "--signer-serial=/etc/origin/master/ca.serial.txt", "--user=system:openshift-master", "--basename=openshift-master", "--expire-days=730"], "delta": "0:00:00.398165", "end": "2020-11-16 02:34:55.382511", "rc": 0, "start": "2020-11-16 02:34:54.984346", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}


4. After playbook finished check the node status. New master was joined in the cluster.
[root@gpei-311hamaster-etcd-1 ~]# oc get node 
NAME                               STATUS    ROLES     AGE       VERSION
gpei-311ha-newrhel-1               Ready     master    1m        v1.11.0+d4cacc0
gpei-311hamaster-etcd-1            Ready     master    5h        v1.11.0+d4cacc0
gpei-311hamaster-etcd-2            Ready     master    5h        v1.11.0+d4cacc0
gpei-311hamaster-etcd-3            Ready     master    5h        v1.11.0+d4cacc0
gpei-311hanode-1                   Ready     compute   5h        v1.11.0+d4cacc0
gpei-311hanode-2                   Ready     compute   5h        v1.11.0+d4cacc0
gpei-311hanode-registry-router-1   Ready     <none>    5h        v1.11.0+d4cacc0

Comment 23 errata-xmlrpc 2020-11-18 14:09:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.318 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5107