Bug 1816959

Summary: Failed to add a RHEL node to Openshift 4.3 cluster as scale up playbook failed at TASK [openshift_node : Approve node-bootstrapper CSR]
Product: OpenShift Container Platform Reporter: Arnab Ghosh <arghosh>
Component: InstallerAssignee: Joseph Callen <jcallen>
Installer sub component: openshift-ansible QA Contact: Yunfei Jiang <yunjiang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: abhinkum, adahiya, fan-wxa, jcallen, jmalde, mfuruta, redhat-bz, rh-container, vpagar, yanyang, yunjiang
Version: 4.5Keywords: Reopened
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-25 17:19:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1818961, 1824150    
Bug Blocks:    

Description Arnab Ghosh 2020-03-25 09:22:38 UTC
Created attachment 1673347 [details]
Ansible playbook run log

Description of problem:

While trying to add a RHEL worker node, the ansible playbook is failing at below task.

~~~
     TASK [openshift_node : Approve node-bootstrapper CSR] *******************************************************************************
     task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/config.yml:165
     failed: [ip-172-32-61-223.us-east-2.compute.internal -> localhost] (item=ip-172-32-61-223.us-east-2.compute.internal) => {
         "ansible_loop_var": "item",
         "attempts": 6,
         "changed": true,
         "cmd": "count=0; for csr in `oc --config=/home/ec2-user/.kube/config get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/ec2-user/.kube/config describe csr/$csr    | grep \" system:node:ip-172-32-61-223.us-east-2.compute.internal$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/home/ec2-user/.kube/config adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n",
         "delta": "0:00:00.211704",
         "end": "2020-03-02 00:14:47.509885",
         "invocation": {
             "module_args": {
                 "_raw_params": "count=0; for csr in `oc --config=/home/ec2-user/.kube/config get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/ec2-user/.kube/config describe csr/$csr    | grep \" system:node:ip-172-32-61-223.us-east-2.compute.internal$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/home/ec2-user/.kube/config adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n",
                 "_uses_shell": true,
                 "argv": null,
                 "chdir": null,
                 "creates": null,
                 "executable": null,
                 "removes": null,
                 "stdin": null,
                 "stdin_add_newline": true,
                 "strip_empty_ends": true,
                 "warn": true
             }
         },
         "item": "ip-172-32-61-223.us-east-2.compute.internal",
         "msg": "non-zero return code",
         "rc": 1,
         "start": "2020-03-02 00:14:47.298181",
         "stderr": "No resources found in openshift-machine-config-operator namespace.",
         "stderr_lines": [
             "No resources found in openshift-machine-config-operator namespace."
         ],
         "stdout": "",
         "stdout_lines": []
     }
     
     
     PLAY RECAP *******************************************************************************
     ip-172-32-61-223.us-east-2.compute.internal : ok=40   changed=19   unreachable=0    failed=1    skipped=1    rescued=0    ignored=0
     localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0

Version-Release number of the following components:

On Ansible controller node:

~~~
     $ rpm -qa | grep -e openshift -e ansible
     openshift-clients-4.3.2-202002070552.git.1.a322635.el7.x86_64
     ansible-2.8.8-1.el7ae.noarch
     openshift-ansible-4.3.2-202002070552.git.174.36281a2.el7.noarch
~~~


How reproducible:
Always

Steps to Reproduce:
1. Try to add a RHEL worker node to Openshift 4.3 cluster hosted on AWS
2.
3.

Actual results:
Node addition is failing

Expected results:
RHEL worker node should be added to the cluster

Additional info:
Logs attached

Inventory File:

===============================================
[all:vars]
ansible_user=ec2-user
ansible_become=True

openshift_kubeconfig_path="~/.kube/kubeconfig"

[new_workers]
ip-172-32-61-223.us-east-2.compute.internal
================================================

Comment 4 Masaki Furuta ( RH ) 2020-04-09 04:59:05 UTC
Dear Joseph Callen,

I'm sorry to rush you but could you please triage this BZ ?

I'm having a hard time since 2 weeks have passed since we filed those BZ and no news since then..
My TAM customer, NEC has been asking us the reason why we cannot proceed with this bz and where are we now.

I am grateful for your help and triage

Thank you.

BR,
Masaki

Comment 23 errata-xmlrpc 2020-05-11 21:20:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2006

Comment 27 Scott Dodson 2020-08-25 17:19:46 UTC
Arnab, Please open a new bug with the full set of normal bugzilla inputs and debugging information. We do not re-open bugs which were closed with errata status, ever.