Bug 1741432 - Should approve node csr before checking nodes to be ready
Summary: Should approve node csr before checking nodes to be ready
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Jeremiah Stuever
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-15 05:57 UTC by Gaoyun Pei
Modified: 2019-10-16 06:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:35:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 11847 0 'None' closed Bug 1741432: openshift-node to approve node CSRs 2020-11-05 13:49:33 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:36:10 UTC

Description Gaoyun Pei 2019-08-15 05:57:46 UTC
Description of problem:
The RHEL worker scale-up playbook failed at "Wait for nodes to report ready"

TASK [openshift_node : Wait for nodes to report ready] *************************
Thursday 15 August 2019  13:26:21 +0800 (0:00:30.580)       0:06:20.154 ******* 
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
FAILED - RETRYING: Wait for nodes to report ready (35 retries left).
...
FAILED - RETRYING: Wait for nodes to report ready (2 retries left).
FAILED - RETRYING: Wait for nodes to report ready (1 retries left).
failed: [rhel-1.qe-gpei-420815.qe.devcluster.openshift.com -> localhost] (item=rhel-1.qe-gpei-420815.qe.devcluster.openshift.com) => {"ansible_loop_var": "item", "attempts": 36, "changed": true, "cmd": ["oc", "get", "node", "rhel-1.qe-gpei-420815.qe.devcluster.openshift.com", "--config=/home/slave1/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.135821", "end": "2019-08-15 13:29:30.917780", "item": "rhel-1.qe-gpei-420815.qe.devcluster.openshift.com", "msg": "non-zero return code", "rc": 1, "start": "2019-08-15 13:29:30.781959", "stderr": "Error from server (NotFound): nodes \"rhel-1.qe-gpei-420815.qe.devcluster.openshift.com\" not found", "stderr_lines": ["Error from server (NotFound): nodes \"rhel-1.qe-gpei-420815.qe.devcluster.openshift.com\" not found"], "stdout": "", "stdout_lines": []}


After approve the pending nodes csr manually, the RHEL nodes are available.


Version-Release number of the following components:
openshift-ansible-4.2.0-201908142219.git.188.7254b39.el7

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Jeremiah Stuever 2019-08-20 21:50:58 UTC
Nodes are approved automatically in my initial testing. I expect this is because they were rolled out using machine sets. Need to test how this looks when nodes are created outside of the cluster.

Comment 2 Jeremiah Stuever 2019-08-20 23:43:15 UTC
Confirmed, node certificates are not automatically approved if the node is not created by the cluster. This causes the playbook to fail at "Wait for nodes to report ready"... introduced in #11808.

https://github.com/openshift/openshift-ansible/pull/11808

Comment 4 Gaoyun Pei 2019-08-26 05:12:23 UTC
Verify this bug with openshift-ansible-4.2.0-201908251340.git.198.9c1655b.el7


TASK [openshift_node : Approve node-bootstrapper CSR] **************************
Monday 26 August 2019  11:42:36 +0800 (0:01:09.436)       0:04:20.220 ********* 
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-51-169.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 1, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig describe csr/$csr    | grep \" system:node:ip-10-0-51-169.us-east-2.compute.internal$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:02.862661", "end": "2019-08-26 11:42:39.206067", "item": "ip-10-0-51-169.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:36.343406", "stderr": "", "stderr_lines": [], "stdout": "         Common Name:    system:node:ip-10-0-51-169.us-east-2.compute.internal\ncertificatesigningrequest.certificates.k8s.io/csr-6hmxb approved", "stdout_lines": ["         Common Name:    system:node:ip-10-0-51-169.us-east-2.compute.internal", "certificatesigningrequest.certificates.k8s.io/csr-6hmxb approved"]}
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-50-20.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 1, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig describe csr/$csr    | grep \" system:node:ip-10-0-50-20.us-east-2.compute.internal$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:02.756084", "end": "2019-08-26 11:42:42.115446", "item": "ip-10-0-50-20.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:39.359362", "stderr": "", "stderr_lines": [], "stdout": "         Common Name:    system:node:ip-10-0-50-20.us-east-2.compute.internal\ncertificatesigningrequest.certificates.k8s.io/csr-rx4fv approved", "stdout_lines": ["         Common Name:    system:node:ip-10-0-50-20.us-east-2.compute.internal", "certificatesigningrequest.certificates.k8s.io/csr-rx4fv approved"]}

TASK [openshift_node : Approve node CSR] ***************************************
Monday 26 August 2019  11:42:42 +0800 (0:00:05.966)       0:04:26.187 ********* 
FAILED - RETRYING: Approve node CSR (6 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-51-169.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 2, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:node:ip-10-0-51-169.us-east-2.compute.internal \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n  if [ $? -eq 0 ];\n  then\n    count=$((count+1));\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:00.493370", "end": "2019-08-26 11:42:48.156555", "item": "ip-10-0-51-169.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:47.663185", "stderr": "", "stderr_lines": [], "stdout": "certificatesigningrequest.certificates.k8s.io/csr-gd6rz approved", "stdout_lines": ["certificatesigningrequest.certificates.k8s.io/csr-gd6rz approved"]}
FAILED - RETRYING: Approve node CSR (6 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-50-20.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 2, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:node:ip-10-0-50-20.us-east-2.compute.internal \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n  if [ $? -eq 0 ];\n  then\n    count=$((count+1));\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:00.504103", "end": "2019-08-26 11:42:54.152217", "item": "ip-10-0-50-20.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:53.648114", "stderr": "", "stderr_lines": [], "stdout": "certificatesigningrequest.certificates.k8s.io/csr-b9kwz approved", "stdout_lines": ["certificatesigningrequest.certificates.k8s.io/csr-b9kwz approved"]}

TASK [openshift_node : Wait for nodes to report ready] *************************
Monday 26 August 2019  11:42:54 +0800 (0:00:12.039)       0:04:38.226 ********* 
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
FAILED - RETRYING: Wait for nodes to report ready (35 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-51-169.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 3, "changed": true, "cmd": ["oc", "get", "node", "ip-10-0-51-169.us-east-2.compute.internal", "--config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:01.233045", "end": "2019-08-26 11:43:06.273282", "item": "ip-10-0-51-169.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:43:05.040237", "stderr": "", "stderr_lines": [], "stdout": "True", "stdout_lines": ["True"]}
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-50-20.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 2, "changed": true, "cmd": ["oc", "get", "node", "ip-10-0-50-20.us-east-2.compute.internal", "--config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.240301", "end": "2019-08-26 11:43:12.017293", "item": "ip-10-0-50-20.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:43:11.776992", "stderr": "", "stderr_lines": [], "stdout": "True", "stdout_lines": ["True"]}

Comment 5 errata-xmlrpc 2019-10-16 06:35:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.