1741432 – Should approve node csr before checking nodes to be ready

Bug 1741432 - Should approve node csr before checking nodes to be ready

Summary: Should approve node csr before checking nodes to be ready

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Jeremiah Stuever
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-15 05:57 UTC by Gaoyun Pei
Modified:	2019-10-16 06:36 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:35:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift openshift-ansible pull 11847	0	'None'	closed	Bug 1741432: openshift-node to approve node CSRs	2020-11-05 13:49:33 UTC
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:36:10 UTC

Description Gaoyun Pei 2019-08-15 05:57:46 UTC

Description of problem:
The RHEL worker scale-up playbook failed at "Wait for nodes to report ready"

TASK [openshift_node : Wait for nodes to report ready] *************************
Thursday 15 August 2019  13:26:21 +0800 (0:00:30.580)       0:06:20.154 ******* 
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
FAILED - RETRYING: Wait for nodes to report ready (35 retries left).
...
FAILED - RETRYING: Wait for nodes to report ready (2 retries left).
FAILED - RETRYING: Wait for nodes to report ready (1 retries left).
failed: [rhel-1.qe-gpei-420815.qe.devcluster.openshift.com -> localhost] (item=rhel-1.qe-gpei-420815.qe.devcluster.openshift.com) => {"ansible_loop_var": "item", "attempts": 36, "changed": true, "cmd": ["oc", "get", "node", "rhel-1.qe-gpei-420815.qe.devcluster.openshift.com", "--config=/home/slave1/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.135821", "end": "2019-08-15 13:29:30.917780", "item": "rhel-1.qe-gpei-420815.qe.devcluster.openshift.com", "msg": "non-zero return code", "rc": 1, "start": "2019-08-15 13:29:30.781959", "stderr": "Error from server (NotFound): nodes \"rhel-1.qe-gpei-420815.qe.devcluster.openshift.com\" not found", "stderr_lines": ["Error from server (NotFound): nodes \"rhel-1.qe-gpei-420815.qe.devcluster.openshift.com\" not found"], "stdout": "", "stdout_lines": []}


After approve the pending nodes csr manually, the RHEL nodes are available.


Version-Release number of the following components:
openshift-ansible-4.2.0-201908142219.git.188.7254b39.el7

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Jeremiah Stuever 2019-08-20 21:50:58 UTC

Nodes are approved automatically in my initial testing. I expect this is because they were rolled out using machine sets. Need to test how this looks when nodes are created outside of the cluster.

Comment 2 Jeremiah Stuever 2019-08-20 23:43:15 UTC

Confirmed, node certificates are not automatically approved if the node is not created by the cluster. This causes the playbook to fail at "Wait for nodes to report ready"... introduced in #11808.

https://github.com/openshift/openshift-ansible/pull/11808

Comment 4 Gaoyun Pei 2019-08-26 05:12:23 UTC

Verify this bug with openshift-ansible-4.2.0-201908251340.git.198.9c1655b.el7


TASK [openshift_node : Approve node-bootstrapper CSR] **************************
Monday 26 August 2019  11:42:36 +0800 (0:01:09.436)       0:04:20.220 ********* 
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-51-169.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 1, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig describe csr/$csr    | grep \" system:node:ip-10-0-51-169.us-east-2.compute.internal$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:02.862661", "end": "2019-08-26 11:42:39.206067", "item": "ip-10-0-51-169.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:36.343406", "stderr": "", "stderr_lines": [], "stdout": "         Common Name:    system:node:ip-10-0-51-169.us-east-2.compute.internal\ncertificatesigningrequest.certificates.k8s.io/csr-6hmxb approved", "stdout_lines": ["         Common Name:    system:node:ip-10-0-51-169.us-east-2.compute.internal", "certificatesigningrequest.certificates.k8s.io/csr-6hmxb approved"]}
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-50-20.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 1, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:serviceaccount:openshift-machine-config-operator:node-bootstrapper \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig describe csr/$csr    | grep \" system:node:ip-10-0-50-20.us-east-2.compute.internal$\";\n  if [ $? -eq 0 ];\n  then\n    oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n    if [ $? -eq 0 ];\n    then\n      count=$((count+1));\n    fi;\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:02.756084", "end": "2019-08-26 11:42:42.115446", "item": "ip-10-0-50-20.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:39.359362", "stderr": "", "stderr_lines": [], "stdout": "         Common Name:    system:node:ip-10-0-50-20.us-east-2.compute.internal\ncertificatesigningrequest.certificates.k8s.io/csr-rx4fv approved", "stdout_lines": ["         Common Name:    system:node:ip-10-0-50-20.us-east-2.compute.internal", "certificatesigningrequest.certificates.k8s.io/csr-rx4fv approved"]}

TASK [openshift_node : Approve node CSR] ***************************************
Monday 26 August 2019  11:42:42 +0800 (0:00:05.966)       0:04:26.187 ********* 
FAILED - RETRYING: Approve node CSR (6 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-51-169.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 2, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:node:ip-10-0-51-169.us-east-2.compute.internal \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n  if [ $? -eq 0 ];\n  then\n    count=$((count+1));\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:00.493370", "end": "2019-08-26 11:42:48.156555", "item": "ip-10-0-51-169.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:47.663185", "stderr": "", "stderr_lines": [], "stdout": "certificatesigningrequest.certificates.k8s.io/csr-gd6rz approved", "stdout_lines": ["certificatesigningrequest.certificates.k8s.io/csr-gd6rz approved"]}
FAILED - RETRYING: Approve node CSR (6 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-50-20.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 2, "changed": true, "cmd": "count=0; for csr in `oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig get csr --no-headers  | grep \" system:node:ip-10-0-50-20.us-east-2.compute.internal \"  | cut -d \" \" -f1`;\ndo\n  oc --config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig adm certificate approve ${csr};\n  if [ $? -eq 0 ];\n  then\n    count=$((count+1));\n  fi;\ndone; exit $((!count));\n", "delta": "0:00:00.504103", "end": "2019-08-26 11:42:54.152217", "item": "ip-10-0-50-20.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:42:53.648114", "stderr": "", "stderr_lines": [], "stdout": "certificatesigningrequest.certificates.k8s.io/csr-b9kwz approved", "stdout_lines": ["certificatesigningrequest.certificates.k8s.io/csr-b9kwz approved"]}

TASK [openshift_node : Wait for nodes to report ready] *************************
Monday 26 August 2019  11:42:54 +0800 (0:00:12.039)       0:04:38.226 ********* 
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
FAILED - RETRYING: Wait for nodes to report ready (35 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-51-169.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 3, "changed": true, "cmd": ["oc", "get", "node", "ip-10-0-51-169.us-east-2.compute.internal", "--config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:01.233045", "end": "2019-08-26 11:43:06.273282", "item": "ip-10-0-51-169.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:43:05.040237", "stderr": "", "stderr_lines": [], "stdout": "True", "stdout_lines": ["True"]}
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
changed: [ip-10-0-51-169.us-east-2.compute.internal -> localhost] => (item=ip-10-0-50-20.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 2, "changed": true, "cmd": ["oc", "get", "node", "ip-10-0-50-20.us-east-2.compute.internal", "--config=/home/slave3/workspace/ocp4-rhel-scaleup-runner/jenkins_kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.240301", "end": "2019-08-26 11:43:12.017293", "item": "ip-10-0-50-20.us-east-2.compute.internal", "rc": 0, "start": "2019-08-26 11:43:11.776992", "stderr": "", "stderr_lines": [], "stdout": "True", "stdout_lines": ["True"]}

Comment 5 errata-xmlrpc 2019-10-16 06:35:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.