Hide Forgot
Description of problem: I used the "Add Hosts" tab to add another worker to the cluster using the assisted day-2 flow, and then waited for long minutes but the node's request to join the cluster never showed up in the UI. On the command line, you can see that the csr already exists. Version-Release number of selected component (if applicable): 4.9.4 How reproducible: 100% Steps to Reproduce: 1. Install a cluster with assisted installer. 2. If needed, update the cluster to the latest version 4.9.4. 3. Use the "Add Hosts" feature to create a day-2 iso and boot an additional node with it 4. Install the day-2 node 5. After the installation is finished, wait for the csr to appear in the UI under the "Nodes" section Actual results: The csr never shows up. I tried F5 many times. Expected results: The user should be able to approve the request to join the cluster using the UI. Additional info: From the CLI you can see that the csr exists already (from 12 minutes ago): # oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-hz289 12m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending # oc describe csr csr-hz289 Name: csr-hz289 Labels: <none> Annotations: <none> CreationTimestamp: Wed, 03 Nov 2021 13:52:27 -0400 Requesting User: system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Signer: kubernetes.io/kube-apiserver-client-kubelet Status: Pending Subject: Common Name: system:node:worker-0-3 Serial Number: Organization: system:nodes Events: <none>
Hi Udi, thanks for reporting the issue. Its not really obvious from the description how to reproduce the issue. Could you please be more specific? Thank you
Creating a Jira story for this issue https://issues.redhat.com/browse/HAC-231 Closing this bug since its filed for incorrect product OCP
Based on the last comment on HAC-231, reopening. The issue is in the OCP console. The problem is that when adding a new node (not only using cloud.redhat.com) than a CSR should be shown in the nodes list. And in 4.9.4 its not showing.
Adjusting the severity to "high" because it blocks functionality but isn't an outage.
@
Launched an upi-on-baremetal cluster on aws with payload 4.10.0-0.nightly-2021-12-23-153012 containing the fix code. When add node to the cluster, the csr and node list, before csr is approved, the node is not shown on nodes list page, and no csr is showing for approving on nodes page. # oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-46pcq 3m31s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-z7zxt 2m40s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending [root@MiWiFi-R1CM ~]# oc describe csr csr-46pcq Name: csr-46pcq Labels: <none> Annotations: <none> CreationTimestamp: Tue, 04 Jan 2022 15:18:27 +0800 Requesting User: system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Signer: kubernetes.io/kube-apiserver-client-kubelet Status: Pending Subject: Common Name: system:node:ip-10-0-51-189.us-east-2.compute.internal Serial Number: Organization: system:nodes Events: <none> [root@MiWiFi-R1CM ~]# oc describe csr csr-z7zxt Name: csr-z7zxt Labels: <none> Annotations: <none> CreationTimestamp: Tue, 04 Jan 2022 15:19:18 +0800 Requesting User: system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Signer: kubernetes.io/kube-apiserver-client-kubelet Status: Pending Subject: Common Name: system:node:ip-10-0-51-189.us-east-2.compute.internal Serial Number: Organization: system:nodes Events: <none> Now there is no new node in nodes list no matter from client or console. # oc get node NAME STATUS ROLES AGE VERSION ip-10-0-50-250 Ready master,worker 4h29m v1.22.1+6859754 ip-10-0-60-161 Ready master,worker 4h29m v1.22.1+6859754 ip-10-0-68-131 Ready master,worker 4h29m v1.22.1+6859754 After approve csr manually from client, new node appears on console and client. # oc adm certificate approve csr-46pcq certificatesigningrequest.certificates.k8s.io/csr-46pcq approved # oc adm certificate approve csr-z7zxt certificatesigningrequest.certificates.k8s.io/csr-z7zxt approved # oc get node NAME STATUS ROLES AGE VERSION ip-10-0-50-250 Ready master,worker 4h30m v1.22.1+6859754 ip-10-0-51-189.us-east-2.compute.internal Ready worker 71s v1.22.1+6859754 ip-10-0-60-161 Ready master,worker 4h30m v1.22.1+6859754 ip-10-0-68-131 Ready master,worker 4h31m v1.22.1+6859754 So the expected fix is to show csr and approve it from console, but it doesn't work.
@yanpzhan Is metal3 plugin enabled on the UI ? Can you see Compute/Bare Metal Hosts page in navigation ?
@Rastislav, there is no Compute/Bare Metal Hosts page on my upi-on-baremetal cluster, not sure how to enable metal3 plugin on the cluster, but I know Bare Metal Hosts page exists on ipi-on-baremetal cluster. If a cluster is not bare metal cluster, when add new node, there won't be csr approval function on console?
metal3 plugin gets enabled when cluster platfom === BareMetal. Metal3 plugin is the one which adds CSR approval capabilities to console. Can you please try with ipi-on-baremetal, since as you said Bare Metal Hosts page exists there which means Metal3 plugin is enabled there.
I tested on a 3 worker node ipi baremetal cluster, deprovision one BMH to remove one worker node, then reprovision the BMH, set machineset replica to 3, after a while, new machine was created, then new node was added(csrs were approved automatically).
I checked the bug on an 4.9.11 cluster(which is assisted install), and bridging console with latest code containing the fix, when the new node is added and csr is pending, new node could be seen on nodes list page showing approval required. Click open the csr approving model, could approve it. After csrs are approved from console, node is ready. Also check on csr detail page, could see message:"This CSR was approved via OpenShift Console".
According to test result in Comment 17, the bug is fixed, so move it to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056