Bug 2100033

Summary: OCP 4.11 IPI - Some csr remain "Pending" post deployment
Product: OpenShift Container Platform Reporter: pdsilva
Component: Cloud ComputeAssignee: Karthik K N <kabhat>
Cloud Compute sub component: Cloud Controller Manager QA Contact: pdsilva
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: mkumatag
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:19:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description pdsilva 2022-06-22 08:44:54 UTC
Description of problem:
After deploying OCP 4.11 via IPI on Power, Some csr remain Pending:

# oc get csr | grep Pending
csr-8vv9t                                        51m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-5v5pb                              <none>              Pending
csr-f7g26                                        36m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-5v5pb                              <none>              Pending
csr-g89ws                                        51m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-dwdng                              <none>              Pending
csr-gf8qt                                        6m26s   kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-dwdng                              <none>              Pending
csr-j4h6b                                        21m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-dwdng                              <none>              Pending
csr-mhz9p                                        36m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-xcjd8                              <none>              Pending
csr-p5m77                                        6m20s   kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-xcjd8                              <none>              Pending
csr-p5qmk                                        6m30s   kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-5v5pb                              <none>              Pending
csr-qlmb8                                        21m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-5v5pb                              <none>              Pending
csr-sn8ms                                        51m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-xcjd8                              <none>              Pending
csr-t5cbh                                        36m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-dwdng                              <none>              Pending
csr-ww2rj                                        21m     kubernetes.io/kubelet-serving                 system:node:rdr-ocp-j17-pravind-i-hqvzv-worker-xcjd8                              <none>              Pending



Cluster status:

# oc get clusterversion
NAME      VERSION                                      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         18m     Cluster version is 4.11.0-0.nightly-ppc64le-2022-06-16-003709

# oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
rdr-ocp-j17-pravind-i-hqvzv-master-0       Ready    master   58m   v1.24.0+cb71478
rdr-ocp-j17-pravind-i-hqvzv-master-1       Ready    master   58m   v1.24.0+cb71478
rdr-ocp-j17-pravind-i-hqvzv-master-2       Ready    master   58m   v1.24.0+cb71478
rdr-ocp-j17-pravind-i-hqvzv-worker-5v5pb   Ready    worker   27m   v1.24.0+cb71478
rdr-ocp-j17-pravind-i-hqvzv-worker-dwdng   Ready    worker   27m   v1.24.0+cb71478
rdr-ocp-j17-pravind-i-hqvzv-worker-xcjd8   Ready    worker   27m   v1.24.0+cb71478

# oc get co
NAME                                       VERSION                                      AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      13m
baremetal                                  4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
cloud-controller-manager                   4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      54m
cloud-credential                           4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      71m
cluster-autoscaler                         4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
config-operator                            4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      52m
console                                    4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      19m
csi-snapshot-controller                    4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
dns                                        4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
etcd                                       4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
image-registry                             4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      22m
ingress                                    4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      22m
insights                                   4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      24m
kube-apiserver                             4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      40m
kube-controller-manager                    4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      48m
kube-scheduler                             4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      48m
kube-storage-version-migrator              4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      51m
machine-api                                4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      46m
machine-approver                           4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
machine-config                             4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      49m
marketplace                                4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
monitoring                                 4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      19m
network                                    4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      52m
node-tuning                                4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      50m
openshift-apiserver                        4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      46m
openshift-controller-manager               4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      47m
openshift-samples                          4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      45m
operator-lifecycle-manager                 4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      51m
operator-lifecycle-manager-catalog         4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      51m
operator-lifecycle-manager-packageserver   4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      46m
service-ca                                 4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      52m
storage                                    4.11.0-0.nightly-ppc64le-2022-06-16-003709   True        False         False      52m

# oc get pods -A | grep -v Running| grep -v Completed
NAMESPACE                                          NAME                                                                  READY   STATUS      RESTARTS       AGE
openshift-kube-apiserver                           installer-6-rdr-ocp-j17-pravind-i-hqvzv-master-1                      0/1     Error       0              36m
openshift-kube-controller-manager                  installer-7-rdr-ocp-j17-pravind-i-hqvzv-master-0                      0/1     Error       0              44m
openshift-kube-scheduler                           installer-5-rdr-ocp-j17-pravind-i-hqvzv-master-1                      0/1     Error       0              50m
openshift-operator-lifecycle-manager               collect-profiles-27591150-8rsbg                                       0/1     Error       0              7m1s



How reproducible:
Always

Steps to Reproduce:
1. Deploy cluster via IPI
2. Check the csr via oc get csr

Actual results:
Worker related csr remain Pending.


Expected results:
csr must be auto-approved.

Comment 1 Joel Speed 2022-06-22 11:25:43 UTC
Serving certs not being approved likely means that the IPs Kubelet is reporting do not match the IPs that the Machine API Provider is reporting, I would suggest looking at the machine-approver logs to be certain

Comment 2 Karthik K N 2022-06-22 14:54:40 UTC
Yeah, On debugging this I made following observatoins

cluster-machine-approver logs 


I0621 06:49:29.065796       1 controller.go:121] Reconciling CSR: csr-zwsfb
I0621 06:49:29.105553       1 csr_check.go:157] csr-zwsfb: CSR does not appear to be client csr
E0621 06:49:29.110604       1 csr_check.go:420] csr-zwsfb: IP address '192.168.0.81' not in machine addresses: 
I0621 06:49:29.113715       1 controller.go:233] csr-zwsfb: CSR not authorized

1. Its a server csr request, for it to approve csr it has few conditions to meet(https://github.com/openshift/cluster-machine-approver#node-server-csr-approval-workflow)
2. One of this is to match machine internalIP with csr request IP
3. Currently machine does not have the InternalIP set

karthikkn@Karthiks-MacBook-Pro .ssh % oc -n openshift-machine-api describe machine rdr-kn24-f9jtx-master-0
Status:
  Addresses:
    Address:  rdr-kn24-f9jtx7mkm5-ks54l
    Type:     InternalDNS

4. But CSR expects this

karthikkn@Karthiks-MacBook-Pro karthik-openshift-workspace % oc describe csr csr-zwsfb                            
Name:               csr-zwsfb
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Mon, 20 Jun 2022 15:07:31 +0530
Requesting User:    system:node:rdr-kn24-f9jtx7mkm5-ks54l
Signer:             kubernetes.io/kubelet-serving
Status:             Pending
Subject:
  Common Name:    system:node:rdr-kn24-f9jtx7mkm5-ks54l
  Serial Number:  
  Organization:   system:nodes
Subject Alternative Names:
         DNS Names:     rdr-kn24-f9jtx7mkm5-ks54l
         IP Addresses:  192.168.0.81


So will be making a necessary changes in machine-api-provider Power VS to add required fields

Comment 6 pdsilva 2022-07-12 09:23:13 UTC
Verified with OCP 4.11.0-rc.1

No Pending csr seen post deployment.

# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.1   True        False         6m2s    Cluster version is 4.11.0-rc.1

# oc get csr
NAME                                             AGE   SIGNERNAME                                    REQUESTOR                                                                         REQUESTEDDURATION   CONDITION
csr-2ml94                                        37m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         <none>              Approved,Issued
csr-9lwd8                                        36m   kubernetes.io/kubelet-serving                 system:node:rdr-ipi-jl12-pravin-s-psmq6-master-1                                  <none>              Approved,Issued
csr-9srxt                                        13m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         <none>              Approved,Issued
csr-crnmx                                        36m   kubernetes.io/kubelet-serving                 system:node:rdr-ipi-jl12-pravin-s-psmq6-master-2                                  <none>              Approved,Issued
csr-dhq98                                        13m   kubernetes.io/kubelet-serving                 system:node:rdr-ipi-jl12-pravin-s-psmq6-worker-9n9sk                              <none>              Approved,Issued
csr-dk5zd                                        37m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         <none>              Approved,Issued
csr-gfwdg                                        14m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         <none>              Approved,Issued
csr-nhp6p                                        13m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         <none>              Approved,Issued
csr-nzb7d                                        13m   kubernetes.io/kubelet-serving                 system:node:rdr-ipi-jl12-pravin-s-psmq6-worker-dkkvk                              <none>              Approved,Issued
csr-sjwct                                        36m   kubernetes.io/kubelet-serving                 system:node:rdr-ipi-jl12-pravin-s-psmq6-master-0                                  <none>              Approved,Issued
csr-w5jp7                                        14m   kubernetes.io/kubelet-serving                 system:node:rdr-ipi-jl12-pravin-s-psmq6-worker-q75v6                              <none>              Approved,Issued
csr-w899t                                        37m   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper         <none>              Approved,Issued
system:openshift:openshift-authenticator-2dhbp   34m   kubernetes.io/kube-apiserver-client           system:serviceaccount:openshift-authentication-operator:authentication-operator   <none>              Approved,Issued
system:openshift:openshift-monitoring-7hnqf      33m   kubernetes.io/kube-apiserver-client           system:serviceaccount:openshift-monitoring:cluster-monitoring-operator            <none>              Approved,Issued

Comment 7 errata-xmlrpc 2022-08-10 11:19:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069