Bug 2002961 - CSR reconciler report error constantly when BYOH CSR approved by other Approver
Summary: CSR reconciler report error constantly when BYOH CSR approved by other Approver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Mansi Kulkarni
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Depends On:
Blocks: 2008942
TreeView+ depends on / blocked
 
Reported: 2021-09-10 08:54 UTC by gaoshang
Modified: 2022-03-28 09:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2003788 2008942 (view as bug list)
Environment:
Last Closed: 2022-03-28 09:36:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift windows-machine-config-operator pull 668 0 None open Bug 2002961: Prevent error loop when a CSR is queued and then approved externally 2021-09-13 17:43:59 UTC
Red Hat Product Errata RHSA-2022:0577 0 None None None 2022-03-28 09:36:45 UTC

Description gaoshang 2021-09-10 08:54:57 UTC
Description of problem:
Looks like BYOH node CSR expired after 24 hours(or less) which is then renewed, the renewed CSR is approved by Node CSR Approver, it will cause error "WMCO CSR Approver could not approve csr-ctlst CSR", see bellow.

# oc logs -f deployment.apps/windows-machine-config-operator -n openshift-windows-machine-config-operator
...
2021-09-10T06:52:16.474Z	DEBUG	wc ip-10-0-57-110.us-east-2.compute.internal	initializing SSH connection
2021-09-10T06:52:16.995Z	DEBUG	wc ip-10-0-57-110.us-east-2.compute.internal	run	{"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass hostname", "out": "sgao-win1\r\n"}
2021-09-10T06:52:17.001Z	ERROR	controller-runtime.manager.controller.certificatesigningrequest	Reconciler error	{"reconciler group": "certificates.k8s.io", "reconciler kind": "CertificateSigningRequest", "name": "csr-ctlst", "namespace": "", "error": "WMCO CSR Approver could not approve csr-ctlst CSR: could not update conditions for approval CSR: csr-ctlst: Operation cannot be fulfilled on certificatesigningrequests.certificates.k8s.io \"csr-ctlst\": the object has been modified; please apply your changes to the latest version and try again", "errorVerbose": "Operation cannot be fulfilled on certificatesigningrequests.certificates.k8s.io \"csr-ctlst\": the object has been modified; please apply your changes to the latest version and try again\ncould not update conditions for approval CSR: csr-ctlst\ngithub.com/openshift/windows-machine-config-operator/pkg/csr.(*Approver).Approve\n\t/build/windows-machine-config-operator/pkg/csr/csr.go:99\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).reconcileCSR\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:96\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:86\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nWMCO CSR Approver could not approve csr-ctlst CSR\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).reconcileCSR\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:97\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:86\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214


# oc get node -l kubernetes.io/os=windows
NAME        STATUS   ROLES    AGE   VERSION
sgao-win1   Ready    worker   16h   v1.22.1-1660+bbcc9aea9e4bef

# oc get csr
NAME        AGE   SIGNERNAME                                    REQUESTOR                    REQUESTEDDURATION   CONDITION
...
csr-ctlst   98m   kubernetes.io/kubelet-serving                 system:node:sgao-win1        <none>              Approved,Issued

# oc get csr csr-ctlst -oyaml
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  creationTimestamp: "2021-09-10T06:52:16Z"
  generateName: csr-
  managedFields:
  - apiVersion: certificates.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:certificate: {}
    manager: kube-controller-manager
    operation: Update
    subresource: status
    time: "2021-09-10T06:52:16Z"
  - apiVersion: certificates.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
    manager: kubelet.exe
    operation: Update
    time: "2021-09-10T06:52:16Z"
  - apiVersion: certificates.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          .: {}
          k:{"type":"Approved"}:
            .: {}
            f:lastTransitionTime: {}
            f:lastUpdateTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
    manager: machine-approver
    operation: Update
    subresource: approval
    time: "2021-09-10T06:52:16Z"
  name: csr-ctlst
  resourceVersion: "303883"
  uid: 6e5e416e-10a0-4779-9b22-49481b241405
spec:
  groups:
  - system:nodes
  - system:authenticated
  request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkh6Q0J4Z0lCQURBM01SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14SGpBY0JnTlZCQU1URlhONQpjM1JsYlRwdWIyUmxPbk5uWVc4dGQybHVNVEJaTUJNR0J5cUdTTTQ5QWdFR0NDcUdTTTQ5QXdFSEEwSUFCRHd4CkkvQWlBTk41WDBzcjBmOStlMGprZXN1bCtIZ0RDN01sQndDSW1VK01EN00veFh0ZHZLZ1d4TEcvYjFZZWJHSWYKaWdHUW5EQUxBZjFmbmROSlVmMmdMVEFyQmdrcWhraUc5dzBCQ1E0eEhqQWNNQm9HQTFVZEVRUVRNQkdDQ1hObgpZVzh0ZDJsdU1ZY0VDZ0E1YmpBS0JnZ3Foa2pPUFFRREFnTklBREJGQWlBaGFWUWdCU3RRUU5zd2tpWVJLRGxHCkk3dE85eENlZTh2bW9zZmJEVGRVMkFJaEFOd1ptQ1JZb0g1RS9nS01ldU9CekFmcnBsa3orcCtXam03ZE1XQlQKWThaMQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
  signerName: kubernetes.io/kubelet-serving
  usages:
  - digital signature
  - key encipherment
  - server auth
  username: system:node:sgao-win1
status:
  certificate: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNqakNDQVhhZ0F3SUJBZ0lRUTUrUnFwVG5tK3BmVmxxdGlGMXZjekFOQmdrcWhraUc5dzBCQVFzRkFEQW0KTVNRd0lnWURWUVFEREJ0cmRXSmxMV056Y2kxemFXZHVaWEpmUURFMk16RXhPVEEwTVRVd0hoY05NakV3T1RFdwpNRFkwTnpFMldoY05NakV3T1RFd01USXdORFExV2pBM01SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14CkhqQWNCZ05WQkFNVEZYTjVjM1JsYlRwdWIyUmxPbk5uWVc4dGQybHVNVEJaTUJNR0J5cUdTTTQ5QWdFR0NDcUcKU000OUF3RUhBMElBQkR3eEkvQWlBTk41WDBzcjBmOStlMGprZXN1bCtIZ0RDN01sQndDSW1VK01EN00veFh0ZAp2S2dXeExHL2IxWWViR0lmaWdHUW5EQUxBZjFmbmROSlVmMmpjakJ3TUE0R0ExVWREd0VCL3dRRUF3SUZvREFUCkJnTlZIU1VFRERBS0JnZ3JCZ0VGQlFjREFUQU1CZ05WSFJNQkFmOEVBakFBTUI4R0ExVWRJd1FZTUJhQUZNek8KTFM1M0lBWEtrQlliekZHUXdSb0NLUzJNTUJvR0ExVWRFUVFUTUJHQ0NYTm5ZVzh0ZDJsdU1ZY0VDZ0E1YmpBTgpCZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFleDJXaVAvaWlpS3dtVXo3dmpmQnlRZ0Y3c3p6M212UkcvZnMyUXNPCnJtSUwvMFpPcm4vc3BqM2J0ekM3L0pPOFprOHVLTldadFVRN1F2eVJPUWFHSHQzRDZSNWJKdGVHekFaNE5obzUKYldwNDRQdWlYdVdIQ1hOR21xU1VlbEFaSENyZlk0blJ0bzhod2VJMjJ4WkFRR2RDZGlpOXhhZmRrb2tBSjlrUQpSOWZOcDBvWTY4NG5vUW1hdWl0UXlKRXBtVUxZZXZxRWJrVW0xN1U4SHJVOFNHSWFtbzcvSG1WMFNPNkU5QkFJClJheGxvUlNtbDlYa05wTWJiMkNlWlhiT2tVU0hXcnhWbU5yQWNRakpUekZIVFVvODFDODRMZ1NhcDlUb0dsbG4KWUdBUlRjb0tuOEIyZVZOVVpxKzVpeU9uK25VZlRwaE1MNEFzMlhSOWNUQVlCUT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  conditions:
  - lastTransitionTime: "2021-09-10T06:52:16Z"
    lastUpdateTime: "2021-09-10T06:52:16Z"
    message: This CSR was approved by the Node CSR Approver
    reason: NodeCSRApprove
    status: "True"
    type: Approved

Version-Release number of selected component (if applicable):
OCP version: 4.9.0-0.nightly-2021-09-08-233235
WMCO mater branch commit: 6901ac3ed2891b7627a1431ab66703358b902671

How reproducible:
Always

Steps to Reproduce:
1, Configure a BYOH Windows instance
2, Wait about 24 hours for CSR renewed and check WMCO log

Actual results:
CSR reconciler report error constantly

Expected results:
CSR reconciler should not report error constantly

Additional info:
Because the CSR is already approved, this error will disappear when CSR record removed from `oc get csr` after a while, or deleted by `oc delete csr csr-ctlst`.

Comment 1 Aravindh Puthiyaparambil 2021-09-10 15:05:12 UTC
@sgao are you talking about cluster-machine-approver [0] when you say node CSR approver?


What we need to figure out is why the node CSR approver approved a BYOH node as they use check for a node link with a Machine before approving. So it could point to an issue with cluster-machine-approver.

[0] https://github.com/openshift/cluster-machine-approver/

Comment 2 gaoshang 2021-09-11 11:40:33 UTC
Yes, I think so. I got the name `Node CSR Approver` from following csr info:

# oc get csr csr-ctlst -oyaml
...
  conditions:
  - lastTransitionTime: "2021-09-10T06:52:16Z"
    lastUpdateTime: "2021-09-10T06:52:16Z"
    message: This CSR was approved by the Node CSR Approver
    reason: NodeCSRApprove
    status: "True"
    type: Approved

Here I have another cluster with this bug, found it's(csr csr-nqbh7) approved by openshift-cluster-machine-approver.

# oc logs deployment.apps/machine-approver machine-approver-controller -n openshift-cluster-machine-approver | grep "csr-nqbh7"
I0911 02:55:19.980810       1 controller.go:114] Reconciling CSR: csr-nqbh7
I0911 02:55:19.991065       1 csr_check.go:150] csr-nqbh7: CSR does not appear to be client csr
I0911 02:55:20.014731       1 controller.go:179] CSR csr-nqbh7 approved

2021-09-11T03:50:40.409Z	DEBUG	wc 10.0.61.32	initializing SSH connection
2021-09-11T03:50:40.882Z	DEBUG	wc 10.0.61.32	run	{"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass hostname", "out": "sgao-win1\r\n"}
2021-09-11T03:50:40.887Z	ERROR	controller-runtime.manager.controller.certificatesigningrequest	Reconciler error	{"reconciler group": "certificates.k8s.io", "reconciler kind": "CertificateSigningRequest", "name": "csr-nqbh7", "namespace": "", "error": "WMCO CSR Approver could not approve csr-nqbh7 CSR: could not update conditions for approval CSR: csr-nqbh7: CertificateSigningRequest.certificates.k8s.io \"csr-nqbh7\" is invalid: status.conditions[1].type: Duplicate value: \"Approved\"", "errorVerbose": "CertificateSigningRequest.certificates.k8s.io \"csr-nqbh7\" is invalid: status.conditions[1].type: Duplicate value: \"Approved\"\ncould not update conditions for approval CSR: csr-nqbh7\ngithub.com/openshift/windows-machine-config-operator/pkg/csr.(*Approver).Approve\n\t/build/windows-machine-config-operator/pkg/csr/csr.go:99\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).reconcileCSR\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:96\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:86\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nWMCO CSR Approver could not approve csr-nqbh7 CSR\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).reconcileCSR\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:97\ngithub.com/openshift/windows-machine-config-operator/controllers.(*certificateSigningRequestsReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/certificatesigningrequests_controller.go:86\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214

Comment 11 gaoshang 2021-09-25 09:01:00 UTC
This bug has been verified on OCP 4.10.0-0.nightly-2021-09-23-210724 and passed, thanks.

Steps:
Wait after the renewed CSR issued, CSR reconciler did not report error anymore and filter by checking CSR status


# oc logs deployment.apps/windows-machine-config-operator -n openshift-windows-machine-config-operator | grep "SR is already approved"
2021-09-25T02:56:37.217Z	INFO	controllers.CertificateSigningRequests	CSR is already approved/denied	{"Name": "system:openshift:openshift-authenticator-69dgg"}
2021-09-25T07:16:39.142Z	INFO	controllers.CertificateSigningRequests	CSR is already approved/denied	{"Name": "system:openshift:openshift-monitoring-z46bl"}

Comment 14 errata-xmlrpc 2022-03-28 09:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Windows Container Support for Red Hat OpenShift 5.0.0 [security update]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0577


Note You need to log in before you can comment on or make changes to this bug.