Bug 1978303 - KAS pod logs show: [SHOULD NOT HAPPEN] ...failed to convert new object...CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]
Summary: KAS pod logs show: [SHOULD NOT HAPPEN] ...failed to convert new object...Cert...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.8
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Michael McCune
QA Contact: Huali Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-01 13:49 UTC by Xingxing Xia
Modified: 2022-08-10 10:37 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Cluster Machine Approver will always append the approved status condition to its condition list. Consequence: The Kubernetes API Server will emit error logs containing the message "[SHOULD NOT HAPPEN] failed to update managedFields". Fix: The Cluster Machine Approver has been updated to check its conditions before appending to the list. Updating the condition where necessary. Result: The conditions are no longer duplicated in the CertificateSigningRequest, the Kubernetes API Server will not emit error messages about the duplicate.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:36:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-machine-approver pull 158 0 None open Bug 1978303: update approve condition logic 2022-03-07 19:48:45 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:37:19 UTC

Description Xingxing Xia 2021-07-01 13:49:13 UTC
Description of problem:
KAS pod logs show: [SHOULD NOT HAPPEN] ...failed to convert new object...CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-07-01-043852

How reproducible:
Unknown yet

Steps to Reproduce:
1. Create a fresh env of aos-4_8/upi-on-aws/versioned-installer-sts-ci

2. Check kube-apiserver containers' logs

Actual results:
2. Found:
$ grep -nr "SHOULD NOT HAPPEN" logs/xxiasts0701dr2days/
logs/xxiasts0701dr2days/pre-dr_kube-apiserver-ip-10-0-57-59.ap-northeast-1.compute.internal/kube-apiserver.log:6252:2021-07-01T08:28:11.151344399Z E0701 08:28:11.151220      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
logs/xxiasts0701dr2days/pre-dr_kube-apiserver-ip-10-0-57-59.ap-northeast-1.compute.internal/kube-apiserver.log:6253:2021-07-01T08:28:12.248117168Z E0701 08:28:12.248024      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
...
logs/xxiasts0701dr2days/pre-dr_kube-apiserver-ip-10-0-67-200.ap-northeast-1.compute.internal/kube-apiserver.log:4595:2021-07-01T08:11:31.122751584Z E0701 08:11:31.122675      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
logs/xxiasts0701dr2days/pre-dr_kube-apiserver-ip-10-0-67-200.ap-northeast-1.compute.internal/kube-apiserver.log:4604:2021-07-01T08:11:32.230233653Z E0701 08:11:32.230127      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="

$ grep -nr "SHOULD NOT HAPPEN" logs/xxiasts0701dr2days/ | wc -l
8

Expected results:
2. No such errors.

Additional info:

Comment 1 Kevin Wiesmueller 2021-07-02 12:13:18 UTC
Can we get access to the cluster or recreate it?
Not familiar with that part, can I get a cluster that should reproduce it from cluster-bot?

Comment 2 Kevin Wiesmueller 2021-07-02 15:45:19 UTC
I managed to reproduce this on a 4.8 cluster.

Sending a POST 
/apis/certificates.k8s.io/v1/certificatesigningrequests
{
  "kind": "CertificateSigningRequest",
  "apiVersion": "certificates.k8s.io/v1",
  "metadata": {
    "name": "testCSR1",
  },
  "spec": {
    "request": "...",
    "signerName": "kubernetes.io/kube-apiserver-client-kubelet",
    "usages": [
      "digital signature",
      "key encipherment",
      "client auth"
    ],
    "extra": {
      "scopes.authorization.openshift.io": [
        "user:full"
      ]
    }
  },
  "status": {
    "conditions": [
      {
        "type": "Approved",
        "status": "True",
        "reason": "AutoApproved",
        "message": "Auto approving kubelet client certificate after SubjectAccessReview.",
        "lastUpdateTime": "2021-07-02T13:10:35Z",
        "lastTransitionTime": "2021-07-02T13:10:35Z"
      },
      {
        "type": "Approved",
        "status": "True",
        "reason": "AutoApproved",
        "message": "Auto approving kubelet client certificate after SubjectAccessReview.",
        "lastUpdateTime": "2021-07-02T13:10:35Z",
        "lastTransitionTime": "2021-07-02T13:10:35Z"
      }
    ],
    "certificate": "..."
  }
}

causes this log entry.
I suspect it is because validation does not check status on creates. This then causes our schema validation to fail.

We added https://github.com/kubernetes/kubernetes/pull/99661 which makes the fieldManager ignore the fields described in the respective strategy.
It seems like that does not ignore the fields for conversion though :-/
I'll have to take a closer look. But the question remains on where this request is coming from as it is not valid.
A PUT on an existing resource does not cause the same error and fails at validation. But any POST even if it fails in the end (due to the resource existing already) causes this to happen.

Comment 3 Stefan Schimanski 2021-07-12 10:44:57 UTC
What is the consequence for the request at hand? Is it just noise or does it cause the request to fail?

Comment 4 Kevin Wiesmueller 2021-07-12 11:58:55 UTC
It's not failing the request, but aborts managedFields computation for the resource and as it's a create doesn't add them.
That means the resource doesn't get managedFields at all (possibly never).
For the specific resource I think this isn't a problem as CSRs are mostly immutable anyways.

But we wanted to fail requests on this error originally to have more stable schema validation, so we should still see how to fix it. Both at the place where the request is coming from and our validation in the apiserver.

Comment 5 Michal Fojtik 2021-08-11 22:53:23 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 6 Rahul Gangwar 2021-08-12 04:29:57 UTC
We have found the same error "SHOULD NOT HAPPEN" in KAS pods logs for 4.9 version

Version:
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-07-175228   True        False         3h26m   Cluster version is 4.9.0-0.nightly-2021-08-07-175228


2021-08-11T12:15:33.361189346+00:00 stderr F E0811 12:15:33.361127      17 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
2021-08-11T12:51:11.840273271+00:00 stderr F E0811 12:51:11.840184      17 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
2021-08-11T12:06:56.404943043+00:00 stderr F E0811 12:06:56.404877      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind=

Comment 7 Michal Fojtik 2021-09-11 05:27:56 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 9 Michal Fojtik 2021-10-30 23:29:59 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 10 Michal Fojtik 2021-12-01 02:58:57 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 11 Stefan Schimanski 2021-12-01 08:13:57 UTC
This is very likely caused by the csr approver from MCO, duplicating the approved condition.

Comment 12 Bob Fournier 2021-12-01 12:37:44 UTC
Reassigning to MCO team per Comment 11.

Comment 13 Ben Nemec 2021-12-01 15:04:47 UTC
Moving to MCO component since this is not related to on-prem deployments.

Comment 14 Yu Qi Zhang 2021-12-01 15:30:40 UTC
> This is very likely caused by the csr approver from MCO, duplicating the approved condition.

I don't believe the MCO team owns the CSR approver. Are you talking about https://github.com/openshift/cluster-machine-approver? That would be on machine-api

Comment 15 Kirsten Garrison 2022-03-02 21:13:10 UTC
As per the above comment, moving to machine-api that we believe owns CSR approver not MCO.

Comment 16 Joel Speed 2022-03-03 12:01:24 UTC
I think Stefan's comment is correct, for some reason, rather than checking for an existing condition in the list, we just add a new one. Will need to fix this by checking if the approved condition already exists. Surprised the API allows this though.

https://github.com/openshift/cluster-machine-approver/blob/897738a5cd150cea7df0ab57c40e01a9de864a3b/pkg/controller/controller.go#L272-L289

Comment 17 Michael McCune 2022-03-03 14:58:21 UTC
just starting to take a look at this, we can definitely improve the way the cluster-machine-approver is handling this condition.

Comment 19 Michael McCune 2022-03-10 02:47:06 UTC
while talking with colleagues about this fix for this, we found a library routine in the upstream kubernetes apimachinery module that would perform the same functionality as the patch i wrote, and has the additional benefit of being a common component in the upstream. if the qa tests have not started yet, would it be possible to hold off for another day so that i can create a new pull request? 

apologies for the inconvenience

Comment 21 Huali Liu 2022-03-16 11:20:48 UTC
reproduce on 4.8.0-0.nightly-2022-03-15-185859:

1.Create a fresh env of upi-on-aws/versioned-installer-sts-ci

2.Check the kube-apiserver logs on all master nodes

liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-ip-10-0-50-70.us-east-2.compute.internal  -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN"  
Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-ip-10-0-50-70.us-east-2.compute.internal. Please use `kubectl.kubernetes.io/default-container` instead
E0316 06:26:36.860495      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
E0316 06:27:47.473794      19 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-ip-10-0-56-248.us-east-2.compute.internal    -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN"
Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-ip-10-0-56-248.us-east-2.compute.internal. Please use `kubectl.kubernetes.io/default-container` instead
liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-ip-10-0-79-239.us-east-2.compute.internal    -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN"
Using deprecated annotation `kubectl.kubernetes.io/default-logs-container` in pod/kube-apiserver-ip-10-0-79-239.us-east-2.compute.internal. Please use `kubectl.kubernetes.io/default-container` instead
E0316 06:42:59.945050      21 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
E0316 06:44:10.564888      21 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
E0316 06:59:40.013501      21 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
E0316 07:00:50.585777      21 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
E0316 07:16:20.051236      21 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
E0316 07:17:30.605206      21 fieldmanager.go:191] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (certificates.k8s.io/v1, Kind=CertificateSigningRequest) to smd typed: .status.conditions: duplicate entries for key [type=\"Approved\"]" VersionKind="/, Kind="
liuhuali@Lius-MacBook-Pro huali-test % 

Verified on 4.11.0-0.nightly-2022-03-15-060211, No "[SHOULD NOT HAPPEN]" error was found.

Steps:

1.Create a fresh env of upi-on-aws/versioned-installer-sts-ci

2.Check the kube-apiserver logs on all master nodes

liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-ip-10-0-52-129.us-east-2.compute.internal -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN"      
liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-ip-10-0-63-143.us-east-2.compute.internal -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN"      
liuhuali@Lius-MacBook-Pro huali-test % oc logs kube-apiserver-ip-10-0-75-16.us-east-2.compute.internal -n openshift-kube-apiserver|grep "SHOULD NOT HAPPEN"       
liuhuali@Lius-MacBook-Pro huali-test %

Comment 22 Michael McCune 2022-03-17 01:48:32 UTC
after discussing with the team and exploring the library option, we have decided to stay with the change i have proposed because it gives us an optimization on updating the resource. please disregard my previous request.

Comment 24 errata-xmlrpc 2022-08-10 10:36:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.