Bug 1904133 - KubeletConfig flooded with failure conditions
Summary: KubeletConfig flooded with failure conditions
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.7.0
Assignee: Qi Wang
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-03 16:06 UTC by Artyom
Modified: 2022-04-19 23:42 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Failed to check the condition message leads to the failure condition with the same message appearing every minute. Fix: Do not add failure condition if the failure message stays the same. Result: The same error condition appears only once with timestamp updated.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:37:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2369 0 None closed Bug 1904133: kubeletcfg: fix repeated status error msg 2021-02-01 07:32:47 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:37:55 UTC

Description Artyom 2020-12-03 16:06:15 UTC
Description of problem:
When the KubeletConfig resource fails to find the MCP by label selector it will add the failure condition under the resource, and it will continue to add a new one each minute. I think the condition logic should be refactored and similar to other resources have only a limited number of conditions, in the case of the KubeletConfig conditions list should always have the length equals to 2.

Version-Release number of selected component (if applicable):
Client Version: 4.6.0-0.nightly-2020-07-25-091217
Server Version: 4.7.0-0.ci-2020-12-02-092627
Kubernetes Version: v1.19.2-1007+ad738ba548b6d6-dirty


How reproducible:
Always

Steps to Reproduce:
1. Create the KubeletConfig resource with the MCP label selector for the MCP that does not exist
2. Wait some time and check the KubeletConfig resource
3.

Actual results:
The KubeletConfig resource has a lot of failure messages
...
Status:
  Conditions:
    Last Transition Time:  2020-12-03T14:56:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T14:57:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T14:58:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T14:59:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:00:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:01:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:02:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:03:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:04:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:05:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:06:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:07:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:08:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:09:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:10:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:11:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:12:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:13:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:14:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:15:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:16:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:17:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:18:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:19:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:20:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2020-12-03T15:21:34Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
...

Expected results:
I expect to have only two conditions one for failure and one for success, once we have some failure under the controller during the KubeletConfig configuration, we will set the Failure condition to True and Success one to False, otherwise when the controller succeeded we will set the Failure condition to False and Succeess one to True

Additional info:

Comment 1 Yu Qi Zhang 2020-12-07 22:09:33 UTC
This is controlled by the kubeletconfigcontroller, which today adds the error to the queue directly I believe. Other sub-controllers should be updating the error state instead. Passing off to the node team to take a look.

Comment 3 MinLi 2021-01-13 02:58:53 UTC
there is a similar bug in 4.6 which has fixed:  https://bugzilla.redhat.com/show_bug.cgi?id=1849538

Comment 4 Qi Wang 2021-01-13 20:00:02 UTC
The results:
```
Status:
  Conditions:
    Last Transition Time:  2021-01-13T19:48:16Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
Events:                    <none>
```

Client Version: v4.2.0-alpha.0-930-geeb9d6d
Server Version: 4.7.0-0.ci-2021-01-13-131322
Kubernetes Version: v1.20.0-983+31b56ef6b1cf67-dirty

The fix https://github.com/openshift/machine-config-operator/pull/1859 for 4.6 should have fixed this bug.

Comment 5 Qi Wang 2021-01-13 20:11:22 UTC
Close this one since it's a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1849538.

*** This bug has been marked as a duplicate of bug 1849538 ***

Comment 6 Artyom 2021-01-25 10:36:42 UTC
I still can see the bug on

oc version 
Client Version: 4.6.0-0.nightly-2020-07-25-091217
Server Version: 4.7.0-fc.3
Kubernetes Version: v1.20.0+d9c52cc

Status:
  Conditions:
    Last Transition Time:  2021-01-25T10:32:33Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2021-01-25T10:33:33Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2021-01-25T10:34:33Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2021-01-25T10:35:33Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
Events:                    <none>

Comment 7 Qi Wang 2021-01-25 15:41:35 UTC
I will check whether the fix is included in that MCO version.

Comment 8 Qi Wang 2021-01-25 17:29:27 UTC
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods
spec:
  logLevel: 5
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: small-pods
  kubeletConfig:
    maxPods: 100

Apply the above configuration 

$ oc create -f kubeconfig.yaml
kubeletconfig.machineconfiguration.openshift.io/set-max-pods created
$ oc describe kubeletconfig.machineconfiguration.openshift.io/set-
Status:
  Conditions:
    Last Transition Time:  2021-01-25T17:05:04Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
Events:                    <none>

$ oc version
Client Version: v4.2.0-alpha.0-930-geeb9d6d
Server Version: 4.7.0-fc.3
Kubernetes Version: v1.20.0+d9c52cc


@alukiano I failed to reproduce this bug. Can I have more details about the steps to reproduce?

Comment 9 Artyom 2021-01-26 16:03:18 UTC
How long did you wait, a new condition appears every minute. I applied your config and got the same problem
Conditions:
    Last Transition Time:  2021-01-26T15:41:09Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
    Last Transition Time:  2021-01-26T15:42:09Z
    Message:               Error: could not find any MachineConfigPool set for KubeletConfig
    Status:                False
    Type:                  Failure
....

Comment 10 Qi Wang 2021-01-27 13:24:08 UTC
Thanks, I saw the conditions after several minutes

(In reply to Artyom from comment #9)
> How long did you wait, a new condition appears every minute. I applied your
> config and got the same problem
> Conditions:
>     Last Transition Time:  2021-01-26T15:41:09Z
>     Message:               Error: could not find any MachineConfigPool set
> for KubeletConfig
>     Status:                False
>     Type:                  Failure
>     Last Transition Time:  2021-01-26T15:42:09Z
>     Message:               Error: could not find any MachineConfigPool set
> for KubeletConfig
>     Status:                False
>     Type:                  Failure
> ....

Comment 12 MinLi 2021-02-01 09:56:24 UTC
verified on version : 4.7.0-0.nightly-2021-01-31-031653
the failure condition updates every minute, and only show one item in kubeletconfig description

status:
  conditions:
  - lastTransitionTime: "2021-02-01T09:52:00Z"
    message: 'Error: could not find any MachineConfigPool set for KubeletConfig'
    status: "False"
    type: Failure
...
status:
  conditions:
  - lastTransitionTime: "2021-02-01T09:53:10Z"
    message: 'Error: could not find any MachineConfigPool set for KubeletConfig'
    status: "False"
    type: Failure

Comment 15 errata-xmlrpc 2021-02-24 15:37:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.