Bug 1944986

Summary: Clarify the ContainerRuntimeConfiguration cr description on the validation
Product: OpenShift Container Platform Reporter: Harshal Patil <harpatil>
Component: NodeAssignee: Harshal Patil <harpatil>
Node sub component: Kubelet QA Contact: MinLi <minmli>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: unspecified CC: aos-bugs, minmli
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:56:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harshal Patil 2021-03-31 07:12:06 UTC
In an attempt to fix, https://bugzilla.redhat.com/show_bug.cgi?id=1930636#c3, we discovered that it's pretty simple to do a basic validation on values of ContainerRuntimeConfiguration within MCO. 

As long as the value can be parsed as int64 the validation code within the controller responsible for ContainerRuntimeConfiguration gets invoked and we do some basic validation like boundary condition validation, sign validation to check negative numbers etc. 

However, if the given input is such that it cannot be parsed into int64 type at all, e.g. "9asadG" the execution flow fails even before it reaches validation code for ContainerRuntimeConfiguration. 

W0330 08:03:49.665463       1 reflector.go:436] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: watch of *v1.ContainerRuntimeConfig ended with: an error on the server ("unable to decode an event from the watch stream: unable to decode watch event: v1.ContainerRuntimeConfig.Spec: v1.ContainerRuntimeConfigSpec.MachineConfigPoolSelector: ContainerRuntimeConfig: v1.ContainerRuntimeConfiguration.OverlaySize: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|\":\"9asadG\"},\"machine|..., bigger context ...|:{\"containerRuntimeConfig\":{\"overlaySize\":\"9asadG\"},\"machineConfigPoolSelector\":{\"matchLabels\":{\"cus|...") has prevented the request from succeeding
E0330 08:03:50.810155       1 reflector.go:138] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.ContainerRuntimeConfig: failed to list *v1.ContainerRuntimeConfig: v1.ContainerRuntimeConfigList.Items: []v1.ContainerRuntimeConfig: v1.ContainerRuntimeConfig.Spec: v1.ContainerRuntimeConfigSpec.MachineConfigPoolSelector: ContainerRuntimeConfig: v1.ContainerRuntimeConfiguration.OverlaySize: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|":"9asadG"},"machine|..., bigger context ...|:{"containerRuntimeConfig":{"overlaySize":"9asadG"},"machineConfigPoolSelector":{"matchLabels":{"cus|...


As you can see, for an input that's not int64, the failure occurs in reflector.go in the go client of upstream k8s which is trying to read the submitted yaml file by the user for updating ContainerRuntimeConfiguration. In the past, we could handle the validation for such an input for KubeletConfig in MCO because it's defined as of type &runtime.RawExtension. Upstream go client will happily read whatever it's thrown at if it's &runtime.RawExtension and that's why we could make the execution flow reach the controller for KubeletConfig which could then do the validation. 

But some of the fields of ContainerRuntimeConfiguration, such as LogSizeMax or OverlaySize, are of type resource.Quantity. Which means at the time of reading the user input the upstream go client will try to make sure the input is indeed of type  resource.Quantity and this is where it's failing for input like "9asadG". 

This clearly falls outside the domain of MCO, and there is little we could do there to improve the situation from MCO's point of view. Hence I am going update the docs and CRD description for ContainerRuntimeConfiguration to alert the user that while we do our best to validate the input they will have to be more vigilant too and where to look for logs in case of the failure.

Comment 3 MinLi 2021-04-09 09:20:51 UTC
verified on version : 4.8.0-0.nightly-2021-04-09-000946

$ oc explain ContainerRuntimeConfig.spec
KIND:     ContainerRuntimeConfig
VERSION:  machineconfiguration.openshift.io/v1

RESOURCE: spec <Object>

DESCRIPTION:
     ContainerRuntimeConfigSpec defines the desired state of
     ContainerRuntimeConfig

FIELDS:
   containerRuntimeConfig	<Object> -required-
     ContainerRuntimeConfiguration defines the tuneables of the container
     runtime. It's important to note that, since the fields of the
     ContainerRuntimeConfiguration are directly read by the upstream kubernetes
     golang client, the validation of those values is handled directly by that
     golang client which is outside of the controller for
     ContainerRuntimeConfiguration. Please ensure the valid values are used for
     those fields as invalid values may render cluster nodes unusable.

   machineConfigPoolSelector	<Object>
     A label selector is a label query over a set of resources. The result of
     matchLabels and matchExpressions are ANDed. An empty label selector matches
     all objects. A null label selector matches no objects.

Comment 6 errata-xmlrpc 2021-07-27 22:56:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438