Bug 1812649 - Deletion of a MachineConfig with multiple kernelArguments as a single string causes a Degraded MCP
Summary: Deletion of a MachineConfig with multiple kernelArguments as a single string ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.6.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-11 18:37 UTC by Jiří Mencák
Modified: 2020-10-27 15:57 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Previously, kernel arguments specified in MachineConfigs needed to be split out into individual argument strings in the array. These kargs were not validated before being concatenated into an rpm-ostree command. Consequence: Multiple kernel arguments concatenated via a space, as allowed in a single line in the kernel command line, would create an invalid rpm-ostree command. Fix: The MachineConfigController parsed each kernelArgument item in a similar manner as the kernel. Result: Users can supply multiple arguments concatenated via a space without errors.
Clone Of:
Environment:
Last Closed: 2020-10-27 15:57:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1563 0 None closed Bug 1812649: split kernel arguments 2021-02-05 00:54:49 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:57:06 UTC

Description Jiří Mencák 2020-03-11 18:37:21 UTC
Description of problem:
Deletion of a MachineConfig with multiple kernelArguments as a single string causes a Degraded MCP.

Version-Release number of selected component (if applicable):
All 4.4 and 4.5 OCP releases.

How reproducible:
Always

Steps to Reproduce:
1.
oc create -f - <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-rt
  name: 50-worker-rt
spec:
  kernelArguments:
  - a=1 b=2
EOF

2.

oc create -f - <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-rt
  labels:
    worker-rt: ""
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-rt: ""
  paused: false
EOF

3. 
oc label node <node> node-role.kubernetes.io/worker-rt=

4. Watch the node <node> reboot and become Ready.
5. oc delete mc/50-worker-rt
6. Watch mcp/worker-rt become Degraded.

Actual results:
MCP degraded.

Expected results:
MCP not become degraded as with:

oc create -f - <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker-rt
  name: 50-worker-rt
spec:
  kernelArguments:
  - a=1
  - b=2
EOF

Additional info:
Workaround: split multiple kernel parameters into an array as above.

Comment 1 Erica von Buelow 2020-03-11 19:29:06 UTC
What I believe is happening here is that we are creating an invalid rpm-ostree command: e.g. `rpm-ostree kargs --append=a=1 b=2` instead of `rpm-ostree kargs --append=a=1 --append=b=2`. That causes it to go degraded when it tries to apply them.
We could attempt to parse the kargs better, such as by using a simple string split on whitespace. That may cause other issues with fancier kargs that use quotes or something.

Comment 2 Marc Sluiter 2020-03-12 08:07:32 UTC
> We could attempt to parse the kargs better

I think clarifying your expectation on what you want users to do with that field is a better approach than trying to parse whatever users try to do.
You could improve documentation by adding corresponding comments on (not only this) field in the Go code and the CRD, and extending the example on https://github.com/openshift/machine-config-operator/blob/master/docs/MachineConfiguration.md#kernelarguments with multiple kargs :)

Comment 3 Jiří Mencák 2020-03-12 08:32:58 UTC
(In reply to Marc Sluiter from comment #2)
> > We could attempt to parse the kargs better
> 
> I think clarifying your expectation on what you want users to do with that
> field is a better approach than trying to parse whatever users try to do.
> You could improve documentation by adding corresponding comments on (not
> only this) field in the Go code and the CRD, and extending the example on
> https://github.com/openshift/machine-config-operator/blob/master/docs/
> MachineConfiguration.md#kernelarguments with multiple kargs :)

I agree with the "better documentation" part, but I think the code should
also be more robust to handle this, otherwise this is probably not the last
BZ they see about this.  The tuned daemon, for example, is preparing kernel
boot parameters in /etc/tuned/bootcmdline as a single string with multiple
kernel parameters.  Something will have to do the parsing.  Looking at kernel
code the parsing shouldn't hopefully be all that difficult.  As a workaround,
I'm looking at writing some golang parameter parsing code for the node tuning
operator.

Comment 10 Michael Nguyen 2020-07-07 18:11:57 UTC
Verified on 4.6.0-0.nightly-2020-07-07-083718.  Deletion of kargs in a single line does not cause degraded MCP.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-07-07-083718   True        False         3h16m   Cluster version is 4.6.0-0.nightly-2020-07-07-083718
$ oc create -f - <<EOF
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>   name: 50-worker-custom
> spec:
>   kernelArguments:
>   - a=1 b=2
> EOF
machineconfig.machineconfiguration.openshift.io/50-worker-custom created
$ 
$ 
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
00-worker                                          34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
01-master-container-runtime                        34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
01-master-kubelet                                  34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
01-worker-container-runtime                        34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
01-worker-kubelet                                  34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
50-worker-custom                                                                                                10s
99-master-generated-registries                     34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
99-master-ssh                                                                                 2.2.0             3h16m
99-worker-generated-registries                     34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
99-worker-ssh                                                                                 2.2.0             3h16m
rendered-master-c1aacf3a48a81966a24864e984ea37bc   34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
rendered-worker-c01ae91040c73f0a3bf642a626d6a237   34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             5s
rendered-worker-f2257ee53059916ea51a9652971094c5   34d03f09dc395269de06ab290a0422a8274b8bd9   2.2.0             3h10m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-f2257ee53059916ea51a9652971094c5   False     True       False      3              0                   0                     0                      3h11m
$ watch oc get node
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-c01ae91040c73f0a3bf642a626d6a237   True      False      False      3              3                   3                     0                      3h26m
$ oc delete mc/50-worker-custom
machineconfig.machineconfiguration.openshift.io "50-worker-custom" deleted
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-c01ae91040c73f0a3bf642a626d6a237   False     True       False      3              0                   0                     0                      3h27m
$ watch oc get node
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-f2257ee53059916ea51a9652971094c5   True      False      False      3              3                   3                     0                      3h38m

Comment 13 errata-xmlrpc 2020-10-27 15:57:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.