Description of problem: Deletion of a MachineConfig with multiple kernelArguments as a single string causes a Degraded MCP. Version-Release number of selected component (if applicable): All 4.4 and 4.5 OCP releases. How reproducible: Always Steps to Reproduce: 1. oc create -f - <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker-rt name: 50-worker-rt spec: kernelArguments: - a=1 b=2 EOF 2. oc create -f - <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: worker-rt labels: worker-rt: "" spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-rt]} nodeSelector: matchLabels: node-role.kubernetes.io/worker-rt: "" paused: false EOF 3. oc label node <node> node-role.kubernetes.io/worker-rt= 4. Watch the node <node> reboot and become Ready. 5. oc delete mc/50-worker-rt 6. Watch mcp/worker-rt become Degraded. Actual results: MCP degraded. Expected results: MCP not become degraded as with: oc create -f - <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker-rt name: 50-worker-rt spec: kernelArguments: - a=1 - b=2 EOF Additional info: Workaround: split multiple kernel parameters into an array as above.
What I believe is happening here is that we are creating an invalid rpm-ostree command: e.g. `rpm-ostree kargs --append=a=1 b=2` instead of `rpm-ostree kargs --append=a=1 --append=b=2`. That causes it to go degraded when it tries to apply them. We could attempt to parse the kargs better, such as by using a simple string split on whitespace. That may cause other issues with fancier kargs that use quotes or something.
> We could attempt to parse the kargs better I think clarifying your expectation on what you want users to do with that field is a better approach than trying to parse whatever users try to do. You could improve documentation by adding corresponding comments on (not only this) field in the Go code and the CRD, and extending the example on https://github.com/openshift/machine-config-operator/blob/master/docs/MachineConfiguration.md#kernelarguments with multiple kargs :)
(In reply to Marc Sluiter from comment #2) > > We could attempt to parse the kargs better > > I think clarifying your expectation on what you want users to do with that > field is a better approach than trying to parse whatever users try to do. > You could improve documentation by adding corresponding comments on (not > only this) field in the Go code and the CRD, and extending the example on > https://github.com/openshift/machine-config-operator/blob/master/docs/ > MachineConfiguration.md#kernelarguments with multiple kargs :) I agree with the "better documentation" part, but I think the code should also be more robust to handle this, otherwise this is probably not the last BZ they see about this. The tuned daemon, for example, is preparing kernel boot parameters in /etc/tuned/bootcmdline as a single string with multiple kernel parameters. Something will have to do the parsing. Looking at kernel code the parsing shouldn't hopefully be all that difficult. As a workaround, I'm looking at writing some golang parameter parsing code for the node tuning operator.
Verified on 4.6.0-0.nightly-2020-07-07-083718. Deletion of kargs in a single line does not cause degraded MCP. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-07-07-083718 True False 3h16m Cluster version is 4.6.0-0.nightly-2020-07-07-083718 $ oc create -f - <<EOF > apiVersion: machineconfiguration.openshift.io/v1 > kind: MachineConfig > metadata: > labels: > machineconfiguration.openshift.io/role: worker > name: 50-worker-custom > spec: > kernelArguments: > - a=1 b=2 > EOF machineconfig.machineconfiguration.openshift.io/50-worker-custom created $ $ $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 00-worker 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 01-master-container-runtime 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 01-master-kubelet 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 01-worker-container-runtime 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 01-worker-kubelet 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 50-worker-custom 10s 99-master-generated-registries 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 99-master-ssh 2.2.0 3h16m 99-worker-generated-registries 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m 99-worker-ssh 2.2.0 3h16m rendered-master-c1aacf3a48a81966a24864e984ea37bc 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m rendered-worker-c01ae91040c73f0a3bf642a626d6a237 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 5s rendered-worker-f2257ee53059916ea51a9652971094c5 34d03f09dc395269de06ab290a0422a8274b8bd9 2.2.0 3h10m $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-f2257ee53059916ea51a9652971094c5 False True False 3 0 0 0 3h11m $ watch oc get node $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-c01ae91040c73f0a3bf642a626d6a237 True False False 3 3 3 0 3h26m $ oc delete mc/50-worker-custom machineconfig.machineconfiguration.openshift.io "50-worker-custom" deleted $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-c01ae91040c73f0a3bf642a626d6a237 False True False 3 0 0 0 3h27m $ watch oc get node $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-f2257ee53059916ea51a9652971094c5 True False False 3 3 3 0 3h38m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196