Description of problem: Currently, MCO is updating kernel arguments by removing and adding them again. It does not work in scenario when we need to keep the order of arguments. For example, I have a node with two sizes of hugepages, their parameters should goes one by one: # cat /proc/cmdline ...hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=20 If I want to update the count of 1G pages (for example set 4) - MCO will remove it and add it again at the end of the line: # Aug 05 19:34:03 worker-0 root[8641]: machine-config-daemon[3241]: Running rpm-ostree [kargs --delete=hugepages=1 --append=hugepages=4] # cat /proc/cmdline ...hugepagesz=1G hugepagesz=2M hugepages=20 hugepages=4 As result - I will not have any 1G hugepages: # cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 0 How reproducible: 100% Steps to Reproduce: 1. create machine config that adds 1G and 2M hugepages to the node 2. modify the number of 1G 3. wait when mcp complete updating nodes Actual results: MCO does not preserve kernel arguments order after updating some of parameters Expected results: Order is preserved
We had bugs at most layers around this, see https://github.com/ostreedev/ostree/commit/aadc4db012ed32455416864c6841e68a8b8ebf58 and https://github.com/coreos/rpm-ostree/commit/f295f543064f1a0b5833fefccd6bb203b3527623 which links to https://github.com/openshift/machine-config-operator/issues/1265 It might be we never updated to rely on the new rpm-ostree code.
Is this fixed by https://github.com/openshift/machine-config-operator/pull/1899 and a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1853890 which will go into 4.6?
(In reply to Antonio Murdaca from comment #3) > Is this fixed by > https://github.com/openshift/machine-config-operator/pull/1899 and a dup of > https://bugzilla.redhat.com/show_bug.cgi?id=1853890 which will go into 4.6? replying to myself - this is more about updating that list, we'll work on it
right, in MCO we need to treat key and value separately and if key already exist update its value with rpm-ostree --replace instead of doing --delete and --append
After taking closer look at this problem, looks to me that updating kernelArg is going to be tricky and may require special case handling. Re-stating the problem here: - Applied kargs: `hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=20` - Now, we want to modify hugepages=1 to hugepages=4. - Desired result is `hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=20` but we are getting `hugepagesz=1G hugepagesz=2M hugepages=20 hugepages=4` Updating a single occurrence key-value kargs should be doable in MCO without any issue but in this situation kernel parameters supports specifying multiple times hugepagesz= when interleaved with hugepages= . With same key present multiple times, we would first need a way to know which key-value pair user wants to replace. Otherwise, we may end up updating at wrong place. This also raises question that do we have more cases where having multiple kargs with same key makes sense? Looking roughly at https://www.kernel.org/doc/html/v5.7/admin-guide/kernel-parameters.html, we have kernel param like virtio_mmio.device= which can be specified multiple times as well. But in this situation, I believe user may want to append or update virtio_mmio.device= karg. Today, MCO has no way to know what user want- whether update or delete/append a karg. To solve this problem, one idea come to my mind is allow users to specify replace keyword in kernelArguments listr 'replace replacekarg replacedkarg'. For example: kerneArguments: - replace "hugepages=1" "hugepages=4" When MCO finds a karg starting with replace, it performs replace operation instead of append. Thoughts?
Uhm, the kargs is a slice and maintains order, in case of a replace of any key, shouldn't we just reapply that list and leave order to the user? i.e. I have an MC with `hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=20`, replacing hugepages=1 would require an edit of that MC and put a 4 and the MCO code should apply the ordered list. The issue could be if I create a "new" mc with just `hugepages=4` and that could cause issues (or just append?). I think if this is an issue of `edit` the MC and put hugepages=4, then we need to tweak the MCO to just re-apply the list in order
Had a slack conversation with Antonio, Jonathan and Colin to figure out best way to handle updating kernelArguments. Summary from the conversation: There are certain usecase where we may end up updating wrong kernelArguments. For example: suppose user have applied kargs `hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=4` using MachineConfig. Now, user edits the MachineConfig to apply `hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=6`. In this situation when we ask rpm-ostree to replace to hugepages=6 for hugepagesz=2M, we will say `rpm-ostree kargs --replace=hugepages=4=6` but rpm-ostree will replace the first match found for key hugepages and result we will get is `hugepagesz=1G hugepages=6 hugepagesz=2M hugepages=4` which is not what user wanted. To solve it cleanly, we agreed on simple solution. when we have kargs to apply, first delete all the applied kargs from old rendered MachineConfig and append kargs available in new rendered MachineCOnfig. We can perform both operation in one rpm-ostree transaction as rpm-ostree honors the order which means delete followed by append. MCO only keep tracks of kerneArguments applied by user through MachineConfig and it doesn't touches default shipped KernelArguments with base RHCOS. So, deletion and append will be performed only for those kargs which is what we want.
Verified with 4.6.0-0.nightly-2020-09-19-060512 ``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-19-060512 True False 68m Cluster version is 4.6.0-0.nightly-2020-09-19-060512 $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-j3tbpx2-f76d1-lrx8m-master-0 Ready master 98m v1.19.0+7f9e863 ci-ln-j3tbpx2-f76d1-lrx8m-master-1 Ready master 98m v1.19.0+7f9e863 ci-ln-j3tbpx2-f76d1-lrx8m-master-2 Ready master 98m v1.19.0+7f9e863 ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7 Ready worker 88m v1.19.0+7f9e863 ci-ln-j3tbpx2-f76d1-lrx8m-worker-c-pvg2x Ready worker 87m v1.19.0+7f9e863 ci-ln-j3tbpx2-f76d1-lrx8m-worker-d-zs8jq Ready worker 87m v1.19.0+7f9e863 $ cat machineConfigs/bz1866546-initial.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 70-multi-kargs spec: kernelArguments: - first=uno - second=dos - third=tres - fourth=cuatro $ oc apply -f bz1866546-initial.yaml machineconfig.machineconfiguration.openshift.io/70-multi-kargs created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 00-worker c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 01-master-container-runtime c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 01-master-kubelet c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 01-worker-container-runtime c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 01-worker-kubelet c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 70-multi-kargs 8m55s 99-master-generated-registries c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 99-master-ssh 3.1.0 100m 99-worker-generated-registries c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m 99-worker-ssh 3.1.0 100m rendered-master-9d39db7fc2ec3a03099836ae174057df c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m rendered-worker-e6c79d53ce9a19fa5793a06663af7c76 c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 8m50s rendered-worker-ff560ececef24a9e8da6f01097187105 c08c048584ef0bf18ab2dd88fdddd93279e1c6a1 3.1.0 94m $ oc debug node/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7 -- chroot /host cat /proc/cmdline Starting pod/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7-debug ... To use host binaries, run `chroot /host` BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23ff86329a18a318e/vmlinuz-4.18.0-193.23.1.el8_2.x86_64 rhcos.root=crypt_rootfs random.trust_cpu=on console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.1/rhcos/882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23 ff86329a18a318e/0 ignition.platform.id=gcp first=uno second=dos third=tres fourth=cuatro Removing debug pod ... $ oc edit mc/70-multi-kargs machineconfig.machineconfiguration.openshift.io/70-multi-kargs edited $ oc get mc/70-multi-kargs -o json | jq .spec { "kernelArguments": [ "first=uno", "second=deux", "third=tres", "fourth=quatre" ] } $ oc debug node/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7 -- chroot /host cat /proc/cmdline Starting pod/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7-debug ... To use host binaries, run `chroot /host` BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23ff86329a18a318e/vmlinuz-4.18.0-193.23.1.el8_2.x86_64 rhcos.root=crypt_rootfs random.trust_cpu=on console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.0/rhcos/882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23ff86329a18a318e/0 ignition.platform.id=gcp first=uno second=deux third=tres fourth=quatre Removing debug pod ... ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196