Bug 1866546 - After updating kernel argument it is placed at the end of the cmdline (arguments order is broken)
Summary: After updating kernel argument it is placed at the end of the cmdline (argume...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Sinny Kumari
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-05 20:47 UTC by Denys Shchedrivyi
Modified: 2020-10-27 16:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:25:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2014 0 None closed Bug 1866546: daemon: simplify kernel arguments processing 2021-01-29 08:23:24 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:25:24 UTC

Description Denys Shchedrivyi 2020-08-05 20:47:50 UTC
Description of problem:

   Currently, MCO is updating kernel arguments by removing and adding them again. It does not work in scenario when we need to keep the order of arguments. 
   For example, I have a node with two sizes of hugepages, their parameters should goes one by one:

# cat /proc/cmdline 
...hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=20
   
   
   If I want to update the count of 1G pages (for example set 4) - MCO will remove it and add it again at the end of the line:

# Aug 05 19:34:03 worker-0 root[8641]: machine-config-daemon[3241]: Running rpm-ostree [kargs --delete=hugepages=1 --append=hugepages=4]

# cat /proc/cmdline 
...hugepagesz=1G hugepagesz=2M hugepages=20 hugepages=4 


   As result - I will not have any 1G hugepages:

# cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 
0



How reproducible:
100%

Steps to Reproduce:
1. create machine config that adds 1G and 2M hugepages to the node
2. modify the number of 1G
3. wait when mcp complete updating nodes



Actual results:
  MCO does not preserve kernel arguments order after updating some of parameters

Expected results:
  Order is preserved

Comment 2 Colin Walters 2020-08-05 21:14:25 UTC
We had bugs at most layers around this, see
https://github.com/ostreedev/ostree/commit/aadc4db012ed32455416864c6841e68a8b8ebf58
and
https://github.com/coreos/rpm-ostree/commit/f295f543064f1a0b5833fefccd6bb203b3527623
which links to
https://github.com/openshift/machine-config-operator/issues/1265

It might be we never updated to rely on the new rpm-ostree code.

Comment 3 Antonio Murdaca 2020-08-06 11:16:39 UTC
Is this fixed by https://github.com/openshift/machine-config-operator/pull/1899 and a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1853890 which will go into 4.6?

Comment 4 Antonio Murdaca 2020-08-06 11:18:35 UTC
(In reply to Antonio Murdaca from comment #3)
> Is this fixed by
> https://github.com/openshift/machine-config-operator/pull/1899 and a dup of
> https://bugzilla.redhat.com/show_bug.cgi?id=1853890 which will go into 4.6?

replying to myself - this is more about updating that list, we'll work on it

Comment 5 Sinny Kumari 2020-08-07 09:29:19 UTC
right, in MCO we need to treat key and value separately and if key already exist update its value with rpm-ostree --replace instead of doing --delete and --append

Comment 8 Sinny Kumari 2020-08-18 10:10:46 UTC
After taking closer look at this problem, looks to me that updating kernelArg is going to be tricky and may require special case handling.

Re-stating the problem here:
- Applied kargs: `hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=20`
- Now, we want to modify hugepages=1 to hugepages=4.
- Desired result is `hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=20` but we are getting `hugepagesz=1G hugepagesz=2M hugepages=20 hugepages=4`

Updating a single occurrence key-value kargs should be doable in MCO without any issue but in this situation kernel parameters supports specifying multiple times hugepagesz= when interleaved with hugepages= . With same key present multiple times, we would first need a way to know which key-value pair user wants to replace. Otherwise, we may end up updating at wrong place.

This also raises question that do we have more cases where having multiple kargs with same key makes sense?

Looking roughly at https://www.kernel.org/doc/html/v5.7/admin-guide/kernel-parameters.html, we have kernel param like virtio_mmio.device= which can be specified multiple times as well. But in this situation, I believe user may want to append or update virtio_mmio.device= karg.

Today, MCO has no way to know what user want- whether update or delete/append a karg. To solve this problem, one idea come to my mind is allow users to specify replace keyword in kernelArguments listr 'replace replacekarg replacedkarg'. For example:

kerneArguments:
 - replace "hugepages=1" "hugepages=4"

When MCO finds a karg starting with replace, it performs replace operation instead of append.

Thoughts?

Comment 9 Antonio Murdaca 2020-08-18 11:18:45 UTC
Uhm, the kargs is a slice and maintains order, in case of a replace of any key, shouldn't we just reapply that list and leave order to the user? i.e. I have an MC with `hugepagesz=1G hugepages=1 hugepagesz=2M hugepages=20`, replacing hugepages=1 would require an edit of that MC and put a 4 and the MCO code should apply the ordered list. The issue could be if I create a "new" mc with just `hugepages=4` and that could cause issues (or just append?).
I think if this is an issue of `edit` the MC and put hugepages=4, then we need to tweak the MCO to just re-apply the list in order

Comment 10 Sinny Kumari 2020-08-19 15:20:53 UTC
Had a slack conversation with Antonio, Jonathan and Colin to figure out best way to handle updating kernelArguments.

Summary from the conversation:

There are certain usecase where we may end up updating wrong kernelArguments. 
For example: suppose user have applied kargs `hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=4` using MachineConfig. Now, user edits the MachineConfig to apply `hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=6`. In this situation when we ask rpm-ostree to replace to hugepages=6 for hugepagesz=2M, we will say `rpm-ostree kargs --replace=hugepages=4=6` but rpm-ostree will replace the first match found for key hugepages and result we will get is `hugepagesz=1G hugepages=6 hugepagesz=2M hugepages=4` which is not what user wanted.

To solve it cleanly, we agreed on simple solution. 
when we have kargs to apply, first delete all the applied kargs from old rendered MachineConfig and append kargs available in new rendered MachineCOnfig. We can perform both operation in one rpm-ostree transaction as rpm-ostree honors the order which means delete followed by append.

MCO only keep tracks of kerneArguments applied by user through MachineConfig and it doesn't touches default shipped KernelArguments with base RHCOS. So, deletion and append will be performed only for those kargs which is what we want.

Comment 13 Micah Abbott 2020-09-19 19:28:56 UTC
Verified with 4.6.0-0.nightly-2020-09-19-060512

```
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-19-060512   True        False         68m     Cluster version is 4.6.0-0.nightly-2020-09-19-060512

$ oc get nodes
NAME                                       STATUS                     ROLES    AGE   VERSION
ci-ln-j3tbpx2-f76d1-lrx8m-master-0         Ready                      master   98m   v1.19.0+7f9e863
ci-ln-j3tbpx2-f76d1-lrx8m-master-1         Ready                      master   98m   v1.19.0+7f9e863
ci-ln-j3tbpx2-f76d1-lrx8m-master-2         Ready                      master   98m   v1.19.0+7f9e863
ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7   Ready                      worker   88m   v1.19.0+7f9e863
ci-ln-j3tbpx2-f76d1-lrx8m-worker-c-pvg2x   Ready                      worker   87m   v1.19.0+7f9e863
ci-ln-j3tbpx2-f76d1-lrx8m-worker-d-zs8jq   Ready                      worker   87m   v1.19.0+7f9e863

$ cat machineConfigs/bz1866546-initial.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 70-multi-kargs
spec:
    kernelArguments:
      - first=uno
      - second=dos
      - third=tres
      - fourth=cuatro

$ oc apply -f bz1866546-initial.yaml
machineconfig.machineconfiguration.openshift.io/70-multi-kargs created

$ oc get mc                       
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
00-worker                                          c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
01-master-container-runtime                        c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
01-master-kubelet                                  c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
01-worker-container-runtime                        c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m                                                                                                                                                                                                           
01-worker-kubelet                                  c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
70-multi-kargs                                                                                                  8m55s                                          
99-master-generated-registries                     c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
99-master-ssh                                                                                 3.1.0             100m
99-worker-generated-registries                     c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
99-worker-ssh                                                                                 3.1.0             100m
rendered-master-9d39db7fc2ec3a03099836ae174057df   c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m
rendered-worker-e6c79d53ce9a19fa5793a06663af7c76   c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             8m50s
rendered-worker-ff560ececef24a9e8da6f01097187105   c08c048584ef0bf18ab2dd88fdddd93279e1c6a1   3.1.0             94m

$ oc debug node/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7 -- chroot /host cat /proc/cmdline
Starting pod/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7-debug ...
To use host binaries, run `chroot /host`                                  
BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23ff86329a18a318e/vmlinuz-4.18.0-193.23.1.el8_2.x86_64 rhcos.root=crypt_rootfs random.trust_cpu=on console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.1/rhcos/882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23
ff86329a18a318e/0 ignition.platform.id=gcp first=uno second=dos third=tres fourth=cuatro
                                                                               
Removing debug pod ...          

$ oc edit mc/70-multi-kargs
machineconfig.machineconfiguration.openshift.io/70-multi-kargs edited

$ oc get mc/70-multi-kargs -o json | jq .spec
{
  "kernelArguments": [
    "first=uno",
    "second=deux",
    "third=tres",
    "fourth=quatre"
  ]
}

$ oc debug node/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7 -- chroot /host cat /proc/cmdline
Starting pod/ci-ln-j3tbpx2-f76d1-lrx8m-worker-b-25cc7-debug ...
To use host binaries, run `chroot /host`
BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23ff86329a18a318e/vmlinuz-4.18.0-193.23.1.el8_2.x86_64 rhcos.root=crypt_rootfs random.trust_cpu=on console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ostree=/ostree/boot.0/rhcos/882cf41eb3f328e33aa6801db4912ca12418ef540dea7da23ff86329a18a318e/0 ignition.platform.id=gcp first=uno second=deux third=tres fourth=quatre

Removing debug pod ...
```

Comment 15 errata-xmlrpc 2020-10-27 16:25:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.