Bug 1766346 - kargs not applied to new nodes created by machine-api
Summary: kargs not applied to new nodes created by machine-api
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Sinny Kumari
QA Contact: Antonio Murdaca
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-28 19:49 UTC by Erica von Buelow
Modified: 2020-07-13 17:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:12:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1748 0 None closed Bug 1766346: docs: Workaround to properly apply kargs on new nodes created by machine-api 2021-01-29 06:15:00 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:12:27 UTC

Description Erica von Buelow 2019-10-28 19:49:29 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. launch a 4.2 cluster (I used the openshift-install binary from https://openshift-release-artifacts.svc.ci.openshift.org/4.2.0-0.ci-2019-10-28-020646/ )
2. create a custom MachineConfig that sets a kernel arg targeting worker nodes
> cat << EOF > custom-worker-machineconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: custom-worker-00
spec:
  kernelArguments: ['evb']
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    networkd: {}
    passwd: {}
    storage:
      files:
      - contents:
          source: data:,-----BEGIN--CUSTOM-WORKER-00--END-----%0A
          verification: {}
        filesystem: root
        mode: 420
        path: /etc/kubernetes/tmp.txt
    systemd: {}
    osImageURL: ""
EOF
> oc apply -f custom-worker-machineconfig.yaml

3. watch that the change gets rolled out:
> oc get mc
> # look for latest rendered-worker-<hash>
> oc get mcp
> # wait for all machines to be updated at latest rendered-worker-<hash> MC

4. confirm that the kernel arg got applied
> oc get nodes
> oc debug node/<worker node here>
> sh-4.2# chroot /host
> sh-4.4# rpm-ostree kargs
> sh-4.4# # check that `evb` shows in list
> sh-4-4# cat /etc/kubernetes/tmp.txt
> sh-4-4# # see that custom file got laid down
> sh-4.4# exit
> sh-4.2# exit

4. scale up a worker machineset
> oc get machinesets -n openshift-machine-api
> # select one of the worker machinesets, for consistency choose the one for the node you checked in previous step
> oc scale --replicas=2 machineset/<worker machine set> -n openshift-machine-api

5. wait for node to be created and showing latest MC
> oc get nodes
> oc get node <new node name>
# look at annotation for machine-config version to be at latest

6. see if the kernel arg got applied
> oc get nodes
> oc debug node/<new worker node here>
> sh-4.2# chroot /host
> sh-4.4# rpm-ostree kargs
> sh-4.4# # see that `evb` does not show up in list
> sh-4-4# cat /etc/kubernetes/tmp.txt
> sh-4-4# # see that custom file did get laid down
> sh-4.4# exit
> sh-4.2# exit

Actual results:
- k-arg not set on new node

Expected results:
- k-arg applied to new nodes

Additional info:

- Creating a new machineconfig with the same karg does not get it applied since the MCD compares it with the current MC and thinks it's already applied
- Creating a new machineconfig with a different karg does work to get that new karg applied (but still not the old one)

Comment 1 Kirsten Garrison 2019-11-06 17:11:45 UTC
@erica was this covered by the epic? not sure if it needs to be updated. lmk!

Comment 2 Erica von Buelow 2019-11-07 02:20:29 UTC
(In reply to Kirsten Garrison from comment #1)
The epic addressed it for new 4.3+ installed clusters that have day-1 kargs support now. For clusters with installed in 4.1 or 4.2 (even those updated to 4.3 after install) that need to use day-2 kargs configuration even for new nodes, this is an issue.

Comment 6 Sinny Kumari 2020-03-11 18:15:10 UTC
My observation while looking at this bug in detail:

Affected OCP release version: Only 4.2 based cluster

The way Karg support works:
- In OCP 4.2, we added support in MCO to update KernelArguments(kargs) as Day 2 operation
- In 4.2, during the firstboot service machine-config-daemon-host.service calls `machine-config-daemon pivot` from the machine-config-daemon binary available on RHCOS host. pivot calls rpm-ostree to rebase host with latest machine-os-content and reboots the host.  It is not aware of processing kargs. While, in 4.3 we also have another service machine-config-daemon-firstboot.service which runs `machine-config-daemon firstboot-complete-machineconfig` that processes kernelArgs during firstboot. As of now since we can't update bootimage, we can't add capability to process MachineConfig during firstboot.

Why this bug is happening?
Consider we have an existing OCP 4.2 cluster and we applied a MachineConfig to add kargs, for example: the one mentioned in comment0 (custom-worker-00) . MCO geneartes a new rendered config, creates file /etc/kubernetes/tmp.txt and adds karg 'evb' on all the worker nodes. Now we add a new worker node (w1) into cluster using machine-api, worker w1 gets created containing all files/units defined from newly rendered config and service machine-config-daemon-firstboot.service runs `machine-config-daemon pivot`, w1 rebases to latest machine-os-content and reboots. kargs available in rendered config gets ignored as MCO doesn't know how to read it during firstboot. Newly created w1 worker node joins the cluster.

What happens when we create another MachineConfig adding another karg?
Suppose we create a new MachineConfig for worker node with kernelArgument: 'foo'. MCO generates a new rendered config appending 'foo' in kernelArguments, all worker nodes will be appended with 'foo' in kargs. All worker nodes except w1 will have 'evb' and 'foo' kargs while w1 will only have 'foo'. This is because of the way MCO checks for what kargs should be applied (see https://github.com/openshift/machine-config-operator/blob/db561314c7afae1d77c16cfdb95f0f0ce6b8977d/pkg/daemon/update.go#L582) . We look for diff into kargs available in oldconfig and newconfig. Since, oldconfig already has 'evb', it doesn't know that some of the worker node like w1 doesn't have 'evb' karg already applied. The only difference it sees between oldconfig and newconfig for karg is 'foo'.

What happens when we try to remove MachineConfig custom-worker-00?
Let's remove custom-worker-00 MachineConfig to fix the problem. MCO will compare oldconfig and newconfig and will try to delete karg 'evb'. All the worker node except w1 will delete 'evb' from kargs list. For w1, when it will try to delete 'evb' using `rpm-ostree kargs --delet='evb'` it fails because has never been applied. This will lead w1 worker node to degrade.

Solution:
Solving this issue is tricky because we will need to know what all kargs is already been applied on the RHCOS node. Some of the kargs is already been set during install time which we don't know. We have an open issue in ostree to identify the base kargs https://github.com/ostreedev/ostree/issues/479 

Since this issue happens only on OCP 4.2 based cluster, perhaps we can document in the MCO doc with this known issue with workaround. There are two possible workaround to avoid this situation while adding a RHCOS node:
1. First delete all MachineConfig which applies any Kargs and then add node and then apply the deleted MachineConifgs
2. Once a machine is scaled-up, user checks the latest rendered config applied, fetches kargs list and then rsh into the created node and applies the kargs manually

Comment 7 Colin Walters 2020-03-11 18:24:16 UTC
https://github.com/openshift/enhancements/pull/201 is intended to fix this by ensuring that "aleph version 4.2"¹ clusters have their bootimages upgraded.

This would allow us to drop the complicated technical debt you're talking about that we carry in the MCO and ensure kargs just work the same as 4.3+.

Now... https://github.com/openshift/machine-config-operator/pull/926 was intending to address one of the huge problems that occurs if this *does* happen on an "aleph 4.2" cluster today.

You're going down the path of trying to make this work on 4.2...might be viable but the complexity is high.  It seems easier to ask people to upgrade to 4.3 in place, and manually do a bootimage update once that's done.  I can write up the manual steps in the enhancement if that'd help.

¹ By this I mean, clusters which were originally installed as 4.2.  Why "aleph"?  See https://github.com/coreos/coreos-assembler/pull/768

Comment 8 Sinny Kumari 2020-03-12 14:17:23 UTC
Thanks Colin for the input!

(In reply to Colin Walters from comment #7)
> https://github.com/openshift/enhancements/pull/201 is intended to fix this
> by ensuring that "aleph version 4.2"¹ clusters have their bootimages
> upgraded.

+1, Can't wait to see bootimage update being implemented.

> It seems easier to ask people to upgrade to 4.3 in place, and manually do a bootimage update once that's done.

Can we already do this manually today or it will be possible once enhancement https://github.com/openshift/enhancements/pull/201 gets implemented?

> I can write up the manual steps in the enhancement if that'd help.

Yes, that will be great.

Comment 10 Antonio Murdaca 2020-04-24 22:09:01 UTC
I'm thinking this BZ isn't something beneficial to keep it open forever - we do have a card https://issues.redhat.com/browse/GRPA-1739 which we can use to track this and can refer to that whenever this arises - I'm going to close this as the fix is something that will come as a feature (bootimage updates likely). Let me know if anybody feels otherwise but this BZ is being pushed at every release and it's somehow 6 months old. (Closing Deferred also as I think it's the best status for this)

Comment 11 Sinny Kumari 2020-05-21 10:16:54 UTC
Reopening this bug to document the fix by manually updating bootimage https://github.com/openshift/machine-config-operator/pull/1748

Comment 18 errata-xmlrpc 2020-07-13 17:12:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.