Description of problem: realtime kernel can't be installed because of a missing dependency (see journal of a worker): Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal machine-config-daemon[1676]: I0409 10:02:17.181955 1676 update.go:1241] Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kern> Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal root[1779]: machine-config-daemon[1676]: Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra"> Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal root[1779]: rged/kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm"] Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: client(id:cli dbus:1.13 unit:machine-config-daemon-firstboot.service uid:0) added; new total=1 Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: Initiated txn UpdateDeployment for client(id:cli dbus:1.13 unit:machine-config-daemon-firstboot.service uid:0): /org/projectatomic/rpmostree1/rhcos Apr 09 10:02:40 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: Preparing pkg txn; enabled repos: [] solvables: 0 Apr 09 10:02:40 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: Txn UpdateDeployment on /org/projectatomic/rpmostree1/rhcos failed: Could not depsolve transaction; 1 problem detected: Problem: conflicting requests - nothing provides perl-interpreter needed by kernel-rt-devel-4.18.0-147.8.1.rt24.101.el8_1.x86_64 Version-Release number of selected component (if applicable): OCP 4.5.0-0.nightly-2020-04-09-073609 How reproducible: tried it once only Steps to Reproduce: 1. Deploy this MachineConfig: apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: realtime-worker spec: kernelType: realtime 2. $ oc get node -l node-role.kubernetes.io/worker -o custom-columns=NAME:.metadata.name,KERNEL-VERSION:.status.nodeInfo.kernelVersion Actual results: NAME KERNEL-VERSION slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal 4.18.0-147.8.1.el8_1.x86_64 slinte-hfvs5-w-c-qlz8v.c.openshift-gce-devel.internal 4.18.0-147.8.1.el8_1.x86_64 slinte-hfvs5-w-d-fk7mm.c.openshift-gce-devel.internal 4.18.0-147.8.1.el8_1.x86_64 Expected results: see rt kernel Additional info: be sure to use at least 4.5.0-0.nightly-2020-04-09-073609, else you will run into https://bugzilla.redhat.com/show_bug.cgi?id=1821888 it's also interesting that error isn't noticed by MCO, MCP says all is fine: $ k get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-c2f9dfa257d210cad7485d1c49881950 True False False 3 3 3 0 47m worker rendered-worker-03da69decee65c0222ae355bbcc45326 True False False 3 3 3 0 47m
Personally, I think the devel packages should not have been included in the first place. The missing dependency is just what revealed it.
Note: not sure if it's important, but I deployed said MC as day 1 operation (running "openshift-install create manifests" and putting the MC into the manifests dir)
Ah, it could be the reason. We may need to update bootimage to include changes from fix https://github.com/openshift/machine-config-operator/pull/1612 in machine-config-daemon binary on host. So far from MCO e2e-gcp-op test, day2 kernel RT switch seems fine https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1632/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1755/artifacts/e2e-gcp-op/pods/openshift-machine-config-operator_machine-config-daemon-mz2jx_machine-config-daemon.log . To be extra sure I will do a RT kernel switch as day2 operation with nightly 4.5.0-0.nightly-2020-04-09-082127 to make sure Can you let us know if RT kernel switch as day2 works for your usecase?
Day2 RT kernel worked fine for me on a cluster installed from 4.5.0-0.nightly-2020-04-09-082127 machine-config-daemon pod log running on one of the node: I0409 12:27:13.476542 4087 rpm-ostree.go:366] Running captured: podman mount 79bb1b269fd08379f6bc9ac09f2e8658c64635ce67775c71a52b3e7aaed9cf90 I0409 12:27:13.593499 4087 update.go:1364] Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-devel-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm"] I0409 12:29:33.848483 4087 update.go:1364] Deleted container and removed OSContainer image I0409 12:29:33.854996 4087 update.go:1364] initiating reboot: Node will reboot into config rendered-worker-fe4dd3f9e693875c16020459e2401324 Most likely we will need to update installer bootimage
https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/887
(Which reverts part of https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/877 which is what caused the problem)
I wonder why I don't see this issue with day2 RT kernel switch. These issue makes me think how much we need support for shipping multiple OSTree refs. It may break again if shipped RT kernelsub-packages adds any dependencies which are not shipped in base RHCOS
Checked with nightly 4.5.0-0.nightly-2020-04-13-014528 , machine-os-content doesn't include kernel-rt-devel package. @Marc Can you please confirm if cluster installation with rt-kernel works as expected for you with recent nightlies?
I can confirm that rt kernel is installed using day1 method on 4.5.0-0.nightly-2020-04-14-055714
Thanks Marc for confirming! QE: This bug can be verified by installing successfully latest 4.5 nightly cluster with setting kernel-rt as day1. Documentation is at https://github.com/openshift/installer/blob/master/docs/user/customization.md#Switching-RHCOS-host-kernel-using-KernelType . On installed cluster, get access to one of the worker node (considering we applied day1 kernel-rt switch on worker nodes) and run `rpm-ostree status`. Latest deployment shouldn't have kernel-rt-devel package layered
Verified RT kernel can be installed day-1 on 4.5.0-0.nightly-2020-04-13-122638 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-13-122638 True False 3m33s Cluster version is 4.5.0-0.nightly-2020-04-13-122638 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-140-248.us-west-2.compute.internal Ready master 28m v1.17.1 ip-10-0-142-181.us-west-2.compute.internal Ready worker 11m v1.17.1 ip-10-0-153-244.us-west-2.compute.internal Ready worker 12m v1.17.1 ip-10-0-154-21.us-west-2.compute.internal Ready master 29m v1.17.1 ip-10-0-166-176.us-west-2.compute.internal Ready worker 12m v1.17.1 ip-10-0-170-140.us-west-2.compute.internal Ready master 28m v1.17.1 $ oc get node -l node-role.kubernetes.io/worker -o custom-columns=NAME:.metadata.name,KERNEL-VERSION:.status.nodeInfo.kernelVersion NAME KERNEL-VERSION ip-10-0-142-181.us-west-2.compute.internal 4.18.0-147.8.1.rt24.101.el8_1.x86_64 ip-10-0-153-244.us-west-2.compute.internal 4.18.0-147.8.1.rt24.101.el8_1.x86_64 ip-10-0-166-176.us-west-2.compute.internal 4.18.0-147.8.1.rt24.101.el8_1.x86_64 $ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- chroot /host rpm-ostree status; done Starting pod/ip-10-0-142-181us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2d924444fb86f6974dd65bcc2c67afb298937529bde40ef3e40d4fd31c77c16c CustomOrigin: Managed by machine-config-operator Version: 45.81.202004130827-0 (2020-04-13T08:33:17Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.8.1.el8_1 LocalPackages: kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z) Removing debug pod ... Starting pod/ip-10-0-153-244us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2d924444fb86f6974dd65bcc2c67afb298937529bde40ef3e40d4fd31c77c16c CustomOrigin: Managed by machine-config-operator Version: 45.81.202004130827-0 (2020-04-13T08:33:17Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.8.1.el8_1 LocalPackages: kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z) Removing debug pod ... Starting pod/ip-10-0-166-176us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2d924444fb86f6974dd65bcc2c67afb298937529bde40ef3e40d4fd31c77c16c CustomOrigin: Managed by machine-config-operator Version: 45.81.202004130827-0 (2020-04-13T08:33:17Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.8.1.el8_1 LocalPackages: kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z) Removing debug pod ... $
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409