+++ This bug was initially created as a clone of Bug #1914469 +++ Description of problem: The Realtime (RT) variant of the RHEL kernel shipped in downstream RHCOS appears to not be synchronized with the standard kernel. For example, the latest currently shipping stable version as of the BZ authoring is version 4.6.9. That corresponds to 46.82.202012151054-0, which has kernel version 4.18.0-193.37.1.el8_2.x86_64. When switching to the RT variant of the kernel via the Performance AddOn Operator, one gets booted into RT kernel version 4.18.0-193.28.1.rt13.77.el8_2.x86_64. This should be a more recent kernel. Additional info: The 28.1 kernel is from October and should be on a later release, such as 37.1, which is from December. In looking at nightly builds for 4.6, it still has the kernel version from October. (4.6.0-0.nightly-2021-01-08-200800) mounting the machine-os-content and looking in the extensions folder. ./extensions/kernel-rt/kernel-headers-4.18.0-193.28.1.el8_2.x86_64.rpm ./extensions/kernel-rt/kernel-rt-core-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm ./extensions/kernel-rt/kernel-rt-devel-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm ./extensions/kernel-rt/kernel-rt-kvm-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm ./extensions/kernel-rt/kernel-rt-modules-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm ./extensions/kernel-rt/kernel-rt-modules-extra-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm There are numerous fixes in more recent RT kernel versions that are absolutely critical for low latency applications running on OpenShift 4.6. --- Additional comment from Micah Abbott on 2021-01-11 14:44:34 UTC --- RHCOS 4.6 is billed as an EUS release and uses the RHEL 8.2 EUS sources. The kernel-rt package does not have an EUS release, rather it uses the moniker "Telecommunications Update Service". See the most recent advisory for `kernel-rt` - https://access.redhat.com/errata/RHSA-2020:5428 The RHCOS build process is incorrectly using the wrong location for TUS updates on `kernel-rt`, so we'll have to update our build process/configuration to use the proper location.
The RHCOS 4.6 build config is updated here - https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1207 We will need to force a build of RHCOS 4.6 via https://gitlab.cee.redhat.com/openshift-art/rhcos-upshift/-/merge_requests/217
The fix landed in RHCOS 46.82.202101111741-0 ``` (1/5): kernel-rt-kvm-4.18.0-193.37.1.rt13.87.el 3.5 MB/s | 3.2 MB 00:00 (2/5): kernel-rt-modules-extra-4.18.0-193.37.1. 2.7 MB/s | 3.4 MB 00:01 (3/5): kernel-rt-modules-4.18.0-193.37.1.rt13.8 7.1 MB/s | 24 MB 00:03 (4/5): kernel-rt-core-4.18.0-193.37.1.rt13.87.e 7.2 MB/s | 27 MB 00:03 (5/5): kernel-rt-devel-4.18.0-193.37.1.rt13.87. 6.4 MB/s | 15 MB 00:02 ```
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-01-12-084037 True False 37m Cluster version is 4.6.0-0.nightly-2021-01-12-084037 $ cat 99-worker-realtime.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-realtime spec: config: ignition: version: 3.1.0 kernelType: realtime $ oc create -f 99-worker-realtime.yaml machineconfig.machineconfiguration.openshift.io/99-worker-realtime created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 00-worker eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 01-master-container-runtime eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 01-master-kubelet eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 01-worker-container-runtime eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 01-worker-kubelet eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 99-master-generated-registries eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 99-master-ssh 3.1.0 38m 99-worker-generated-registries eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m 99-worker-realtime 3.1.0 3s 99-worker-ssh 3.1.0 38m rendered-master-5d7bfa47cbeec95df59f71f33b975eb1 eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m rendered-worker-3913f58d5e37f4cbf0cce63c0b49a0a3 eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 31m rendered-worker-48aa8698ff39eae0d76d83d06b9f6978 eab9c35dfbeb0d21be6e1db3887acbbb93592d34 3.1.0 72s $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-3913f58d5e37f4cbf0cce63c0b49a0a3 False True False 3 2 2 0 33m $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-3913f58d5e37f4cbf0cce63c0b49a0a3 False True False 3 2 2 0 33m $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-y3lkdbb-f76d1-hsvx6-master-0 Ready master 54m v1.19.0+9c69bdc ci-ln-y3lkdbb-f76d1-hsvx6-master-1 Ready master 54m v1.19.0+9c69bdc ci-ln-y3lkdbb-f76d1-hsvx6-master-2 Ready master 54m v1.19.0+9c69bdc ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k Ready worker 43m v1.19.0+9c69bdc ci-ln-y3lkdbb-f76d1-hsvx6-worker-c-wrkgp Ready worker 43m v1.19.0+9c69bdc ci-ln-y3lkdbb-f76d1-hsvx6-worker-d-nrsfm Ready worker 43m v1.19.0+9c69bdc $ oc debug node/ci-ln-y3lkdbb-f76d1-hsvx6-master-0 Starting pod/ci-ln-y3lkdbb-f76d1-hsvx6-master-0-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# uname -a Linux ci-ln-y3lkdbb-f76d1-hsvx6-master-0 4.18.0-193.37.1.el8_2.x86_64 #1 SMP Sun Dec 6 19:59:00 EST 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc debug node/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k Starting pod/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# uname -a Linux ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k 4.18.0-193.37.1.rt13.87.el8_2.x86_64 #1 SMP PREEMPT RT Mon Dec 7 13:13:06 EST 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# rpm -qa | grep kernel kernel-rt-modules-extra-4.18.0-193.37.1.rt13.87.el8_2.x86_64 kernel-rt-modules-4.18.0-193.37.1.rt13.87.el8_2.x86_64 kernel-rt-kvm-4.18.0-193.37.1.rt13.87.el8_2.x86_64 kernel-rt-core-4.18.0-193.37.1.rt13.87.el8_2.x86_64 sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc debug node/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k -- chroot /host rpm-ostree status Starting pod/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k-debug ... To use host binaries, run `chroot /host` State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb3a8ce181e114db2b7f859b349b840721d109c89261623d87fac96b2499b1b9 CustomOrigin: Managed by machine-config-operator Version: 46.82.202101111741-0 (2021-01-11T17:44:58Z) RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-193.37.1.el8_2 LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra ostree://cb0327325553e6922ff25822ea7eb1a2ec213e70c7cf6880965e7e2bb5ee7dea Version: 46.82.202011260640-0 (2020-11-26T06:44:15Z)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.6.12 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0037