Bug 1914988 - [4.6.z] real-time kernel in RHCOS is not synchronized
Summary: [4.6.z] real-time kernel in RHCOS is not synchronized
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6.z
Hardware: All
OS: All
high
urgent
Target Milestone: ---
: 4.6.z
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1914469
Blocks: 1922262 1922263
TreeView+ depends on / blocked
 
Reported: 2021-01-11 16:57 UTC by Micah Abbott
Modified: 2021-01-29 14:43 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1914469
: 1922262 (view as bug list)
Environment:
Last Closed: 2021-01-18 18:00:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:0037 0 None None None 2021-01-18 18:00:39 UTC

Description Micah Abbott 2021-01-11 16:57:43 UTC
+++ This bug was initially created as a clone of Bug #1914469 +++

Description of problem:
The Realtime (RT) variant of the RHEL kernel shipped in downstream RHCOS appears to not be synchronized with the standard kernel.

For example, the latest currently shipping stable version as of the BZ authoring is version 4.6.9.  That corresponds to 46.82.202012151054-0, which has kernel version 4.18.0-193.37.1.el8_2.x86_64.

When switching to the RT variant of the kernel via the Performance AddOn Operator, one gets booted into RT kernel version 
4.18.0-193.28.1.rt13.77.el8_2.x86_64.  This should be a more recent kernel.

Additional info:
The 28.1 kernel is from October and should be on a later release, such as 37.1, which is from December.

In looking at nightly builds for 4.6, it still has the kernel version from October.  (4.6.0-0.nightly-2021-01-08-200800) mounting the machine-os-content and looking in the extensions folder.

./extensions/kernel-rt/kernel-headers-4.18.0-193.28.1.el8_2.x86_64.rpm
./extensions/kernel-rt/kernel-rt-core-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm
./extensions/kernel-rt/kernel-rt-devel-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm
./extensions/kernel-rt/kernel-rt-kvm-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm
./extensions/kernel-rt/kernel-rt-modules-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm
./extensions/kernel-rt/kernel-rt-modules-extra-4.18.0-193.28.1.rt13.77.el8_2.x86_64.rpm

There are numerous fixes in more recent RT kernel versions that are absolutely critical for low latency applications running on OpenShift 4.6.

--- Additional comment from Micah Abbott on 2021-01-11 14:44:34 UTC ---

RHCOS 4.6 is billed as an EUS release and uses the RHEL 8.2 EUS sources.  The kernel-rt package does not have an EUS release, rather it uses the moniker "Telecommunications Update Service".  See the most recent advisory for `kernel-rt` - https://access.redhat.com/errata/RHSA-2020:5428

The RHCOS build process is incorrectly using the wrong location for TUS updates on `kernel-rt`, so we'll have to update our build process/configuration to use the proper location.

Comment 1 Micah Abbott 2021-01-11 17:02:40 UTC
The RHCOS 4.6 build config is updated here - https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1207

We will need to force a build of RHCOS 4.6 via https://gitlab.cee.redhat.com/openshift-art/rhcos-upshift/-/merge_requests/217

Comment 2 Micah Abbott 2021-01-11 19:54:03 UTC
The fix landed in RHCOS 46.82.202101111741-0

```
(1/5): kernel-rt-kvm-4.18.0-193.37.1.rt13.87.el 3.5 MB/s | 3.2 MB     00:00    
(2/5): kernel-rt-modules-extra-4.18.0-193.37.1. 2.7 MB/s | 3.4 MB     00:01    
(3/5): kernel-rt-modules-4.18.0-193.37.1.rt13.8 7.1 MB/s |  24 MB     00:03    
(4/5): kernel-rt-core-4.18.0-193.37.1.rt13.87.e 7.2 MB/s |  27 MB     00:03    
(5/5): kernel-rt-devel-4.18.0-193.37.1.rt13.87. 6.4 MB/s |  15 MB     00:02    
```

Comment 7 Michael Nguyen 2021-01-13 16:02:02 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-01-12-084037   True        False         37m     Cluster version is 4.6.0-0.nightly-2021-01-12-084037
$ cat 99-worker-realtime.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-realtime
spec:
  config:
    ignition:
      version: 3.1.0
  kernelType: realtime

$ oc create -f 99-worker-realtime.yaml 
machineconfig.machineconfiguration.openshift.io/99-worker-realtime created
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
00-worker                                          eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
01-master-container-runtime                        eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
01-master-kubelet                                  eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
01-worker-container-runtime                        eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
01-worker-kubelet                                  eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
99-master-generated-registries                     eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
99-master-ssh                                                                                 3.1.0             38m
99-worker-generated-registries                     eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
99-worker-realtime                                                                            3.1.0             3s
99-worker-ssh                                                                                 3.1.0             38m
rendered-master-5d7bfa47cbeec95df59f71f33b975eb1   eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
rendered-worker-3913f58d5e37f4cbf0cce63c0b49a0a3   eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             31m
rendered-worker-48aa8698ff39eae0d76d83d06b9f6978   eab9c35dfbeb0d21be6e1db3887acbbb93592d34   3.1.0             72s
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-3913f58d5e37f4cbf0cce63c0b49a0a3   False     True       False      3              2                   2                     0                      33m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-3913f58d5e37f4cbf0cce63c0b49a0a3   False     True       False      3              2                   2                     0                      33m
$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-y3lkdbb-f76d1-hsvx6-master-0         Ready    master   54m   v1.19.0+9c69bdc
ci-ln-y3lkdbb-f76d1-hsvx6-master-1         Ready    master   54m   v1.19.0+9c69bdc
ci-ln-y3lkdbb-f76d1-hsvx6-master-2         Ready    master   54m   v1.19.0+9c69bdc
ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k   Ready    worker   43m   v1.19.0+9c69bdc
ci-ln-y3lkdbb-f76d1-hsvx6-worker-c-wrkgp   Ready    worker   43m   v1.19.0+9c69bdc
ci-ln-y3lkdbb-f76d1-hsvx6-worker-d-nrsfm   Ready    worker   43m   v1.19.0+9c69bdc
$ oc debug node/ci-ln-y3lkdbb-f76d1-hsvx6-master-0
Starting pod/ci-ln-y3lkdbb-f76d1-hsvx6-master-0-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# uname -a
Linux ci-ln-y3lkdbb-f76d1-hsvx6-master-0 4.18.0-193.37.1.el8_2.x86_64 #1 SMP Sun Dec 6 19:59:00 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...
$ oc debug node/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k
Starting pod/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# uname -a
Linux ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k 4.18.0-193.37.1.rt13.87.el8_2.x86_64 #1 SMP PREEMPT RT Mon Dec 7 13:13:06 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4# rpm -qa | grep kernel
kernel-rt-modules-extra-4.18.0-193.37.1.rt13.87.el8_2.x86_64
kernel-rt-modules-4.18.0-193.37.1.rt13.87.el8_2.x86_64
kernel-rt-kvm-4.18.0-193.37.1.rt13.87.el8_2.x86_64
kernel-rt-core-4.18.0-193.37.1.rt13.87.el8_2.x86_64
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

$ oc debug node/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k -- chroot /host rpm-ostree status
Starting pod/ci-ln-y3lkdbb-f76d1-hsvx6-worker-b-dd88k-debug ...
To use host binaries, run `chroot /host`
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb3a8ce181e114db2b7f859b349b840721d109c89261623d87fac96b2499b1b9
              CustomOrigin: Managed by machine-config-operator
                   Version: 46.82.202101111741-0 (2021-01-11T17:44:58Z)
       RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-193.37.1.el8_2
           LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules
                            kernel-rt-modules-extra

  ostree://cb0327325553e6922ff25822ea7eb1a2ec213e70c7cf6880965e7e2bb5ee7dea
                   Version: 46.82.202011260640-0 (2020-11-26T06:44:15Z)

Comment 9 errata-xmlrpc 2021-01-18 18:00:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.12 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0037


Note You need to log in before you can comment on or make changes to this bug.