Bug 1811630 - RT kernel RPMs missing in latest OCP 4.5
Summary: RT kernel RPMs missing in latest OCP 4.5
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-03-09 11:41 UTC by Marc Sluiter
Modified: 2020-07-13 17:19 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Trying to install kernel-rt on RHCOS nodes Consequence: Failure to install kernel-rt Fix: Include kernel-rt as part of the machine-os-content Result: Installation of kernel-rt is successful
Clone Of:
Environment:
Last Closed: 2020-07-13 17:18:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:19:27 UTC

Description Marc Sluiter 2020-03-09 11:41:22 UTC
Description of problem:

Since around yesterday OCP's 4.5 machine-os-content image does not contain RT kernel RPMs anymore in its root directory. This results in machine-config-daemon failing to install the rt kernel, and the node is degraded.

Version-Release number of selected component (if applicable):

OCP 4.5.0-0.ci-2020-03-09-035935
machine-os-content                             registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-035935@sha256:84e3cbc499264b663ed9dea874d294920bc27ae92a4519f31307bee4a39fd251

How reproducible:
always

Steps to Reproduce:
1. sudo podman pull --authfile=./all-the-pull-secrets.json registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-035935@sha256:84e3cbc499264b663ed9dea874d294920bc27ae92a4519f31307bee4a39fd251
2. lookup image id of image (podman images)
3. containerId=sudo podman create --net=none --name test <imageID>
4. containerPath=sudo podman mount $containerId
5. sudo ls -ll $containerPath

Actual results:
total 68
drwxr-xr-x 2 root root 4096 Dec 14  2017 opt
drwxr-xr-x 2 root root 4096 Dec 14  2017 mnt
drwxr-xr-x 2 root root 4096 Dec 14  2017 media
dr-xr-xr-x 2 root root 4096 Dec 14  2017 boot
drwxr-xr-x 2 root root 4096 Dec 10 18:24 sys
drwxr-xr-x 2 root root 4096 Dec 10 18:24 proc
drwxr-xr-x 2 root root 4096 Dec 10 18:24 dev
drwxr-xr-x 1 root root 4096 Dec 10 18:25 usr
lrwxrwxrwx 1 root root    8 Dec 10 18:25 sbin -> usr/sbin
lrwxrwxrwx 1 root root    9 Dec 10 18:25 lib64 -> usr/lib64
lrwxrwxrwx 1 root root    7 Dec 10 18:25 lib -> usr/lib
lrwxrwxrwx 1 root root    7 Dec 10 18:25 bin -> usr/bin
drwxr-xr-x 1 root root 4096 Dec 10 18:26 var
drwxr-xr-x 2 root root 4096 Dec 10 18:27 home
dr-xr-x--- 1 root root 4096 Dec 10 18:33 root
drwxr-xr-x 1 root root 4096 Dec 19 22:08 etc
drwxrwxrwt 1 root root 4096 Dec 19 22:08 tmp
drwxr-xr-x 1 root root 4096 Dec 19 22:08 run
drwxr-xr-x 1 root root 4096 Feb 19 03:07 srv
drwx------ 5 root root 4096 Mar  9 12:17 ..
drwxr-xr-x 1 root root 4096 Mar  9 12:17 .

Expected results:
This is from OCP 4.4:

total 50076
-rw-r--r-- 1 msluiter msluiter 26067104 Mar  8 09:53 kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
-rw-r--r-- 1 msluiter msluiter 22879500 Mar  8 09:53 kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
-rw-r--r-- 1 msluiter msluiter  2293020 Mar  8 09:53 kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm
-rw-rw-r-- 1 root     root        14031 Mar  8 09:53 pkglist.txt
drwxrwxr-x 3 root     root         4096 Mar  8 09:53 srv

Additional info:

Comment 1 Colin Walters 2020-03-09 14:08:22 UTC
A lot more info in https://github.com/openshift/machine-config-operator/pull/1545

TL;DR the RHCOS 4.5 builds haven't been finalized yet and what's in 4.5 CI stream is wrong.

Comment 2 Micah Abbott 2020-03-09 20:07:12 UTC
RHCOS 4.5 job was finally successful; versions 45.81.202003091535-0 and newer should have the RT kernel included.

```
$  oc image info --output json $(oc adm release info -a ~/openshift-cluster-installs/all-the-pull-secrets.json --image-for=machine-os-content registry.svc.ci.openshift.org/ocp/release:4.5.0-0.ci-2020-03-09-173642) | jq .name
"registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e"

$ sudo podman pull --authfile  ~/openshift-cluster-installs/all-the-pull-secrets.json registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e
Trying to pull registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e...
Getting image source signatures
Copying blob 9ed0ef8a459c done  
Copying config 84b7a363f3 done  
Writing manifest to image destination
Storing signatures
84b7a363f3e062142bda783e6e18c891dce93c876f3478eb0d4e3ae8013ac1dc

$ ctr=$(sudo podman create registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e)
$ mnt=$(sudo podman mount $ctr)
$ sudo ls $mnt
kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm  kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm  kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm  pkglist.txt  srv
```

Comment 3 Marc Sluiter 2020-03-09 20:17:11 UTC
Just noticed it on our CI, thanks for the quick fix!

Comment 7 Michael Nguyen 2020-03-13 18:26:12 UTC
Verified on 4.5.0-0.nightly-2020-03-13-113617 

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-03-13-113617   True        False         4m6s    Cluster version is 4.5.0-0.nightly-2020-03-13-113617
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-138-0.us-west-2.compute.internal     Ready    worker   15m   v1.17.1
ip-10-0-141-46.us-west-2.compute.internal    Ready    master   30m   v1.17.1
ip-10-0-147-245.us-west-2.compute.internal   Ready    worker   15m   v1.17.1
ip-10-0-155-42.us-west-2.compute.internal    Ready    master   30m   v1.17.1
ip-10-0-169-101.us-west-2.compute.internal   Ready    worker   15m   v1.17.1
ip-10-0-173-114.us-west-2.compute.internal   Ready    master   29m   v1.17.1
$ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host rpm -q kernel
Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
kernel-4.18.0-147.5.1.el8_1.x86_64

Removing debug pod ...
$ cat << EOF > worker-kernel.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: "worker"
>   name: worker-kerneltype
> spec:
>   kernelType: realtime
> EOF
$ oc apply -f worker-kernel.yaml 
machineconfig.machineconfiguration.openshift.io/worker-kerneltype created
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                                   d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
00-worker                                                   d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
01-master-container-runtime                                 d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
01-master-kubelet                                           d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
01-worker-container-runtime                                 d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
01-worker-kubelet                                           d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
99-master-3fcc6cbf-cde7-4d60-b15e-c00e340663d0-registries   d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
99-master-ssh                                                                                          2.2.0             28m
99-worker-5e5c295c-f8ca-42a3-8837-f239d06f2d17-registries   d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
99-worker-ssh                                                                                          2.2.0             28m
rendered-master-a99ee50734a219676caba2c8a360ac19            d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
rendered-worker-3e647a75d51a4bb8f3b865b232ee1a3f            d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             27m
rendered-worker-870a926c40f9e3d2b3e9cc819be21c2d            d737c575cf5ab31cf303fd75cb7f104563eeb9b4   2.2.0             0s
worker-kerneltype                                                                                                        5s
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-3e647a75d51a4bb8f3b865b232ee1a3f   False     True       False      3              0                   0                     0                      28m
$ watch oc get node
$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-138-0.us-west-2.compute.internal     Ready    worker   32m   v1.17.1
ip-10-0-141-46.us-west-2.compute.internal    Ready    master   47m   v1.17.1
ip-10-0-147-245.us-west-2.compute.internal   Ready    worker   32m   v1.17.1
ip-10-0-155-42.us-west-2.compute.internal    Ready    master   47m   v1.17.1
ip-10-0-169-101.us-west-2.compute.internal   Ready    worker   32m   v1.17.1
ip-10-0-173-114.us-west-2.compute.internal   Ready    master   46m   v1.17.1
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-870a926c40f9e3d2b3e9cc819be21c2d   True      False      False      3              3                   3                     0                      44m
$ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host rpm -q kernel
Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
package kernel is not installed

Removing debug pod ...
$ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host rpm -qa kernel*
Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64
kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64

Removing debug pod ...

$ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host uname -a
Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Linux ip-10-0-138-0 4.18.0-147.5.1.rt24.98.el8_1.x86_64 #1 SMP PREEMPT RT Tue Jan 14 16:03:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Removing debug pod ...

Comment 11 errata-xmlrpc 2020-07-13 17:18:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.