Description of problem: Since around yesterday OCP's 4.5 machine-os-content image does not contain RT kernel RPMs anymore in its root directory. This results in machine-config-daemon failing to install the rt kernel, and the node is degraded. Version-Release number of selected component (if applicable): OCP 4.5.0-0.ci-2020-03-09-035935 machine-os-content registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-035935@sha256:84e3cbc499264b663ed9dea874d294920bc27ae92a4519f31307bee4a39fd251 How reproducible: always Steps to Reproduce: 1. sudo podman pull --authfile=./all-the-pull-secrets.json registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-035935@sha256:84e3cbc499264b663ed9dea874d294920bc27ae92a4519f31307bee4a39fd251 2. lookup image id of image (podman images) 3. containerId=sudo podman create --net=none --name test <imageID> 4. containerPath=sudo podman mount $containerId 5. sudo ls -ll $containerPath Actual results: total 68 drwxr-xr-x 2 root root 4096 Dec 14 2017 opt drwxr-xr-x 2 root root 4096 Dec 14 2017 mnt drwxr-xr-x 2 root root 4096 Dec 14 2017 media dr-xr-xr-x 2 root root 4096 Dec 14 2017 boot drwxr-xr-x 2 root root 4096 Dec 10 18:24 sys drwxr-xr-x 2 root root 4096 Dec 10 18:24 proc drwxr-xr-x 2 root root 4096 Dec 10 18:24 dev drwxr-xr-x 1 root root 4096 Dec 10 18:25 usr lrwxrwxrwx 1 root root 8 Dec 10 18:25 sbin -> usr/sbin lrwxrwxrwx 1 root root 9 Dec 10 18:25 lib64 -> usr/lib64 lrwxrwxrwx 1 root root 7 Dec 10 18:25 lib -> usr/lib lrwxrwxrwx 1 root root 7 Dec 10 18:25 bin -> usr/bin drwxr-xr-x 1 root root 4096 Dec 10 18:26 var drwxr-xr-x 2 root root 4096 Dec 10 18:27 home dr-xr-x--- 1 root root 4096 Dec 10 18:33 root drwxr-xr-x 1 root root 4096 Dec 19 22:08 etc drwxrwxrwt 1 root root 4096 Dec 19 22:08 tmp drwxr-xr-x 1 root root 4096 Dec 19 22:08 run drwxr-xr-x 1 root root 4096 Feb 19 03:07 srv drwx------ 5 root root 4096 Mar 9 12:17 .. drwxr-xr-x 1 root root 4096 Mar 9 12:17 . Expected results: This is from OCP 4.4: total 50076 -rw-r--r-- 1 msluiter msluiter 26067104 Mar 8 09:53 kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm -rw-r--r-- 1 msluiter msluiter 22879500 Mar 8 09:53 kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm -rw-r--r-- 1 msluiter msluiter 2293020 Mar 8 09:53 kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm -rw-rw-r-- 1 root root 14031 Mar 8 09:53 pkglist.txt drwxrwxr-x 3 root root 4096 Mar 8 09:53 srv Additional info:
A lot more info in https://github.com/openshift/machine-config-operator/pull/1545 TL;DR the RHCOS 4.5 builds haven't been finalized yet and what's in 4.5 CI stream is wrong.
RHCOS 4.5 job was finally successful; versions 45.81.202003091535-0 and newer should have the RT kernel included. ``` $ oc image info --output json $(oc adm release info -a ~/openshift-cluster-installs/all-the-pull-secrets.json --image-for=machine-os-content registry.svc.ci.openshift.org/ocp/release:4.5.0-0.ci-2020-03-09-173642) | jq .name "registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e" $ sudo podman pull --authfile ~/openshift-cluster-installs/all-the-pull-secrets.json registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e Trying to pull registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e... Getting image source signatures Copying blob 9ed0ef8a459c done Copying config 84b7a363f3 done Writing manifest to image destination Storing signatures 84b7a363f3e062142bda783e6e18c891dce93c876f3478eb0d4e3ae8013ac1dc $ ctr=$(sudo podman create registry.svc.ci.openshift.org/ocp/4.5-2020-03-09-173642@sha256:06bb3cffbd17a62b3b8db5a1296b8e74ccb2497ffb952eb8dff2e7922ec6736e) $ mnt=$(sudo podman mount $ctr) $ sudo ls $mnt kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64.rpm pkglist.txt srv ```
Just noticed it on our CI, thanks for the quick fix!
Verified on 4.5.0-0.nightly-2020-03-13-113617 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-03-13-113617 True False 4m6s Cluster version is 4.5.0-0.nightly-2020-03-13-113617 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-138-0.us-west-2.compute.internal Ready worker 15m v1.17.1 ip-10-0-141-46.us-west-2.compute.internal Ready master 30m v1.17.1 ip-10-0-147-245.us-west-2.compute.internal Ready worker 15m v1.17.1 ip-10-0-155-42.us-west-2.compute.internal Ready master 30m v1.17.1 ip-10-0-169-101.us-west-2.compute.internal Ready worker 15m v1.17.1 ip-10-0-173-114.us-west-2.compute.internal Ready master 29m v1.17.1 $ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host rpm -q kernel Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` kernel-4.18.0-147.5.1.el8_1.x86_64 Removing debug pod ... $ cat << EOF > worker-kernel.yaml > apiVersion: machineconfiguration.openshift.io/v1 > kind: MachineConfig > metadata: > labels: > machineconfiguration.openshift.io/role: "worker" > name: worker-kerneltype > spec: > kernelType: realtime > EOF $ oc apply -f worker-kernel.yaml machineconfig.machineconfiguration.openshift.io/worker-kerneltype created $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 00-worker d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 01-master-container-runtime d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 01-master-kubelet d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 01-worker-container-runtime d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 01-worker-kubelet d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 99-master-3fcc6cbf-cde7-4d60-b15e-c00e340663d0-registries d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 99-master-ssh 2.2.0 28m 99-worker-5e5c295c-f8ca-42a3-8837-f239d06f2d17-registries d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m 99-worker-ssh 2.2.0 28m rendered-master-a99ee50734a219676caba2c8a360ac19 d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m rendered-worker-3e647a75d51a4bb8f3b865b232ee1a3f d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 27m rendered-worker-870a926c40f9e3d2b3e9cc819be21c2d d737c575cf5ab31cf303fd75cb7f104563eeb9b4 2.2.0 0s worker-kerneltype 5s $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-3e647a75d51a4bb8f3b865b232ee1a3f False True False 3 0 0 0 28m $ watch oc get node $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-138-0.us-west-2.compute.internal Ready worker 32m v1.17.1 ip-10-0-141-46.us-west-2.compute.internal Ready master 47m v1.17.1 ip-10-0-147-245.us-west-2.compute.internal Ready worker 32m v1.17.1 ip-10-0-155-42.us-west-2.compute.internal Ready master 47m v1.17.1 ip-10-0-169-101.us-west-2.compute.internal Ready worker 32m v1.17.1 ip-10-0-173-114.us-west-2.compute.internal Ready master 46m v1.17.1 $ oc get mcp/worker NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-870a926c40f9e3d2b3e9cc819be21c2d True False False 3 3 3 0 44m $ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host rpm -q kernel Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` package kernel is not installed Removing debug pod ... $ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host rpm -qa kernel* Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` kernel-rt-core-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-modules-4.18.0-147.5.1.rt24.98.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.5.1.rt24.98.el8_1.x86_64 Removing debug pod ... $ oc debug node/ip-10-0-138-0.us-west-2.compute.internal -- chroot /host uname -a Starting pod/ip-10-0-138-0us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Linux ip-10-0-138-0 4.18.0-147.5.1.rt24.98.el8_1.x86_64 #1 SMP PREEMPT RT Tue Jan 14 16:03:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409