Description of problem: Bootstraping an OCP 4.5 cluster with an additional MachineConfig for deploying realtime kernel on workers fails recently. Version-Release number of selected component (if applicable): $ oc version Client Version: 4.5.0-0.nightly-2020-04-07-141621 Kubernetes Version: v1.18.0-rc.1 How reproducible: always Steps to Reproduce: 1. ./openshift-install create manifests 2. add this MachineConfig to manifests: $ cat realtime-worker-machine-config.yml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: realtime-worker spec: kernelType: realtime 3. ./openshift-install create cluster Actual results: Cluster bootstrap fails. bootkube complains about MachineConfig CRD not available logs of machine-config-controller: [root@slinte-d2q72-b core]# crictl logs 3e008718f135c I0407 17:37:24.117091 1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b) F0407 17:37:25.619601 1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty) Report: Expected results: Cluster ceration succesful Additional info:
Another link to the broken jobs: https://prow.svc.ci.openshift.org/job-history/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5 Might be related to this : https://github.com/openshift/machine-config-operator/pull/996 which went in over the weekend.
Confirmed this in most recent run: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5/398/artifacts/e2e-gcp/installer/ Download https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5/398/artifacts/e2e-gcp/installer/log-bundle-20200407011342.tar Look in kubelet.log in /log-bundle-20200407011342/bootstrap/journals/ Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: I0407 00:29:22.730520 1979 kubelet_getters.go:173] status for pod bootstrap-machine-config-operator-ci-op-8wldh-b.c.openshift-gce-devel-ci.internal updated to {Pending [{Initialized False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotInitialized containers with incomplete status: [machine-config-controller]} {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotReady containers with unready status: [machine-config-server]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotReady containers with unready status: [machine-config-server]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC }] [] 2020-04-07 00:29:08 +0000 UTC [{machine-config-controller {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0407 00:29:12.882901 1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: F0407 00:29:14.322597 1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: Report: Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: ,StartedAt:2020-04-07 00:29:12 +0000 UTC,FinishedAt:2020-04-07 00:29:14 +0000 UTC,ContainerID:cri-o://f3c722599e9f6ef39f28c9bb4f0f3fc37412725514bef5842cd326d75de482a2,}} {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0407 00:29:09.712355 1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: F0407 00:29:11.149021 1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: Report: Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: ,StartedAt:2020-04-07 00:29:09 +0000 UTC,FinishedAt:2020-04-07 00:29:11 +0000
Reassigned to Jerry as he maybe already working on it.
QE 2 example MCs for testing: ```apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: realtime-worker spec: kernelType: realtime ``` ``` apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kargs-loglevel spec: kernelArguments: - 'loglevel=7' ```
Created clusters with these two MCs with no errors: ```apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: realtime-worker spec: kernelType: realtime ``` ``` apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kargs-loglevel spec: kernelArguments: - 'loglevel=7' ``` $ ./openshift-install create manifests --dir=testcluster --log-level debug $ cd openshift/ $ cat << EOF > rt-kernel.yaml > apiVersion: machineconfiguration.openshift.io/v1 > kind: MachineConfig > metadata: > labels: > machineconfiguration.openshift.io/role: worker > name: realtime-worker > spec: > kernelType: realtime > EOF $ cd ../.. $ ./openshift-install create cluster --dir=testcluster --log-level debug ..install.. $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-134-116.us-west-2.compute.internal Ready master 33m v1.17.1 ip-10-0-141-102.us-west-2.compute.internal Ready worker 17m v1.17.1 ip-10-0-144-215.us-west-2.compute.internal Ready worker 17m v1.17.1 ip-10-0-158-216.us-west-2.compute.internal Ready master 34m v1.17.1 ip-10-0-163-69.us-west-2.compute.internal Ready worker 16m v1.17.1 ip-10-0-172-1.us-west-2.compute.internal Ready master 34m v1.17.1 $ oc debug node/ip-10-0-141-102.us-west-2.compute.internal Starting pod/ip-10-0-141-102us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` chroot /host If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm -q | grep kernel rpm: no arguments given for query sh-4.4# rpm -qa | grep kernel kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-headers-4.18.0-147.8.1.el8_1.x86_64 kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-devel-4.18.0-147.8.1.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 sh-4.4# uname -a Linux ip-10-0-141-102 4.18.0-147.8.1.rt24.101.el8_1.x86_64 #1 SMP PREEMPT RT Wed Feb 26 16:43:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-13-122638 True False 10m Cluster version is 4.5.0-0.nightly-2020-04-13-122638
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409