Bug 1821888
Summary: | Day 1 deployment of realtime kernel fails on OCP 4.5 recently | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Marc Sluiter <msluiter> |
Component: | Machine Config Operator | Assignee: | Yu Qi Zhang <jerzhang> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | amurdaca, kgarriso, skumari, yanyang |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:26:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1771572 |
Description
Marc Sluiter
2020-04-07 18:55:22 UTC
Another link to the broken jobs: https://prow.svc.ci.openshift.org/job-history/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5 Might be related to this : https://github.com/openshift/machine-config-operator/pull/996 which went in over the weekend. Confirmed this in most recent run: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5/398/artifacts/e2e-gcp/installer/ Download https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5/398/artifacts/e2e-gcp/installer/log-bundle-20200407011342.tar Look in kubelet.log in /log-bundle-20200407011342/bootstrap/journals/ Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: I0407 00:29:22.730520 1979 kubelet_getters.go:173] status for pod bootstrap-machine-config-operator-ci-op-8wldh-b.c.openshift-gce-devel-ci.internal updated to {Pending [{Initialized False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotInitialized containers with incomplete status: [machine-config-controller]} {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotReady containers with unready status: [machine-config-server]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotReady containers with unready status: [machine-config-server]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC }] [] 2020-04-07 00:29:08 +0000 UTC [{machine-config-controller {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0407 00:29:12.882901 1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: F0407 00:29:14.322597 1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: Report: Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: ,StartedAt:2020-04-07 00:29:12 +0000 UTC,FinishedAt:2020-04-07 00:29:14 +0000 UTC,ContainerID:cri-o://f3c722599e9f6ef39f28c9bb4f0f3fc37412725514bef5842cd326d75de482a2,}} {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0407 00:29:09.712355 1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: F0407 00:29:11.149021 1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty) Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: Report: Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: ,StartedAt:2020-04-07 00:29:09 +0000 UTC,FinishedAt:2020-04-07 00:29:11 +0000 Reassigned to Jerry as he maybe already working on it. QE 2 example MCs for testing: ```apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: realtime-worker spec: kernelType: realtime ``` ``` apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: "worker" name: 99-worker-kargs-loglevel spec: kernelArguments: - 'loglevel=7' ``` Created clusters with these two MCs with no errors:
```apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: realtime-worker
spec:
kernelType: realtime
```
```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: "worker"
name: 99-worker-kargs-loglevel
spec:
kernelArguments:
- 'loglevel=7'
```
$ ./openshift-install create manifests --dir=testcluster --log-level debug
$ cd openshift/
$ cat << EOF > rt-kernel.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
> labels:
> machineconfiguration.openshift.io/role: worker
> name: realtime-worker
> spec:
> kernelType: realtime
> EOF
$ cd ../..
$ ./openshift-install create cluster --dir=testcluster --log-level debug
..install..
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-134-116.us-west-2.compute.internal Ready master 33m v1.17.1
ip-10-0-141-102.us-west-2.compute.internal Ready worker 17m v1.17.1
ip-10-0-144-215.us-west-2.compute.internal Ready worker 17m v1.17.1
ip-10-0-158-216.us-west-2.compute.internal Ready master 34m v1.17.1
ip-10-0-163-69.us-west-2.compute.internal Ready worker 16m v1.17.1
ip-10-0-172-1.us-west-2.compute.internal Ready master 34m v1.17.1
$ oc debug node/ip-10-0-141-102.us-west-2.compute.internal
Starting pod/ip-10-0-141-102us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
chroot /host
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm -q | grep kernel
rpm: no arguments given for query
sh-4.4# rpm -qa | grep kernel
kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-headers-4.18.0-147.8.1.el8_1.x86_64
kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-devel-4.18.0-147.8.1.el8_1.x86_64
kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64
sh-4.4# uname -a
Linux ip-10-0-141-102 4.18.0-147.8.1.rt24.101.el8_1.x86_64 #1 SMP PREEMPT RT Wed Feb 26 16:43:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-0.nightly-2020-04-13-122638 True False 10m Cluster version is 4.5.0-0.nightly-2020-04-13-122638
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |