Bug 1821888 - Day 1 deployment of realtime kernel fails on OCP 4.5 recently
Summary: Day 1 deployment of realtime kernel fails on OCP 4.5 recently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.5.0
Assignee: Yu Qi Zhang
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-04-07 18:55 UTC by Marc Sluiter
Modified: 2020-07-13 17:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:26:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1631 0 None closed Bug 1821888: controller: do not error on empty Ignition configs 2020-11-21 15:39:37 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:26:31 UTC

Description Marc Sluiter 2020-04-07 18:55:22 UTC
Description of problem:
Bootstraping an OCP 4.5 cluster with an additional MachineConfig for deploying realtime kernel on workers fails recently.

Version-Release number of selected component (if applicable):
$ oc version
Client Version: 4.5.0-0.nightly-2020-04-07-141621
Kubernetes Version: v1.18.0-rc.1

How reproducible:
always

Steps to Reproduce:
1. ./openshift-install create manifests
2. add this MachineConfig to manifests:

$ cat realtime-worker-machine-config.yml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: realtime-worker
spec:
  kernelType: realtime

3. ./openshift-install create cluster

Actual results:
Cluster bootstrap fails.

bootkube complains about MachineConfig CRD not available

logs of machine-config-controller:
[root@slinte-d2q72-b core]# crictl logs 3e008718f135c
I0407 17:37:24.117091       1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b)
F0407 17:37:25.619601       1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty)
Report:

Expected results:
Cluster ceration succesful

Additional info:

Comment 2 Kirsten Garrison 2020-04-08 00:55:26 UTC
Another link to the broken jobs: https://prow.svc.ci.openshift.org/job-history/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5

Might be related to this : https://github.com/openshift/machine-config-operator/pull/996 which went in over the weekend.

Comment 3 Kirsten Garrison 2020-04-08 01:04:11 UTC
Confirmed this in most recent run: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5/398/artifacts/e2e-gcp/installer/

Download https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.5/398/artifacts/e2e-gcp/installer/log-bundle-20200407011342.tar

Look in kubelet.log in /log-bundle-20200407011342/bootstrap/journals/

Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: I0407 00:29:22.730520    1979 kubelet_getters.go:173] status for pod bootstrap-machine-config-operator-ci-op-8wldh-b.c.openshift-gce-devel-ci.internal updated to {Pending [{Initialized False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotInitialized containers with incomplete status: [machine-config-controller]} {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotReady containers with unready status: [machine-config-server]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC ContainersNotReady containers with unready status: [machine-config-server]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-04-07 00:29:08 +0000 UTC  }]      [] 2020-04-07 00:29:08 +0000 UTC [{machine-config-controller {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0407 00:29:12.882901       1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b)
Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: F0407 00:29:14.322597       1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty)
Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: Report:
Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: ,StartedAt:2020-04-07 00:29:12 +0000 UTC,FinishedAt:2020-04-07 00:29:14 +0000 UTC,ContainerID:cri-o://f3c722599e9f6ef39f28c9bb4f0f3fc37412725514bef5842cd326d75de482a2,}} {nil nil &ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Error,Message:I0407 00:29:09.712355       1 bootstrap.go:40] Version: v4.5.0-202004062301-dirty (71a2fb3a9765351668b7c56cc2e1cfe88768e25b)
Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: F0407 00:29:11.149021       1 bootstrap.go:47] error running MCC[BOOTSTRAP]: parsing Ignition config failed with error: not a config (empty)
Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: Report:
Apr 07 00:29:22 ci-op-8wldh-b.c.openshift-gce-devel-ci.internal hyperkube[1979]: ,StartedAt:2020-04-07 00:29:09 +0000 UTC,FinishedAt:2020-04-07 00:29:11 +0000

Comment 4 Sinny Kumari 2020-04-08 09:12:28 UTC
Reassigned to Jerry as he maybe already working on it.

Comment 5 Kirsten Garrison 2020-04-08 20:08:48 UTC
QE 2 example MCs for testing:

```apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: realtime-worker
spec:
  kernelType: realtime
```

```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-kargs-loglevel
spec:
  kernelArguments:
    - 'loglevel=7'
```

Comment 8 Michael Nguyen 2020-04-13 17:01:39 UTC
Created clusters with these two MCs with no errors:


```apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: realtime-worker
spec:
  kernelType: realtime
```

```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: 99-worker-kargs-loglevel
spec:
  kernelArguments:
    - 'loglevel=7'
```


$ ./openshift-install create manifests --dir=testcluster --log-level debug
$ cd openshift/
$ cat << EOF > rt-kernel.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>   name: realtime-worker
> spec:
>   kernelType: realtime
> EOF
$ cd ../..
$ ./openshift-install create cluster --dir=testcluster --log-level debug

..install..

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-116.us-west-2.compute.internal   Ready    master   33m   v1.17.1
ip-10-0-141-102.us-west-2.compute.internal   Ready    worker   17m   v1.17.1
ip-10-0-144-215.us-west-2.compute.internal   Ready    worker   17m   v1.17.1
ip-10-0-158-216.us-west-2.compute.internal   Ready    master   34m   v1.17.1
ip-10-0-163-69.us-west-2.compute.internal    Ready    worker   16m   v1.17.1
ip-10-0-172-1.us-west-2.compute.internal     Ready    master   34m   v1.17.1
$ oc debug node/ip-10-0-141-102.us-west-2.compute.internal
Starting pod/ip-10-0-141-102us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
chroot /host
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm -q | grep kernel
rpm: no arguments given for query
sh-4.4# rpm -qa | grep kernel
kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-headers-4.18.0-147.8.1.el8_1.x86_64
kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64
kernel-devel-4.18.0-147.8.1.el8_1.x86_64
kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64
sh-4.4# uname -a
Linux ip-10-0-141-102 4.18.0-147.8.1.rt24.101.el8_1.x86_64 #1 SMP PREEMPT RT Wed Feb 26 16:43:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
sh-4.4# exit
exit
sh-4.2# exit   
exit

Removing debug pod ...
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-13-122638   True        False         10m     Cluster version is 4.5.0-0.nightly-2020-04-13-122638

Comment 9 errata-xmlrpc 2020-07-13 17:26:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.