Bug 1822558 - realtime kernel can't be installed because of a missing dependency
Summary: realtime kernel can't be installed because of a missing dependency
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Sinny Kumari
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-04-09 10:36 UTC by Marc Sluiter
Modified: 2020-07-13 17:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:26:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:27:10 UTC

Description Marc Sluiter 2020-04-09 10:36:47 UTC
Description of problem:
realtime kernel can't be installed because of a missing dependency (see journal of a worker):

Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal machine-config-daemon[1676]: I0409 10:02:17.181955    1676 update.go:1241] Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kern>
Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal root[1779]: machine-config-daemon[1676]: Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra">
Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal root[1779]: rged/kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm"]
Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: client(id:cli dbus:1.13 unit:machine-config-daemon-firstboot.service uid:0) added; new total=1
Apr 09 10:02:17 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: Initiated txn UpdateDeployment for client(id:cli dbus:1.13 unit:machine-config-daemon-firstboot.service uid:0): /org/projectatomic/rpmostree1/rhcos
Apr 09 10:02:40 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: Preparing pkg txn; enabled repos: [] solvables: 0
Apr 09 10:02:40 slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal rpm-ostree[1688]: Txn UpdateDeployment on /org/projectatomic/rpmostree1/rhcos failed: Could not depsolve transaction; 1 problem detected:
                                                                                         Problem: conflicting requests
                                                                                          - nothing provides perl-interpreter needed by kernel-rt-devel-4.18.0-147.8.1.rt24.101.el8_1.x86_64


Version-Release number of selected component (if applicable):
OCP 4.5.0-0.nightly-2020-04-09-073609

How reproducible:
tried it once only

Steps to Reproduce:
1. Deploy this MachineConfig:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: realtime-worker
spec:
  kernelType: realtime

2. $ oc get node -l node-role.kubernetes.io/worker -o custom-columns=NAME:.metadata.name,KERNEL-VERSION:.status.nodeInfo.kernelVersion

Actual results:
NAME                                                    KERNEL-VERSION
slinte-hfvs5-w-b-zqv4k.c.openshift-gce-devel.internal   4.18.0-147.8.1.el8_1.x86_64
slinte-hfvs5-w-c-qlz8v.c.openshift-gce-devel.internal   4.18.0-147.8.1.el8_1.x86_64
slinte-hfvs5-w-d-fk7mm.c.openshift-gce-devel.internal   4.18.0-147.8.1.el8_1.x86_64

Expected results:
see rt kernel

Additional info:
be sure to use at least 4.5.0-0.nightly-2020-04-09-073609, else you will run into https://bugzilla.redhat.com/show_bug.cgi?id=1821888

it's also interesting that error isn't noticed by MCO, MCP says all is fine:

$ k get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-c2f9dfa257d210cad7485d1c49881950   True      False      False      3              3                   3                     0                      47m
worker   rendered-worker-03da69decee65c0222ae355bbcc45326   True      False      False      3              3                   3                     0                      47m

Comment 1 Martin Sivák 2020-04-09 10:57:44 UTC
Personally, I think the devel packages should not have been included in the first place. The missing dependency is just what revealed it.

Comment 2 Marc Sluiter 2020-04-09 11:13:57 UTC
Note: not sure if it's important, but I deployed said MC as day 1 operation (running "openshift-install create manifests" and putting the MC into the manifests dir)

Comment 3 Sinny Kumari 2020-04-09 11:27:38 UTC
Ah, it could be the reason. We may need to update bootimage to include changes from fix https://github.com/openshift/machine-config-operator/pull/1612 in machine-config-daemon binary on host.
So far from MCO e2e-gcp-op test, day2 kernel RT switch seems fine https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1632/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1755/artifacts/e2e-gcp-op/pods/openshift-machine-config-operator_machine-config-daemon-mz2jx_machine-config-daemon.log .

To be extra sure I will do a RT kernel switch as day2 operation with nightly 4.5.0-0.nightly-2020-04-09-082127 to make sure
Can you let us know if RT kernel switch as day2 works for your usecase?

Comment 4 Sinny Kumari 2020-04-09 12:37:11 UTC
Day2 RT kernel worked fine for me on a cluster installed from 4.5.0-0.nightly-2020-04-09-082127

machine-config-daemon pod log running on one of the node:
I0409 12:27:13.476542    4087 rpm-ostree.go:366] Running captured: podman mount 79bb1b269fd08379f6bc9ac09f2e8658c64635ce67775c71a52b3e7aaed9cf90
I0409 12:27:13.593499    4087 update.go:1364] Switching to kernelType=realtime, invoking rpm-ostree ["override" "remove" "kernel" "kernel-core" "kernel-modules" "kernel-modules-extra" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-devel-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm" "--install" "/var/lib/containers/storage/overlay/171c5505c7ab73d83602a769a7f64237e0b2d245aa7afa697627c3fd0b8aa607/merged/kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64.rpm"]
I0409 12:29:33.848483    4087 update.go:1364] Deleted container and removed OSContainer image
I0409 12:29:33.854996    4087 update.go:1364] initiating reboot: Node will reboot into config rendered-worker-fe4dd3f9e693875c16020459e2401324

Most likely we will need to update installer bootimage

Comment 6 Colin Walters 2020-04-09 12:42:37 UTC
(Which reverts part of https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/877 which is what caused the problem)

Comment 7 Sinny Kumari 2020-04-09 13:09:51 UTC
I wonder why I don't see this issue with day2 RT kernel switch.
These issue makes me think how much we need support for shipping multiple OSTree refs. It may break again if shipped  RT kernelsub-packages adds any dependencies which are not shipped in base RHCOS

Comment 8 Sinny Kumari 2020-04-13 09:33:14 UTC
Checked with nightly 4.5.0-0.nightly-2020-04-13-014528 , machine-os-content doesn't include kernel-rt-devel package.
@Marc Can you please confirm if cluster installation with rt-kernel works as expected for you with recent nightlies?

Comment 9 Marc Sluiter 2020-04-14 10:50:30 UTC
I can confirm that rt kernel is installed using day1 method on 4.5.0-0.nightly-2020-04-14-055714

Comment 10 Sinny Kumari 2020-04-14 12:03:35 UTC
Thanks Marc for confirming!


QE: This bug can be verified by installing successfully latest 4.5 nightly cluster with setting kernel-rt as day1. Documentation is at https://github.com/openshift/installer/blob/master/docs/user/customization.md#Switching-RHCOS-host-kernel-using-KernelType . 

On installed cluster, get access to one of the worker node (considering we applied day1 kernel-rt switch on worker nodes) and run `rpm-ostree status`. Latest deployment shouldn't have kernel-rt-devel package layered

Comment 11 Michael Nguyen 2020-04-14 14:19:11 UTC
Verified RT kernel can be installed day-1 on 4.5.0-0.nightly-2020-04-13-122638


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-13-122638   True        False         3m33s   Cluster version is 4.5.0-0.nightly-2020-04-13-122638
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-140-248.us-west-2.compute.internal   Ready    master   28m   v1.17.1
ip-10-0-142-181.us-west-2.compute.internal   Ready    worker   11m   v1.17.1
ip-10-0-153-244.us-west-2.compute.internal   Ready    worker   12m   v1.17.1
ip-10-0-154-21.us-west-2.compute.internal    Ready    master   29m   v1.17.1
ip-10-0-166-176.us-west-2.compute.internal   Ready    worker   12m   v1.17.1
ip-10-0-170-140.us-west-2.compute.internal   Ready    master   28m   v1.17.1
$ oc get node -l node-role.kubernetes.io/worker -o custom-columns=NAME:.metadata.name,KERNEL-VERSION:.status.nodeInfo.kernelVersion
NAME                                         KERNEL-VERSION
ip-10-0-142-181.us-west-2.compute.internal   4.18.0-147.8.1.rt24.101.el8_1.x86_64
ip-10-0-153-244.us-west-2.compute.internal   4.18.0-147.8.1.rt24.101.el8_1.x86_64
ip-10-0-166-176.us-west-2.compute.internal   4.18.0-147.8.1.rt24.101.el8_1.x86_64
$ for i in $(oc get node -l node-role.kubernetes.io/worker -o name); do oc debug $i -- chroot /host rpm-ostree status; done
Starting pod/ip-10-0-142-181us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2d924444fb86f6974dd65bcc2c67afb298937529bde40ef3e40d4fd31c77c16c
              CustomOrigin: Managed by machine-config-operator
                   Version: 45.81.202004130827-0 (2020-04-13T08:33:17Z)
       RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.8.1.el8_1
             LocalPackages: kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64

  ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce
                   Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z)

Removing debug pod ...
Starting pod/ip-10-0-153-244us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2d924444fb86f6974dd65bcc2c67afb298937529bde40ef3e40d4fd31c77c16c
              CustomOrigin: Managed by machine-config-operator
                   Version: 45.81.202004130827-0 (2020-04-13T08:33:17Z)
       RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.8.1.el8_1
             LocalPackages: kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64

  ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce
                   Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z)

Removing debug pod ...
Starting pod/ip-10-0-166-176us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2d924444fb86f6974dd65bcc2c67afb298937529bde40ef3e40d4fd31c77c16c
              CustomOrigin: Managed by machine-config-operator
                   Version: 45.81.202004130827-0 (2020-04-13T08:33:17Z)
       RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-147.8.1.el8_1
             LocalPackages: kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64
                            kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64

  ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce
                   Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z)

Removing debug pod ...
$

Comment 14 errata-xmlrpc 2020-07-13 17:26:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.