Description of problem: updating a cluster from 4.4 to 4.5, a worker rt node gets a new kernel where kvm module is missing, although it's present in a fresh 4.5 install Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. deploy a 4.4 cluster 2. mark node as realtime 3. upgrade cluster to 4.5 4. check whether kvm module is present in the upgraded node Actual results: kvm module is not there Expected results: kvm should be there Additional info: it can be argued that kvm was not present in the 4.4 rt worker but since it's present when deploying 4.5 from the beginning, and upgrade to 4.5 should bring the node to the same state as a new install
Uhm, I'm moving to RHCOS to better assess why it's missing and what we can do - I'll move it back to MCO if we need to install that ourselves (but it would be very weird I think)
Note: the realtime kvm module was added recently and is available in OCP 4.5 only. And when installing OCP 4.5 directly it works as expected. But somehow that module is missed during upgrade from 4.4 to 4.5.
The way RT kernel is implemented currently, we only upgrade RT kernel related packages which are already installed on the host. Since kernel-rt-kvm package was not shipped in 4.4, it was never installed on 4.4 based RHCOS node and hence they are not getting updated/installed. One hacky way to deal with 4.4 based cluster is to install kernel-rt-kvm manually on 4.4 based nodes by pulling latest 4.5 machine-os-content image RHCOS node is using and running `rpm-ostree install <local kernel-rt-kvm package location>`. This is a one time task, during next upgrade MCO should handle it since kernel-rt-kvm is already been installed.
I think we should change the MCO to try installing kernel-rt-kvm if it exists.
> I think we should change the MCO to try installing kernel-rt-kvm if it exists. To elaborate, when doing an upgrade, if we have `kernel-rt`, also install `kernel-rt-kvm` if we don't already have it.
(In reply to Colin Walters from comment #5) > > I think we should change the MCO to try installing kernel-rt-kvm if it exists. > > To elaborate, when doing an upgrade, if we have `kernel-rt`, also install > `kernel-rt-kvm` if we don't already have it. I think this is what we may want to do +1 (In reply to Sinny Kumari from comment #3) > The way RT kernel is implemented currently, we only upgrade RT kernel > related packages which are already installed on the host. Since > kernel-rt-kvm package was not shipped in 4.4, it was never installed on 4.4 > based RHCOS node and hence they are not getting updated/installed. > > One hacky way to deal with 4.4 based cluster is to install kernel-rt-kvm > manually on 4.4 based nodes by pulling latest 4.5 machine-os-content image > RHCOS node is using and running `rpm-ostree install <local kernel-rt-kvm > package location>`. This is a one time task, during next upgrade MCO should > handle it since kernel-rt-kvm is already been installed. if I got Colin's right, we'd need to just try and install that package once we boot into 4.5 rhcos and a sync in MCD happens, we would look for that pkg and if it's missing we'll install it
(In reply to Colin Walters from comment #4) > I think we should change the MCO to try installing kernel-rt-kvm if it > exists. yes, we can. I am a bit concerned that every additional conditions increases chance of error if the list grows with time ...
(In reply to Sinny Kumari from comment #7) > (In reply to Colin Walters from comment #4) > > I think we should change the MCO to try installing kernel-rt-kvm if it > > exists. > > yes, we can. I am a bit concerned that every additional conditions increases > chance of error if the list grows with time ... s/error/bugs/
@Karim Fix should be available in latest 4.5 nightly. Let us know if cluster upgrade with RT kernel works as expected for you. Thanks.
verified on 4.5.0-0.nightly-2020-04-30-112808 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-134-199.us-west-2.compute.internal Ready worker 103m v1.17.1 ip-10-0-143-138.us-west-2.compute.internal Ready master 116m v1.17.1 ip-10-0-144-170.us-west-2.compute.internal Ready worker 104m v1.17.1 ip-10-0-156-222.us-west-2.compute.internal Ready master 117m v1.17.1 ip-10-0-163-220.us-west-2.compute.internal Ready worker 104m v1.17.1 ip-10-0-165-195.us-west-2.compute.internal Ready master 116m v1.17.1 $ oc debug node/ip-10-0-134-199.us-west-2.compute.internal Starting pod/ip-10-0-134-199us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# uname -a Linux ip-10-0-134-199 4.18.0-147.8.1.rt24.101.el8_1.x86_64 #1 SMP PREEMPT RT Wed Feb 26 16:43:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# rpm -qa | grep kernel kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64 sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-04-30-112808 --force Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-04-30-112808 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-04-30-051505 True True 45m Working towards 4.5.0-0.nightly-2020-04-30-112808: 84% complete $ watch oc get clusterversion $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-30-112808 True False 5m42s Cluster version is 4.5.0-0.nightly-2020-04-30-112808 $ oc debug node/ip-10-0-134-199.us-west-2.compute.internal -- chroot /host uname -a Starting pod/ip-10-0-134-199us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Linux ip-10-0-134-199 4.18.0-147.8.1.rt24.101.el8_1.x86_64 #1 SMP PREEMPT RT Wed Feb 26 16:43:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Removing debug pod ... $ oc debug node/ip-10-0-134-199.us-west-2.compute.internal -- chroot /host rpm -qa | grep kernel Starting pod/ip-10-0-134-199us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` kernel-rt-kvm-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-headers-4.18.0-147.8.1.el8_1.x86_64 kernel-rt-core-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-rt-modules-extra-4.18.0-147.8.1.rt24.101.el8_1.x86_64 kernel-devel-4.18.0-147.8.1.el8_1.x86_64 kernel-rt-modules-4.18.0-147.8.1.rt24.101.el8_1.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409