Bug 1888853

Summary: MCO extension kernel-headers is invalid
Product: OpenShift Container Platform Reporter: Evan Dunn <edunn>
Component: Machine Config OperatorAssignee: Sinny Kumari <skumari>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: brueckner, danili, dustymabe, jomiller, skumari, walters
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1890074 (view as bug list) Environment:
Last Closed: 2020-11-30 16:45:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1890074    
Bug Blocks:    

Description Evan Dunn 2020-10-15 23:46:09 UTC
Description of problem:
kernel-headers are now absent from RHCOS, but MCO extensions cannot be used to install them

Version-Release number of selected component (if applicable):
4.6.0-rc.3

Steps to Reproduce:
1. oc debug node/... 
2. chroot /host
3. rpm -qa | grep kernel
4. apply MachinConfig

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 03-worker-extensions
spec:
  config:
    ignition:
      version: 3.1.0
  extensions:
    - kernel-headers
    - kernel-devel
5. invalid extensions found: kernel-headers

EDIT: note, using just kernel-devel under extensions also does not install kernel-headers package.

Actual results:

RHCOS removed kernel-headers and kernel-devel, but they cannot both be installed with MCO extensions

rpm -qa | grep kernel

kernel-core-4.18.0-193.24.1.el8_2.dt1.x86_64
kernel-modules-4.18.0-193.24.1.el8_2.dt1.x86_64
kernel-modules-extra-4.18.0-193.24.1.el8_2.dt1.x86_64
kernel-4.18.0-193.24.1.el8_2.dt1.x86_64

Expected results:

Access to kernel-headers and kernel-devel on RHCOS workers

kernel-core-4.18.0-193.14.3.el8_2.ppc64le
kernel-modules-extra-4.18.0-193.14.3.el8_2.ppc64le
kernel-modules-4.18.0-193.14.3.el8_2.ppc64le
kernel-4.18.0-193.14.3.el8_2.ppc64le
kernel-devel-4.18.0-193.14.3.el8_2.ppc64le
kernel-headers-4.18.0-193.14.3.el8_2.ppc64le

Comment 1 Sinny Kumari 2020-10-21 09:36:34 UTC
Thank you for reporting this issue, we will work on a fix to include kernel-headers included as part of kernel-devel extension.

Comment 2 Hendrik Brueckner 2020-10-21 09:47:53 UTC
Sinny, could share some insights?

This problem impacts the Spectrum Scale beta program.

Dan, could you add this bug for discussion on the Thu multi-arch call?

Comment 4 Sinny Kumari 2020-10-21 10:21:17 UTC
(In reply to Hendrik Brueckner from comment #2)
> Sinny, could share some insights?

Currently the way kernel-devel extension is implemented in MCO - it installs only kernel-devel package (https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/update.go#L912) and missing dependencies if any. It seems, kernel-headers package is not an install dependency for kernel-devel, and hence it doesn't get installed. 

In order to fix the issue, we will need to patch MCO to also install kernel-headers package when kernel-devel is specified as an extensions in a MachineConfig.

> This problem impacts the Spectrum Scale beta program.

We understand the urgency here, this bug is in our top priority list to get fixed.

Comment 6 Hendrik Brueckner 2020-10-21 12:30:38 UTC
Sinny, thanks for the update.

Comment 12 Sinny Kumari 2020-10-22 14:25:17 UTC
Patch has been submitted to upstream master branch - https://github.com/openshift/machine-config-operator/pull/2170 . Once pull request gets merged and verified we will backport it to 4.6

Comment 13 Dan Li 2020-10-26 12:10:04 UTC
Hi @Hendrick, please see Sinny's Comment 22. Do we still need to add this bug to Thursday meeting's discussion? If so, I will add it to the list.

Comment 14 Hendrik Brueckner 2020-10-26 13:41:45 UTC
@danili With the PR already merged, there is great progress here. The Spectrum Scale team is in contact with Red Hat on this topic as well.

Comment 15 Sinny Kumari 2020-10-29 13:24:30 UTC
Backported fix is in PR https://github.com/openshift/machine-config-operator/pull/2187

Comment 16 Dan Li 2020-11-06 18:18:36 UTC
Added this bug to the MA meeting agenda. I'm removing the "needinfo"

Comment 19 Michael Nguyen 2020-11-16 18:50:15 UTC
Veried on 4.6.0-0.nightly-2020-11-15-104235.  Successfully installed kernel-devel extension


$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-30rd65b-f76d1-zd8c9-master-0         Ready    master   33m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-master-1         Ready    master   33m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-master-2         Ready    master   33m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-worker-b-9xktv   Ready    worker   24m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-worker-c-m5zjk   Ready    worker   25m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-worker-d-lz6q2   Ready    worker   25m   v1.19.0+9f84db3
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-11-15-104235   True        False         106s    Cluster version is 4.6.0-0.nightly-2020-11-15-104235
$ oc debug node/ci-ln-30rd65b-f76d1-zd8c9-worker-b-9xktv -- chroot /host rpm -qa | grep kernel
Starting pod/ci-ln-30rd65b-f76d1-zd8c9-worker-b-9xktv-debug ...
To use host binaries, run `chroot /host`
kernel-core-4.18.0-193.29.1.el8_2.x86_64
kernel-modules-4.18.0-193.29.1.el8_2.x86_64
kernel-4.18.0-193.29.1.el8_2.x86_64
kernel-modules-extra-4.18.0-193.29.1.el8_2.x86_64

Removing debug pod ...
$ cat << EOF > 03-worker-extensions.yaml
> apiVersion: machineconfiguration.openshift.io/v1
> kind: MachineConfig
> metadata:
>   labels:
>     machineconfiguration.openshift.io/role: worker
>   name: 03-worker-extensions
> spec:
>   config:
>     ignition:
>       version: 3.1.0
>   extensions:
>     - kernel-devel
> EOF
$ oc create -f 03-worker-extensions.yaml 
machineconfig.machineconfiguration.openshift.io/03-worker-extensions created
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
00-worker                                          fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
01-master-container-runtime                        fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
01-master-kubelet                                  fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
01-worker-container-runtime                        fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
01-worker-kubelet                                  fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
03-worker-extensions                                                                          3.1.0             4s
99-master-generated-registries                     fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
99-master-ssh                                                                                 3.1.0             38m
99-worker-generated-registries                     fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
99-worker-ssh                                                                                 3.1.0             38m
rendered-master-3b5af39131e1d19c816b48cfd34e0192   fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
rendered-worker-57f8c43ccd91bebc038c12a94565c3a8   fbb7093f17ef1183b4a2e620daf064e50a56e720   3.1.0             32m
$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-57f8c43ccd91bebc038c12a94565c3a8   False     True       False      3              0                   0                     0                      33m
$ watch oc get nodes
$ oc get nodes
NAME                                       STATUS                     ROLES    AGE   VERSION
ci-ln-30rd65b-f76d1-zd8c9-master-0         Ready                      master   35m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-master-1         Ready                      master   35m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-master-2         Ready                      master   35m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-worker-b-9xktv   Ready                      worker   26m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-worker-c-m5zjk   Ready,SchedulingDisabled   worker   26m   v1.19.0+9f84db3
ci-ln-30rd65b-f76d1-zd8c9-worker-d-lz6q2   Ready                      worker   26m   v1.19.0+9f84db3
$ watch oc get mcp/worker 
$ oc debug node/ci-ln-30rd65b-f76d1-zd8c9-worker-c-m5zjk -- chroot /host rpm -qa | grep kernel
Starting pod/ci-ln-30rd65b-f76d1-zd8c9-worker-c-m5zjk-debug ...
To use host binaries, run `chroot /host`
kernel-core-4.18.0-193.29.1.el8_2.x86_64
kernel-headers-4.18.0-193.29.1.el8_2.x86_64
kernel-modules-4.18.0-193.29.1.el8_2.x86_64
kernel-4.18.0-193.29.1.el8_2.x86_64
kernel-devel-4.18.0-193.29.1.el8_2.x86_64
kernel-modules-extra-4.18.0-193.29.1.el8_2.x86_64

Removing debug pod ...

Comment 21 errata-xmlrpc 2020-11-30 16:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5115