Bug 1895919 - NTO fails to load kernel modules
Summary: NTO fails to load kernel modules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: jmencak
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks: 1896381
TreeView+ depends on / blocked
 
Reported: 2020-11-09 12:49 UTC by Kevin Pouget
Modified: 2021-02-24 15:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:31:28 UTC
Target Upstream Version:


Attachments (Terms of Use)
YAML tuned resource requesting a kernel module (441 bytes, text/plain)
2020-11-09 12:49 UTC, Kevin Pouget
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 175 0 None closed Bug 1895919: Add a weak dependency on kmod to tuned. 2021-01-07 19:57:47 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:31:58 UTC

Description Kevin Pouget 2020-11-09 12:49:16 UTC
Created attachment 1727788 [details]
YAML tuned resource requesting a kernel module

Created attachment 1727788 [details]
YAML tuned resource requesting a kernel module

Description of problem:

Node Tuning Operator fails to load kernel modules


Version-Release number of selected component (if applicable):

4.6.1


How reproducible:

100%


Steps to Reproduce:
1. create the resource attached, change the hostname if necessary
2. find the right tuned pod in openshift-cluster-node-tuning-operator 
# oc get pods -n openshift-cluster-node-tuning-operator -owide | grep worker1
tuned-c5h4m                                     1/1     Running   0          4d22h   192.168.222.31   worker1         <none>           <none>

3. get the logs of the pod:
# oc logs tuned-c5h4m
2020-11-09 12:27:44,194 INFO     tuned.daemon.daemon: Using 'openshift-fuse' profile
2020-11-09 12:27:44,195 INFO     tuned.profiles.loader: loading profile: openshift-fuse
2020-11-09 12:27:44,227 INFO     tuned.daemon.daemon: starting tuning
2020-11-09 12:27:44,288 INFO     tuned.plugins.base: instance cpu: assigning devices cpu4, cpu8, cpu12, cpu2, cpu15, cpu11, cpu3, cpu0, cpu6, cpu14, cpu7, cpu9, cpu5, cpu10, cpu1, cpu13
2020-11-09 12:27:44,289 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2020-11-09 12:27:44,292 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2020-11-09 12:27:44,307 INFO     tuned.plugins.base: instance disk: assigning devices sdb, dm-0, sda
2020-11-09 12:27:44,309 INFO     tuned.plugins.base: instance net: assigning devices enp1s0
2020-11-09 12:27:44,318 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-11-09 12:27:44,331 ERROR    tuned.utils.commands: Executing modinfo error: [Errno 2] No such file or directory: 'modinfo': 'modinfo'
2020-11-09 12:27:44,331 WARNING  tuned.plugins.plugin_modules: 'modinfo' command not found, not checking kernel modules
2020-11-09 12:27:44,333 ERROR    tuned.utils.commands: Executing modprobe error: [Errno 2] No such file or directory: 'modprobe': 'modprobe'
2020-11-09 12:27:44,333 WARNING  tuned.plugins.plugin_modules: 'modprobe' command not found, cannot reload kernel modules, reboot is required
2020-11-09 12:27:44,333 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-fuse' applied

Actual results:

kernel module not loaded


Expected results:

kernel module loaded


Additional info:

# oc describe pod/tuned-c5h4m | grep Image
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:72323ce541f8a26fbad17ef65ff21b51498863bb851635a0faa8d5b1ac6ce0e4

and changing the image hash to an older one (eg 09a7dea10cd584c6048f8df3dcec67dd9a8432eb44051353e180dfeb350c6310) works around the problem

Comment 1 jmencak 2020-11-09 19:05:40 UTC
Filed upstream tuned issue
https://github.com/redhat-performance/tuned/issues/304
workaround for NTO to follow.

Comment 3 Simon 2020-11-13 18:39:14 UTC
POSITIVE VERIFICATION

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          63m     Unable to apply 4.7.0-0.nightly-2020-11-12-063401

oc project openshift-cluster-node-tuning-operator
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas1311.qe.devcluster.openshift.com:6443".

oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-138-224.us-east-2.compute.internal   Ready    master   65m   v1.19.2+5fc3f4d
ip-10-0-148-170.us-east-2.compute.internal   Ready    worker   59m   v1.19.2+5fc3f4d
ip-10-0-163-226.us-east-2.compute.internal   Ready    master   65m   v1.19.2+5fc3f4d
ip-10-0-180-132.us-east-2.compute.internal   Ready    worker   59m   v1.19.2+5fc3f4d
ip-10-0-209-84.us-east-2.compute.internal    Ready    master   64m   v1.19.2+5fc3f4d
ip-10-0-210-158.us-east-2.compute.internal   Ready    worker   59m   v1.19.2+5fc3f4d

worker=ip-10-0-148-170.us-east-2.compute.internal

oc get pods -o wide | grep $worker
tuned-prj8c                                     1/1     Running   0          61m   10.0.148.170   ip-10-0-148-170.us-east-2.compute.internal   <none>           <none>

pod=tuned-prj8c

oc get node $worker --show-labels
NAME                                         STATUS   ROLES    AGE   VERSION           LABELS
ip-10-0-148-170.us-east-2.compute.internal   Ready    worker   62m   v1.19.2+5fc3f4d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-148-170,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.large,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a

oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: fuse-for-buildah
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=An OpenShift profile to load 'fuse' module
      include=openshift-node
      [modules]
      fuse=+r
    name: openshift-fuse
  recommend:
  - match:
    - label: kubernetes.io/hostname
      value: ip-10-0-148-170
    priority: 5
    profile: openshift-fuse
EOF

oc get tuned
NAME               AGE
default            70m
fuse-for-buildah   4s
rendered           70m

for pr in $(oc get profiles -n openshift-cluster-node-tuning-operator --no-headers | cut -d ' ' -f 1); do echo $pr; oc get profile $pr -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile"; done
ip-10-0-138-224.us-east-2.compute.internal
"openshift-control-plane"
ip-10-0-148-170.us-east-2.compute.internal
"openshift-fuse"
ip-10-0-163-226.us-east-2.compute.internal
"openshift-control-plane"
ip-10-0-180-132.us-east-2.compute.internal
"openshift-node"
ip-10-0-209-84.us-east-2.compute.internal
"openshift-control-plane"
ip-10-0-210-158.us-east-2.compute.internal
"openshift-node"

** Correct profile on $worker **

oc logs $pod
2020-11-13 18:27:51,175 INFO     tuned.daemon.daemon: stopping tuning
2020-11-13 18:27:51,193 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2020-11-13 18:27:51,199 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-11-13 18:27:51,200 INFO     tuned.daemon.daemon: Using 'openshift-fuse' profile
2020-11-13 18:27:51,201 INFO     tuned.profiles.loader: loading profile: openshift-fuse
2020-11-13 18:27:51,245 INFO     tuned.daemon.daemon: starting tuning
2020-11-13 18:27:51,248 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2020-11-13 18:27:51,249 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2020-11-13 18:27:51,251 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2020-11-13 18:27:51,253 INFO     tuned.plugins.base: instance disk: assigning devices dm-0
2020-11-13 18:27:51,254 INFO     tuned.plugins.base: instance net: assigning devices ens5
2020-11-13 18:27:51,257 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-11-13 18:27:51,295 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-fuse' applied

** No problems loading module **

Comment 6 errata-xmlrpc 2021-02-24 15:31:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.