Bug 1896381 - NTO fails to load kernel modules
Summary: NTO fails to load kernel modules
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: jmencak
QA Contact: Simon
URL:
Whiteboard:
Depends On: 1895919
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-10 12:33 UTC by jmencak
Modified: 2020-11-30 16:46 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-30 16:46:09 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 176 0 None closed [release-4.6] Bug 1896381: Add a weak dependency on kmod to tuned. 2020-11-30 07:40:02 UTC
Red Hat Product Errata RHBA-2020:5115 0 None None None 2020-11-30 16:46:29 UTC

Description jmencak 2020-11-10 12:33:26 UTC
This bug was initially created as a copy of Bug #1895919

I am copying this bug because: 



Created attachment 1727788 [details]
YAML tuned resource requesting a kernel module

Description of problem:

Node Tuning Operator fails to load kernel modules


Version-Release number of selected component (if applicable):

4.6.1


How reproducible:

100%


Steps to Reproduce:
1. create the resource attached, change the hostname if necessary
2. find the right tuned pod in openshift-cluster-node-tuning-operator 
# oc get pods -n openshift-cluster-node-tuning-operator -owide | grep worker1
tuned-c5h4m                                     1/1     Running   0          4d22h   192.168.222.31   worker1         <none>           <none>

3. get the logs of the pod:
# oc logs tuned-c5h4m
2020-11-09 12:27:44,194 INFO     tuned.daemon.daemon: Using 'openshift-fuse' profile
2020-11-09 12:27:44,195 INFO     tuned.profiles.loader: loading profile: openshift-fuse
2020-11-09 12:27:44,227 INFO     tuned.daemon.daemon: starting tuning
2020-11-09 12:27:44,288 INFO     tuned.plugins.base: instance cpu: assigning devices cpu4, cpu8, cpu12, cpu2, cpu15, cpu11, cpu3, cpu0, cpu6, cpu14, cpu7, cpu9, cpu5, cpu10, cpu1, cpu13
2020-11-09 12:27:44,289 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2020-11-09 12:27:44,292 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2020-11-09 12:27:44,307 INFO     tuned.plugins.base: instance disk: assigning devices sdb, dm-0, sda
2020-11-09 12:27:44,309 INFO     tuned.plugins.base: instance net: assigning devices enp1s0
2020-11-09 12:27:44,318 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-11-09 12:27:44,331 ERROR    tuned.utils.commands: Executing modinfo error: [Errno 2] No such file or directory: 'modinfo': 'modinfo'
2020-11-09 12:27:44,331 WARNING  tuned.plugins.plugin_modules: 'modinfo' command not found, not checking kernel modules
2020-11-09 12:27:44,333 ERROR    tuned.utils.commands: Executing modprobe error: [Errno 2] No such file or directory: 'modprobe': 'modprobe'
2020-11-09 12:27:44,333 WARNING  tuned.plugins.plugin_modules: 'modprobe' command not found, cannot reload kernel modules, reboot is required
2020-11-09 12:27:44,333 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-fuse' applied

Actual results:

kernel module not loaded


Expected results:

kernel module loaded


Additional info:

# oc describe pod/tuned-c5h4m | grep Image
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:72323ce541f8a26fbad17ef65ff21b51498863bb851635a0faa8d5b1ac6ce0e4

and changing the image hash to an older one (eg 09a7dea10cd584c6048f8df3dcec67dd9a8432eb44051353e180dfeb350c6310) works around the problem

Comment 3 Simon 2020-11-24 20:39:51 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-11-22-160856   True        False         50m     Cluster version is 4.6.0-0.nightly-2020-11-22-160856

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-154-108.us-east-2.compute.internal   Ready    master   74m   v1.19.0+43983cd
ip-10-0-156-254.us-east-2.compute.internal   Ready    worker   67m   v1.19.0+43983cd
ip-10-0-168-148.us-east-2.compute.internal   Ready    worker   70m   v1.19.0+43983cd
ip-10-0-176-62.us-east-2.compute.internal    Ready    master   75m   v1.19.0+43983cd
ip-10-0-219-123.us-east-2.compute.internal   Ready    worker   67m   v1.19.0+43983cd
ip-10-0-222-120.us-east-2.compute.internal   Ready    master   76m   v1.19.0+43983cd

$ worker=ip-10-0-156-254.us-east-2.compute.internal

$ oc get pods -o wide | grep $worker
tuned-dqskq                                     1/1     Running   0          68m   10.0.156.254   ip-10-0-156-254.us-east-2.compute.internal   <none>           <none>

$ pod=tuned-dqskq

$ oc get node $worker --show-labels
NAME                                         STATUS   ROLES    AGE   VERSION           LABELS
ip-10-0-156-254.us-east-2.compute.internal   Ready    worker   68m   v1.19.0+43983cd   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-156-254,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.large,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a

$ # kubernetes.io/hostname=ip-10-0-156-254 label

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: fuse-for-buildah
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=An OpenShift profile to load 'fuse' module
      include=openshift-node
      [modules]
      fuse=+r
    name: openshift-fuse
  recommend:
  - match:
    - label: kubernetes.io/hostname
      value: ip-10-0-156-254
    priority: 5
    profile: openshift-fuse
EOF
tuned.tuned.openshift.io/fuse-for-buildah created

for pr in $(oc get profiles -n openshift-cluster-node-tuning-operator --no-headers | cut -d ' ' -f 1); do echo $pr; oc get profile $pr -n openshift-cluster-node-tuning-operator -o json | jq ".spec.config.tunedProfile"; done
ip-10-0-154-108.us-east-2.compute.internal
"openshift-control-plane"
ip-10-0-156-254.us-east-2.compute.internal
"openshift-fuse"
ip-10-0-168-148.us-east-2.compute.internal
"openshift-node"
ip-10-0-176-62.us-east-2.compute.internal
"openshift-control-plane"
ip-10-0-219-123.us-east-2.compute.internal
"openshift-node"
ip-10-0-222-120.us-east-2.compute.internal
"openshift-control-plane"

$ # Correct profile on correct node

$ oc logs $pod

2020-11-24 19:47:03,616 INFO     tuned.plugins.base: instance disk: assigning devices dm-0
2020-11-24 19:47:03,618 INFO     tuned.plugins.base: instance net: assigning devices ens5
2020-11-24 19:47:03,647 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-11-24 19:47:03,653 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node' applied
I1124 20:35:24.183017    2280 tuned.go:281] extracting Tuned profiles
I1124 20:35:24.330718    2280 tuned.go:315] recommended Tuned profile openshift-node content unchanged
I1124 20:35:24.330895    2280 tuned.go:359] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile openshift-fuse
I1124 20:35:25.385964    2280 tuned.go:563] active profile (openshift-node) != recommended profile (openshift-fuse)
I1124 20:35:25.386008    2280 tuned.go:445] reloading tuned...
I1124 20:35:25.386014    2280 tuned.go:448] sending HUP to PID 3530
2020-11-24 20:35:25,386 INFO     tuned.daemon.daemon: stopping tuning
2020-11-24 20:35:25,402 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2020-11-24 20:35:25,409 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-11-24 20:35:25,409 INFO     tuned.daemon.daemon: Using 'openshift-fuse' profile
2020-11-24 20:35:25,410 INFO     tuned.profiles.loader: loading profile: openshift-fuse
2020-11-24 20:35:25,455 INFO     tuned.daemon.daemon: starting tuning
2020-11-24 20:35:25,458 INFO     tuned.plugins.base: instance cpu: assigning devices cpu1, cpu0
2020-11-24 20:35:25,459 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2020-11-24 20:35:25,461 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2020-11-24 20:35:25,463 INFO     tuned.plugins.base: instance disk: assigning devices dm-0
2020-11-24 20:35:25,465 INFO     tuned.plugins.base: instance net: assigning devices ens5
2020-11-24 20:35:25,469 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-11-24 20:35:25,511 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-fuse' applied

$ # ^^ No errors

Comment 5 errata-xmlrpc 2020-11-30 16:46:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5115


Note You need to log in before you can comment on or make changes to this bug.