Description of problem: When the tuned net plugin is set to scan devices and apply net queues configuration it can throw an unhandeld exception if the device has a combined channel with the n/a value. Version-Release number of selected component (if applicable): OCP 4.8 and OCP 4.9 How reproducible: Always Steps to Reproduce: 1. See https://bugzilla.redhat.com/show_bug.cgi?id=1974071#c0 Additional info: https://bugzilla.redhat.com/show_bug.cgi?id=1974071 # original BZ https://github.com/redhat-performance/tuned/pull/360 # u/s fix for tuned
Cluster version: 4.9.0-0.nightly-2021-06-22-193627 $ nto=openshift-cluster-node-tuning-operator $ oc project $nto Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas0623.qe.devcluster.openshift.com:6443". $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-133-153.us-east-2.compute.internal Ready master 3h32m v1.21.0-rc.0+120883f ip-10-0-135-253.us-east-2.compute.internal Ready worker 3h25m v1.21.0-rc.0+120883f ip-10-0-164-131.us-east-2.compute.internal Ready master 3h32m v1.21.0-rc.0+120883f ip-10-0-170-30.us-east-2.compute.internal Ready worker 3h25m v1.21.0-rc.0+120883f ip-10-0-203-162.us-east-2.compute.internal Ready worker 3h25m v1.21.0-rc.0+120883f ip-10-0-208-216.us-east-2.compute.internal Ready master 3h32m v1.21.0-rc.0+120883f $ node=ip-10-0-135-253.us-east-2.compute.internal $ oc get pods -o wide | grep $node tuned-2d28g 1/1 Running 0 3h25m 10.0.135.253 ip-10-0-135-253.us-east-2.compute.internal <none> <none> $ pod=tuned-2d28g $ oc label pod $pod tuned.openshift.io/elasticsearch= pod/tuned-2d28g labeled $ oc get pods --show-labels NAME READY STATUS RESTARTS AGE LABELS cluster-node-tuning-operator-bf7d4d84f-55xxf 1/1 Running 1 3h37m name=cluster-node-tuning-operator,pod-template-hash=bf7d4d84f tuned-2d28g 1/1 Running 0 3h28m controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1,tuned.openshift.io/elasticsearch= tuned-4bxsk 1/1 Running 0 3h28m controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1 tuned-7v7zz 1/1 Running 0 3h33m controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1 tuned-hg7jc 1/1 Running 0 3h33m controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1 tuned-nf2nj 1/1 Running 0 3h33m controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1 tuned-r98xl 1/1 Running 0 3h28m controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1 $ oc create -f- <<EOF apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: net-plugin namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Test BZ 1974277 include=openshift-control-plane [net] channels=combined 1 name: testnetplugin recommend: - match: - label: tuned.openshift.io/elasticsearch type: pod priority: 5 profile: testnetplugin EOF tuned.tuned.openshift.io/net-plugin created $ oc get tuned NAME AGE default 3h34m net-plugin 8s rendered 3h34m $ oc get profiles NAME TUNED APPLIED DEGRADED AGE ip-10-0-133-153.us-east-2.compute.internal openshift-control-plane True False 3h34m ip-10-0-135-253.us-east-2.compute.internal testnetplugin True False 3h28m ip-10-0-164-131.us-east-2.compute.internal openshift-control-plane True False 3h34m ip-10-0-170-30.us-east-2.compute.internal openshift-node True False 3h28m ip-10-0-203-162.us-east-2.compute.internal openshift-node True False 3h28m ip-10-0-208-216.us-east-2.compute.internal openshift-control-plane True False 3h34m $ oc logs $pod [...] I0623 17:03:22.152275 2538 tuned.go:312] extracting Tuned profiles I0623 17:03:22.287298 2538 tuned.go:346] recommended Tuned profile openshift-node content unchanged I0623 17:03:22.648719 2538 tuned.go:390] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile testnetplugin I0623 17:03:23.343180 2538 tuned.go:644] active profile (openshift-node) != recommended profile (testnetplugin) I0623 17:03:23.343218 2538 tuned.go:499] reloading tuned... I0623 17:03:23.343223 2538 tuned.go:502] sending HUP to PID 3086 2021-06-23 17:03:23,343 INFO tuned.daemon.daemon: stopping tuning 2021-06-23 17:03:23,360 INFO tuned.daemon.daemon: terminating Tuned, rolling back all changes 2021-06-23 17:03:23,366 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2021-06-23 17:03:23,366 INFO tuned.daemon.daemon: Using 'testnetplugin' profile 2021-06-23 17:03:23,367 INFO tuned.profiles.loader: loading profile: testnetplugin 2021-06-23 17:03:23,400 INFO tuned.daemon.daemon: starting tuning 2021-06-23 17:03:23,402 INFO tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1 2021-06-23 17:03:23,403 INFO tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform 2021-06-23 17:03:23,406 WARNING tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias 2021-06-23 17:03:23,407 WARNING tuned.plugins.base: instance disk: no matching devices available 2021-06-23 17:03:23,409 INFO tuned.plugins.base: instance net: assigning devices ens5 2021-06-23 17:03:23,412 INFO tuned.plugins.plugin_sysctl: reapplying system sysctl 2021-06-23 17:03:23,436 INFO tuned.daemon.daemon: static tuning from profile 'testnetplugin' applied I0623 17:03:24.217018 2538 tuned.go:842] updated Profile ip-10-0-135-253.us-east-2.compute.internal stalld=<nil>, bootcmdline: I0623 17:03:24.217302 2538 tuned.go:390] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile testnetplugin I0623 17:03:25.406223 2538 tuned.go:655] active and recommended profile (testnetplugin) match; profile change will not trigger profile reload $ oc debug node/$node Starting pod/ip-10-0-135-253us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.135.253 If you don't see a command prompt, try pressing enter. sh-4.4# find /sys/class/net -type l -not -lname *virtual* -printf '%f\n' ens5 sh-4.4# ethtool -l ens5 Channel parameters for ens5: Pre-set maximums: RX: n/a TX: n/a Other: n/a Combined: 2 Current hardware settings: RX: n/a TX: n/a Other: n/a Combined: 1 sh-4.4# exit $ oc get tuned NAME AGE default 3h39m net-plugin 4m52s rendered 3h39m sh-4.4# exit exit Removing debug pod ... $ oc delete tuned net-plugin tuned.tuned.openshift.io "net-plugin" deleted $ oc get profiles NAME TUNED APPLIED DEGRADED AGE ip-10-0-133-153.us-east-2.compute.internal openshift-control-plane True False 3h39m ip-10-0-135-253.us-east-2.compute.internal openshift-node True False 3h33m ip-10-0-164-131.us-east-2.compute.internal openshift-control-plane True False 3h39m ip-10-0-170-30.us-east-2.compute.internal openshift-node True False 3h33m ip-10-0-203-162.us-east-2.compute.internal openshift-node True False 3h33m ip-10-0-208-216.us-east-2.compute.internal openshift-control-plane True False 3h39m $ oc debug node/$node Starting pod/ip-10-0-135-253us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.135.253 If you don't see a command prompt, try pressing enter. sh-4.4# find /sys/class/net -type l -not -lname *virtual* -printf '%f\n' ens5 sh-4.4# ethtool -l ens5 Channel parameters for ens5: Pre-set maximums: RX: n/a TX: n/a Other: n/a Combined: 2 Current hardware settings: RX: n/a TX: n/a Other: n/a Combined: 2 sh-4.4# exit exit Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759