Bug 1974277 - Tuned net plugin fails to handle net devices with n/a value for a channel
Summary: Tuned net plugin fails to handle net devices with n/a value for a channel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks: 1974718
TreeView+ depends on / blocked
 
Reported: 2021-06-21 09:31 UTC by Yanir Quinn
Modified: 2021-10-18 17:36 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:35:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 239 0 None open Bug 1974277: Fix conditional order for setting net device param. 2021-06-21 09:51:14 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:36:09 UTC

Description Yanir Quinn 2021-06-21 09:31:35 UTC
Description of problem:
When the tuned net plugin is set to scan devices and apply net queues configuration
it can throw an unhandeld exception if the device has a combined channel with the n/a value.

Version-Release number of selected component (if applicable):
OCP 4.8 and OCP 4.9

How reproducible:
Always

Steps to Reproduce:
1. See https://bugzilla.redhat.com/show_bug.cgi?id=1974071#c0


Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1974071  # original BZ
https://github.com/redhat-performance/tuned/pull/360 # u/s fix for tuned

Comment 2 Simon 2021-06-23 17:14:28 UTC
Cluster version: 4.9.0-0.nightly-2021-06-22-193627

$ nto=openshift-cluster-node-tuning-operator

$ oc project $nto
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas0623.qe.devcluster.openshift.com:6443".

$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-133-153.us-east-2.compute.internal   Ready    master   3h32m   v1.21.0-rc.0+120883f
ip-10-0-135-253.us-east-2.compute.internal   Ready    worker   3h25m   v1.21.0-rc.0+120883f
ip-10-0-164-131.us-east-2.compute.internal   Ready    master   3h32m   v1.21.0-rc.0+120883f
ip-10-0-170-30.us-east-2.compute.internal    Ready    worker   3h25m   v1.21.0-rc.0+120883f
ip-10-0-203-162.us-east-2.compute.internal   Ready    worker   3h25m   v1.21.0-rc.0+120883f
ip-10-0-208-216.us-east-2.compute.internal   Ready    master   3h32m   v1.21.0-rc.0+120883f

$ node=ip-10-0-135-253.us-east-2.compute.internal

$ oc get pods -o wide | grep $node
tuned-2d28g                                    1/1     Running   0          3h25m   10.0.135.253   ip-10-0-135-253.us-east-2.compute.internal   <none>           <none>

$ pod=tuned-2d28g

$ oc label pod $pod tuned.openshift.io/elasticsearch=
pod/tuned-2d28g labeled

$ oc get pods --show-labels
NAME                                           READY   STATUS    RESTARTS   AGE     LABELS
cluster-node-tuning-operator-bf7d4d84f-55xxf   1/1     Running   1          3h37m   name=cluster-node-tuning-operator,pod-template-hash=bf7d4d84f
tuned-2d28g                                    1/1     Running   0          3h28m   controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1,tuned.openshift.io/elasticsearch=
tuned-4bxsk                                    1/1     Running   0          3h28m   controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1
tuned-7v7zz                                    1/1     Running   0          3h33m   controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1
tuned-hg7jc                                    1/1     Running   0          3h33m   controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1
tuned-nf2nj                                    1/1     Running   0          3h33m   controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1
tuned-r98xl                                    1/1     Running   0          3h28m   controller-revision-hash=5b7c8977bf,openshift-app=tuned,pod-template-generation=1

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: net-plugin
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Test BZ 1974277
      include=openshift-control-plane

      [net]
      channels=combined 1

    name: testnetplugin

  recommend:
  - match:
    - label: tuned.openshift.io/elasticsearch
      type: pod
    priority: 5
    profile: testnetplugin
EOF
tuned.tuned.openshift.io/net-plugin created

$ oc get tuned
NAME         AGE
default      3h34m
net-plugin   8s
rendered     3h34m

$ oc get profiles
NAME                                         TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-133-153.us-east-2.compute.internal   openshift-control-plane   True      False      3h34m
ip-10-0-135-253.us-east-2.compute.internal   testnetplugin             True      False      3h28m
ip-10-0-164-131.us-east-2.compute.internal   openshift-control-plane   True      False      3h34m
ip-10-0-170-30.us-east-2.compute.internal    openshift-node            True      False      3h28m
ip-10-0-203-162.us-east-2.compute.internal   openshift-node            True      False      3h28m
ip-10-0-208-216.us-east-2.compute.internal   openshift-control-plane   True      False      3h34m

$ oc logs $pod 
[...]
I0623 17:03:22.152275    2538 tuned.go:312] extracting Tuned profiles
I0623 17:03:22.287298    2538 tuned.go:346] recommended Tuned profile openshift-node content unchanged
I0623 17:03:22.648719    2538 tuned.go:390] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile testnetplugin
I0623 17:03:23.343180    2538 tuned.go:644] active profile (openshift-node) != recommended profile (testnetplugin)
I0623 17:03:23.343218    2538 tuned.go:499] reloading tuned...
I0623 17:03:23.343223    2538 tuned.go:502] sending HUP to PID 3086
2021-06-23 17:03:23,343 INFO     tuned.daemon.daemon: stopping tuning
2021-06-23 17:03:23,360 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-06-23 17:03:23,366 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-06-23 17:03:23,366 INFO     tuned.daemon.daemon: Using 'testnetplugin' profile
2021-06-23 17:03:23,367 INFO     tuned.profiles.loader: loading profile: testnetplugin
2021-06-23 17:03:23,400 INFO     tuned.daemon.daemon: starting tuning
2021-06-23 17:03:23,402 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2021-06-23 17:03:23,403 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2021-06-23 17:03:23,406 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2021-06-23 17:03:23,407 WARNING  tuned.plugins.base: instance disk: no matching devices available
2021-06-23 17:03:23,409 INFO     tuned.plugins.base: instance net: assigning devices ens5
2021-06-23 17:03:23,412 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2021-06-23 17:03:23,436 INFO     tuned.daemon.daemon: static tuning from profile 'testnetplugin' applied
I0623 17:03:24.217018    2538 tuned.go:842] updated Profile ip-10-0-135-253.us-east-2.compute.internal stalld=<nil>, bootcmdline: 
I0623 17:03:24.217302    2538 tuned.go:390] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile testnetplugin
I0623 17:03:25.406223    2538 tuned.go:655] active and recommended profile (testnetplugin) match; profile change will not trigger profile reload

$ oc debug node/$node
Starting pod/ip-10-0-135-253us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.135.253
If you don't see a command prompt, try pressing enter.

sh-4.4# find /sys/class/net -type l -not -lname *virtual* -printf '%f\n'
ens5
sh-4.4# ethtool -l ens5
Channel parameters for ens5:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       2
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       1
sh-4.4# exit

$ oc get tuned
NAME         AGE
default      3h39m
net-plugin   4m52s
rendered     3h39m

sh-4.4# exit
exit

Removing debug pod ...

$ oc delete tuned net-plugin
tuned.tuned.openshift.io "net-plugin" deleted

$ oc get profiles
NAME                                         TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-133-153.us-east-2.compute.internal   openshift-control-plane   True      False      3h39m
ip-10-0-135-253.us-east-2.compute.internal   openshift-node            True      False      3h33m
ip-10-0-164-131.us-east-2.compute.internal   openshift-control-plane   True      False      3h39m
ip-10-0-170-30.us-east-2.compute.internal    openshift-node            True      False      3h33m
ip-10-0-203-162.us-east-2.compute.internal   openshift-node            True      False      3h33m
ip-10-0-208-216.us-east-2.compute.internal   openshift-control-plane   True      False      3h39m

$ oc debug node/$node
Starting pod/ip-10-0-135-253us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.135.253
If you don't see a command prompt, try pressing enter.
sh-4.4# find /sys/class/net -type l -not -lname *virtual* -printf '%f\n'
ens5
sh-4.4# ethtool -l ens5                                                 
Channel parameters for ens5:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       2
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       2
sh-4.4# exit
exit

Removing debug pod ...

Comment 5 errata-xmlrpc 2021-10-18 17:35:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.