Bug 1974718 - Tuned net plugin fails to handle net devices with n/a value for a channel
Summary: Tuned net plugin fails to handle net devices with n/a value for a channel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On: 1974277
Blocks: 1973502 1975803
TreeView+ depends on / blocked
 
Reported: 2021-06-22 11:35 UTC by OpenShift BugZilla Robot
Modified: 2021-07-27 23:13 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:13:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 240 0 None open [release-4.8] Bug 1974718: Fix conditional order for setting net device param. 2021-06-22 11:35:50 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:13:37 UTC

Description OpenShift BugZilla Robot 2021-06-22 11:35:27 UTC
+++ This bug was initially created as a clone of Bug #1974277 +++

Description of problem:
When the tuned net plugin is set to scan devices and apply net queues configuration
it can throw an unhandeld exception if the device has a combined channel with the n/a value.

Version-Release number of selected component (if applicable):
OCP 4.8 and OCP 4.9

How reproducible:
Always

Steps to Reproduce:
1. See https://bugzilla.redhat.com/show_bug.cgi?id=1974071#c0


Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1974071  # original BZ
https://github.com/redhat-performance/tuned/pull/360 # u/s fix for tuned

Comment 2 Simon 2021-06-24 12:48:26 UTC
Clusterversion: 4.8.0-0.nightly-2021-06-23-232238

$ nto=openshift-cluster-node-tuning-operator

$ oc project $nto
Now using project "openshift-cluster-node-tuning-operator" on server "https://api.skordas0624.qe.devcluster.openshift.com:6443".

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-148-91.us-east-2.compute.internal    Ready    worker   51m   v1.21.0-rc.0+766a5fe
ip-10-0-153-206.us-east-2.compute.internal   Ready    master   56m   v1.21.0-rc.0+766a5fe
ip-10-0-166-117.us-east-2.compute.internal   Ready    master   56m   v1.21.0-rc.0+766a5fe
ip-10-0-166-134.us-east-2.compute.internal   Ready    worker   49m   v1.21.0-rc.0+766a5fe
ip-10-0-207-169.us-east-2.compute.internal   Ready    worker   49m   v1.21.0-rc.0+766a5fe
ip-10-0-221-168.us-east-2.compute.internal   Ready    master   55m   v1.21.0-rc.0+766a5fe

$ node=ip-10-0-148-91.us-east-2.compute.internal

$ oc get pods -o wide | grep $node
tuned-zd4q4                                     1/1     Running   0          50m   10.0.148.91    ip-10-0-148-91.us-east-2.compute.internal    <none>           <none>

$ pod=tuned-zd4q4

$ oc label pod $pod tuned.openshift.io/elasticsearch=
pod/tuned-zd4q4 labeled

$ oc get pods --show-labels
NAME                                            READY   STATUS    RESTARTS   AGE   LABELS
cluster-node-tuning-operator-5957c5df4f-fktgv   1/1     Running   1          65m   name=cluster-node-tuning-operator,pod-template-hash=5957c5df4f
tuned-28bzx                                     1/1     Running   0          50m   controller-revision-hash=649d574bbd,openshift-app=tuned,pod-template-generation=1
tuned-fm8tn                                     1/1     Running   0          50m   controller-revision-hash=649d574bbd,openshift-app=tuned,pod-template-generation=1
tuned-hlxg9                                     1/1     Running   0          54m   controller-revision-hash=649d574bbd,openshift-app=tuned,pod-template-generation=1
tuned-rfmhw                                     1/1     Running   0          54m   controller-revision-hash=649d574bbd,openshift-app=tuned,pod-template-generation=1
tuned-zd4q4                                     1/1     Running   0          51m   controller-revision-hash=649d574bbd,openshift-app=tuned,pod-template-generation=1,tuned.openshift.io/elasticsearch=
tuned-zhjgz                                     1/1     Running   0          54m   controller-revision-hash=649d574bbd,openshift-app=tuned,pod-template-generation=1

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: net-plugin
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Test BZ 1974718
      include=openshift-control-plane

      [net]
      channels=combined 1

    name: testnetplugin

  recommend:
  - match:
    - label: tuned.openshift.io/elasticsearch
      type: pod
    priority: 5
    profile: testnetplugin
EOF
tuned.tuned.openshift.io/net-plugin created

$ oc get tuned
NAME         AGE
default      55m
net-plugin   10s
rendered     55m

$ oc get profiles
NAME                                         TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-148-91.us-east-2.compute.internal    testnetplugin             True      False      52m
ip-10-0-153-206.us-east-2.compute.internal   openshift-control-plane   True      False      55m
ip-10-0-166-117.us-east-2.compute.internal   openshift-control-plane   True      False      55m
ip-10-0-166-134.us-east-2.compute.internal   openshift-node            True      False      50m
ip-10-0-207-169.us-east-2.compute.internal   openshift-node            True      False      50m
ip-10-0-221-168.us-east-2.compute.internal   openshift-control-plane   True      False      55m

$ oc logs $pod
[...]
I0624 12:43:43.905653    3046 tuned.go:312] extracting Tuned profiles
I0624 12:43:44.036297    3046 tuned.go:346] recommended Tuned profile openshift-node content unchanged
I0624 12:43:44.049606    3046 tuned.go:390] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile testnetplugin
I0624 12:43:44.708138    3046 tuned.go:644] active profile (openshift-node) != recommended profile (testnetplugin)
I0624 12:43:44.708182    3046 tuned.go:499] reloading tuned...
I0624 12:43:44.708188    3046 tuned.go:502] sending HUP to PID 4522
2021-06-24 12:43:44,708 INFO     tuned.daemon.daemon: stopping tuning
2021-06-24 12:43:44,731 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2021-06-24 12:43:44,745 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2021-06-24 12:43:44,747 INFO     tuned.daemon.daemon: Using 'testnetplugin' profile
2021-06-24 12:43:44,748 INFO     tuned.profiles.loader: loading profile: testnetplugin
2021-06-24 12:43:44,817 INFO     tuned.daemon.daemon: starting tuning
2021-06-24 12:43:44,819 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2021-06-24 12:43:44,820 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2021-06-24 12:43:44,823 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2021-06-24 12:43:44,826 WARNING  tuned.plugins.base: instance disk: no matching devices available
2021-06-24 12:43:44,830 INFO     tuned.plugins.base: instance net: assigning devices ens5
2021-06-24 12:43:44,834 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2021-06-24 12:43:44,859 INFO     tuned.daemon.daemon: static tuning from profile 'testnetplugin' applied
I0624 12:43:45.585700    3046 tuned.go:390] written "/etc/tuned/recommend.d/50-openshift.conf" to set Tuned profile testnetplugin
I0624 12:43:45.585931    3046 tuned.go:842] updated Profile ip-10-0-148-91.us-east-2.compute.internal stalld=<nil>, bootcmdline: 
I0624 12:43:45.710848    3046 tuned.go:655] active and recommended profile (testnetplugin) match; profile change will not trigger profile reload

$ oc debug node/$node
Starting pod/ip-10-0-148-91us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.148.91
If you don't see a command prompt, try pressing enter.
sh-4.4# find /sys/class/net -type l -not -lname *virtual* -printf '%f\n'
ens5
sh-4.4# ethtool -l ens5
Channel parameters for ens5:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       2
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       1
sh-4.4# exit
exit

Removing debug pod ...

$ oc delete tuned net-plugin
tuned.tuned.openshift.io "net-plugin" deleted

$ oc get profiles
NAME                                         TUNED                     APPLIED   DEGRADED   AGE
ip-10-0-148-91.us-east-2.compute.internal    openshift-node            True      False      55m
ip-10-0-153-206.us-east-2.compute.internal   openshift-control-plane   True      False      58m
ip-10-0-166-117.us-east-2.compute.internal   openshift-control-plane   True      False      58m
ip-10-0-166-134.us-east-2.compute.internal   openshift-node            True      False      53m
ip-10-0-207-169.us-east-2.compute.internal   openshift-node            True      False      53m
ip-10-0-221-168.us-east-2.compute.internal   openshift-control-plane   True      False      58m

$ oc debug node/$node
Starting pod/ip-10-0-148-91us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.148.91
If you don't see a command prompt, try pressing enter.
sh-4.4# find /sys/class/net -type l -not -lname *virtual* -printf '%f\n'
ens5
sh-4.4# ethtool -l ens5
Channel parameters for ens5:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       2
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       2
sh-4.4# exit
exit

Removing debug pod ...

Comment 5 errata-xmlrpc 2021-07-27 23:13:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.