Bug 2108557 - [4.9] [Webscale] High OVS cpu usage causing performance issues
Summary: [4.9] [Webscale] High OVS cpu usage causing performance issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.9.z
Assignee: Yanir Quinn
QA Contact: Niranjan Mallapadi Raghavender
URL:
Whiteboard:
: 2108556 (view as bug list)
Depends On: 2100544
Blocks: 2108556
TreeView+ depends on / blocked
 
Reported: 2022-07-19 10:35 UTC by OpenShift BugZilla Robot
Modified: 2022-09-20 15:43 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-12 13:36:57 UTC
Target Upstream Version:
Embargoed:
cback: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni performance-addon-operators pull 922 0 None open [release-4.9] Bug 2108557: Set rps for virtual interfaces only in crio hook 2022-08-15 16:27:11 UTC
Github openshift-kni performance-addon-operators pull 923 0 None open [release-4.9] Bug 2108557: Fix RPS default physical and virtual settings 2022-08-15 16:27:21 UTC
Red Hat Product Errata RHBA-2022:6408 0 None None None 2022-09-12 13:37:10 UTC

Comment 2 Yanir Quinn 2022-08-15 16:26:34 UTC
*** Bug 2108556 has been marked as a duplicate of this bug. ***

Comment 5 Shereen Haj Makhoul 2022-09-12 08:19:31 UTC
Verification:

Versions:
OCP: 4.9.48
PAO: 4.9.11-2


Steps: 

- PP:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: manual
spec:
  cpu:
    isolated: "0-2"
    reserved: "3"
  realTimeKernel:
    enabled: true
  nodeSelector:
    node-role.kubernetes.io/workercnf: ""

- GU pod (specify the node name for simplicity):
apiVersion: v1
kind: Pod
metadata:
  name: test
  annotations:
     irq-load-balancing.crio.io: "disable"
     cpu-quota.crio.io: "disable"
spec:
  nodeName: worker-1
  containers:
  - name: test
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3d7f41e3c7e242e67035f94abb8d0faf35bee1c45449ba9c2712a211670914b
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c"]
    args: [ "while true; do sleep 100000; done;" ]
    resources:
      requests:
        cpu: 2
        memory: "200M"
      limits:
        cpu: 2
        memory: "200M"
  nodeSelector:
    node-role.kubernetes.io/workercnf: ""
  runtimeClassName: performance-manual

- Check the rps is updated only for veth devices:
[root@registry ~]# oc rsh test 
sh-4.4# find /sys/devices/virtual/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008
sh-4.4# find /sys/devices/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008

one can see the same devices are found with rps_cpus set

- Check the new annotation works properly:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  annotations:
      performance.openshift.io/enable-physical-dev-rps: "true"
  name: manual
spec:
 ...

before setting the annotation, check the following on the node:
sh-4.4# chroot /host
sh-4.4# systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                         loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                     loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                            loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                             loaded    inactive dead      Sets network devices RPS mask

then apply the updated profile and check that now the physical devices are shown:

systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                         loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                     loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                            loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                             loaded    inactive dead      Sets network devices RPS mask                                                                                                      
sh-4.4# 

note that after the profile is updated, enoX devices were displayed as expected.

Comment 7 errata-xmlrpc 2022-09-12 13:36:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.48 low-latency extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6408


Note You need to log in before you can comment on or make changes to this bug.