2108557 – [4.9] [Webscale] High OVS cpu usage causing performance issues

Bug 2108557 - [4.9] [Webscale] High OVS cpu usage causing performance issues

Summary: [4.9] [Webscale] High OVS cpu usage causing performance issues

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Performance Addon Operator
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.9.z
Assignee:	Yanir Quinn
QA Contact:	Niranjan Mallapadi Raghavender
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2108556 (view as bug list)
Depends On:	2100544
Blocks:	2108556
TreeView+	depends on / blocked

Reported:	2022-07-19 10:35 UTC by OpenShift BugZilla Robot
Modified:	2022-09-20 15:43 UTC (History)
CC List:	33 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-12 13:36:57 UTC
Target Upstream Version:
Embargoed:
Flags:	cback: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift-kni performance-addon-operators pull 922	None	open	[release-4.9] Bug 2108557: Set rps for virtual interfaces only in crio hook	2022-08-15 16:27:11 UTC
Github	openshift-kni performance-addon-operators pull 923	None	open	[release-4.9] Bug 2108557: Fix RPS default physical and virtual settings	2022-08-15 16:27:21 UTC
Red Hat Product Errata	RHBA-2022:6408	None	None	None	2022-09-12 13:37:10 UTC

Comment 2 Yanir Quinn 2022-08-15 16:26:34 UTC

*** Bug 2108556 has been marked as a duplicate of this bug. ***

Comment 5 Shereen Haj Makhoul 2022-09-12 08:19:31 UTC

Verification:

Versions:
OCP: 4.9.48
PAO: 4.9.11-2


Steps: 

- PP:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: manual
spec:
  cpu:
    isolated: "0-2"
    reserved: "3"
  realTimeKernel:
    enabled: true
  nodeSelector:
    node-role.kubernetes.io/workercnf: ""

- GU pod (specify the node name for simplicity):
apiVersion: v1
kind: Pod
metadata:
  name: test
  annotations:
     irq-load-balancing.crio.io: "disable"
     cpu-quota.crio.io: "disable"
spec:
  nodeName: worker-1
  containers:
  - name: test
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f3d7f41e3c7e242e67035f94abb8d0faf35bee1c45449ba9c2712a211670914b
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c"]
    args: [ "while true; do sleep 100000; done;" ]
    resources:
      requests:
        cpu: 2
        memory: "200M"
      limits:
        cpu: 2
        memory: "200M"
  nodeSelector:
    node-role.kubernetes.io/workercnf: ""
  runtimeClassName: performance-manual

- Check the rps is updated only for veth devices:
[root@registry ~]# oc rsh test 
sh-4.4# find /sys/devices/virtual/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008
sh-4.4# find /sys/devices/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008

one can see the same devices are found with rps_cpus set

- Check the new annotation works properly:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  annotations:
      performance.openshift.io/enable-physical-dev-rps: "true"
  name: manual
spec:
 ...

before setting the annotation, check the following on the node:
sh-4.4# chroot /host
sh-4.4# systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                         loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                     loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                            loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                             loaded    inactive dead      Sets network devices RPS mask

then apply the updated profile and check that now the physical devices are shown:

systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                         loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                                     loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                            loaded    inactive dead      Sets network devices RPS mask                                                                                                      
  update-rps                                                                                                                                             loaded    inactive dead      Sets network devices RPS mask                                                                                                      
sh-4.4# 

note that after the profile is updated, enoX devices were displayed as expected.

Comment 7 errata-xmlrpc 2022-09-12 13:36:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.48 low-latency extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6408

Note You need to log in before you can comment on or make changes to this bug.

ajuarez
amorenoz
anbhat
atheurer
bbennett
bhershbe
browsell
cback
cfields
cgaynor
ctrautma
dacarpen
ealcaniz
eglottma
fbaudin
fdeutsch
ffernand
fleitner
gdiotte
grajaiya
i.maximets
jerward
jhsiao
mkennell
openshift-bugs-escalate
ralongi
rcernin
rkhan
shajmakh
surya
vjaypurk
vnema
yquinn