2081852 – [Webscale] High OVS cpu usage causing performance issues

Bug 2081852 - [Webscale] High OVS cpu usage causing performance issues

Summary: [Webscale] High OVS cpu usage causing performance issues

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Performance Addon Operator
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Yanir Quinn
QA Contact:	Niranjan Mallapadi Raghavender
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2093267 2094935 (view as bug list)
Depends On:
Blocks:	2096703 2100544
TreeView+	depends on / blocked

Reported:	2022-05-04 20:06 UTC by Gabriel Diotte
Modified:	2023-09-18 04:36 UTC (History)
CC List:	33 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: 1. The RPS cpu mask was set to the reserved CPUs taken from the performance profile for all interfaces by default. 2. A background script that runs as an oci pre-start hook (low-latency-hooks) for guaranteed pods sets the rps mask of all network devices visible from the root path of devices (/sys/devices). Consequence: 1. This is not practical for sriov-vfs and similar interfaces that carry RAN payload traffic as this is not scalable. 2. The intent of the pre-start hook it to set the rps mask of virtual interfaces (i.e. the veth) but instead it sets the rps mask of all network devices visible in the container. Fix: 1. Set RPS cpu mask to reserved CPUs taken from the performance profile excluding physical and veth interfaces. * NOTE: this is an emergency hook - not to be officially documented* Add a new annotation A new annotation option to the performance profile: "performance.openshift.io/enable-physical-dev-rps". When added to a performance profile,it will enable RPS mask setting with systemd for all network devices by including physical interfaces from netdev-rps rule. Usage example for enabling RPS mask for all network interfaces: Oc edit <performanceprofile> for existing profiles OR when creating a new performance profile add the following annotation: - apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: annotations: performance.openshift.io/enable-physical-dev-rps: "true" spec: cpu: reserved: <reserved cpus set> When set to "true" - set RPS mask to the reserved cpus set for all network interfaces. When omitted or set to false - set RPS mask to net interfaces excluding physical and veth interfaces. (default behavior) 2. Change the devices path in the script to set rps mask only for virtual interfaces. Result: 1. RPS mask setting defaults for net interfaces excluding physical and vetch devices. - An annotation added to included all net interfaces as described in the fix section. 2. Only virtual interfaces will be set with rps mask in the crio prestart hook. More info can be found in the following slide deck: https://docs.google.com/presentation/d/1ZKlMWwpzI50cdvEOaI1weB1hNhvNyD4HBeOY4ZhTQ3Y/edit?usp=sharing
Clone Of:
Clones:	2100544 (view as bug list)
Environment:
Last Closed:	2022-08-10 12:16:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-node-tuning-operator pull 371	None	Merged	Bug 2094935: Set rps for virtual interfaces only in crio hook	2022-06-23 09:30:50 UTC
Github	openshift cluster-node-tuning-operator pull 377	None	Merged	Bug 2093267: Exclude physical interfaces from netdev-rps rule	2022-06-23 09:31:35 UTC
Github	openshift cluster-node-tuning-operator pull 378	None	Merged	Bug 2093267: New annotation for enabling RPS on all devices	2022-06-23 09:31:35 UTC
Github	openshift cluster-node-tuning-operator pull 483	None	Merged	OCPBUGS-3324: RPS mask test fixes	2022-11-07 14:20:13 UTC
Red Hat Issue Tracker	FD-1978	None	None	None	2022-05-18 14:35:21 UTC
Red Hat Product Errata	RHBA-2022:5869	None	None	None	2022-08-10 12:17:00 UTC

Internal Links: 2108217

Comment 13 Darren Carpenter 2022-05-17 15:28:02 UTC

It looks like we have 2 reserved cores, 1 per numa. The suspected issue is burstable pods on the affected nodes are working on to the reserved cores which is causing instability with ovs.

Within OVS, we've noted numerous uplink calls, flow limits being reached, along with failed to aquire udpif_key/unexpected flow messages. 

This issue does not seem to happen immediately upon introducing load as we were moving log-transformer pods (a log aggregate-ish service) to attempt to induce the error but only had results after a little bit of time. 

example snippet:
---------

May 16 19:33:02 worker-00 ovs-vswitchd[3619]: ovs|03050|ofproto_dpif_upcall(revalidator12)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:95aa52e6-93d7-43d2-b8a7-a68bc4c1cc61
May 16 19:33:03 worker-00 ovs-vswitchd[3619]: ovs|02756|ofproto_dpif_upcall(revalidator11)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:3b2808d0-4575-4c67-8335-de91d373d25d
May 16 19:33:13 worker-00 ovs-vswitchd[3619]: ovs|03053|ofproto_dpif_upcall(revalidator12)|WARN|Dropped 673 log messages in last 10 seconds (most recently, 6 seconds ago) due to excessive rate
May 16 19:33:13 worker-00 ovs-vswitchd[3619]: ovs|03054|ofproto_dpif_upcall(revalidator12)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:5d18f80e-58ab-4e51-ac8b-a7add4ccd9b2
May 16 19:33:28 worker-00 ovs-vswitchd[3619]: ovs|03059|ofproto_dpif_upcall(revalidator12)|INFO|Spent an unreasonably long 2296ms dumping flows
May 16 19:33:30 worker-00 ovs-vswitchd[3619]: ovs|03062|ofproto_dpif_upcall(revalidator12)|INFO|Spent an unreasonably long 2020ms dumping flows

From the application side, it results in the following:
----------
E0516 14:24:56.492221       1 connections.go:150] Cannot connect to 192.168.x.x:6380 err: dial tcp 192.168.21.64:6380: i/o timeout
E0516 15:09:55.348744       1 client.go:76] radix.Dial - err: dial tcp 192.168.y.y:6380: i/o timeout
E0516 15:09:55.348779       1 connections.go:150] Cannot connect to 192.168.y.y:6380 err: dial tcp 192.168.y.y:6380: i/o timeou
E0516 15:44:56.296411       1 client.go:76] radix.Dial - err: dial tcp 192.168.z.z:6380: i/o timeout
E0516 15:44:56.296436       1 connections.go:150] Cannot connect to 192.168.x.x:6380 err: dial tcp 192.168.z.z:6380: i/o timeout

Comment 14 Darren Carpenter 2022-05-17 15:29:04 UTC

Currently, we're looking for 'concrete' that points to load being transferred/running on the reserved cores. We seem to be able to reasonably speculate based on conditions.

Comment 15 Darren Carpenter 2022-05-17 15:44:39 UTC

I'd expect to see either redis or log-transformer processees spilling over.... if it is simply the usage/execution of those causing ovs to do some weird bleed.

Comment 34 Yanir Quinn 2022-06-23 09:27:37 UTC

*** Bug 2094935 has been marked as a duplicate of this bug. ***

Comment 35 Yanir Quinn 2022-06-23 09:29:44 UTC

*** Bug 2093267 has been marked as a duplicate of this bug. ***

Comment 45 Shereen Haj Makhoul 2022-07-14 15:17:32 UTC

Verification:

OCP version: 4.11.0-rc.1
Verified on BM machine with SRIOV support.

Steps:

- Check that rps mask is set only to virtual devices by default:

apply the following performance profile and wait for the nodes to be updated:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: manual
spec:
  cpu:
    isolated: "0-2"
    reserved: "3"
  nodeSelector:
    node-role.kubernetes.io/worker: ""

then apply a gu pod:

apiVersion: v1
kind: Pod
metadata:
  name: test2
  annotations:
     irq-load-balancing.crio.io: "disable"
     cpu-quota.crio.io: "disable"
spec:
  containers:
  - name: test
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c2949f380f89e12ce91655e3b6631b4fa5001f1331f0f85e1713232bcb8e66f1
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c"]
    args: [ "while true; do sleep 100000; done;" ]
    resources:
      requests:
        cpu: 2
        memory: "200M"
      limits:
        cpu: 2
        memory: "200M"
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  runtimeClassName: performance-manual

login to the pod and check the rps is set for veth:

# oc rsh test2
sh-4.4# find /sys/devices/virtual/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008
sh-4.4# find /sys/devices/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008

As can be seen, in both commands only the veth devices shown.

- Verify the new annotation works properly:

when the profile doesn't enable the annotation, only the virtual devices are seen in systemctl:

sh-4.4# systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                       loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                              loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                               loaded    inactive dead      Sets network devices RPS mask

and when the profile enable the annotation, all devices are shown: (note that virtual devices had names changed upon the update of the profile)

sh-4.4# systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                     loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                            loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                        loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                               loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                loaded    inactive dead      Sets network devices RPS mask

bug verified.

Comment 46 Shereen Haj Makhoul 2022-07-14 15:25:54 UTC

enabling the annotation in the profile would be as the following:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  annotations:
      performance.openshift.io/enable-physical-dev-rps: "true"   <---
  name: manual
spec:
  ...

Comment 50 errata-xmlrpc 2022-08-10 12:16:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.11 low-latency extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5869

Comment 52 Red Hat Bugzilla 2023-09-18 04:36:35 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.

ajuarez
amorenoz
anbhat
arajapa
atheurer
bbennett
bhershbe
browsell
cfields
ctrautma
dacarpen
ealcaniz
eglottma
fbaudin
fdeutsch
ffernand
fleitner
fromani
grajaiya
i.maximets
jerward
jhsiao
mkennell
msivak
openshift-bugs-escalate
ralongi
rcernin
rkhan
shajmakh
surya
vjaypurk
vnema
yquinn