Bug 2081852 - [Webscale] High OVS cpu usage causing performance issues
Summary: [Webscale] High OVS cpu usage causing performance issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.11.0
Assignee: Yanir Quinn
QA Contact: Niranjan Mallapadi Raghavender
URL:
Whiteboard:
: 2093267 2094935 (view as bug list)
Depends On:
Blocks: 2096703 2100544
TreeView+ depends on / blocked
 
Reported: 2022-05-04 20:06 UTC by Gabriel Diotte
Modified: 2023-09-18 04:36 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: 1. The RPS cpu mask was set to the reserved CPUs taken from the performance profile for all interfaces by default. 2. A background script that runs as an oci pre-start hook (low-latency-hooks) for guaranteed pods sets the rps mask of all network devices visible from the root path of devices (/sys/devices). Consequence: 1. This is not practical for sriov-vfs and similar interfaces that carry RAN payload traffic as this is not scalable. 2. The intent of the pre-start hook it to set the rps mask of virtual interfaces (i.e. the veth) but instead it sets the rps mask of all network devices visible in the container. Fix: 1. Set RPS cpu mask to reserved CPUs taken from the performance profile excluding physical and veth interfaces. *** NOTE: this is an emergency hook - not to be officially documented*** Add a new annotation A new annotation option to the performance profile: "performance.openshift.io/enable-physical-dev-rps". When added to a performance profile,it will enable RPS mask setting with systemd for all network devices by including physical interfaces from netdev-rps rule. Usage example for enabling RPS mask for all network interfaces: Oc edit <performanceprofile> for existing profiles OR when creating a new performance profile add the following annotation: - apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: annotations: performance.openshift.io/enable-physical-dev-rps: "true" spec: cpu: reserved: <reserved cpus set> When set to "true" - set RPS mask to the reserved cpus set for all network interfaces. When omitted or set to false - set RPS mask to net interfaces excluding physical and veth interfaces. (default behavior) 2. Change the devices path in the script to set rps mask only for virtual interfaces. Result: 1. RPS mask setting defaults for net interfaces excluding physical and vetch devices. - An annotation added to included all net interfaces as described in the fix section. 2. Only virtual interfaces will be set with rps mask in the crio prestart hook. More info can be found in the following slide deck: https://docs.google.com/presentation/d/1ZKlMWwpzI50cdvEOaI1weB1hNhvNyD4HBeOY4ZhTQ3Y/edit?usp=sharing
Clone Of:
: 2100544 (view as bug list)
Environment:
Last Closed: 2022-08-10 12:16:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 371 0 None Merged Bug 2094935: Set rps for virtual interfaces only in crio hook 2022-06-23 09:30:50 UTC
Github openshift cluster-node-tuning-operator pull 377 0 None Merged Bug 2093267: Exclude physical interfaces from netdev-rps rule 2022-06-23 09:31:35 UTC
Github openshift cluster-node-tuning-operator pull 378 0 None Merged Bug 2093267: New annotation for enabling RPS on all devices 2022-06-23 09:31:35 UTC
Github openshift cluster-node-tuning-operator pull 483 0 None Merged OCPBUGS-3324: RPS mask test fixes 2022-11-07 14:20:13 UTC
Red Hat Issue Tracker FD-1978 0 None None None 2022-05-18 14:35:21 UTC
Red Hat Product Errata RHBA-2022:5869 0 None None None 2022-08-10 12:17:00 UTC

Internal Links: 2108217

Comment 13 Darren Carpenter 2022-05-17 15:28:02 UTC
It looks like we have 2 reserved cores, 1 per numa. The suspected issue is burstable pods on the affected nodes are working on to the reserved cores which is causing instability with ovs.

Within OVS, we've noted numerous uplink calls, flow limits being reached, along with failed to aquire udpif_key/unexpected flow messages. 

This issue does not seem to happen immediately upon introducing load as we were moving log-transformer pods (a log aggregate-ish service) to attempt to induce the error but only had results after a little bit of time. 

example snippet:
---------

May 16 19:33:02 worker-00 ovs-vswitchd[3619]: ovs|03050|ofproto_dpif_upcall(revalidator12)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:95aa52e6-93d7-43d2-b8a7-a68bc4c1cc61
May 16 19:33:03 worker-00 ovs-vswitchd[3619]: ovs|02756|ofproto_dpif_upcall(revalidator11)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:3b2808d0-4575-4c67-8335-de91d373d25d
May 16 19:33:13 worker-00 ovs-vswitchd[3619]: ovs|03053|ofproto_dpif_upcall(revalidator12)|WARN|Dropped 673 log messages in last 10 seconds (most recently, 6 seconds ago) due to excessive rate
May 16 19:33:13 worker-00 ovs-vswitchd[3619]: ovs|03054|ofproto_dpif_upcall(revalidator12)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:5d18f80e-58ab-4e51-ac8b-a7add4ccd9b2
May 16 19:33:28 worker-00 ovs-vswitchd[3619]: ovs|03059|ofproto_dpif_upcall(revalidator12)|INFO|Spent an unreasonably long 2296ms dumping flows
May 16 19:33:30 worker-00 ovs-vswitchd[3619]: ovs|03062|ofproto_dpif_upcall(revalidator12)|INFO|Spent an unreasonably long 2020ms dumping flows

From the application side, it results in the following:
----------
E0516 14:24:56.492221       1 connections.go:150] Cannot connect to 192.168.x.x:6380 err: dial tcp 192.168.21.64:6380: i/o timeout
E0516 15:09:55.348744       1 client.go:76] radix.Dial - err: dial tcp 192.168.y.y:6380: i/o timeout
E0516 15:09:55.348779       1 connections.go:150] Cannot connect to 192.168.y.y:6380 err: dial tcp 192.168.y.y:6380: i/o timeou
E0516 15:44:56.296411       1 client.go:76] radix.Dial - err: dial tcp 192.168.z.z:6380: i/o timeout
E0516 15:44:56.296436       1 connections.go:150] Cannot connect to 192.168.x.x:6380 err: dial tcp 192.168.z.z:6380: i/o timeout

Comment 14 Darren Carpenter 2022-05-17 15:29:04 UTC
Currently, we're looking for 'concrete' that points to load being transferred/running on the reserved cores. We seem to be able to reasonably speculate based on conditions.

Comment 15 Darren Carpenter 2022-05-17 15:44:39 UTC
I'd expect to see either redis or log-transformer processees spilling over.... if it is simply the usage/execution of those causing ovs to do some weird bleed.

Comment 34 Yanir Quinn 2022-06-23 09:27:37 UTC
*** Bug 2094935 has been marked as a duplicate of this bug. ***

Comment 35 Yanir Quinn 2022-06-23 09:29:44 UTC
*** Bug 2093267 has been marked as a duplicate of this bug. ***

Comment 45 Shereen Haj Makhoul 2022-07-14 15:17:32 UTC
Verification:

OCP version: 4.11.0-rc.1
Verified on BM machine with SRIOV support.

Steps:

- Check that rps mask is set only to virtual devices by default:

apply the following performance profile and wait for the nodes to be updated:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: manual
spec:
  cpu:
    isolated: "0-2"
    reserved: "3"
  nodeSelector:
    node-role.kubernetes.io/worker: ""

then apply a gu pod:

apiVersion: v1
kind: Pod
metadata:
  name: test2
  annotations:
     irq-load-balancing.crio.io: "disable"
     cpu-quota.crio.io: "disable"
spec:
  containers:
  - name: test
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c2949f380f89e12ce91655e3b6631b4fa5001f1331f0f85e1713232bcb8e66f1
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c"]
    args: [ "while true; do sleep 100000; done;" ]
    resources:
      requests:
        cpu: 2
        memory: "200M"
      limits:
        cpu: 2
        memory: "200M"
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  runtimeClassName: performance-manual

login to the pod and check the rps is set for veth:

# oc rsh test2
sh-4.4# find /sys/devices/virtual/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008
sh-4.4# find /sys/devices/ -name rps_cpus -printf '%p\n' -exec cat {} \;
/sys/devices/virtual/net/lo/queues/rx-0/rps_cpus
0000,00000000,00000008
/sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus
0000,00000000,00000008

As can be seen, in both commands only the veth devices shown.

- Verify the new annotation works properly:

when the profile doesn't enable the annotation, only the virtual devices are seen in systemctl:

sh-4.4# systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                          loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                   loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                       loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                              loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                               loaded    inactive dead      Sets network devices RPS mask

and when the profile enable the annotation, all devices are shown: (note that virtual devices had names changed upon the update of the profile)

sh-4.4# systemctl list-units -all | grep update-rps@
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                     loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                 loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                     <--                                                                                                                  loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                    loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                           loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                            loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                        loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                               loaded    inactive dead      Sets network devices RPS mask                                                                                                        
  update-rps                                                                                                                                                loaded    inactive dead      Sets network devices RPS mask

bug verified.

Comment 46 Shereen Haj Makhoul 2022-07-14 15:25:54 UTC
enabling the annotation in the profile would be as the following:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  annotations:
      performance.openshift.io/enable-physical-dev-rps: "true"   <---
  name: manual
spec:
  ...

Comment 50 errata-xmlrpc 2022-08-10 12:16:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.11 low-latency extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5869

Comment 52 Red Hat Bugzilla 2023-09-18 04:36:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.