It looks like we have 2 reserved cores, 1 per numa. The suspected issue is burstable pods on the affected nodes are working on to the reserved cores which is causing instability with ovs. Within OVS, we've noted numerous uplink calls, flow limits being reached, along with failed to aquire udpif_key/unexpected flow messages. This issue does not seem to happen immediately upon introducing load as we were moving log-transformer pods (a log aggregate-ish service) to attempt to induce the error but only had results after a little bit of time. example snippet: --------- May 16 19:33:02 worker-00 ovs-vswitchd[3619]: ovs|03050|ofproto_dpif_upcall(revalidator12)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:95aa52e6-93d7-43d2-b8a7-a68bc4c1cc61 May 16 19:33:03 worker-00 ovs-vswitchd[3619]: ovs|02756|ofproto_dpif_upcall(revalidator11)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:3b2808d0-4575-4c67-8335-de91d373d25d May 16 19:33:13 worker-00 ovs-vswitchd[3619]: ovs|03053|ofproto_dpif_upcall(revalidator12)|WARN|Dropped 673 log messages in last 10 seconds (most recently, 6 seconds ago) due to excessive rate May 16 19:33:13 worker-00 ovs-vswitchd[3619]: ovs|03054|ofproto_dpif_upcall(revalidator12)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (Invalid argument): ufid:5d18f80e-58ab-4e51-ac8b-a7add4ccd9b2 May 16 19:33:28 worker-00 ovs-vswitchd[3619]: ovs|03059|ofproto_dpif_upcall(revalidator12)|INFO|Spent an unreasonably long 2296ms dumping flows May 16 19:33:30 worker-00 ovs-vswitchd[3619]: ovs|03062|ofproto_dpif_upcall(revalidator12)|INFO|Spent an unreasonably long 2020ms dumping flows From the application side, it results in the following: ---------- E0516 14:24:56.492221 1 connections.go:150] Cannot connect to 192.168.x.x:6380 err: dial tcp 192.168.21.64:6380: i/o timeout E0516 15:09:55.348744 1 client.go:76] radix.Dial - err: dial tcp 192.168.y.y:6380: i/o timeout E0516 15:09:55.348779 1 connections.go:150] Cannot connect to 192.168.y.y:6380 err: dial tcp 192.168.y.y:6380: i/o timeou E0516 15:44:56.296411 1 client.go:76] radix.Dial - err: dial tcp 192.168.z.z:6380: i/o timeout E0516 15:44:56.296436 1 connections.go:150] Cannot connect to 192.168.x.x:6380 err: dial tcp 192.168.z.z:6380: i/o timeout
Currently, we're looking for 'concrete' that points to load being transferred/running on the reserved cores. We seem to be able to reasonably speculate based on conditions.
I'd expect to see either redis or log-transformer processees spilling over.... if it is simply the usage/execution of those causing ovs to do some weird bleed.
*** Bug 2094935 has been marked as a duplicate of this bug. ***
*** Bug 2093267 has been marked as a duplicate of this bug. ***
Verification: OCP version: 4.11.0-rc.1 Verified on BM machine with SRIOV support. Steps: - Check that rps mask is set only to virtual devices by default: apply the following performance profile and wait for the nodes to be updated: apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: manual spec: cpu: isolated: "0-2" reserved: "3" nodeSelector: node-role.kubernetes.io/worker: "" then apply a gu pod: apiVersion: v1 kind: Pod metadata: name: test2 annotations: irq-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" spec: containers: - name: test image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c2949f380f89e12ce91655e3b6631b4fa5001f1331f0f85e1713232bcb8e66f1 imagePullPolicy: IfNotPresent command: ["/bin/sh", "-c"] args: [ "while true; do sleep 100000; done;" ] resources: requests: cpu: 2 memory: "200M" limits: cpu: 2 memory: "200M" nodeSelector: node-role.kubernetes.io/worker: "" runtimeClassName: performance-manual login to the pod and check the rps is set for veth: # oc rsh test2 sh-4.4# find /sys/devices/virtual/ -name rps_cpus -printf '%p\n' -exec cat {} \; /sys/devices/virtual/net/lo/queues/rx-0/rps_cpus 0000,00000000,00000008 /sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus 0000,00000000,00000008 sh-4.4# find /sys/devices/ -name rps_cpus -printf '%p\n' -exec cat {} \; /sys/devices/virtual/net/lo/queues/rx-0/rps_cpus 0000,00000000,00000008 /sys/devices/virtual/net/eth0/queues/rx-0/rps_cpus 0000,00000000,00000008 As can be seen, in both commands only the veth devices shown. - Verify the new annotation works properly: when the profile doesn't enable the annotation, only the virtual devices are seen in systemctl: sh-4.4# systemctl list-units -all | grep update-rps@ update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask and when the profile enable the annotation, all devices are shown: (note that virtual devices had names changed upon the update of the profile) sh-4.4# systemctl list-units -all | grep update-rps@ update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps <-- loaded inactive dead Sets network devices RPS mask update-rps <-- loaded inactive dead Sets network devices RPS mask update-rps <-- loaded inactive dead Sets network devices RPS mask update-rps <-- loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask update-rps loaded inactive dead Sets network devices RPS mask bug verified.
enabling the annotation in the profile would be as the following: apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: annotations: performance.openshift.io/enable-physical-dev-rps: "true" <--- name: manual spec: ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.11 low-latency extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5869
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days