Bug 2077357

Summary: [release-4.11] 200ms packet delay with OVN controller turn on
Product: OpenShift Container Platform Reporter: Aaron Park <aapark>
Component: NetworkingAssignee: mcambria <mcambria>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: anusaxen, dcbw, ealcaniz, eglottma, jseunghw, mcambria, openshift-bugs-escalate, xmu, zzhao
Version: 4.9   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2079044 (view as bug list) Environment:
Last Closed: 2022-08-10 11:08:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2079044    

Description Aaron Park 2022-04-21 07:10:10 UTC
Description of problem:

Call failure and packet drop occurred in OCP 4.8.

And after upgrading the cluster to 4.9:
  - Packet drop judged to be caused by upcall is not found
  - Call failure occurred under any conditions until 4/8, but did not occur after 4/11.
  - Unconfirmed any difference between 4/8 and 4/11

Result when applying OVN controller pause
  - 200ms delay does not occur
  - Upcall does not occur, but Out of Order still occurs

The above result was confirmed under the following circumstances
1) Pod etho <---> Out of order occurs when ovs receives a packet sent by pod between ovs interface
2) When an out of order occurs between genev interface <---> genev interface
In both cases, 200ms delay did not occur, and after requesting DUP_ACK 5 times or less, reordering was processed without retransmission.

Result when applying OVN controller turn on
  - '200ms delay' by Out-of-Order occurs again
  - Currently, call failure does not occur in both OVN controller turn on/pause, so it is not possible to determine whether it is related.

Version-Release number of selected component (if applicable):

- OCP 4.9.23(ovn-kubernetes)
- baremetal 
- IPv6

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

When the reproducibility test is performed, '200ms delay' due to Out of Order occurs.


Expected results:

Find out why there are a lot of upcalls in 'OVN controller turn on' and there should be no '200ms delay'


Additional info:

Comment 9 errata-xmlrpc 2022-08-10 11:08:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069