@amorenoz Has done some really great work on this, so I'll try to summarize his most recent findings in terms of both Openshift and OVS For Openshift 4.10 we no-longer can reproduce this issue For Openshift 4.9 We're hoping https://github.com/openshift/ovn-kubernetes/pull/914 resolves the issues for 4.9, based on these findings from Adrian's testing Taken straight from Adrian regarding his test in OVS to reproduce the issue there: ``` The test is: - Add 254 interfaces - For each interface, add about 2k flows (maybe this is too much?). If the insertion fails (ovs-ofproto returns != 0), wait 0.5 seconds and retry. Round 1: Just run the test normally with a clean ovs: Mean: 0.013 s Max: 0.216 s Round 2: Do not clean the previous 254 interfaces and repeat the test: Mean: 12.0 s Max: 42s So, the main culprit right now seems to be the number of stale ports. They are all re-evaluated on each run of the bridge loop. So, vswitchd reads around 10 new Interfaces, loops over the 254 stale ones, each one generating a context switch due to the ioctl, which makes it process ofp-flows without the benefit of bulk processing... ``` Also of note, the above delay only occurs on RT kernel, therefore I think we should have some other way to track the continued OVS work exploring other issues caused by kernel datapath + rt + CPU pinning and it's effects on OVS. I will re-assign this bug to ovn-kubernetes/ myself and push to ON_QA so that can have QE verify that the issue is no-longer reproducible on both SNO 4.9 / SNO 4.10 Thanks, Andrew
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056