Description of problem: Restarting ovn-controller when there are an appreciable number of flows causes (new) connections to be interrupted. Version-Release number of selected component (if applicable): ovn2.13-2.13.0-39.el7fdp.x86_64 How reproducible: Very reproducible; scale dependent Steps to Reproduce: 1. Create a lot of flows. I did this by creating 100 pods, then creating 20 services that all referenced those pods. If you like, I can share a copy-and-paste reproducer. 2. In a new pod a simple loop. Something like while true; do curl http://>service ip<; sleep 0.5; done 3. Restart ovn-controller on the node hosting the curl. e.g. oc -n openshift-ovn-kubernetes delete pod ovnkube-node-dw9km Actual results: When ovn-controller restarts, new connections are interrupted for, in my test, about 5 seconds. And this is a small cluster. Expected results: New connections (almost) always succeed. Additional info: Users at higher scale are punished much more by this, and can experience outages in the 10s-of-seconds. There is a thread about it on the ovs-devel / ovn-devel mailing lists.
Hi Casey, Can you please attach the OVN north db to the BZ ? I think that would be helpful in reproducing the issue and while testing the fix. Thanks
Created attachment 1711630 [details] ovn-northd backup
Created attachment 1711634 [details] ovn-northd backup
Wouldn't the recent discussions here https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050520.html be relevant? Also maybe Han's recent patches for incremental flow installation are also relevant? http://patchwork.ozlabs.org/project/openvswitch/list/?series=197009