Bug 1869295
| Summary: | Restarting ovn-controller should not interrupt connectivity | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Casey Callendrello <cdc> | ||||||
| Component: | OVN | Assignee: | Numan Siddique <nusiddiq> | ||||||
| Status: | NEW --- | QA Contact: | Jianlin Shi <jishi> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | RHEL 8.0 | CC: | ctrautma, dcbw, mmichels, nusiddiq | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Hi Casey, Can you please attach the OVN north db to the BZ ? I think that would be helpful in reproducing the issue and while testing the fix. Thanks Created attachment 1711630 [details]
ovn-northd backup
Created attachment 1711634 [details]
ovn-northd backup
Wouldn't the recent discussions here https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050520.html be relevant? Also maybe Han's recent patches for incremental flow installation are also relevant? http://patchwork.ozlabs.org/project/openvswitch/list/?series=197009 |
Description of problem: Restarting ovn-controller when there are an appreciable number of flows causes (new) connections to be interrupted. Version-Release number of selected component (if applicable): ovn2.13-2.13.0-39.el7fdp.x86_64 How reproducible: Very reproducible; scale dependent Steps to Reproduce: 1. Create a lot of flows. I did this by creating 100 pods, then creating 20 services that all referenced those pods. If you like, I can share a copy-and-paste reproducer. 2. In a new pod a simple loop. Something like while true; do curl http://>service ip<; sleep 0.5; done 3. Restart ovn-controller on the node hosting the curl. e.g. oc -n openshift-ovn-kubernetes delete pod ovnkube-node-dw9km Actual results: When ovn-controller restarts, new connections are interrupted for, in my test, about 5 seconds. And this is a small cluster. Expected results: New connections (almost) always succeed. Additional info: Users at higher scale are punished much more by this, and can experience outages in the 10s-of-seconds. There is a thread about it on the ovs-devel / ovn-devel mailing lists.