Bug 1869295 - Restarting ovn-controller should not interrupt connectivity
Summary: Restarting ovn-controller should not interrupt connectivity
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-17 12:41 UTC by Casey Callendrello
Modified: 2023-07-13 07:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ovn-northd backup (43.61 KB, application/gzip)
2020-08-17 15:29 UTC, Casey Callendrello
no flags Details
ovn-northd backup (43.76 KB, application/gzip)
2020-08-17 15:40 UTC, Casey Callendrello
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-817 0 None None None 2021-10-29 13:58:23 UTC

Description Casey Callendrello 2020-08-17 12:41:21 UTC
Description of problem: Restarting ovn-controller when there are an appreciable number of flows causes (new) connections to be interrupted.


Version-Release number of selected component (if applicable): ovn2.13-2.13.0-39.el7fdp.x86_64


How reproducible: Very reproducible; scale dependent


Steps to Reproduce:
1. Create a lot of flows. I did this by creating 100 pods, then creating 20 services that all referenced those pods. If you like, I can share a copy-and-paste reproducer.

2. In a new pod a simple loop. Something like
    while true; do curl http://>service ip<; sleep 0.5; done


3. Restart ovn-controller on the node hosting the curl. e.g.
    oc -n openshift-ovn-kubernetes delete pod ovnkube-node-dw9km

Actual results:

When ovn-controller restarts, new connections are interrupted for, in my test, about 5 seconds. And this is a small cluster.


Expected results: New connections (almost) always succeed.


Additional info:

Users at higher scale are punished much more by this, and can experience outages in the 10s-of-seconds. There is a thread about it on the ovs-devel / ovn-devel mailing lists.

Comment 1 Numan Siddique 2020-08-17 13:47:10 UTC
Hi Casey,

Can you please attach the OVN north db to the BZ ?

I think that would be helpful in reproducing the issue and while testing the fix.

Thanks

Comment 2 Casey Callendrello 2020-08-17 15:29:48 UTC
Created attachment 1711630 [details]
ovn-northd backup

Comment 3 Casey Callendrello 2020-08-17 15:40:24 UTC
Created attachment 1711634 [details]
ovn-northd backup

Comment 4 Dan Williams 2020-08-24 20:46:37 UTC
Wouldn't the recent discussions here https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050520.html be relevant?

Also maybe Han's recent patches for incremental flow installation are also relevant? http://patchwork.ozlabs.org/project/openvswitch/list/?series=197009


Note You need to log in before you can comment on or make changes to this bug.