Bug 1748429 - Recommendations on how to use OSP with OVN
Summary: Recommendations on how to use OSP with OVN
Keywords:
Status: CLOSED DUPLICATE of bug 2017906
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: RHOS Documentation Team
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-03 14:47 UTC by Daniel Alvarez Sanchez
Modified: 2023-12-15 16:44 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-02 14:48:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-485 0 None None None 2021-12-22 19:16:49 UTC

Description Daniel Alvarez Sanchez 2019-09-03 14:47:58 UTC
On the context of this BZ1733374 [0] we've seen that ovn-controller can take 100% CPU when GARPs are sent from the external network which seems like a common thing to expect.

Using OSP13 and OVN in certain ways can lead to long times for ovn-controller to process the logical flows (in particular in [0] we saw >23s times). It's unclear where in this particular setup ovn-controller is spending the time on (maybe ACLs?) but over the time we have seen heavy use of certain types of Security Group Rules (such as "port > 0 and < 65535" kind of thing) that we know that increment unnecessarily the amount of OVN Logical Flows and hence the processing time.

The purpose of this BZ is to understand better what we should recommend (and avoid) in OVN setups from an OSP perspective and we will follow up on addressing each of those with independent BZs if required.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1733374

Comment 1 Daniel Alvarez Sanchez 2019-09-04 12:30:07 UTC
As an example, I have come up with this info for the OSP13 case:



> Just in general, would:
> openstack security group rule create --remote-ip 10.4.0.0/8 --dst-port 5000 --protocol tcp --ingress

> have any advantage over:
> openstack security group rule create --remote-group SG_MME --dst-port 5000 --protocol tcp --ingress

In general, you won't see any difference in terms of number of ACLs:

For the first case, this would be the OVN match action in the ACL:

match               : "outport == \"f79e33d0-43ad-4f22-9ec1-9747fddd156d\" && ip4 && ip4.src == 192.168.10.0/24 && tcp && tcp.dst == 5000"

And for the second case:

match               : "outport == \"f79e33d0-43ad-4f22-9ec1-9747fddd156d\" && ip4 && ip4.src == $as_ip4_63c9e87a_4ec6_4a4f_a234_245a947ae7cc && tcp && tcp.dst == 5000"

However, in terms of OpenFlow rules (and hence work on ovn-controller), using the remote-ip is much more efficient.
Let me highlight with one example.

I have created two networks net1 and net2, each with 100 ports. Ports on net1 are on security group sg1 while ports on net2 are on security group seg2.
Only one port from net1 and one port from net2 are bound to this compute node:

# of ACLs:
[root@controller-0 neutron]# docker exec -it ovn-dbs-bundle-docker-0 ovn-nbctl list ACL | grep _uuid -c
1304
# of OF rules in the compute node:
[root@compute-0 ~]# ovs-ofctl dump-flows br-int | wc -l
1376

Now, I'm going to add a security group rule to allow SSH traffic on SG2 coming from SG1:

$ openstack security group rule create sg2 --remote-group sg1 --protocol tcp --dst-port 22

[root@controller-0 neutron]# docker exec -it ovn-dbs-bundle-docker-0 ovn-nbctl list ACL | grep _uuid -c
1402
[root@compute-0 ~]# ovs-ofctl dump-flows br-int | wc -l
1670

I'll delete the previous rule and add this other one:

[root@controller-0 neutron]# docker exec -it ovn-dbs-bundle-docker-0 ovn-nbctl list ACL | grep _uuid -c
1402
[root@compute-0 ovs]# ovs-ofctl dump-flows br-int | wc -l
1379

As you can see, the number of OpenFlow rules increased in 3 when using 'remote-ip' but it did in '126' when referencing a remote security group.
This is more noticeable as the number of ports in the SGs increase.

The reason for this is that when using remote SGs, ovn-controller will install a flow per IP address of each port in the referenced SG so it's really bad.

In later versions of OSP13 and OVN, both number of ACLs and OpenFlow rules are reduced thanks to the use of a feature called Port Groups [0][1].
We are considering moving to OVN 2.11 in OSP13 and bringing Port Groups in as well as we see a lot of benefit.


Thanks,
Daniel

[0] https://docs.openstack.org/networking-ovn/latest/contributor/design/acl_optimizations.html
[1] http://dani.foroselectronica.es/implementing-security-groups-in-openstack-using-ovn-port-groups-478/

Comment 5 Numan Siddique 2020-10-07 08:36:03 UTC
Moving this BZ to networking-ovn. Please move back to OVN if this needs some work in OVN.

Comment 15 James Smith 2021-11-02 14:48:55 UTC

*** This bug has been marked as a duplicate of bug 2017906 ***


Note You need to log in before you can comment on or make changes to this bug.