Bug 1533696 - [RFE] Loop protection: implement verification that prohibits the assignment of 2 bonds or 2 interfaces or a combination thereof to the same bridge
Summary: [RFE] Loop protection: implement verification that prohibits the assignment ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-12 00:39 UTC by Andreas Karis
Modified: 2022-08-24 09:57 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-17 20:10:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-2770 0 None None None 2022-08-24 09:57:39 UTC

Description Andreas Karis 2018-01-12 00:39:33 UTC
Let's assume a catastrophic misconfiguration of the neutron templates which assigns 2 bonds to the same br, br-ex.
~~~
ovs-vsctl_-t_5_show
2362197c-5660-4f7e-8d63-78561f670154
    Bridge br-ex
        fail_mode: standalone
        Port "vlan1165"
            tag: 1165
            Interface "vlan1165"
                type: internal
        Port "vlan1166"
            tag: 1166
            Interface "vlan1166"
                type: internal
        Port br-ex
            Interface br-ex
                type: internal
        Port "vlan1161"
            tag: 1161
            Interface "vlan1161"
                type: internal
        Port "vlan1163"
            tag: 1163
            Interface "vlan1163"
                type: internal
        Port "vlan1164"
            tag: 1164
            Interface "vlan1164"
                type: internal
        Port "vlan1162"
            tag: 1162
            Interface "vlan1162"
                type: internal
        Port "bond0"
            Interface "bond0"
        Port "bond1"
            Interface "bond1"
    ovs_version: "2.6.1"
~~~

By the way, if the user had been using 2 virtual bridges, br-ex and br-ex-two, for example, and had connected one bond to each, then those bridges would only be connected via a virtual patch cable, via br-int, as soon as neutron comes up and configures the flows. Hence, the same issue would not have happend. But this is not the point here for the time being.

Due to:
https://bugzilla.redhat.com/show_bug.cgi?id=1386299
https://bugzilla.redhat.com/show_bug.cgi?id=1372370

OVS needs to start with `failmode: standalone`, and with NORMAL action, meaning that it will flood all traffic, and will function without an SDN controller. Once neutron talks to OVS, it will configure flows accordingly, but before this happens, OVS will flood. It's a chicken/egg problem: if the control plane passes via OVS (the default configuration), then neutron and the rest of the control plane components need to talk to the other controllers first. Then, they can configure OVS according to the control plane exchange. If we block all traffic though unless it was configured with an SDN controller (failmode: secure), then neutron (the SDN controller in this case) could never exchange the control plane status, and hence cannot configure OVS correctly.

We have:
* 
~~~
fail_mode: standalone
~~~
* 
~~~
[akaris@collab-shell sosreport-20180109-142638]$ cat ./oscar01ctr001.bc/sos_commands/openvswitch/ovs-ofctl_dump-flows_br-ex
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=39779.969s, table=0, n_packets=9279622416, n_bytes=625420450894, idle_age=1118, priority=0 actions=NORMAL
~~~

But we also **disable STP during the standalone flooding stage**. From a lab:

We do disable spanning-tree by default:
~~~
[root@overcloud-controller-0 ~]# systemctl stop neutron-oepnvswitch-agent
[root@overcloud-controller-0 ~]# rm -f /etc/openvswitch/conf.db 
[root@overcloud-controller-0 ~]# systemctl restart openvswitch
[root@overcloud-controller-0 ~]# systemctl restart network
[root@overcloud-controller-0 ~]# ovs-vsctl show
d6cdf226-1f72-4008-8e9a-85c284cda586
    Bridge br-ex
        fail_mode: standalone
        Port "eth1"
            Interface "eth1"
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.6.1"
[root@overcloud-controller-0 ~]# ovs-vsctl list Bridge br-ex | grep -i stp
rstp_enable         : false
rstp_status         : {}
stp_enable          : false
[root@overcloud-controller-0 ~]# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=100.066s, table=0, n_packets=3337, n_bytes=464726, idle_age=0, priority=0 actions=NORMAL
[root@overcloud-controller-0 ~]# 
~~~

Of course, this opens doors for users to create catastrophic loops in their networks. We allow the attachment of 2 bonds (or interfaces, for that matter) to the same brige, then we enable normal L2 behavior and disable spanning-tree. 

I think that the solutions for this issue are the following - either or is mandatory, both could be implemented though to be safe:

a) fix this via OVS configuration 
==> enable spanning-tree during the flooding state 
==> once neutron takes over, configures the flows and sets `fail-mode: secure`, disable spanning-tree

b) implement an OSP Director verification that prohibits the assignment of 2 bonds or 2 interfaces or a combination thereof to the same bridge

Comment 1 Andreas Karis 2018-01-12 00:41:39 UTC
This bugzilla here is to address:
 b) implement an OSP Director verification that prohibits the assignment of 2 bonds or 2 interfaces or a combination thereof to the same bridge

Comment 2 pweeks 2021-08-17 20:10:29 UTC
customer case closed, no progress for 3 years
reduction in capacity impacting backlog priorities
closing wontfix
please reopen should this require attention.


Note You need to log in before you can comment on or make changes to this bug.