Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1533697

Summary: Loop protection: Enable spanning-tree in fail-mode: standalone and disable spanning-tree in fail-mode secure
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openstack-neutronAssignee: Brent Eagles <beagles>
Status: CLOSED DUPLICATE QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: amuller, chrisw, jelle.hoylaerts, jlibosva, nyechiel, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1533698 (view as bug list) Environment:
Last Closed: 2018-01-22 14:50:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1533698    

Description Andreas Karis 2018-01-12 00:40:59 UTC
Let's assume a catastrophic misconfiguration of the neutron templates which assigns 2 bonds to the same br, br-ex.
~~~
ovs-vsctl_-t_5_show
2362197c-5660-4f7e-8d63-78561f670154
    Bridge br-ex
        fail_mode: standalone
        Port "vlan1165"
            tag: 1165
            Interface "vlan1165"
                type: internal
        Port "vlan1166"
            tag: 1166
            Interface "vlan1166"
                type: internal
        Port br-ex
            Interface br-ex
                type: internal
        Port "vlan1161"
            tag: 1161
            Interface "vlan1161"
                type: internal
        Port "vlan1163"
            tag: 1163
            Interface "vlan1163"
                type: internal
        Port "vlan1164"
            tag: 1164
            Interface "vlan1164"
                type: internal
        Port "vlan1162"
            tag: 1162
            Interface "vlan1162"
                type: internal
        Port "bond0"
            Interface "bond0"
        Port "bond1"
            Interface "bond1"
    ovs_version: "2.6.1"
~~~

By the way, if the user had been using 2 virtual bridges, br-ex and br-ex-two, for example, and had connected one bond to each, then those bridges would only be connected via a virtual patch cable, via br-int, as soon as neutron comes up and configures the flows. Hence, the same issue would not have happend. But this is not the point here for the time being.

Due to:
https://bugzilla.redhat.com/show_bug.cgi?id=1386299
https://bugzilla.redhat.com/show_bug.cgi?id=1372370

OVS needs to start with `failmode: standalone`, and with NORMAL action, meaning that it will flood all traffic, and will function without an SDN controller. Once neutron talks to OVS, it will configure flows accordingly, but before this happens, OVS will flood. It's a chicken/egg problem: if the control plane passes via OVS (the default configuration), then neutron and the rest of the control plane components need to talk to the other controllers first. Then, they can configure OVS according to the control plane exchange. If we block all traffic though unless it was configured with an SDN controller (failmode: secure), then neutron (the SDN controller in this case) could never exchange the control plane status, and hence cannot configure OVS correctly.

We have:
* 
~~~
fail_mode: standalone
~~~
* 
~~~
[akaris@collab-shell sosreport-20180109-142638]$ cat ./oscar01ctr001.bc/sos_commands/openvswitch/ovs-ofctl_dump-flows_br-ex
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=39779.969s, table=0, n_packets=9279622416, n_bytes=625420450894, idle_age=1118, priority=0 actions=NORMAL
~~~

But we also **disable STP during the standalone flooding stage**. From a lab:

We do disable spanning-tree by default:
~~~
[root@overcloud-controller-0 ~]# systemctl stop neutron-oepnvswitch-agent
[root@overcloud-controller-0 ~]# rm -f /etc/openvswitch/conf.db 
[root@overcloud-controller-0 ~]# systemctl restart openvswitch
[root@overcloud-controller-0 ~]# systemctl restart network
[root@overcloud-controller-0 ~]# ovs-vsctl show
d6cdf226-1f72-4008-8e9a-85c284cda586
    Bridge br-ex
        fail_mode: standalone
        Port "eth1"
            Interface "eth1"
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.6.1"
[root@overcloud-controller-0 ~]# ovs-vsctl list Bridge br-ex | grep -i stp
rstp_enable         : false
rstp_status         : {}
stp_enable          : false
[root@overcloud-controller-0 ~]# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=100.066s, table=0, n_packets=3337, n_bytes=464726, idle_age=0, priority=0 actions=NORMAL
[root@overcloud-controller-0 ~]# 
~~~

Of course, this opens doors for users to create catastrophic loops in their networks. We allow the attachment of 2 bonds (or interfaces, for that matter) to the same brige, then we enable normal L2 behavior and disable spanning-tree. 

I think that the solutions for this issue are the following - either or is mandatory, both could be implemented though to be safe:

a) fix this via OVS configuration 
==> enable spanning-tree during the flooding state 
==> once neutron takes over, configures the flows and sets `fail-mode: secure`, disable spanning-tree

b) implement an OSP Director verification that prohibits the assignment of 2 bonds or 2 interfaces or a combination thereof to the same bridge

Comment 1 Andreas Karis 2018-01-12 00:41:58 UTC
This bugzilla here is to address 
a) fix this via OVS configuration 
==> enable spanning-tree during the flooding state 
==> once neutron takes over, configures the flows and sets `fail-mode: secure`, disable spanning-tree

from the neutron side of things