+++ This bug was initially created as a clone of Bug #1533697 +++ Let's assume a catastrophic misconfiguration of the neutron templates which assigns 2 bonds to the same br, br-ex. ~~~ ovs-vsctl_-t_5_show 2362197c-5660-4f7e-8d63-78561f670154 Bridge br-ex fail_mode: standalone Port "vlan1165" tag: 1165 Interface "vlan1165" type: internal Port "vlan1166" tag: 1166 Interface "vlan1166" type: internal Port br-ex Interface br-ex type: internal Port "vlan1161" tag: 1161 Interface "vlan1161" type: internal Port "vlan1163" tag: 1163 Interface "vlan1163" type: internal Port "vlan1164" tag: 1164 Interface "vlan1164" type: internal Port "vlan1162" tag: 1162 Interface "vlan1162" type: internal Port "bond0" Interface "bond0" Port "bond1" Interface "bond1" ovs_version: "2.6.1" ~~~ By the way, if the user had been using 2 virtual bridges, br-ex and br-ex-two, for example, and had connected one bond to each, then those bridges would only be connected via a virtual patch cable, via br-int, as soon as neutron comes up and configures the flows. Hence, the same issue would not have happend. But this is not the point here for the time being. Due to: https://bugzilla.redhat.com/show_bug.cgi?id=1386299 https://bugzilla.redhat.com/show_bug.cgi?id=1372370 OVS needs to start with `failmode: standalone`, and with NORMAL action, meaning that it will flood all traffic, and will function without an SDN controller. Once neutron talks to OVS, it will configure flows accordingly, but before this happens, OVS will flood. It's a chicken/egg problem: if the control plane passes via OVS (the default configuration), then neutron and the rest of the control plane components need to talk to the other controllers first. Then, they can configure OVS according to the control plane exchange. If we block all traffic though unless it was configured with an SDN controller (failmode: secure), then neutron (the SDN controller in this case) could never exchange the control plane status, and hence cannot configure OVS correctly. We have: * ~~~ fail_mode: standalone ~~~ * ~~~ [akaris@collab-shell sosreport-20180109-142638]$ cat ./oscar01ctr001.bc/sos_commands/openvswitch/ovs-ofctl_dump-flows_br-ex NXST_FLOW reply (xid=0x4): cookie=0x0, duration=39779.969s, table=0, n_packets=9279622416, n_bytes=625420450894, idle_age=1118, priority=0 actions=NORMAL ~~~ But we also **disable STP during the standalone flooding stage**. From a lab: We do disable spanning-tree by default: ~~~ [root@overcloud-controller-0 ~]# systemctl stop neutron-oepnvswitch-agent [root@overcloud-controller-0 ~]# rm -f /etc/openvswitch/conf.db [root@overcloud-controller-0 ~]# systemctl restart openvswitch [root@overcloud-controller-0 ~]# systemctl restart network [root@overcloud-controller-0 ~]# ovs-vsctl show d6cdf226-1f72-4008-8e9a-85c284cda586 Bridge br-ex fail_mode: standalone Port "eth1" Interface "eth1" Port br-ex Interface br-ex type: internal ovs_version: "2.6.1" [root@overcloud-controller-0 ~]# ovs-vsctl list Bridge br-ex | grep -i stp rstp_enable : false rstp_status : {} stp_enable : false [root@overcloud-controller-0 ~]# ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0x0, duration=100.066s, table=0, n_packets=3337, n_bytes=464726, idle_age=0, priority=0 actions=NORMAL [root@overcloud-controller-0 ~]# ~~~ Of course, this opens doors for users to create catastrophic loops in their networks. We allow the attachment of 2 bonds (or interfaces, for that matter) to the same brige, then we enable normal L2 behavior and disable spanning-tree. I think that the solutions for this issue are the following - either or is mandatory, both could be implemented though to be safe: a) fix this via OVS configuration ==> enable spanning-tree during the flooding state ==> once neutron takes over, configures the flows and sets `fail-mode: secure`, disable spanning-tree b) implement an OSP Director verification that prohibits the assignment of 2 bonds or 2 interfaces or a combination thereof to the same bridge
This bugzilla here is to address a) fix this via OVS configuration ==> enable spanning-tree during the flooding state ==> once neutron takes over, configures the flows and sets `fail-mode: secure`, disable spanning-tree from the tripleo-heat-template side of things, enabling STP by default when fail-mode: standalone is enabled
This request to change OVS configuration falls under the networking DFG, so moving it there.
*** Bug 1533697 has been marked as a duplicate of this bug. ***
Layering in workarounds for handling unsupported network configuration is not a sustainable approach. A more reasonable approach would be to treat this as an RFE for some form of validation on deployment that the network configuration is reasonable.
Related bug to this one would be: https://bugzilla.redhat.com/show_bug.cgi?id=1533697 (to fix this with neutron/OVS) The BZ for the verification of user input is https://bugzilla.redhat.com/show_bug.cgi?id=1533696 ~~~ implement an OSP Director verification that prohibits the assignment of 2 bonds or 2 interfaces or a combination thereof to the same bridge ~~~ I agree that 1533696 is the quicker win and easier to fix, so we should focus on that one (I wouldn't call this an RFE though, I still think it's a bug). I'd still like to see some review of our current bridge configuration eventually, because I think that STP should be enabled there while in normal switch mode (before neutron takes over as a controller).
Andreas, do you feel that we should enable stp on all OVS bridges, or only the ones used by neutron for tenant traffic. We can absolutely do configure bridges with stp_enable passed in through heat templates, but that's error-prone from the user perspective. What are your thoughts on enabling it by default is os-net-config instead?
Current release supports the validations framework and DFGs are enabled to create their own functional validations. Moving back to networking to prioritize or close.