Bug 1480242

Summary: ovs2.6 doesn't work when bridges using same bond have same mac address
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openvswitchAssignee: Eelco Chaudron <echaudro>
Status: CLOSED NOTABUG QA Contact: Ofer Blaut <oblaut>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: aloughla, apevec, chrisw, echaudro, jmelvin, rhos-maint, srevivo
Target Milestone: ---Keywords: Unconfirmed
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-03 14:02:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2017-08-10 13:20:49 UTC
Description of problem: environment where using ovs 2.6 has br-pub1 and br-pub2 both on bond0. They have duplicate mac addresses and doesn't work. The fix was to change one bridge's mac address. However it works in ovs 2.5 with duplicate mac addresses


Version-Release number of selected component (if applicable):
openvswitch-2.6.1-10.git20161206.el7fdp.x86_64

How reproducible:
100%

Steps to Reproduce:
1.create 2 external network bridges using a bond
2. since both have the same mac address it doesn't work.
3.

Actual results:
duplicate mac address ovs bridges do not work in ovs2.6, does work in ovs2.5

Expected results:
works with duplicate mac addresses since both use same bond.

Additional info:


using  native ovsdb_interface

###info from customer case:

multiple flat networks.
When using one external flat network everything works fine.  However when trying to configure more than one external flat network, neutron-openvswitch-agent disconnects ovs continously.

Failed environment (having latest patches):

[root@srv-51d3-16 ~]# ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.6.1
OpenFlow versions 0x1:0x4

[root@srv-51d3-12 ~]# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini | grep bridge_mappings
bridge_mappings=OS-PFL-Internet:br-pub2,OS-PFL-OCW:br-pub1

[root@srv-51d3-12 ~]# ovs-vsctl list-br
br-int
br-pub1
br-pub2
br-tun

[root@srv-51d3-12 ~]# cat /var/log/openvswitch/ovs-vswitchd.log
<snip>
2017-08-09T11:14:12.826Z|21764|rconn|INFO|br-pub2<->tcp:127.0.0.1:6633: connected
2017-08-09T11:14:12.827Z|21765|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connection closed by peer
2017-08-09T11:14:13.825Z|21766|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connecting...
2017-08-09T11:14:13.826Z|21767|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connected
2017-08-09T11:14:13.827Z|21768|rconn|INFO|br-pub2<->tcp:127.0.0.1:6633: connection closed by peer
2017-08-09T11:14:21.826Z|21769|rconn|INFO|br-pub2<->tcp:127.0.0.1:6633: connected
2017-08-09T11:14:21.827Z|21770|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connection closed by peer
2017-08-09T11:14:22.824Z|21771|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connecting...
2017-08-09T11:14:22.826Z|21772|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connected
2017-08-09T11:14:22.826Z|21773|rconn|INFO|br-pub2<->tcp:127.0.0.1:6633: connection closed by peer
2017-08-09T11:14:30.826Z|21774|rconn|INFO|br-pub2<->tcp:127.0.0.1:6633: connected
2017-08-09T11:14:30.827Z|21775|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connection closed by peer
2017-08-09T11:14:31.824Z|21776|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connecting...
2017-08-09T11:14:31.825Z|21777|rconn|INFO|br-pub1<->tcp:127.0.0.1:6633: connected
2017-08-09T11:14:31.826Z|21778|rconn|INFO|br-pub2<->tcp:127.0.0.1:6633: connection closed by peer

[root@srv-51d3-12 ~]# cat /var/log/messages | grep ovs
Aug  9 12:37:00 srv-51d3-16 ovs-vswitchd: ovs|06403|rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting
Aug  9 12:37:00 srv-51d3-16 ovs-vswitchd: ovs|06404|rconn|ERR|br-pub2<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting

It is constantly resetting its connection therefor not able to set the ovs flows on br-pub2:
[root@srv-51d3-12 ~]# ovs-ofctl dump-flows br-pub2
NXST_FLOW reply (xid=0x4):

Therfore i cannot reach any instance going though networknode srv-51d3-12.


===========================


Working environment:
When removing br-pub1 from the ovs-bridges it works:

[root@srv-51d3-12 ~]# ovs-vsctl --may-exist del-br br-pub1

[root@srv-51d3-12 ~]# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini | grep bridge_mappings
bridge_mappings=OS-PFL-Internet:br-pub2

[root@srv-51d3-12 ~]# systemctl restart neutron-openvswitch-agent neutron-l3-agent

no connection resets in /var/log/openvswitch/ovs-vswitchd.log

It created the flows:
[root@srv-51d3-12 ~]# ovs-ofctl dump-flows br-pub2
NXST_FLOW reply (xid=0x4):
 cookie=0x9fe9ff1beb556098, duration=171.321s, table=0, n_packets=4, n_bytes=304, idle_age=167, priority=4,in_port=2,dl_vlan=3 actions=strip_vlan,NORMAL
 cookie=0x9fe9ff1beb556098, duration=178.966s, table=0, n_packets=602, n_bytes=35636, idle_age=0, priority=2,in_port=2 actions=drop
 cookie=0x9fe9ff1beb556098, duration=179.079s, table=0, n_packets=85, n_bytes=5834, idle_age=2, priority=0 actions=NORMAL

Can ping instance.

===========================
Ok, so we dived in a bit deeper and eventually got it working for br-pub1 with br-pub2. So we found this workaround by accident since we saw that a certain combination of bridge mappings (br-pub2 and br-pub-customerX) did work. The only difference between -br-pub2 and br-pub-customerX- and -br-ub2 and br-pub1- is that it was running in a different bonding interface:

[root@srv-51d3-12 ~]# ovs-vsctl show #copied only relevant information
    Bridge "br-pub2"
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "br-pub2"
            Interface "br-pub2"
                type: internal
        Port "phy-br-pub2"
            Interface "phy-br-pub2"
                type: patch
                options: {peer="int-br-pub2"}
        Port "bond0.216"
            Interface "bond0.216"
    Bridge "br-pub1"
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "bond0.205"
            Interface "bond0.205"
        Port "br-pub1"
            Interface "br-pub1"
                type: internal
        Port "phy-br-pub1"
            Interface "phy-br-pub1"
                type: patch
                options: {peer="int-br-pub1"}
    Bridge br-pub-customerX
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port phy-br-pub-customerX
            Interface phy-br-pub-customerX
                type: patch
                options: {peer=int-br-pub-customerX}
        Port br-pub-customerX
            Interface br-pub-customerX
                type: internal
        Port "bond2"
            Interface "bond2"

So as you can see br-pub-customerX is running on bond2 and br-pub1 and br-pub2 is running on bond0.216 and bond0.205. 

Now the only difference between bond2 and the other bonds is that it got a uniq mac address:
[root@srv-51d3-12 ~]# ip a #copied only relevant information
27: bond0.205@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP qlen 1000
    link/ether 00:90:fa:ae:1a:cd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:feae:1acb/64 scope link 
       valid_lft forever preferred_lft forever
29: bond0.216@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP qlen 1000
    link/ether 00:90:fa:ae:1a:cd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::290:faff:feae:1acd/64 scope link 
       valid_lft forever preferred_lft forever
22: bond2: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP qlen 1000
    link/ether 0c:c4:7a:bc:65:90 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ec4:7aff:febc:6590/64 scope link 
       valid_lft forever preferred_lft forever

So bond 0.216 and bond0.205  both got "00:90:fa:ae:1a:cd ".
bond2 got "0c:c4:7a:bc:65:90"

So changing the last octet of the mac address on bond0.205:
[root@srv-51d3-12 ~]# ip link set addr 00:90:fa:ae:1a:cb dev bond0.205
[root@srv-51d3-12 ~]# ovs-vsctl --may-exist add-br br-pub1
[root@srv-51d3-12 ~]# ovs-vsctl --may-exist add-port br-pub1 bond0.205
[root@srv-51d3-12 ~]# systemctl restart neutron-openvswitch-agent neutron-l3-agent.service
[root@srv-51d3-12 ~]# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini | grep bridge_mappings
bridge_mappings=OS-PFL-Internet:br-pub2,OS-PFL-OCW:br-pub1

And problems are gone! So somehow Openvswitch 2.6.1 is having trouble when multiple bridges have the same mac addresses on different bonding interfaces.