Bug 1393562

Summary: OVS VXLAN port does not receive packets in OSP director ODL deployment
Product: Red Hat OpenStack Reporter: Tim Rozet <trozet>
Component: openstack-tripleo-heat-templatesAssignee: Tim Rozet <trozet>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: apevec, chrisw, egarver, jschluet, mburns, nyechiel, rhel-osp-director-maint, rhos-maint, srevivo, trozet
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.1.0-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2016-12-14 16:31:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1258832    

Description Tim Rozet 2016-11-09 21:07:46 UTC
Description of problem:
VXLAN packets destined to a host are received by the host's ethernet interface, but the vxlan sys port does not pick it up.


Version-Release number of selected component (if applicable):

OSP stable/newton overcloud

[root@controller-0 ~]# cat /etc/*release*
cat: /etc/lsb-release.d: Is a directory
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.3 (Maipo)"

root@controller-0 ~]# uname -a
Linux controller-0.localdomain 3.10.0-513.el7.x86_64 #1 SMP Wed Oct 12 09:41:28 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@controller-0 ~]# rpm -q openvswitch
openvswitch-2.5.0-14.git20160727.el7fdb.x86_64

How reproducible:
Reproduced 3 times (on different setups) with a single compute and single control node

Steps to Reproduce:
1.  Configure bridges with vxlan ports on each node.
2.  Verify ping between nodes across IPs used for vxlan ports.
3.  Configure the bridge's local port with an IP on each node.
4.  Ping from one local bridge port to the other.

Actual results:
ARP request packet is seen on ingress to the node, with correct vxlan header.  However the packet never gets picked up by the vxlan port for OVS and does not make to the OVS local port.

Expected results:
ARP request enters OVS bridge to local port and ARP reply is sent back to the opposite node.

Additional info:

Outputs:
###control node###
[root@controller-0 ~]# ifconfig br1
br1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 16.0.0.2  netmask 255.255.255.0  broadcast 16.0.0.255
        inet6 fe80::7cac:ccff:fed5:bb4e  prefixlen 64  scopeid 0x20<link>
        ether 7e:ac:cc:d5:bb:4e  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5  bytes 438 (438.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@controller-0 ~]# ifconfig eth1
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 15.0.0.1  netmask 255.255.255.0  broadcast 15.0.0.255
        inet6 fe80::5054:ff:feb7:e86d  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:b7:e8:6d  txqueuelen 1000  (Ethernet)
        RX packets 19610  bytes 975230 (952.3 KiB)
        RX errors 0  dropped 9114  overruns 0  frame 0
        TX packets 365  bytes 41420 (40.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@controller-0 ~]# ovs-vsctl show
adad57f0-efae-4976-9f32-3d2e9a3af3e2
    Manager "ptcp:6640"
    Bridge "br1"
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {remote_ip="15.0.0.2"}
        Port "br1"
            Interface "br1"
                type: internal
    ovs_version: "2.5.0"

[root@controller-0 ~]# ping 15.0.0.2
PING 15.0.0.2 (15.0.0.2) 56(84) bytes of data.
64 bytes from 15.0.0.2: icmp_seq=1 ttl=64 time=0.272 ms
64 bytes from 15.0.0.2: icmp_seq=2 ttl=64 time=0.226 ms
64 bytes from 15.0.0.2: icmp_seq=3 ttl=64 time=0.213 ms
^C
--- 15.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 0.213/0.237/0.272/0.025 ms

###compute node ####

[root@compute-0 ~]# ifconfig br1
br1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 16.0.0.1  netmask 255.255.255.0  broadcast 16.0.0.255
        inet6 fe80::2887:54ff:fe7f:6648  prefixlen 64  scopeid 0x20<link>
        ether 2a:87:54:7f:66:48  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 356  bytes 15180 (14.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@compute-0 ~]# ifconfig eth1
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 15.0.0.2  netmask 255.255.255.0  broadcast 15.0.0.255
        inet6 fe80::5054:ff:fe79:998a  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:79:99:8a  txqueuelen 1000  (Ethernet)
        RX packets 9742  bytes 532432 (519.9 KiB)

[root@compute-0 ~]# ovs-vsctl show
9477dad5-0379-4d18-988d-c2416ce67726
    Manager "ptcp:6640"
    Bridge "br1"
        Port "br1"
            Interface "br1"
                type: internal
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {remote_ip="15.0.0.1"}
    ovs_version: "2.5.0"

        RX errors 0  dropped 249  overruns 0  frame 0
        TX packets 10352  bytes 494510 (482.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


####ping from compute br1 port to br1 port on control node###
[root@compute-0 ~]# ping 16.0.0.2
PING 16.0.0.2 (16.0.0.2) 56(84) bytes of data.
From 16.0.0.1 icmp_seq=1 Destination Host Unreachable
From 16.0.0.1 icmp_seq=2 Destination Host Unreachable
From 16.0.0.1 icmp_seq=3 Destination Host Unreachable

####capture of arp request on control node ETH1###
[root@controller-0 ~]# tcpdump -i eth1 port 4789 -e -xx -n 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
20:45:02.823458 52:54:00:79:99:8a > 52:54:00:b7:e8:6d, ethertype IPv4 (0x0800), length 92: 15.0.0.2.57311 > 15.0.0.1.4789: VXLAN, flags [I] (0x08), vni 0
2a:87:54:7f:66:48 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 16.0.0.2 tell 16.0.0.1, length 28
        0x0000:  5254 00b7 e86d 5254 0079 998a 0800 4500
        0x0010:  004e e5eb 4000 4011 36b1 0f00 0002 0f00
        0x0020:  0001 dfdf 12b5 003a 0000 0800 0000 0000
        0x0030:  0000 ffff ffff ffff 2a87 547f 6648 0806
        0x0040:  0001 0800 0604 0001 2a87 547f 6648 1000
        0x0050:  0001 0000 0000 0000 1000 0002
^C
1 packet captured
1 packet received by filter
0 packets dropped by kernel

###capture of arp request on control node br1 (nothing)###
[root@controller-0 ~]# tcpdump -i br1 -e -xx -n 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

###controller ovs stats##
root@controller-0 ~]# ovs-ofctl dump-ports br1
OFPST_PORT reply (xid=0x2): 2 ports
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=5, bytes=438, drop=0, errs=0, coll=0
  port  2: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=5, bytes=438, drop=0, errs=0, coll=0

###capture on compute node ETH1####
[root@compute-0 ~]# tcpdump -i eth1 port 4789 -n -xx -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
20:47:50.864632 52:54:00:79:99:8a > 52:54:00:b7:e8:6d, ethertype IPv4 (0x0800), length 92: 15.0.0.2.57311 > 15.0.0.1.4789: VXLAN, flags [I] (0x08), vni 0
2a:87:54:7f:66:48 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 16.0.0.2 tell 16.0.0.1, length 28
        0x0000:  5254 00b7 e86d 5254 0079 998a 0800 4500
        0x0010:  004e 4067 4000 4011 dc35 0f00 0002 0f00
        0x0020:  0001 dfdf 12b5 003a 0000 0800 0000 0000
        0x0030:  0000 ffff ffff ffff 2a87 547f 6648 0806
        0x0040:  0001 0800 0604 0001 2a87 547f 6648 1000
        0x0050:  0001 0000 0000 0000 1000 0002
^C
1 packet captured
1 packet received by filter
0 packets dropped by kernel

###capture on compute node br1###
[root@compute-0 ~]# tcpdump -i br1 -n -xx -e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br1, link-type EN10MB (Ethernet), capture size 65535 bytes
20:48:35.877332 2a:87:54:7f:66:48 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 16.0.0.2 tell 16.0.0.1, length 28
        0x0000:  ffff ffff ffff 2a87 547f 6648 0806 0001
        0x0010:  0800 0604 0001 2a87 547f 6648 1000 0001
        0x0020:  0000 0000 0000 1000 0002

###compute node ovs stats###
[root@compute-0 ~]# ovs-ofctl dump-ports br1
OFPST_PORT reply (xid=0x2): 2 ports
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=596, bytes=25260, drop=0, errs=0, coll=0
  port  1: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=596, bytes=25260, drop=0, errs=0, coll=0

Comment 1 Eric Garver 2016-11-10 22:08:15 UTC
Hi Tim,

I looked at your setup today. The VXLAN UDP ports were being blocked by iptables. Adding an exception allowed traffic to pass on the overlay.

[heat-admin@compute-0 ~]$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     udp  --  anywhere             anywhere             udp dpt:4789
...


[heat-admin@compute-0 ~]$ ping 16.0.0.1
PING 16.0.0.1 (16.0.0.1) 56(84) bytes of data.
64 bytes from 16.0.0.1: icmp_seq=1 ttl=64 time=0.611 ms
64 bytes from 16.0.0.1: icmp_seq=2 ttl=64 time=0.206 ms
64 bytes from 16.0.0.1: icmp_seq=3 ttl=64 time=0.236 ms
^C
--- 16.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.206/0.351/0.611/0.184 ms

Comment 2 Tim Rozet 2016-11-11 18:44:53 UTC
Thanks Eric for debugging on my the setup.  As you thought it looks like there is a bug in how firewall is being configured with TripleO.  I filed it upstream as
https://bugs.launchpad.net/tripleo/+bug/1641191

There is no bug with OVS or the kernel, the problem is VXLAN traffic is being blocked by iptables because TripleO firewall is not configured to allow it if neutron OVS agent is not being used.

Going to move this bug to OSP Director and provide a fix upstream.

Comment 3 Nir Yechiel 2016-11-14 09:44:44 UTC
Code is merged in master branch. Will need to be backported to stable/newton.

Comment 4 Nir Yechiel 2016-11-14 09:52:41 UTC
(In reply to Tim Rozet from comment #2)
> Thanks Eric for debugging on my the setup.  As you thought it looks like
> there is a bug in how firewall is being configured with TripleO.  I filed it
> upstream as
> https://bugs.launchpad.net/tripleo/+bug/1641191
> 
> There is no bug with OVS or the kernel, the problem is VXLAN traffic is
> being blocked by iptables because TripleO firewall is not configured to
> allow it if neutron OVS agent is not being used.
> 
> Going to move this bug to OSP Director and provide a fix upstream.

Thanks Eric and Tim for the collaboration and quick turnaround!

/Nir

Comment 6 Itzik Brown 2016-11-21 07:06:18 UTC
Verified with openstack-tripleo-heat-templates-5.1.0-3.el7ost.noarch

Comment 9 errata-xmlrpc 2016-12-14 16:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html