Bug 1123458 - setting libvirt_vif_driver to LibvirtHybridOVSBridgeDriver causes multicast to fail
Summary: setting libvirt_vif_driver to LibvirtHybridOVSBridgeDriver causes multicast t...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.0
Assignee: Brent Eagles
QA Contact: Ami Jeain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-25 17:57 UTC by Jeff Dexter
Modified: 2019-09-09 14:49 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-06 16:56:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Dexter 2014-07-25 17:57:02 UTC
Description of problem:
When using the LibvirtGenericVIFDriver driver causes multicast boadcasts to be dropped after about 200 seconds.  

Changing 
[root@gss-rhos-4 ~]# echo 0 > /sys/devices/virtual/net/tap751c39bc-db/brport/bridge/bridge/multicast_snooping 
[root@gss-rhos-4 ~]# echo 0 > /sys/devices/virtual/net/tap18b1e1ef-5a/brport/bridge/bridge/multicast_snooping 

works

Version-Release number of selected component (if applicable):
Havana Current



How reproducible:
always

Steps to Reproduce:
1.set   libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtHybridOVSBridgeDrive
On node1 and node2 which are a multicast receivers we execute the command: iperf -s -u -B 224.0.67.67 -i 1

On node3 which is the multicast sender we execute the command: iperf -c 224.0.67.67 -u --ttl 5 -t 3600

On the node3 we see the following output:

------------------------------------------------------------
Client connecting to 224.0.67.67, UDP port 5001
Sending 1470 byte datagrams
Setting multicast TTL to 5
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.11.8 port 35976 connected with 224.0.67.67 port 5001



Actual results:
On node1 and node2 we see that the nodes receive multicast traffic:

[  3] 197.0-198.0 sec   128 KBytes  1.05 Mbits/sec   0.036 ms    0/   89 (0%)
[  3] 198.0-199.0 sec   129 KBytes  1.06 Mbits/sec   0.046 ms    0/   90 (0%)
[  3] 199.0-200.0 sec   128 KBytes  1.05 Mbits/sec   0.041 ms    0/   89 (0%)
[  3] 200.0-201.0 sec   128 KBytes  1.05 Mbits/sec   0.043 ms    0/   89 (0%)



Expected results:

[  3] 958.0-959.0 sec   128 KBytes  1.05 Mbits/sec   0.015 ms    0/   89 (0%)
[  3] 959.0-960.0 sec   128 KBytes  1.05 Mbits/sec   0.019 ms    0/   89 (0%)
[  3] 960.0-961.0 sec   128 KBytes  1.05 Mbits/sec   0.012 ms    0/   89 (0%)
[  3] 961.0-962.0 sec   129 KBytes  1.06 Mbits/sec   0.015 ms    0/   90 (0%)
[  3] 962.0-963.0 sec   128 KBytes  1.05 Mbits/sec   0.027 ms    0/   89 (0%)
[  3] 963.0-964.0 sec   128 KBytes  1.05 Mbits/sec   0.025 ms    0/   89 (0%)


Additional info:
Using the libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver it does not support Security groups for them

Comment 1 Brent Eagles 2014-07-28 15:40:31 UTC
Is there an error in the description of this BZ? The BZ title indicates the generic VIF driver, but the steps to reproduce indicates configuring the HybridOVSBridgeDriver.

Comment 2 Brent Eagles 2014-07-28 15:49:05 UTC
Please see above comment for reason for NEEDINFO.

Comment 3 Brent Eagles 2014-07-28 18:30:58 UTC
Clearing NEEDINFO. The hybrid driver does appear to have been configured as would have been required for security group support for the release indicated.

Comment 4 Brent Eagles 2014-07-28 19:03:16 UTC
This appears to be the same issue as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=902922.

Comment 5 Jeff Dexter 2014-07-29 21:24:11 UTC
Brent, I update the title, but the issue is when the customer uses the 
LibvirtHybridOVSBridgeDriver driver, which supports Security groups, it also has multicast_snooping enabled on the brport within the tapdevice

Comment 6 Brent Eagles 2014-07-30 14:38:38 UTC
Considering the similarity to the bz mentioned above, I'd say this is a source of the problem and is not necessarily OpenStack specific. In order to properly address this, we need to:

- Determine whether it is expected to have to disable multicast snooping or not when doing this kind of thing. If it should not be necessary then it looks like a bug with linux bridging or similar and we should fix it there.
- If it is not a bug with linux bridging and it is expected that spoofing be disabled then we need to determine whether this is something libvirt should do when constructing bridges for the VMs, etc. If so, the issue should either be reassigned to libvirt or associated with other similar bugs already reported against libvirt.
- Regardless of either of the above, there probably should be some discussion if this is something that is appropriate to somehow workaround within OpenStack.

Comment 7 Jeff Dexter 2014-07-30 17:39:46 UTC
Brent,
Issue is fixed in RHOS5 as it no longer requires the use of the OVSHybridDriver, At this point finding a workaround that would allow then to allow multicast_snooping for an entire host would be useful.

Comment 8 Brent Eagles 2014-07-30 19:23:19 UTC
The HybridOVSBridgeDriver was obsoleted but the functionality was rolled up into the generic driver. Are you inferring that linux bridges are therefore no longer used, rendering this issue in 5 moot?  Linux bridges are actually still created to implement security groups so if this works in 5 it is for some other reason.

Multicast reliability seems to have been related to kernel versions (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=880035) so maybe there is a kernel fix underway already.

Comment 10 Jeff Dexter 2014-07-31 12:48:53 UTC
Brent,
The issue we have is customer currently has to use the OVSHybrid driver becuase of the security groups not working otherwise.  However when they use the OVSHybrid driver they lose the ability to use multicast.  

this is a moot point becuase they when the upgrade to 5, then both the security groups and multicast work with the generic driver, however they are looking for a workaround or a fix for 4, as that is what they are currently on.

Comment 12 Jeff Dexter 2014-07-31 19:47:15 UTC
Upgrading kernel to 2.6.32-431.23.3.el6.x86_64 solved the issue. Pushing update to customer.

Comment 13 Brent Eagles 2014-08-06 16:56:32 UTC
I'm closing this report as the root cause of the bug is a kernel issue.


Note You need to log in before you can comment on or make changes to this bug.