Bug 1123458

Summary: setting libvirt_vif_driver to LibvirtHybridOVSBridgeDriver causes multicast to fail
Product: Red Hat OpenStack Reporter: Jeff Dexter <jdexter>
Component: openstack-novaAssignee: Brent Eagles <beagles>
Status: CLOSED NOTABUG QA Contact: Ami Jeain <ajeain>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: beagles, benglish, jdexter, ndipanov, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-06 16:56:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Dexter 2014-07-25 17:57:02 UTC
Description of problem:
When using the LibvirtGenericVIFDriver driver causes multicast boadcasts to be dropped after about 200 seconds.  

Changing 
[root@gss-rhos-4 ~]# echo 0 > /sys/devices/virtual/net/tap751c39bc-db/brport/bridge/bridge/multicast_snooping 
[root@gss-rhos-4 ~]# echo 0 > /sys/devices/virtual/net/tap18b1e1ef-5a/brport/bridge/bridge/multicast_snooping 

works

Version-Release number of selected component (if applicable):
Havana Current



How reproducible:
always

Steps to Reproduce:
1.set   libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtHybridOVSBridgeDrive
On node1 and node2 which are a multicast receivers we execute the command: iperf -s -u -B 224.0.67.67 -i 1

On node3 which is the multicast sender we execute the command: iperf -c 224.0.67.67 -u --ttl 5 -t 3600

On the node3 we see the following output:

------------------------------------------------------------
Client connecting to 224.0.67.67, UDP port 5001
Sending 1470 byte datagrams
Setting multicast TTL to 5
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.11.8 port 35976 connected with 224.0.67.67 port 5001



Actual results:
On node1 and node2 we see that the nodes receive multicast traffic:

[  3] 197.0-198.0 sec   128 KBytes  1.05 Mbits/sec   0.036 ms    0/   89 (0%)
[  3] 198.0-199.0 sec   129 KBytes  1.06 Mbits/sec   0.046 ms    0/   90 (0%)
[  3] 199.0-200.0 sec   128 KBytes  1.05 Mbits/sec   0.041 ms    0/   89 (0%)
[  3] 200.0-201.0 sec   128 KBytes  1.05 Mbits/sec   0.043 ms    0/   89 (0%)



Expected results:

[  3] 958.0-959.0 sec   128 KBytes  1.05 Mbits/sec   0.015 ms    0/   89 (0%)
[  3] 959.0-960.0 sec   128 KBytes  1.05 Mbits/sec   0.019 ms    0/   89 (0%)
[  3] 960.0-961.0 sec   128 KBytes  1.05 Mbits/sec   0.012 ms    0/   89 (0%)
[  3] 961.0-962.0 sec   129 KBytes  1.06 Mbits/sec   0.015 ms    0/   90 (0%)
[  3] 962.0-963.0 sec   128 KBytes  1.05 Mbits/sec   0.027 ms    0/   89 (0%)
[  3] 963.0-964.0 sec   128 KBytes  1.05 Mbits/sec   0.025 ms    0/   89 (0%)


Additional info:
Using the libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver it does not support Security groups for them

Comment 1 Brent Eagles 2014-07-28 15:40:31 UTC
Is there an error in the description of this BZ? The BZ title indicates the generic VIF driver, but the steps to reproduce indicates configuring the HybridOVSBridgeDriver.

Comment 2 Brent Eagles 2014-07-28 15:49:05 UTC
Please see above comment for reason for NEEDINFO.

Comment 3 Brent Eagles 2014-07-28 18:30:58 UTC
Clearing NEEDINFO. The hybrid driver does appear to have been configured as would have been required for security group support for the release indicated.

Comment 4 Brent Eagles 2014-07-28 19:03:16 UTC
This appears to be the same issue as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=902922.

Comment 5 Jeff Dexter 2014-07-29 21:24:11 UTC
Brent, I update the title, but the issue is when the customer uses the 
LibvirtHybridOVSBridgeDriver driver, which supports Security groups, it also has multicast_snooping enabled on the brport within the tapdevice

Comment 6 Brent Eagles 2014-07-30 14:38:38 UTC
Considering the similarity to the bz mentioned above, I'd say this is a source of the problem and is not necessarily OpenStack specific. In order to properly address this, we need to:

- Determine whether it is expected to have to disable multicast snooping or not when doing this kind of thing. If it should not be necessary then it looks like a bug with linux bridging or similar and we should fix it there.
- If it is not a bug with linux bridging and it is expected that spoofing be disabled then we need to determine whether this is something libvirt should do when constructing bridges for the VMs, etc. If so, the issue should either be reassigned to libvirt or associated with other similar bugs already reported against libvirt.
- Regardless of either of the above, there probably should be some discussion if this is something that is appropriate to somehow workaround within OpenStack.

Comment 7 Jeff Dexter 2014-07-30 17:39:46 UTC
Brent,
Issue is fixed in RHOS5 as it no longer requires the use of the OVSHybridDriver, At this point finding a workaround that would allow then to allow multicast_snooping for an entire host would be useful.

Comment 8 Brent Eagles 2014-07-30 19:23:19 UTC
The HybridOVSBridgeDriver was obsoleted but the functionality was rolled up into the generic driver. Are you inferring that linux bridges are therefore no longer used, rendering this issue in 5 moot?  Linux bridges are actually still created to implement security groups so if this works in 5 it is for some other reason.

Multicast reliability seems to have been related to kernel versions (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=880035) so maybe there is a kernel fix underway already.

Comment 10 Jeff Dexter 2014-07-31 12:48:53 UTC
Brent,
The issue we have is customer currently has to use the OVSHybrid driver becuase of the security groups not working otherwise.  However when they use the OVSHybrid driver they lose the ability to use multicast.  

this is a moot point becuase they when the upgrade to 5, then both the security groups and multicast work with the generic driver, however they are looking for a workaround or a fix for 4, as that is what they are currently on.

Comment 12 Jeff Dexter 2014-07-31 19:47:15 UTC
Upgrading kernel to 2.6.32-431.23.3.el6.x86_64 solved the issue. Pushing update to customer.

Comment 13 Brent Eagles 2014-08-06 16:56:32 UTC
I'm closing this report as the root cause of the bug is a kernel issue.