Bug 1464061

Summary: Traffic between two VMs having FIP is not working if the VMs are in the same compute node
Product: Red Hat OpenStack Reporter: Aswin Suryanarayanan <asuryana>
Component: opendaylightAssignee: Aswin Suryanarayanan <asuryana>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: aloughla, amuller, apevec, asuryana, atragler, chrisw, fleitner, lpeer, mkolesni, nusiddiq, nyechiel, oblaut, rhos-maint, rkhan, sgaddam, shague, srevivo, sukulkar, tbarron
Target Milestone: Upstream M3Keywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: opendaylight-8.0.0-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1501415 (view as bug list) Environment:
N/A
Last Closed: 2018-06-27 13:31:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1501418    
Bug Blocks:    

Description Aswin Suryanarayanan 2017-06-22 11:09:33 UTC
Description of problem:
Traffic between Two VM having FIP is not working if the VM are in the same compute node when Openstack is installed with Opendaylight as the network controller.

The packet is dropped by security groups which is implemented using ovs-conntrack. The netfilter fails to receive some of the packets submitted from the pipeline and marks it as invalid.

Version-Release number of selected component (if applicable):


How reproducible:
A Openstack setup with opendaylight is required.
Steps to Reproduce:
1.Spawn two VM in the same compute node. 
2.Assosiate  FIP both the vms
3.SSH from vm1 to vm2 using the FIP

Actual results:
SSH should succeed.

Expected results:
SSH is failing

Additional info: Thread regarding the issue ovs-discuss[1]. A similar issue is observed in Ovn controller as well.

[1]https://mail.openvswitch.org/pipermail/ovs-discuss/2017-June/044613.html

Comment 1 Flavio Leitner 2017-06-22 18:06:54 UTC
Please attach a sosreport from the system reproducing the issue.

Comment 3 Numan Siddique 2017-06-22 18:35:04 UTC
The issue can be reproduces using this script here [1] when OVN is used.

[1] - https://gist.github.com/russellb/4ab0a9641f12f8ac66fdd6822ee7789e

Comment 4 Numan Siddique 2017-06-22 18:37:20 UTC
I tried fixing the issue and proposed the RFC patch - https://patchwork.ozlabs.org/patch/739796/, but that was not the right approach.
Please see the comments for more details.

Comment 6 Aswin Suryanarayanan 2017-06-23 08:34:02 UTC
(In reply to Flavio Leitner from comment #1)
> Please attach a sosreport from the system reproducing the issue.

The issue can be reproduced with
two namespace using the steps in [1] in ovs 2.7.

With [1]

>From 10.100.5.8 if I try to ping/ssh 10.100.5.9 it works, but not when I
try ping/ssh to 192.168.56.32 from 10.100.5.8.

But it seems to work if I track them in two different ct zones as below(in
40,41,251,252)

"table=40,priority=61010,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,actions=ct(table=41,zone=5001)"
"table=40,priority=61010,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,actions=ct(table=41,zone=5002)"

"table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,actions=ct(commit,zone=5001),resubmit(,21)"
"table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,actions=ct(commit,zone=5002),resubmit(,21)"

[1]https://gist.github.com/aswinsuryan/c22919576ae19e14ed489bf1f6c668cb

Comment 7 Nir Yechiel 2017-07-05 13:49:43 UTC
This bug affects both OVN and OpenDaylight, and therefore is high prio for RHOSP use cases.

Comment 9 Eric Garver 2017-07-05 18:48:27 UTC
(In reply to Aswin Suryanarayanan from comment #6)
> (In reply to Flavio Leitner from comment #1)
> > Please attach a sosreport from the system reproducing the issue.
> 
> The issue can be reproduced with
> two namespace using the steps in [1] in ovs 2.7.

I verified that it affects current upstream/master as well.

> 
> With [1]
> 
> >From 10.100.5.8 if I try to ping/ssh 10.100.5.9 it works, but not when I
> try ping/ssh to 192.168.56.32 from 10.100.5.8.
> 
> But it seems to work if I track them in two different ct zones as below(in
> 40,41,251,252)
> 
> "table=40,priority=61010,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,
> actions=ct(table=41,zone=5001)"
> "table=40,priority=61010,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,
> actions=ct(table=41,zone=5002)"
> 
> "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:1d:3d:01,
> nw_src=10.100.5.8,actions=ct(commit,zone=5001),resubmit(,21)"
> "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:13:85:be,
> nw_src=10.100.5.9,actions=ct(commit,zone=5002),resubmit(,21)"
> 
> [1]https://gist.github.com/aswinsuryan/c22919576ae19e14ed489bf1f6c668cb

I also verified that using different zones works. So that's the current work around at the moment.

Comment 10 Numan Siddique 2017-07-06 09:41:16 UTC
I did some testing locally and I shared my observations here - https://mail.openvswitch.org/pipermail/ovs-discuss/2017-July/044879.html.

Looks to me, either using a different zone as Eric mentioned or by-passing connection tracking for icmp packets for router ip seems to me the work around.

Comment 21 Nir Yechiel 2017-07-26 11:21:56 UTC
BZ 1475273 was reported to track an immediate fix in OpenDaylight/Netvirt. 

This bug is going to be used to track a long term fix in OVS.

Comment 27 Aswin Suryanarayanan 2018-02-22 15:51:20 UTC
Once the dependent ovs bug is merged, the temporary work around needs to be removed and we need to use the new ct_clear action in ODL pipeline.

Comment 29 Itzik Brown 2018-03-26 02:22:11 UTC
Verified with
ovs 2.9.0
opendaylight-8.0.0-3.el7ost.noarch

Comment 31 errata-xmlrpc 2018-06-27 13:31:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086