Bug 1464061 - [Open vSwitch] Traffic between two VMs having FIP is not working if the VMs are in the same compute node [NEEDINFO]
[Open vSwitch] Traffic between two VMs having FIP is not working if the VMs a...
Status: ASSIGNED
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight (Show other bugs)
10.0 (Newton)
Unspecified Unspecified
high Severity high
: Upstream M3
: 13.0 (Queens)
Assigned To: lpeer
Itzik Brown
: Triaged
Depends On: 1501418
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-22 07:09 EDT by Aswin Suryanarayanan
Modified: 2017-10-12 10:36 EDT (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1501415 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
nyechiel: needinfo? (amuller)


Attachments (Terms of Use)

  None (edit)
Description Aswin Suryanarayanan 2017-06-22 07:09:33 EDT
Description of problem:
Traffic between Two VM having FIP is not working if the VM are in the same compute node when Openstack is installed with Opendaylight as the network controller.

The packet is dropped by security groups which is implemented using ovs-conntrack. The netfilter fails to receive some of the packets submitted from the pipeline and marks it as invalid.

Version-Release number of selected component (if applicable):


How reproducible:
A Openstack setup with opendaylight is required.
Steps to Reproduce:
1.Spawn two VM in the same compute node. 
2.Assosiate  FIP both the vms
3.SSH from vm1 to vm2 using the FIP

Actual results:
SSH should succeed.

Expected results:
SSH is failing

Additional info: Thread regarding the issue ovs-discuss[1]. A similar issue is observed in Ovn controller as well.

[1]https://mail.openvswitch.org/pipermail/ovs-discuss/2017-June/044613.html
Comment 1 Flavio Leitner 2017-06-22 14:06:54 EDT
Please attach a sosreport from the system reproducing the issue.
Comment 3 Numan Siddique 2017-06-22 14:35:04 EDT
The issue can be reproduces using this script here [1] when OVN is used.

[1] - https://gist.github.com/russellb/4ab0a9641f12f8ac66fdd6822ee7789e
Comment 4 Numan Siddique 2017-06-22 14:37:20 EDT
I tried fixing the issue and proposed the RFC patch - https://patchwork.ozlabs.org/patch/739796/, but that was not the right approach.
Please see the comments for more details.
Comment 6 Aswin Suryanarayanan 2017-06-23 04:34:02 EDT
(In reply to Flavio Leitner from comment #1)
> Please attach a sosreport from the system reproducing the issue.

The issue can be reproduced with
two namespace using the steps in [1] in ovs 2.7.

With [1]

>From 10.100.5.8 if I try to ping/ssh 10.100.5.9 it works, but not when I
try ping/ssh to 192.168.56.32 from 10.100.5.8.

But it seems to work if I track them in two different ct zones as below(in
40,41,251,252)

"table=40,priority=61010,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,actions=ct(table=41,zone=5001)"
"table=40,priority=61010,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,actions=ct(table=41,zone=5002)"

"table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,actions=ct(commit,zone=5001),resubmit(,21)"
"table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,actions=ct(commit,zone=5002),resubmit(,21)"

[1]https://gist.github.com/aswinsuryan/c22919576ae19e14ed489bf1f6c668cb
Comment 7 Nir Yechiel 2017-07-05 09:49:43 EDT
This bug affects both OVN and OpenDaylight, and therefore is high prio for RHOSP use cases.
Comment 9 Eric Garver 2017-07-05 14:48:27 EDT
(In reply to Aswin Suryanarayanan from comment #6)
> (In reply to Flavio Leitner from comment #1)
> > Please attach a sosreport from the system reproducing the issue.
> 
> The issue can be reproduced with
> two namespace using the steps in [1] in ovs 2.7.

I verified that it affects current upstream/master as well.

> 
> With [1]
> 
> >From 10.100.5.8 if I try to ping/ssh 10.100.5.9 it works, but not when I
> try ping/ssh to 192.168.56.32 from 10.100.5.8.
> 
> But it seems to work if I track them in two different ct zones as below(in
> 40,41,251,252)
> 
> "table=40,priority=61010,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,
> actions=ct(table=41,zone=5001)"
> "table=40,priority=61010,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,
> actions=ct(table=41,zone=5002)"
> 
> "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:1d:3d:01,
> nw_src=10.100.5.8,actions=ct(commit,zone=5001),resubmit(,21)"
> "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:13:85:be,
> nw_src=10.100.5.9,actions=ct(commit,zone=5002),resubmit(,21)"
> 
> [1]https://gist.github.com/aswinsuryan/c22919576ae19e14ed489bf1f6c668cb

I also verified that using different zones works. So that's the current work around at the moment.
Comment 10 Numan Siddique 2017-07-06 05:41:16 EDT
I did some testing locally and I shared my observations here - https://mail.openvswitch.org/pipermail/ovs-discuss/2017-July/044879.html.

Looks to me, either using a different zone as Eric mentioned or by-passing connection tracking for icmp packets for router ip seems to me the work around.
Comment 21 Nir Yechiel 2017-07-26 07:21:56 EDT
BZ 1475273 was reported to track an immediate fix in OpenDaylight/Netvirt. 

This bug is going to be used to track a long term fix in OVS.

Note You need to log in before you can comment on or make changes to this bug.