Description of problem: Traffic between Two VM having FIP is not working if the VM are in the same compute node when Openstack is installed with Opendaylight as the network controller. The packet is dropped by security groups which is implemented using ovs-conntrack. The netfilter fails to receive some of the packets submitted from the pipeline and marks it as invalid. Version-Release number of selected component (if applicable): How reproducible: A Openstack setup with opendaylight is required. Steps to Reproduce: 1.Spawn two VM in the same compute node. 2.Assosiate FIP both the vms 3.SSH from vm1 to vm2 using the FIP Actual results: SSH should succeed. Expected results: SSH is failing Additional info: Thread regarding the issue ovs-discuss[1]. A similar issue is observed in Ovn controller as well. [1]https://mail.openvswitch.org/pipermail/ovs-discuss/2017-June/044613.html
Please attach a sosreport from the system reproducing the issue.
The issue can be reproduces using this script here [1] when OVN is used. [1] - https://gist.github.com/russellb/4ab0a9641f12f8ac66fdd6822ee7789e
I tried fixing the issue and proposed the RFC patch - https://patchwork.ozlabs.org/patch/739796/, but that was not the right approach. Please see the comments for more details.
(In reply to Flavio Leitner from comment #1) > Please attach a sosreport from the system reproducing the issue. The issue can be reproduced with two namespace using the steps in [1] in ovs 2.7. With [1] >From 10.100.5.8 if I try to ping/ssh 10.100.5.9 it works, but not when I try ping/ssh to 192.168.56.32 from 10.100.5.8. But it seems to work if I track them in two different ct zones as below(in 40,41,251,252) "table=40,priority=61010,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,actions=ct(table=41,zone=5001)" "table=40,priority=61010,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,actions=ct(table=41,zone=5002)" "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8,actions=ct(commit,zone=5001),resubmit(,21)" "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9,actions=ct(commit,zone=5002),resubmit(,21)" [1]https://gist.github.com/aswinsuryan/c22919576ae19e14ed489bf1f6c668cb
This bug affects both OVN and OpenDaylight, and therefore is high prio for RHOSP use cases.
(In reply to Aswin Suryanarayanan from comment #6) > (In reply to Flavio Leitner from comment #1) > > Please attach a sosreport from the system reproducing the issue. > > The issue can be reproduced with > two namespace using the steps in [1] in ovs 2.7. I verified that it affects current upstream/master as well. > > With [1] > > >From 10.100.5.8 if I try to ping/ssh 10.100.5.9 it works, but not when I > try ping/ssh to 192.168.56.32 from 10.100.5.8. > > But it seems to work if I track them in two different ct zones as below(in > 40,41,251,252) > > "table=40,priority=61010,ip,dl_src=fa:16:3e:1d:3d:01,nw_src=10.100.5.8, > actions=ct(table=41,zone=5001)" > "table=40,priority=61010,ip,dl_src=fa:16:3e:13:85:be,nw_src=10.100.5.9, > actions=ct(table=41,zone=5002)" > > "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:1d:3d:01, > nw_src=10.100.5.8,actions=ct(commit,zone=5001),resubmit(,21)" > "table=41,priority=1000,ct_state=+new+trk,ip,dl_src=fa:16:3e:13:85:be, > nw_src=10.100.5.9,actions=ct(commit,zone=5002),resubmit(,21)" > > [1]https://gist.github.com/aswinsuryan/c22919576ae19e14ed489bf1f6c668cb I also verified that using different zones works. So that's the current work around at the moment.
I did some testing locally and I shared my observations here - https://mail.openvswitch.org/pipermail/ovs-discuss/2017-July/044879.html. Looks to me, either using a different zone as Eric mentioned or by-passing connection tracking for icmp packets for router ip seems to me the work around.
BZ 1475273 was reported to track an immediate fix in OpenDaylight/Netvirt. This bug is going to be used to track a long term fix in OVS.
Once the dependent ovs bug is merged, the temporary work around needs to be removed and we need to use the new ct_clear action in ODL pipeline.
Verified with ovs 2.9.0 opendaylight-8.0.0-3.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086