Bug 1849162 - Traffic fails to unDNAT without an allow-related ACL existing on the logical switch
Summary: Traffic fails to unDNAT without an allow-related ACL existing on the logical ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks: 1867844
TreeView+ depends on / blocked
 
Reported: 2020-06-19 18:03 UTC by Tim Rozet
Modified: 2020-08-11 13:49 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1867844 (view as bug list)
Environment:
Last Closed: 2020-07-27 05:11:50 UTC
Target Upstream Version:


Attachments (Terms of Use)
logs, dbs (145.18 KB, application/gzip)
2020-06-19 18:03 UTC, Tim Rozet
no flags Details
logs and dbs for when things work (59.11 KB, application/gzip)
2020-06-19 20:24 UTC, Tim Rozet
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:3150 None None None 2020-07-27 05:11:53 UTC

Description Tim Rozet 2020-06-19 18:03:00 UTC
Description of problem:
While deploying ovn-kubernetes without configuring the normal allow-related ACL for mgmt traffic, sometimes deployments will fail because coredns pods cannot become ready. The reason they cannot become ready is that they are unable to contact the K8S API server (north/south traffic). From tcpdump it can be seen that the packet does make from the pod -> api server and is SNAT and DNAT'ed accordingly. However, return traffic is arriving back to the pod with a SYN ACK, but not unDNAT'ed. This causes the pod to send a TCP RST to the unknown endpoint IP:

[root@pod1 /]# tcpdump -i any -nn  -vv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
15:57:38.161176 IP (tos 0x0, ttl 64, id 19562, offset 0, flags [DF], proto TCP (6), length 60)
    10.244.0.4.45584 > 10.96.0.1.443: Flags [S], cksum 0x1587 (incorrect -> 0x0d60), seq 1779773811, win 65280, options [mss 1360,sackOK,TS val 2393262256 ecr 0,nop,wscale 7], length 0
15:57:38.163072 IP (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    172.17.0.2.6443 > 10.244.0.4.45584: Flags [S.], cksum 0xc2ed (correct), seq 550974793, ack 1779773812, win 65160, options [mss 1460,sackOK,TS val 853084249 ecr 2393262256,nop,wscale 7], length 0
15:57:38.163102 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.244.0.4.45584 > 172.17.0.2.6443: Flags [R], cksum 0x9210 (correct), seq 1779773812, win 0, length 0

by simply adding an allow-related ACL that matches nothing, the problem is fixed (this is because it forces all traffic on the switch into conntrack):

ovn-nbctl acl-add ovn-control-plane from-lport 1001 ip4.dst=1.1.1.1 allow-related

17:57:24.431979 IP 10.244.0.5.59458 > 10.96.0.1.443: Flags [S], seq 3193775616, win 65280, options [mss 1360,sackOK,TS val 3432435366 ecr 0,nop,wscale 7], length 0
17:57:24.434159 IP 10.96.0.1.443 > 10.244.0.5.59458: Flags [S.], seq 1148510621, ack 3193775617, win 64704, options [mss 1360,sackOK,TS val 3640936395 ecr 3432435366,nop,wscale 7], length 0
17:57:24.434193 IP 10.244.0.5.59458 > 10.96.0.1.443: Flags [.], ack 1, win 510, options [nop,nop,TS val 3432435368 ecr 3640936395], length 0

More info can be found here:
https://gist.github.com/trozet/d6e42b71f5d8cc3e04dc49a5111f789c

Comment 1 Tim Rozet 2020-06-19 18:03:35 UTC
Created attachment 1698139 [details]
logs, dbs

Comment 2 Tim Rozet 2020-06-19 18:27:06 UTC
For some reason this problem does not happen every deployment, I would say it happens around 50% of the time. I'll attach all the logs,dbs from a working setup as well so they can be compared.

Comment 3 Tim Rozet 2020-06-19 20:24:39 UTC
Created attachment 1698172 [details]
logs and dbs for when things work

Comment 4 Numan Siddique 2020-07-01 19:27:25 UTC
This problem can be hard to address without using conntrack.

I'm working on an approach to send the traffic to conntrack only if necessary as opposed to
sending all the traffic to conntrack even if there is one ACL with allow-related action.

I'm still not sure if that approach would work out fine. But I'm giving a try and working
on a POC. I'll keep updating the status here.

There is another BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1836804 related to this.

So if this approach works, then ovn-k8s can continue to use allow-related (or a new type - allow-reply) ACLs.

Thanks
Numan

Comment 5 Numan Siddique 2020-07-07 13:22:09 UTC
Found the issue. I've submitted the patch to fix it - https://patchwork.ozlabs.org/project/openvswitch/patch/20200707131622.581859-1-numans@ovn.org/

Comment 6 Numan Siddique 2020-07-08 10:24:25 UTC
Steps to reproduce the issue
--------

ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1p1 -- lsp-set-addresses ls1p1 "10:14:00:00:00:04 10.0.0.4"
ovn-nbctl lsp-add ls1 ls1p2 -- lsp-set-addresses ls1p2 "10:14:00:00:00:05 10.0.0.5"

ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:ff:01 10.0.0.1/24
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-addresses ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1

ovn-nbctl lb-add lb1 "10.0.0.10" "10.0.0.5"
ovn-nbctl ls-lb-add ls1 lb1
ovn-nbctl lr-lb-add lr1 lb1

ovn-nbctl lb-add lb2 "10.0.0.20" "10.0.0.5"
ovn-nbctl ls-lb-add ls1 lb2
ovn-nbctl lr-lb-add lr1 lb2

# On any node where ovn-controller is running

ovs-vsctl add-port br-int ls1p1 -- set interface ls1p1 type=internal
ip netns add ls1p1
ip link set ls1p1 netns ls1p1
ip netns exec ls1p1 ip link set lo up
ip netns exec ls1p1 ip link set ls1p1 up
ip netns exec ls1p1 ip link set ls1p1 address 10:14:00:00:00:04
ip netns exec ls1p1 ip addr add 10.0.0.4/24 dev ls1p1
ip netns exec ls1p1 ip route add default via 10.0.0.1 dev ls1p1
ovs-vsctl set Interface ls1p1 external_ids:iface-id=ls1p1


ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal
ip netns add ls1p2
ip link set ls1p2 netns ls1p2
ip netns exec ls1p2 ip link set lo up
ip netns exec ls1p2 ip link set ls1p2 up
ip netns exec ls1p2 ip link set ls1p2 address 10:14:00:00:00:05
ip netns exec ls1p2 ip addr add 10.0.0.5/24 dev ls1p2
ip netns exec ls1p2 ip route add default via 10.0.0.1 dev ls1p2
ovs-vsctl set Interface ls1p2 external_ids:iface-id=ls1p2


# ping to vips. Should work fine
ip netns exec ls1p1 ping 10.0.0.10 -c3
ip netns exec ls1p1 ping 10.0.0.20 -c3


lb=$(ovn-nbctl --bare --columns load_balancer list logical_switch ls1 | cut  -d ' '   -f2)
ovn-nbctl clear load_balancer $lb vips

# Now ping from ls1p1 to the load balancer vip which is still set
lb1=$(ovn-nbctl --bare --columns load_balancer list logical_switch ls1 | cut  -d ' '   -f1)
ovn-nbctl get load_balancer $lb1  vips

If vip set on $lb1 is 10.0.0.20 then

Actual

[root@ovn-chassis-1 ~]# ip netns exec ls1p1 ping 10.0.0.20
PING 10.0.0.20 (10.0.0.20) 56(84) bytes of data.
64 bytes from 10.0.0.5: icmp_seq=1 ttl=64 time=1.13 ms
64 bytes from 10.0.0.5: icmp_seq=2 ttl=64 time=0.126 ms

This is wrong. The reply should be from the VIP - 10.0.0.20

Expected

[root@ovn-chassis-1 ~]# ip netns exec ls1p1 ping 10.0.0.20
PING 10.0.0.20 (10.0.0.20) 56(84) bytes of data.
64 bytes from 10.0.0.20: icmp_seq=1 ttl=64 time=2.19 ms
64 bytes from 10.0.0.20: icmp_seq=2 ttl=64 time=1.30 ms
64 bytes from 10.0.0.20: icmp_seq=3 ttl=64 time=0.165 ms

Comment 8 Dan Williams 2020-07-08 21:03:04 UTC
I tagged the build into OCP 4.6 since we're still under development branch rules there.

Comment 11 ying xu 2020-07-14 02:16:51 UTC
Use the reproducer in comment6,I can reproduce the issue on version:
# rpm -qa|grep ovn
ovn2.13-host-2.13.0-37.el8fdp.x86_64
ovn2.13-2.13.0-37.el8fdp.x86_64
ovn2.13-central-2.13.0-37.el8fdp.x86_64

about half of times,the ping will get the wrong reply ip.
# ip netns exec ls1p1 ping 10.0.0.20
PING 10.0.0.20 (10.0.0.20) 56(84) bytes of data.
64 bytes from 10.0.0.5: icmp_seq=1 ttl=64 time=1.13 ms
64 bytes from 10.0.0.5: icmp_seq=2 ttl=64 time=0.126 ms


on the latest version:
# rpm -qa|grep ovn
ovn2.13-host-2.13.0-39.el8fdp.x86_64
ovn2.13-2.13.0-39.el8fdp.x86_64
ovn2.13-central-2.13.0-39.el8fdp.x86_64

I ran many times,and the ping could get the right reply ip every time.
# ip netns exec ls1p1 ping 10.0.0.20
PING 10.0.0.20 (10.0.0.20) 56(84) bytes of data.
64 bytes from 10.0.0.20: icmp_seq=1 ttl=64 time=2.19 ms
64 bytes from 10.0.0.20: icmp_seq=2 ttl=64 time=1.30 ms

Comment 13 errata-xmlrpc 2020-07-27 05:11:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3150


Note You need to log in before you can comment on or make changes to this bug.