Bug 1718372
| Summary: | [RFE]: OVN sctp support (load balancing) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | lorenzo bianconi <lorenzo.bianconi> | ||||
| Component: | OVN | Assignee: | Mark Michelson <mmichels> | ||||
| Status: | CLOSED ERRATA | QA Contact: | ying xu <yinxu> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | RHEL 8.0 | CC: | ctrautma, fsimonce, jishi, mmichels, rkhan, trozet | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-04-20 19:43:23 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1815217 | ||||||
| Bug Blocks: | 1771572, 1817657, 1818860, 1818862 | ||||||
| Attachments: |
|
||||||
|
Description
lorenzo bianconi
2019-06-07 14:56:35 UTC
Branch that adds SCTP load balancing: https://github.com/putnopvut/ovn/tree/sctp_lb However, it does not add SCTP health checks, so turning health checks on for SCTP load balancers will likely result in bad things happening. Hey Mark, the current patch I believe is programming the wrong flows (both logical and openflow). Pod to pod sctp traffic works, but SCTP traffic through a load balancer (service) is punted to controller in table 14 (OF). Table 18 (OF) has the load balancer flow: cookie=0x85e1a82c, duration=226167.380s, table=18, n_packets=0, n_bytes=0, priority=120,ct_state=+new+trk,sctp,metadata=0x2,nw_dst=10.101.165.35,tp_dst=62324 actions=group:2 [root@ovn-control-plane ~]# ovs-ofctl -O openflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=4,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.3:53)),bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.4:53)) group_id=5,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.3:9153)),bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.4:9153)) group_id=1,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=172.17.0.2:6443)) group_id=2,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.5:62324)) Traffic never making it there because it is being punted in table 14: cookie=0x92de86d0, duration=228619.854s, table=14, n_packets=21, n_bytes=1722, priority=2000,ct_state=-est+trk,sctp,metadata=0x2,nw_dst=10.101.165.35,tp_dst=62324 actions=load:0->NXM_NX_XXREG0[96..127],push:NXM_OF_ETH_SRC[],push:NXM_OF_ETH_DST[],pop:NXM_OF_ETH_SRC[],pop:NXM_OF_ETH_DST[],push:NXM_OF_IP_SRC[],push:NXM_OF_IP_DST[],pop:NXM_OF_IP_SRC[],pop:NXM_OF_IP_DST[],controller(userdata=00.00.00.0a.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1b.00.00.00.01.1c.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1b.00.00.00.01.1e.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1c.00.00.00.01.1c.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1c.00.00.00.01.1e.04.00.20.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.20.00.00.00) Looking at ovn-nb, it looks like something is wrong there with the logical flows. Running ovn-trace results in the same behavior as above. I'm attaching all traces, outputs and dbs. Created attachment 1670559 [details]
logs, dbs
Hi Tim.
The issue appears to be twofold.
First there's an ACL reject rule in place that is causing the flow in table 14 to be installed.
"b290336e-9641-46fb-9248-c17355d0137b":{"action":"reject","match":"ip4.dst==10.101.165.35 && sctp && sctp.dst==62324","name":"69a2ee26-1086-42b6-a4f9-1b880cd047d7-10.101.165.35:62324","priority":1000,"direction":"from-lport"}
If that ACL is removed, then the packet should reach table 18 with no issue.
However, there is a bug in here. I overlooked that ACL reject rules would also need special handling for SCTP. Currently TCP reject rules respond with a TCP RST, and UDP reject rules respond with an ICMP destination unreachable. SCTP appears to be treated like TCP right now, resulting in our attempting to respond with a TCP RST. I will correct this in my next version of the patch. In the meantime, you should be able to get SCTP load balancers working by removing the ACL in question.
As it turns out, I was wrong about the TCP-like behavior of ACL reject rules for SCTP. As it turns out, reject rules are generated in an interesting way. The ones that send a TCP RST ensure that the L4 protocol is TCP. However, they also copy the ACL match, too. So the resulting logical flow is this:
table=6 (ls_in_acl ), priority=2010 , match=(((!ct.trk || !ct.est || (ct.est && ct_label.blocked == 1))) && ip4 && tcp && (ip4.dst==10.101.165.35 && sctp && sctp.dst==62324)), action=(reg0 = 0; eth.dst <-> eth.src; ip4.dst <-> ip4.src; tcp_reset { outport <-> inport; output; };)
Notice how in order to match this has to be both a TCP and an SCTP packet. It's weird because it ensures that the flow is impossible to ever match. This behavior isn't restricted to SCTP; the same would happen if you had a UDP ACL. The actual flow you're hitting is down lower:
table=6 (ls_in_acl ), priority=2000 , match=(((!ct.trk || !ct.est || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==10.101.165.35 && sctp && sctp.dst==62324)), action=(reg0 = 0; eth.dst <-> eth.src; ip4.dst <-> ip4.src; icmp4 { outport <-> inport; output; };)
In this case, your SCTP packet is being responded to with an ICMP destination unreachable message. There's probably a more SCTP-friendly way to respond, but this isn't as bad as us trying to send a TCP RST in response to an SCTP packet.
In other words, there's no immediate need on my part to alter the ACL reject behavior. The issue you're running into should be fixed by removing the offending ACL.
Thanks Mark. The ACL is coming from the service reject stuff we worked on. It just exposed a bug in my code where accidentally both the reject ACL and the load balancer are being configured: https://github.com/ovn-org/ovn-kubernetes/pull/1096#discussion_r393283840 Will fix it and try again. After fixing that issue. Regular SCTP service works. I then tried a nodePort service and that is not working. Not sure why yet. Will look into it tomorrow morning. Debugging this shows that the INIT, and INIT ACK packets are working as expected. However the follow up COOKIE ECHO does not make it to the server. OVN entities:
_uuid : 8b99786c-3dd5-4250-ab9e-b6b47f981679
external_ids : {SCTP_lb_gateway_router=GR_ovn-control-plane}
health_check : []
ip_port_mappings : {}
name : ""
protocol : sctp
vips : {"169.254.33.2:31790"="10.244.0.6:62324"}
router 44ad05d5-4e9b-4926-b4a6-705818ab8c2d (GR_ovn-control-plane)
port rtoe-GR_ovn-control-plane
mac: "e2:0f:73:06:da:4a"
networks: ["169.254.33.2/24"]
port rtoj-GR_ovn-control-plane
mac: "0A:58:64:40:00:01"
networks: ["100.64.0.1/29"]
nat 7f139299-a8f9-4690-8255-289b9c3ef7e1
external ip: "169.254.33.2"
logical ip: "10.244.0.0/16"
type: "snat"
In the below TCP DUMP the flow is like this:
169.254.33.1 (node IP), 169.254.33.2 (OVN router IP on the node), node port is 31790, 10.244.0.6 (SCTP server pod at port 62324), 10.64.0.1 (OVN router IP towards the pod)
Flow:
SCTP request from 169.254.33.1 <random port> -> 169.254.33.2 31790
OVN router SNAT 169.254.33.1 -> 10.64.0.1
OVN router DNAT 169.254.33.2 31790 -> 10.244.0.6 62324 (load balancer)
10.64.0.1 <random port> -> 10.244.0.6 62324 (SCTP server)
###CLIENT TCP DUMP
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:46:32.882074 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108]
15:46:32.882755 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
100.64.0.1.55628 > 10.244.0.6.62324: sctp
1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108]
15:46:32.882805 P 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
10.244.0.6.62324 > 100.64.0.1.55628: sctp
1) [INIT ACK] [init tag: 640125419] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2012023407]
15:46:32.883379 In e2:0f:73:06:da:4a ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
169.254.33.2.31790 > 169.254.33.1.55628: sctp
1) [INIT ACK] [init tag: 640125419] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2012023407]
15:46:32.883398 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:46:35.945987 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 1, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:46:42.409896 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 2, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:46:54.697993 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 3, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:47:18.761936 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 4, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:48:06.891002 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 5, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:49:08.329989 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 6, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:50:09.769943 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 7, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
15:51:11.210000 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 8, offset 0, flags [DF], proto SCTP (132), length 296)
169.254.33.1.55628 > 169.254.33.2.31790: sctp
1) [COOKIE ECHO]
### SERVER
[root@sctpserver /]# tcpdump -i any -vvv -en sctp
dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:27:27.915884 In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
100.64.0.1.53356 > 10.244.0.6.62324: sctp
1) [INIT] [init tag: 3891781292] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3710072835]
15:27:27.916048 Out 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
10.244.0.6.62324 > 100.64.0.1.53356: sctp
1) [INIT ACK] [init tag: 2005164658] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2868032396]
15:40:59.487706 In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
100.64.0.1.41928 > 10.244.0.6.62324: sctp
1) [INIT] [init tag: 583010785] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3194207059]
15:40:59.487814 Out 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
10.244.0.6.62324 > 100.64.0.1.41928: sctp
1) [INIT ACK] [init tag: 3289569359] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 1428594883]
15:46:32.882758 In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
100.64.0.1.55628 > 10.244.0.6.62324: sctp
1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108]
15:46:32.882801 Out 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
10.244.0.6.62324 > 100.64.0.1.55628: sctp
1) [INIT ACK] [init tag: 640125419] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2012023407]
You can see in the above the COOKIE ECHO is never making it to the server. I see in conntrack:
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=52412,dport=31790),reply=(src=10.244.0.6,dst=169.254.33.1,sport=62324,dport=52412),zone=8,protoinfo=(state=COOKIE_ECHOED,vtag_orig=2905074738,vtag_reply=3931006177)
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=52412,dport=31790),reply=(src=169.254.33.2,dst=169.254.33.1,sport=31790,dport=52412),protoinfo=(state=COOKIE_ECHOED,vtag_orig=2905074738,vtag_reply=3931006177)
sctp,orig=(src=169.254.33.1,dst=10.244.0.6,sport=52412,dport=62324),reply=(src=10.244.0.6,dst=100.64.0.1,sport=62324,dport=52412),zone=6,protoinfo=(state=COOKIE_WAIT,vtag_orig=2905074738,vtag_reply=3931006177)
sctp,orig=(src=100.64.0.1,dst=10.244.0.6,sport=52412,dport=62324),reply=(src=10.244.0.6,dst=100.64.0.1,sport=62324,dport=52412),zone=16,protoinfo=(state=COOKIE_WAIT,vtag_orig=2905074738,vtag_reply=3931006177)
This looks to me like the COOKIE ECHO made it into conntrack because the state is transitioned to COOKIE_ECHOED, but somewhere it is dropped before making it out to the server. I cannot seem to locate where that is happening.
Also, in the TCP dump output we see the packet sent from SNAT from the router:
15:46:32.882755 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
100.64.0.1.55628 > 10.244.0.6.62324: sctp
1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108]
However we do not see the COOKIE ECHO packet here. So perhaps it is never getting SNAT'ed.
OVN trace:
[root@ovn-control-plane ~]# ovn-trace --ct new,trk --ovs ext_ovn-control-plane 'inport == "br-local_ovn-control-plane" && eth.dst == e2:0f:73:06:da:4a && eth.src==00:00:a9:fe:21:01 && sctp && sctp.dst==31790 && ip4.src==169.254.33.1 && ip4.dst==169.254.33.2 && ip.ttl==64'
# sctp,reg14=0x1,vlan_tci=0x0000,dl_src=00:00:a9:fe:21:01,dl_dst=e2:0f:73:06:da:4a,nw_src=169.254.33.1,nw_dst=169.254.33.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=31790
ingress(dp="ext_ovn-control-plane", inport="br-local_ovn-control-plane")
------------------------------------------------------------------------
0. ls_in_port_sec_l2 (ovn-northd.c:4511): inport == "br-local_ovn-control-plane", priority 50, uuid c660b3cc
cookie=0xc660b3cc, duration=75121.760s, table=8, n_packets=172151, n_bytes=90295439, priority=50,reg14=0x1,metadata=0x5 actions=resubmit(,9)
next;
13. ls_in_arp_rsp (ovn-northd.c:6079): inport == "br-local_ovn-control-plane", priority 100, uuid fdc35cc9
cookie=0xfdc35cc9, duration=75121.760s, table=21, n_packets=172151, n_bytes=90295439, priority=100,reg14=0x1,metadata=0x5 actions=resubmit(,22)
next;
19. ls_in_l2_lkup (ovn-northd.c:6747): eth.dst == e2:0f:73:06:da:4a, priority 50, uuid ec8722df
cookie=0xec8722df, duration=75121.761s, table=27, n_packets=172151, n_bytes=90295439, priority=50,metadata=0x5,dl_dst=e2:0f:73:06:da:4a actions=set_field:0x2->reg15,resubmit(,32)
outport = "etor-GR_ovn-control-plane";
output;
egress(dp="ext_ovn-control-plane", inport="br-local_ovn-control-plane", outport="etor-GR_ovn-control-plane")
------------------------------------------------------------------------------------------------------------
9. ls_out_port_sec_l2 (ovn-northd.c:4577): outport == "etor-GR_ovn-control-plane", priority 50, uuid 15a5862b
cookie=0x15a5862b, duration=75121.761s, table=49, n_packets=172151, n_bytes=90295439, priority=50,reg15=0x2,metadata=0x5 actions=resubmit(,64)
output;
/* output to "etor-GR_ovn-control-plane", type "l3gateway" */
ingress(dp="GR_ovn-control-plane", inport="rtoe-GR_ovn-control-plane")
----------------------------------------------------------------------
0. lr_in_admission (ovn-northd.c:7823): eth.dst == e2:0f:73:06:da:4a && inport == "rtoe-GR_ovn-control-plane", priority 50, uuid 62058325
cookie=0x62058325, duration=75121.679s, table=8, n_packets=172151, n_bytes=90295439, priority=50,reg14=0x2,metadata=0x3,dl_dst=e2:0f:73:06:da:4a actions=resubmit(,9)
next;
1. lr_in_lookup_neighbor (ovn-northd.c:7872): 1, priority 0, uuid e1e05851
cookie=0xe1e05851, duration=75121.778s, table=9, n_packets=356896, n_bytes=103190334, priority=0,metadata=0x3 actions=load:0x1->OXM_OF_PKT_REG4[3],resubmit(,10)
reg9[3] = 1;
next;
2. lr_in_learn_neighbor (ovn-northd.c:7877): reg9[3] == 1 || reg9[2] == 1, priority 100, uuid ae614fed
cookie=0xae614fed, duration=75121.778s, table=10, n_packets=0, n_bytes=0, priority=100,reg9=0x4/0x4,metadata=0x3 actions=resubmit(,11)
cookie=0xae614fed, duration=75121.776s, table=10, n_packets=356896, n_bytes=103190334, priority=100,reg9=0x8/0x8,metadata=0x3 actions=resubmit(,11)
next;
4. lr_in_defrag (ovn-northd.c:9166): ip && ip4.dst == 169.254.33.2, priority 100, uuid ccfa4178
cookie=0xccfa4178, duration=73292.927s, table=12, n_packets=167997, n_bytes=88230779, priority=100,ip,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=13,zone=NXM_NX_REG11[0..15])
ct_next;
ct_next(ct_state=new|trk)
-------------------------
5. lr_in_unsnat (ovn-northd.c:8755): ip && ip4.dst == 169.254.33.2, priority 90, uuid b6f6d0dd
cookie=0xb6f6d0dd, duration=75121.762s, table=13, n_packets=172148, n_bytes=90295313, priority=90,ip,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=14,zone=NXM_NX_REG12[0..15],nat)
ct_snat;
Hi Tim. I force-pushed changes to my sctp_lb branch (https://github.com/putnopvut/ovn/tree/sctp_lb). This has a couple of fixes. Most notably, it contains a fix for undnat. ovn-northd was previously installing a TCP undnat logical flow where it should have been putting in an SCTP flow. This particular issue seems most likely to be the culprit in the nodeport failure with SCTP. Please give it a try when you can. The same problem still exists. I managed to work around it. The problem comes from table 14: cookie=0x3bfb3d5b, duration=1724.090s, table=14, n_packets=0, n_bytes=0, idle_age=1724, priority=120,ct_state=+est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat) cookie=0xd84870e, duration=1724.090s, table=14, n_packets=6, n_bytes=636, idle_age=257, priority=120,ct_state=+new+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,group:2 The first packet (INIT) will always hit that 2nd flow. However, the follow up COOKIE packet from the client misses that flow. This is due to the ct_state. If instead I add a flow that ignores ct_state to the same table: ovs-ofctl -O openflow15 add-flow br-int 'table=14,priority=121,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769,actions=set_field:0x8/0x8->reg10,group:2' The entire connection then works. Similarly if I add a flow that only includes ct_state trk, it works: cookie=0x0, duration=122.593s, table=14, n_packets=11, n_bytes=1246, idle_age=3, priority=122,ct_state=+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,group:2 So the issue I think is during INIT there must be a ct commit somewhere, and the connection is no longer "new" when the COOKIE ECHO comes, but it also is not "est" because the COOKIE ACK has not occured yet. That's my theory anyway :) I've been googling about how conntrack states are set for SCTP, but so far haven't found any sort of documentation on the matter. So instead I looked at the netfilter code in Linux to see if I could determine what's going on. It's the best documentation. Your theory is correct. The SCTP association is not considered "established" until the COOKIE_ACK is seen by the kernel. Therefore, when the COOKIE_ECHO is received by the kernel, the ct state is neither new nor est. That's why when you dump conntrack tables, you see that the state is COOKIE_ECHOED, because it's a special state used by SCTP. It doesn't cleanly map to OVS's honed down set of universal conntrack states. I think your method may have issues because I *think* that calling "group" each time may result in packets going to different load balancer backends[1]. In your setup, were there multiple load balancer destinations? Using the est state to nat likely forces the packet to go to the same destination as it originally reached when in the new state. I think the correct thing to do here is to modify the first flow of table 14 specifically in the case of SCTP to work on packets that are -new-est+trk. They're not new, they're not established, but they are tracked. What do you think? [1] My certainty of this is not 100%. Yeah I think you are correct. I only had 1 load balancer backend, so for me it didn't matter, but you are right. I tested with flows like this and it works:
[root@ovn-control-plane ~]# ovs-ofctl -O openflow15 dump-flows br-int |grep table=14 |grep sctp
cookie=0x3bfb3d5b, duration=19421.897s, table=14, n_packets=0, n_bytes=0, idle_age=19427, priority=120,ct_state=+est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat)
cookie=0xd84870e, duration=19421.893s, table=14, n_packets=8, n_bytes=680, idle_age=14, priority=120,ct_state=+new+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,group:1
cookie=0x0, duration=255.408s, table=14, n_packets=16, n_bytes=1968, idle_age=14, priority=120,ct_state=-new-est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat)
The interesting thing is no packets ever hit that first flow (+est+trk). I wonder if sctp does not use est state in conntrack with OVS? Or its a bug? I am definitely sending data after the connection gets setup:
01:45:34.825529 In de:59:8b:08:e4:43 ethertype IPv4 (0x0800), length 52: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 36)
169.254.33.2.31769 > 169.254.33.1.55447: sctp
1) [COOKIE ACK]
01:45:34.825841 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 80: (tos 0x2,ECT(0), ttl 64, id 1, offset 0, flags [DF], proto SCTP (132), length 64)
169.254.33.1.55447 > 169.254.33.2.31769: sctp
1) [DATA] (B)(E) [TSN: 1573904206] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload:
0x0000: 4865 6c6c 6f2c 2053 6572 7665 7221 00 Hello,.Server!.]
01:45:34.825958 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 80: (tos 0x2,ECT(0), ttl 62, id 1, offset 0, flags [DF], proto SCTP (132), length 64)
100.64.0.1.55447 > 10.244.0.5.62324: sctp
1) [DATA] (B)(E) [TSN: 1573904206] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload:
0x0000: 4865 6c6c 6f2c 2053 6572 7665 7221 00 Hello,.Server!.]
01:45:34.826032 P 82:d3:0a:f4:00:06 ethertype IPv4 (0x0800), length 64: (tos 0x2,ECT(0), ttl 64, id 60359, offset 0, flags [DF], proto SCTP (132), length 48)
10.244.0.5.62324 > 100.64.0.1.55447: sctp
1) [SACK] [cum ack 1573904206] [a_rwnd 106481] [#gap acks 0] [#dup tsns 0]
01:45:34.826110 In de:59:8b:08:e4:43 ethertype IPv4 (0x0800), length 64: (tos 0x2,ECT(0), ttl 62, id 60359, offset 0, flags [DF], proto SCTP (132), length 48)
169.254.33.2.31769 > 169.254.33.1.55447: sctp
1) [SACK] [cum ack 1573904206] [a_rwnd 106481] [#gap acks 0] [#dup tsns 0]
01:45:34.826216 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 56: (tos 0x2,ECT(0), ttl 64, id 2, offset 0, flags [DF], proto SCTP (132), length 40)
169.254.33.1.55447 > 169.254.33.2.31769: sctp
1) [SHUTDOWN]
Either way we can just add the missing flow for now and then it should work.
Ooh, that is weird. I agree we should add the missing flow for the time being since it works. But I think it's worth bringing up with the OVS dev list that SCTP associations apparently never reach est. There's a lot of flows installed by ovn-northd that operate based on the ct state being "est". If we never reach that state, then this potentially means we'll be seeing odd bugs when dealing with SCTP. I think I have identified the problem is the ct zones are being traversed in the wrong order in tables 13, and 14. See https://bugzilla.redhat.com/show_bug.cgi?id=1815217#c11 for more info. This looks to be an OVN bug. The flows programmed by OVN are like this: cookie=0x85a8499, duration=6057.175s, table=13, n_packets=16083, n_bytes=8005026, idle_age=0, priority=90,ip,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=14,zone=NXM_NX_REG12[0..15],nat) cookie=0x7ee7a259, duration=5152.425s, table=14, n_packets=0, n_bytes=0, idle_age=5152, priority=120,ct_state=+est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31291 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat) Zone REG12 (zone6) is SNAT, while zone REG11 (zone9) is DNAT. Therefore the first flow is saying SNAT, and second is saying perform DNAT. This is a problem because the conntrack entries are: [root@ovn-control-plane ~]# ovs-appctl dpctl/dump-conntrack |grep sctp sctp,orig=(src=100.64.0.1,dst=10.244.0.5,sport=46525,dport=62324),reply=(src=10.244.0.5,dst=100.64.0.1,sport=62324,dport=46525),zone=15,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775) sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=46525,dport=31291),reply=(src=10.244.0.5,dst=169.254.33.1,sport=62324,dport=46525),zone=9,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775) sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=46525,dport=31291),reply=(src=169.254.33.2,dst=169.254.33.1,sport=31291,dport=46525),protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775) sctp,orig=(src=169.254.33.1,dst=10.244.0.5,sport=46525,dport=62324),reply=(src=10.244.0.5,dst=100.64.0.1,sport=62324,dport=46525),zone=6,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775) The problem with the above is SNAT and DNAT are committed in inverse order. This means OVN will hit zone 6 first (table 13 flow): sctp,orig=(src=169.254.33.1,dst=10.244.0.5,sport=46525,dport=62324),reply=(src=10.244.0.5,dst=100.64.0.1,sport=62324,dport=46525),zone=6,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775) However that will result in ct invalid, because the src/dst ip on this packet is 169.254.33.1/169.254.33.2. The correct order of operations is to DNAT lookup first, and then SNAT: sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=46525,dport=31291),reply=(src=10.244.0.5,dst=169.254.33.1,sport=62324,dport=46525),zone=9,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775) In the above this will match and DNAT will happen, then we can SNAT and zone 6 will match. This is achieved by adding the following flows: [root@ovn-control-plane ~]# ovs-ofctl -O openflow15 dump-flows br-int table=13 |grep sctp cookie=0x0, duration=1013.896s, table=13, n_packets=33, n_bytes=4490, idle_age=9, priority=95,sctp,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=14,zone=NXM_NX_REG11[0..15],nat) [root@ovn-control-plane ~]# ovs-ofctl -O openflow15 dump-flows br-int table=14 |grep REG cookie=0x0, duration=44.396s, table=14, n_packets=3, n_bytes=474, idle_age=4, priority=125,ct_state=+est+trk,sctp,metadata=0x3,nw_src=169.254.33.1 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG12[0..15],nat) Notice now zone=REG11 (zone 9, DNAT) is happening first in table 13, and then in table 14 +est now matches, and then I do nat on zone=REG12 (zone 6, SNAT). The connection then works fine. I'm not sure how TCP works and why this just happens for SCTP. I'm going to look into that next. Mark, now that https://bugzilla.redhat.com/show_bug.cgi?id=1815217 is fixed upstream, can help push along your patch for sctp support upstream? Then we can get that in and kick off upstream builds. Thanks. Patch has been pushed to upstream master. Changing state to POST. Patch has been backported downstream. Changing to MODIFIED. this feature wasn't supported on ovn before the fix.
now I verified it on version:
# rpm -qa|grep ovn
ovn2.13-central-2.13.0-11.el8fdp.x86_64
ovn2.13-host-2.13.0-11.el8fdp.x86_64
ovn2.13-2.13.0-11.el8fdp.x86_64
# rpm -qa|grep ovn
ovn2.13-central-2.13.0-11.el7fdp.x86_64
ovn2.13-host-2.13.0-11.el7fdp.x86_64
ovn2.13-2.13.0-11.el7fdp.x86_64
use the topo as below:
server0-----------------ls1------------lr1--------------public
| |
server1 ls2
# ovn-nbctl show
switch b3d69768-75a6-4b08-aaa6-d4c0c74198e5 (public)
port ln_public
type: localnet
addresses: ["unknown"]
port plr1
type: router
addresses: ["00:01:02:0d:0f:01 172.16.1.254 2002::a"]
router-port: lr1p
switch 726d126d-2139-4e87-860b-99c174a9f054 (ls1)
port ls1lr1
type: router
addresses: ["00:01:02:0d:01:01 192.168.0.254 3001::a"]
router-port: lr1ls1
port ls1p3
addresses: ["00:01:02:01:01:04"]
port ls1p2
addresses: ["00:01:02:01:01:02"]
port ls1p1
addresses: ["00:01:02:01:01:01"]
switch 81577be1-b4c0-41c9-96ee-9dcb58d54622 (ls2)
port ls2lr1
type: router
addresses: ["00:01:02:0d:01:02 192.168.1.254 3001:1::a"]
router-port: lr1ls2
port ls2p1
addresses: ["00:01:02:01:01:03"]
router 101fc633-a654-4b3f-9121-523b6afd5701 (lr1)
port lr1ls2
mac: "00:01:02:0d:01:02"
networks: ["192.168.1.254/24", "3001:1::a/64"]
port lr1p
mac: "00:01:02:0d:0f:01"
networks: ["172.16.1.254/24", "2002::a/64"]
gateway chassis: [hv1 hv0]
port lr1ls1
mac: "00:01:02:0d:01:01"
networks: ["192.168.0.254/24", "3001::a/64"]
nat 496c2377-0073-4873-b313-3160a6889337
external ip: "172.16.1.10"
logical ip: "192.168.2.1"
type: "dnat_and_snat"
nat c3afebe4-1cb9-4cbd-8a7a-235db6201a0a
external ip: "2002::100"
logical ip: "3000::100"
type: "dnat_and_snat"
test some scenairos such as: lb on ls, lb on lr, lb on lr behind fip
set verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1501 |