Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1718372

Summary:

[RFE]: OVN sctp support (load balancing)

Product:

Red Hat Enterprise Linux Fast Datapath

Reporter:

lorenzo bianconi <lorenzo.bianconi>

Component:

OVN

Assignee:

Mark Michelson <mmichels>

Status:

CLOSED ERRATA

QA Contact:

ying xu <yinxu>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

RHEL 8.0

CC:

ctrautma, fsimonce, jishi, mmichels, rkhan, trozet

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-04-20 19:43:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1815217

Bug Blocks:

1771572, 1817657, 1818860, 1818862

Attachments:

Description	Flags
logs, dbs	none

Description lorenzo bianconi 2019-06-07 14:56:35 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Mark Michelson 2020-03-12 15:05:35 UTC

Branch that adds SCTP load balancing:

https://github.com/putnopvut/ovn/tree/sctp_lb

However, it does not add SCTP health checks, so turning health checks on for SCTP load balancers will likely result in bad things happening.

Comment 2 Tim Rozet 2020-03-16 14:33:22 UTC

Hey Mark, the current patch I believe is programming the wrong flows (both logical and openflow). Pod to pod sctp traffic works, but SCTP traffic through a load balancer (service) is punted to controller in table 14 (OF). Table 18 (OF) has the load balancer flow:
 cookie=0x85e1a82c, duration=226167.380s, table=18, n_packets=0, n_bytes=0, priority=120,ct_state=+new+trk,sctp,metadata=0x2,nw_dst=10.101.165.35,tp_dst=62324 actions=group:2

[root@ovn-control-plane ~]# ovs-ofctl -O openflow13 dump-groups br-int
OFPST_GROUP_DESC reply (OF1.3) (xid=0x2):
 group_id=4,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.3:53)),bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.4:53))
 group_id=5,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.3:9153)),bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.4:9153))
 group_id=1,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=172.17.0.2:6443))
 group_id=2,type=select,bucket=weight:100,actions=ct(commit,table=19,zone=NXM_NX_REG13[0..15],nat(dst=10.244.0.5:62324))


Traffic never making it there because it is being punted in table 14:
 cookie=0x92de86d0, duration=228619.854s, table=14, n_packets=21, n_bytes=1722, priority=2000,ct_state=-est+trk,sctp,metadata=0x2,nw_dst=10.101.165.35,tp_dst=62324 actions=load:0->NXM_NX_XXREG0[96..127],push:NXM_OF_ETH_SRC[],push:NXM_OF_ETH_DST[],pop:NXM_OF_ETH_SRC[],pop:NXM_OF_ETH_DST[],push:NXM_OF_IP_SRC[],push:NXM_OF_IP_DST[],pop:NXM_OF_IP_SRC[],pop:NXM_OF_IP_DST[],controller(userdata=00.00.00.0a.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1b.00.00.00.01.1c.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1b.00.00.00.01.1e.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1c.00.00.00.01.1c.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1c.00.00.00.01.1e.04.00.20.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.20.00.00.00)

Looking at ovn-nb, it looks like something is wrong there with the logical flows. Running ovn-trace results in the same behavior as above. I'm attaching all traces, outputs and dbs.

Comment 3 Tim Rozet 2020-03-16 14:40:27 UTC

Created attachment 1670559 [details]
logs, dbs

Comment 4 Mark Michelson 2020-03-16 17:44:26 UTC

Hi Tim.

The issue appears to be twofold.

First there's an ACL reject rule in place that is causing the flow in table 14 to be installed. 

"b290336e-9641-46fb-9248-c17355d0137b":{"action":"reject","match":"ip4.dst==10.101.165.35 && sctp && sctp.dst==62324","name":"69a2ee26-1086-42b6-a4f9-1b880cd047d7-10.101.165.35:62324","priority":1000,"direction":"from-lport"}

If that ACL is removed, then the packet should reach table 18 with no issue.

However, there is a bug in here. I overlooked that ACL reject rules would also need special handling for SCTP. Currently TCP reject rules respond with a TCP RST, and UDP reject rules respond with an ICMP destination unreachable. SCTP appears to be treated like TCP right now, resulting in our attempting to respond with a TCP RST. I will correct this in my next version of the patch. In the meantime, you should be able to get SCTP load balancers working by removing the ACL in question.

Comment 5 Mark Michelson 2020-03-16 18:00:22 UTC

As it turns out, I was wrong about the TCP-like behavior of ACL reject rules for SCTP. As it turns out, reject rules are generated in an interesting way. The ones that send a TCP RST ensure that the L4 protocol is TCP. However, they also copy the ACL match, too. So the resulting logical flow is this:

table=6 (ls_in_acl          ), priority=2010 , match=(((!ct.trk || !ct.est || (ct.est && ct_label.blocked == 1))) && ip4 && tcp && (ip4.dst==10.101.165.35 && sctp && sctp.dst==62324)), action=(reg0 = 0; eth.dst <-> eth.src; ip4.dst <-> ip4.src; tcp_reset { outport <-> inport; output; };)

Notice how in order to match this has to be both a TCP and an SCTP packet. It's weird because it ensures that the flow is impossible to ever match. This behavior isn't restricted to SCTP; the same would happen if you had a UDP ACL. The actual flow you're hitting is down lower:

table=6 (ls_in_acl          ), priority=2000 , match=(((!ct.trk || !ct.est || (ct.est && ct_label.blocked == 1))) && ip4 && (ip4.dst==10.101.165.35 && sctp && sctp.dst==62324)), action=(reg0 = 0; eth.dst <-> eth.src; ip4.dst <-> ip4.src; icmp4 { outport <-> inport; output; };)

In this case, your SCTP packet is being responded to with an ICMP destination unreachable message. There's probably a more SCTP-friendly way to respond, but this isn't as bad as us trying to send a TCP RST in response to an SCTP packet.

In other words, there's no immediate need on my part to alter the ACL reject behavior. The issue you're running into should be fixed by removing the offending ACL.

Comment 6 Tim Rozet 2020-03-16 20:15:27 UTC

Thanks Mark. The ACL is coming from the service reject stuff we worked on. It just exposed a bug in my code where accidentally both the reject ACL and the load balancer are being configured:
https://github.com/ovn-org/ovn-kubernetes/pull/1096#discussion_r393283840

Will fix it and try again.

Comment 7 Tim Rozet 2020-03-16 22:12:07 UTC

After fixing that issue. Regular SCTP service works. I then tried a nodePort service and that is not working. Not sure why yet. Will look into it tomorrow morning.

Comment 8 Tim Rozet 2020-03-17 18:08:59 UTC

Debugging this shows that the INIT, and INIT ACK packets are working as expected. However the follow up COOKIE ECHO does not make it to the server. OVN entities:

_uuid               : 8b99786c-3dd5-4250-ab9e-b6b47f981679
external_ids        : {SCTP_lb_gateway_router=GR_ovn-control-plane}
health_check        : []
ip_port_mappings    : {}
name                : ""
protocol            : sctp
vips                : {"169.254.33.2:31790"="10.244.0.6:62324"}

router 44ad05d5-4e9b-4926-b4a6-705818ab8c2d (GR_ovn-control-plane)
    port rtoe-GR_ovn-control-plane
        mac: "e2:0f:73:06:da:4a"
        networks: ["169.254.33.2/24"]
    port rtoj-GR_ovn-control-plane
        mac: "0A:58:64:40:00:01"
        networks: ["100.64.0.1/29"]
    nat 7f139299-a8f9-4690-8255-289b9c3ef7e1
        external ip: "169.254.33.2"
        logical ip: "10.244.0.0/16"
        type: "snat"


In the below TCP DUMP the flow is like this:

169.254.33.1 (node IP), 169.254.33.2 (OVN router IP on the node), node port is 31790, 10.244.0.6 (SCTP server pod at port 62324), 10.64.0.1 (OVN router IP towards the pod)

Flow:
SCTP request from 169.254.33.1 <random port> -> 169.254.33.2 31790
OVN router SNAT 169.254.33.1 -> 10.64.0.1
OVN router DNAT 169.254.33.2 31790 -> 10.244.0.6 62324 (load balancer)
10.64.0.1 <random port> -> 10.244.0.6 62324 (SCTP server)


###CLIENT TCP DUMP
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:46:32.882074 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108] 
15:46:32.882755 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
    100.64.0.1.55628 > 10.244.0.6.62324: sctp
	1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108] 
15:46:32.882805   P 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
    10.244.0.6.62324 > 100.64.0.1.55628: sctp
	1) [INIT ACK] [init tag: 640125419] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2012023407] 
15:46:32.883379  In e2:0f:73:06:da:4a ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
    169.254.33.2.31790 > 169.254.33.1.55628: sctp
	1) [INIT ACK] [init tag: 640125419] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2012023407] 
15:46:32.883398 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:46:35.945987 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 1, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:46:42.409896 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 2, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:46:54.697993 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 3, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:47:18.761936 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 4, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:48:06.891002 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 5, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:49:08.329989 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 6, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:50:09.769943 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 7, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 
15:51:11.210000 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 312: (tos 0x2,ECT(0), ttl 64, id 8, offset 0, flags [DF], proto SCTP (132), length 296)
    169.254.33.1.55628 > 169.254.33.2.31790: sctp
	1) [COOKIE ECHO] 



### SERVER
[root@sctpserver /]# tcpdump -i any -vvv -en sctp 
dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:27:27.915884  In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
    100.64.0.1.53356 > 10.244.0.6.62324: sctp
	1) [INIT] [init tag: 3891781292] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3710072835] 
15:27:27.916048 Out 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
    10.244.0.6.62324 > 100.64.0.1.53356: sctp
	1) [INIT ACK] [init tag: 2005164658] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2868032396] 
15:40:59.487706  In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
    100.64.0.1.41928 > 10.244.0.6.62324: sctp
	1) [INIT] [init tag: 583010785] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3194207059] 
15:40:59.487814 Out 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
    10.244.0.6.62324 > 100.64.0.1.41928: sctp
	1) [INIT ACK] [init tag: 3289569359] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 1428594883] 
15:46:32.882758  In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
    100.64.0.1.55628 > 10.244.0.6.62324: sctp
	1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108] 
15:46:32.882801 Out 1e:1b:89:f4:00:07 ethertype IPv4 (0x0800), length 340: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 324)
    10.244.0.6.62324 > 100.64.0.1.55628: sctp
	1) [INIT ACK] [init tag: 640125419] [rwnd: 106496] [OS: 5] [MIS: 5] [init TSN: 2012023407] 




You can see in the above the COOKIE ECHO is never making it to the server. I see in conntrack:
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=52412,dport=31790),reply=(src=10.244.0.6,dst=169.254.33.1,sport=62324,dport=52412),zone=8,protoinfo=(state=COOKIE_ECHOED,vtag_orig=2905074738,vtag_reply=3931006177)
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=52412,dport=31790),reply=(src=169.254.33.2,dst=169.254.33.1,sport=31790,dport=52412),protoinfo=(state=COOKIE_ECHOED,vtag_orig=2905074738,vtag_reply=3931006177)
sctp,orig=(src=169.254.33.1,dst=10.244.0.6,sport=52412,dport=62324),reply=(src=10.244.0.6,dst=100.64.0.1,sport=62324,dport=52412),zone=6,protoinfo=(state=COOKIE_WAIT,vtag_orig=2905074738,vtag_reply=3931006177)
sctp,orig=(src=100.64.0.1,dst=10.244.0.6,sport=52412,dport=62324),reply=(src=10.244.0.6,dst=100.64.0.1,sport=62324,dport=52412),zone=16,protoinfo=(state=COOKIE_WAIT,vtag_orig=2905074738,vtag_reply=3931006177)

This looks to me like the COOKIE ECHO made it into conntrack because the state is transitioned to COOKIE_ECHOED, but somewhere it is dropped before making it out to the server. I cannot seem to locate where that is happening.

Comment 9 Tim Rozet 2020-03-17 18:10:34 UTC

Also, in the TCP dump output we see the packet sent from SNAT from the router:
15:46:32.882755 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 108: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 92)
    100.64.0.1.55628 > 10.244.0.6.62324: sctp
	1) [INIT] [init tag: 3519555769] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 1882501108] 

However we do not see the COOKIE ECHO packet here. So perhaps it is never getting SNAT'ed.

Comment 10 Tim Rozet 2020-03-17 18:19:02 UTC

OVN trace:
[root@ovn-control-plane ~]# ovn-trace  --ct new,trk --ovs ext_ovn-control-plane 'inport == "br-local_ovn-control-plane" && eth.dst == e2:0f:73:06:da:4a && eth.src==00:00:a9:fe:21:01 && sctp && sctp.dst==31790 && ip4.src==169.254.33.1 && ip4.dst==169.254.33.2 && ip.ttl==64'
# sctp,reg14=0x1,vlan_tci=0x0000,dl_src=00:00:a9:fe:21:01,dl_dst=e2:0f:73:06:da:4a,nw_src=169.254.33.1,nw_dst=169.254.33.2,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=31790

ingress(dp="ext_ovn-control-plane", inport="br-local_ovn-control-plane")
------------------------------------------------------------------------
 0. ls_in_port_sec_l2 (ovn-northd.c:4511): inport == "br-local_ovn-control-plane", priority 50, uuid c660b3cc
    cookie=0xc660b3cc, duration=75121.760s, table=8, n_packets=172151, n_bytes=90295439, priority=50,reg14=0x1,metadata=0x5 actions=resubmit(,9)
    next;
13. ls_in_arp_rsp (ovn-northd.c:6079): inport == "br-local_ovn-control-plane", priority 100, uuid fdc35cc9
    cookie=0xfdc35cc9, duration=75121.760s, table=21, n_packets=172151, n_bytes=90295439, priority=100,reg14=0x1,metadata=0x5 actions=resubmit(,22)
    next;
19. ls_in_l2_lkup (ovn-northd.c:6747): eth.dst == e2:0f:73:06:da:4a, priority 50, uuid ec8722df
    cookie=0xec8722df, duration=75121.761s, table=27, n_packets=172151, n_bytes=90295439, priority=50,metadata=0x5,dl_dst=e2:0f:73:06:da:4a actions=set_field:0x2->reg15,resubmit(,32)
    outport = "etor-GR_ovn-control-plane";
    output;

egress(dp="ext_ovn-control-plane", inport="br-local_ovn-control-plane", outport="etor-GR_ovn-control-plane")
------------------------------------------------------------------------------------------------------------
 9. ls_out_port_sec_l2 (ovn-northd.c:4577): outport == "etor-GR_ovn-control-plane", priority 50, uuid 15a5862b
    cookie=0x15a5862b, duration=75121.761s, table=49, n_packets=172151, n_bytes=90295439, priority=50,reg15=0x2,metadata=0x5 actions=resubmit(,64)
    output;
    /* output to "etor-GR_ovn-control-plane", type "l3gateway" */

ingress(dp="GR_ovn-control-plane", inport="rtoe-GR_ovn-control-plane")
----------------------------------------------------------------------
 0. lr_in_admission (ovn-northd.c:7823): eth.dst == e2:0f:73:06:da:4a && inport == "rtoe-GR_ovn-control-plane", priority 50, uuid 62058325
    cookie=0x62058325, duration=75121.679s, table=8, n_packets=172151, n_bytes=90295439, priority=50,reg14=0x2,metadata=0x3,dl_dst=e2:0f:73:06:da:4a actions=resubmit(,9)
    next;
 1. lr_in_lookup_neighbor (ovn-northd.c:7872): 1, priority 0, uuid e1e05851
    cookie=0xe1e05851, duration=75121.778s, table=9, n_packets=356896, n_bytes=103190334, priority=0,metadata=0x3 actions=load:0x1->OXM_OF_PKT_REG4[3],resubmit(,10)
    reg9[3] = 1;
    next;
 2. lr_in_learn_neighbor (ovn-northd.c:7877): reg9[3] == 1 || reg9[2] == 1, priority 100, uuid ae614fed
    cookie=0xae614fed, duration=75121.778s, table=10, n_packets=0, n_bytes=0, priority=100,reg9=0x4/0x4,metadata=0x3 actions=resubmit(,11)
    cookie=0xae614fed, duration=75121.776s, table=10, n_packets=356896, n_bytes=103190334, priority=100,reg9=0x8/0x8,metadata=0x3 actions=resubmit(,11)
    next;
 4. lr_in_defrag (ovn-northd.c:9166): ip && ip4.dst == 169.254.33.2, priority 100, uuid ccfa4178
    cookie=0xccfa4178, duration=73292.927s, table=12, n_packets=167997, n_bytes=88230779, priority=100,ip,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=13,zone=NXM_NX_REG11[0..15])
    ct_next;

ct_next(ct_state=new|trk)
-------------------------
 5. lr_in_unsnat (ovn-northd.c:8755): ip && ip4.dst == 169.254.33.2, priority 90, uuid b6f6d0dd
    cookie=0xb6f6d0dd, duration=75121.762s, table=13, n_packets=172148, n_bytes=90295313, priority=90,ip,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=14,zone=NXM_NX_REG12[0..15],nat)
    ct_snat;

Comment 11 Mark Michelson 2020-03-18 00:49:27 UTC

Hi Tim. I force-pushed changes to my sctp_lb branch (https://github.com/putnopvut/ovn/tree/sctp_lb). This has a couple of fixes. Most notably, it contains a fix for undnat. ovn-northd was previously installing a TCP undnat logical flow where it should have been putting in an SCTP flow. This particular issue seems most likely to be the culprit in the nodeport failure with SCTP. Please give it a try when you can.

Comment 12 Tim Rozet 2020-03-18 17:00:15 UTC

The same problem still exists. I managed to work around it. The problem comes from table 14:

 cookie=0x3bfb3d5b, duration=1724.090s, table=14, n_packets=0, n_bytes=0, idle_age=1724, priority=120,ct_state=+est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat)
 cookie=0xd84870e, duration=1724.090s, table=14, n_packets=6, n_bytes=636, idle_age=257, priority=120,ct_state=+new+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,group:2

The first packet (INIT) will always hit that 2nd flow. However, the follow up COOKIE packet from the client misses that flow. This is due to the ct_state. If instead I add a flow that ignores ct_state to the same table:

ovs-ofctl -O openflow15 add-flow br-int 'table=14,priority=121,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769,actions=set_field:0x8/0x8->reg10,group:2'

The entire connection then works.

Comment 13 Tim Rozet 2020-03-18 17:30:54 UTC

Similarly if I add a flow that only includes ct_state trk, it works:

 cookie=0x0, duration=122.593s, table=14, n_packets=11, n_bytes=1246, idle_age=3, priority=122,ct_state=+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,group:2

So the issue I think is during INIT there must be a ct commit somewhere, and the connection is no longer "new" when the COOKIE ECHO comes, but it also is not "est" because the COOKIE ACK has not occured yet. That's my theory anyway :)

Comment 14 Mark Michelson 2020-03-19 01:35:51 UTC

I've been googling about how conntrack states are set for SCTP, but so far haven't found any sort of documentation on the matter. So instead I looked at the netfilter code in Linux to see if I could determine what's going on. It's the best documentation. Your theory is correct. The SCTP association is not considered "established" until the COOKIE_ACK is seen by the kernel. Therefore, when the COOKIE_ECHO is received by the kernel, the ct state is neither new nor est. That's why when you dump conntrack tables, you see that the state is COOKIE_ECHOED, because it's a special state used by SCTP. It doesn't cleanly map to OVS's honed down set of universal conntrack states.

I think your method may have issues because I *think* that calling "group" each time may result in packets going to different load balancer backends[1]. In your setup, were there multiple load balancer destinations? Using the est state to nat likely forces the packet to go to the same destination as it originally reached when in the new state.

I think the correct thing to do here is to modify the first flow of table 14 specifically in the case of SCTP to work on packets that are -new-est+trk. They're not new, they're not established, but they are tracked. What do you think?




[1] My certainty of this is not 100%.

Comment 15 Tim Rozet 2020-03-19 01:47:54 UTC

Yeah I think you are correct. I only had 1 load balancer backend, so for me it didn't matter, but you are right. I tested with flows like this and it works:
[root@ovn-control-plane ~]# ovs-ofctl -O openflow15 dump-flows br-int |grep table=14 |grep sctp
 cookie=0x3bfb3d5b, duration=19421.897s, table=14, n_packets=0, n_bytes=0, idle_age=19427, priority=120,ct_state=+est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat)
 cookie=0xd84870e, duration=19421.893s, table=14, n_packets=8, n_bytes=680, idle_age=14, priority=120,ct_state=+new+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,group:1
 cookie=0x0, duration=255.408s, table=14, n_packets=16, n_bytes=1968, idle_age=14, priority=120,ct_state=-new-est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31769 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat)


The interesting thing is no packets ever hit that first flow (+est+trk). I wonder if sctp does not use est state in conntrack with OVS? Or its a bug? I am definitely sending data after the connection gets setup:

01:45:34.825529  In de:59:8b:08:e4:43 ethertype IPv4 (0x0800), length 52: (tos 0x2,ECT(0), ttl 62, id 0, offset 0, flags [DF], proto SCTP (132), length 36)
    169.254.33.2.31769 > 169.254.33.1.55447: sctp
	1) [COOKIE ACK] 
01:45:34.825841 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 80: (tos 0x2,ECT(0), ttl 64, id 1, offset 0, flags [DF], proto SCTP (132), length 64)
    169.254.33.1.55447 > 169.254.33.2.31769: sctp
	1) [DATA] (B)(E) [TSN: 1573904206] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload:
	0x0000:  4865 6c6c 6f2c 2053 6572 7665 7221 00    Hello,.Server!.]
01:45:34.825958 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 80: (tos 0x2,ECT(0), ttl 62, id 1, offset 0, flags [DF], proto SCTP (132), length 64)
    100.64.0.1.55447 > 10.244.0.5.62324: sctp
	1) [DATA] (B)(E) [TSN: 1573904206] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload:
	0x0000:  4865 6c6c 6f2c 2053 6572 7665 7221 00    Hello,.Server!.]
01:45:34.826032   P 82:d3:0a:f4:00:06 ethertype IPv4 (0x0800), length 64: (tos 0x2,ECT(0), ttl 64, id 60359, offset 0, flags [DF], proto SCTP (132), length 48)
    10.244.0.5.62324 > 100.64.0.1.55447: sctp
	1) [SACK] [cum ack 1573904206] [a_rwnd 106481] [#gap acks 0] [#dup tsns 0] 
01:45:34.826110  In de:59:8b:08:e4:43 ethertype IPv4 (0x0800), length 64: (tos 0x2,ECT(0), ttl 62, id 60359, offset 0, flags [DF], proto SCTP (132), length 48)
    169.254.33.2.31769 > 169.254.33.1.55447: sctp
	1) [SACK] [cum ack 1573904206] [a_rwnd 106481] [#gap acks 0] [#dup tsns 0] 
01:45:34.826216 Out 00:00:a9:fe:21:01 ethertype IPv4 (0x0800), length 56: (tos 0x2,ECT(0), ttl 64, id 2, offset 0, flags [DF], proto SCTP (132), length 40)
    169.254.33.1.55447 > 169.254.33.2.31769: sctp
	1) [SHUTDOWN] 


Either way we can just add the missing flow for now and then it should work.

Comment 16 Mark Michelson 2020-03-19 13:15:35 UTC

Ooh, that is weird. I agree we should add the missing flow for the time being since it works. But I think it's worth bringing up with the OVS dev list that SCTP associations apparently never reach est. There's a lot of flows installed by ovn-northd that operate based on the ct state being "est". If we never reach that state, then this potentially means we'll be seeing odd bugs when dealing with SCTP.

Comment 18 Tim Rozet 2020-03-24 19:00:38 UTC

I think I have identified the problem is the ct zones are being traversed in the wrong order in tables 13, and 14. See https://bugzilla.redhat.com/show_bug.cgi?id=1815217#c11 for more info. This looks to be an OVN bug. The flows programmed by OVN are like this:


cookie=0x85a8499, duration=6057.175s, table=13, n_packets=16083, n_bytes=8005026, idle_age=0, priority=90,ip,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=14,zone=NXM_NX_REG12[0..15],nat)
 cookie=0x7ee7a259, duration=5152.425s, table=14, n_packets=0, n_bytes=0, idle_age=5152, priority=120,ct_state=+est+trk,sctp,metadata=0x3,nw_dst=169.254.33.2,tp_dst=31291 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG11[0..15],nat)

Zone REG12 (zone6) is SNAT, while zone REG11 (zone9) is DNAT. Therefore the first flow is saying SNAT, and second is saying perform DNAT. This is a problem because the conntrack entries are:

[root@ovn-control-plane ~]# ovs-appctl dpctl/dump-conntrack |grep sctp
sctp,orig=(src=100.64.0.1,dst=10.244.0.5,sport=46525,dport=62324),reply=(src=10.244.0.5,dst=100.64.0.1,sport=62324,dport=46525),zone=15,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775)
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=46525,dport=31291),reply=(src=10.244.0.5,dst=169.254.33.1,sport=62324,dport=46525),zone=9,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775)
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=46525,dport=31291),reply=(src=169.254.33.2,dst=169.254.33.1,sport=31291,dport=46525),protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775)
sctp,orig=(src=169.254.33.1,dst=10.244.0.5,sport=46525,dport=62324),reply=(src=10.244.0.5,dst=100.64.0.1,sport=62324,dport=46525),zone=6,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775)

The problem with the above is SNAT and DNAT are committed in inverse order. This means OVN will hit zone 6 first (table 13 flow):
sctp,orig=(src=169.254.33.1,dst=10.244.0.5,sport=46525,dport=62324),reply=(src=10.244.0.5,dst=100.64.0.1,sport=62324,dport=46525),zone=6,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775)

However that will result in ct invalid, because the src/dst ip on this packet is 169.254.33.1/169.254.33.2. The correct order of operations is to DNAT lookup first, and then SNAT:
sctp,orig=(src=169.254.33.1,dst=169.254.33.2,sport=46525,dport=31291),reply=(src=10.244.0.5,dst=169.254.33.1,sport=62324,dport=46525),zone=9,protoinfo=(state=ESTABLISHED,vtag_orig=3990749122,vtag_reply=409279775)

In the above this will match and DNAT will happen, then we can SNAT and zone 6 will match. This is achieved by adding the following flows:
[root@ovn-control-plane ~]# ovs-ofctl -O openflow15 dump-flows br-int table=13 |grep sctp
 cookie=0x0, duration=1013.896s, table=13, n_packets=33, n_bytes=4490, idle_age=9, priority=95,sctp,metadata=0x3,nw_dst=169.254.33.2 actions=ct(table=14,zone=NXM_NX_REG11[0..15],nat)


[root@ovn-control-plane ~]# ovs-ofctl -O openflow15 dump-flows br-int table=14 |grep REG
 cookie=0x0, duration=44.396s, table=14, n_packets=3, n_bytes=474, idle_age=4, priority=125,ct_state=+est+trk,sctp,metadata=0x3,nw_src=169.254.33.1 actions=set_field:0x8/0x8->reg10,ct(table=15,zone=NXM_NX_REG12[0..15],nat)

Notice now zone=REG11 (zone 9, DNAT) is happening first in table 13, and then in table 14 +est now matches, and then I do nat on zone=REG12 (zone 6, SNAT). The connection then works fine.

I'm not sure how TCP works and why this just happens for SCTP. I'm going to look into that next.

Comment 19 Tim Rozet 2020-03-27 13:36:04 UTC

Mark, now that https://bugzilla.redhat.com/show_bug.cgi?id=1815217 is fixed upstream, can help push along your patch for sctp support upstream? Then we can get that in and kick off upstream builds. Thanks.

Comment 20 Mark Michelson 2020-03-30 14:22:19 UTC

Patch has been pushed to upstream master. Changing state to POST.

Comment 21 Mark Michelson 2020-04-02 14:07:35 UTC

Patch has been backported downstream. Changing to MODIFIED.

Comment 24 ying xu 2020-04-14 08:12:12 UTC

this feature wasn't supported on ovn before the fix.

now I verified it on version:
# rpm -qa|grep ovn
ovn2.13-central-2.13.0-11.el8fdp.x86_64
ovn2.13-host-2.13.0-11.el8fdp.x86_64
ovn2.13-2.13.0-11.el8fdp.x86_64

# rpm -qa|grep ovn
ovn2.13-central-2.13.0-11.el7fdp.x86_64
ovn2.13-host-2.13.0-11.el7fdp.x86_64
ovn2.13-2.13.0-11.el7fdp.x86_64

use the topo as below:
server0-----------------ls1------------lr1--------------public
                         |              |
                        server1         ls2

# ovn-nbctl show
switch b3d69768-75a6-4b08-aaa6-d4c0c74198e5 (public)
    port ln_public
        type: localnet
        addresses: ["unknown"]
    port plr1
        type: router
        addresses: ["00:01:02:0d:0f:01 172.16.1.254 2002::a"]
        router-port: lr1p
switch 726d126d-2139-4e87-860b-99c174a9f054 (ls1)
    port ls1lr1
        type: router
        addresses: ["00:01:02:0d:01:01 192.168.0.254 3001::a"]
        router-port: lr1ls1
    port ls1p3
        addresses: ["00:01:02:01:01:04"]
    port ls1p2
        addresses: ["00:01:02:01:01:02"]
    port ls1p1
        addresses: ["00:01:02:01:01:01"]
switch 81577be1-b4c0-41c9-96ee-9dcb58d54622 (ls2)
    port ls2lr1
        type: router
        addresses: ["00:01:02:0d:01:02 192.168.1.254 3001:1::a"]
        router-port: lr1ls2
    port ls2p1
        addresses: ["00:01:02:01:01:03"]
router 101fc633-a654-4b3f-9121-523b6afd5701 (lr1)
    port lr1ls2
        mac: "00:01:02:0d:01:02"
        networks: ["192.168.1.254/24", "3001:1::a/64"]
    port lr1p
        mac: "00:01:02:0d:0f:01"
        networks: ["172.16.1.254/24", "2002::a/64"]
        gateway chassis: [hv1 hv0]
    port lr1ls1
        mac: "00:01:02:0d:01:01"
        networks: ["192.168.0.254/24", "3001::a/64"]
    nat 496c2377-0073-4873-b313-3160a6889337
        external ip: "172.16.1.10"
        logical ip: "192.168.2.1"
        type: "dnat_and_snat"
    nat c3afebe4-1cb9-4cbd-8a7a-235db6201a0a
        external ip: "2002::100"
        logical ip: "3000::100"
        type: "dnat_and_snat"


test some scenairos such as: lb on ls, lb on lr, lb on lr behind fip

set verified.

Comment 26 errata-xmlrpc 2020-04-20 19:43:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1501