Bug 2126083
Summary: | OVN Load Balancers should allow all "related" ICMP error messages to pass through | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Surya Seetharaman <surya> |
Component: | ovn22.12 | Assignee: | Ales Musil <amusil> |
Status: | CLOSED UPSTREAM | QA Contact: | ying xu <yinxu> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | FDP 22.L | CC: | amusil, aygarg, ctrautma, dceara, dumitrus.smail, echaudro, fbaudin, fwestpha, jiji, jishi, jpradhan, kkarampo, mmichels, sgurnale |
Target Milestone: | --- | Flags: | yinxu:
needinfo-
|
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovn22.12-22.12.0-15.el8fdp | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2025-02-10 04:01:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2137754, 2137756 | ||
Bug Blocks: | 2041746 |
Description
Surya Seetharaman
2022-09-12 11:31:07 UTC
Retrying this with a generic non-protocol load balancer: sh-5.1# ovn-nbctl create load_balancer vips:10.96.210.8="10.244.1.6" c07d4e97-a796-4c1e-928a-88b6aec8ee04 _uuid : c07d4e97-a796-4c1e-928a-88b6aec8ee04 external_ids : {} health_check : [] ip_port_mappings : {} name : "" options : {} protocol : [] selection_fields : [] vips : {"10.96.210.8"="10.244.1.6"} sh-5.1# ovn-nbctl list load-balancer-group _uuid : fa77927a-a424-485b-95d2-2fee2bef8c1a load_balancer : [0550b9ae-7db8-4de1-bf0d-33912916e36f, 186af76e-8c14-4b48-8930-6e3d52d886b0, 56e12768-0aad-41ff-b0a8-2c911290dfef] name : clusterLBGroup sh-5.1# ovn-nbctl add load_balancer_group fa77927a-a424-485b-95d2-2fee2bef8c1a load_balancer c07d4e97-a796-4c1e-928a-88b6aec8ee04 sh-5.1# sh-5.1# sh-5.1# ovn-nbctl list load-balancer-group _uuid : fa77927a-a424-485b-95d2-2fee2bef8c1a load_balancer : [0550b9ae-7db8-4de1-bf0d-33912916e36f, 186af76e-8c14-4b48-8930-6e3d52d886b0, 56e12768-0aad-41ff-b0a8-2c911290dfef, c07d4e97-a796-4c1e-928a-88b6aec8ee04] name : clusterLBGroup sh-5.1# ovn-nbctl list logical-router GR_ovn-worker _uuid : 0201b2b9-a757-41c5-ae20-70abb2dec872 copp : 2750b4bb-dcdd-4049-b2ab-1811f4993148 enabled : [] external_ids : {physical_ip="172.18.0.3", physical_ips="172.18.0.3"} load_balancer : [9bb40843-0987-4fbc-bd43-69cff2bc75ac, f74ccd3e-4a20-4c08-9dcc-94352a9f3105] load_balancer_group : [fa77927a-a424-485b-95d2-2fee2bef8c1a] name : GR_ovn-worker nat : [66ee587f-9969-49a5-875f-049392b8f81b, 6d720d51-0bcf-48a2-be3b-2b23318d65a8, bd738d52-8784-4894-b536-688082f2d39d] options : {always_learn_from_arp_request="false", chassis="8ae63ea6-86f1-4c91-b1e8-13d94b069eee", dynamic_neigh_routers="true", lb_force_snat_ip=router_ip, snat-ct-zone="0"} policies : [] ports : [19a900b4-5140-4c35-b9af-e2798df1789a, 3d9305e6-9f02-4c38-bca0-b4b32d828223] static_routes : [5ed2caa0-5c36-40fe-95fb-64ec3d325602, 6d508443-21d9-408a-9593-bf03993f7acd] So I setup the generic LB. Next when trying to curl: 10.96.210.8.80 > 169.254.169.2.40156: Flags [.], seq 0:1348, ack 1, win 505, options [nop,nop,TS val 3234942472 ecr 682470886], length 1348: HTTP 12:25:04.667201 43921d1e48f7a25 P ifindex 10 0a:58:0a:f4:01:06 ethertype IPv4 (0x0800), length 1420: (tos 0x0, ttl 64, id 32897, offset 0, flags [DF], proto TCP (6), lengt h 1400) 10.244.1.6.80 > 100.64.0.3.34662: Flags [.], cksum 0x75a7 (incorrect -> 0x09d7), seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 13 48: HTTP 12:25:04.669052 breth0 In ifindex 6 02:42:fb:cb:dc:e3 ethertype IPv4 (0x0800), length 1420: (tos 0x0, ttl 62, id 32897, offset 0, flags [DF], proto TCP (6), length 1400) 10.96.210.8.80 > 172.19.0.3.34662: Flags [.], cksum 0xf194 (correct), seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 1348: HTTP 12:25:04.669090 breth0 Out ifindex 6 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 1420: (tos 0x0, ttl 61, id 32897, offset 0, flags [DF], proto TCP (6), length 1400) 192.168.10.0.80 > 172.19.0.3.34662: Flags [.], cksum 0x0355 (correct), seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 1348: HTTP 12:25:04.669103 eth0 Out ifindex 507 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 1420: (tos 0x0, ttl 61, id 32897, offset 0, flags [DF], proto TCP (6), length 1400) 192.168.10.0.80 > 172.19.0.3.34662: Flags [.], cksum 0x0355 (correct), seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 1348: HTTP 12:25:04.669222 eth0 In ifindex 507 02:42:ac:12:00:06 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 64, id 7195, offset 0, flags [none], proto ICMP (1), length 576) 172.18.0.6 > 192.168.10.0: ICMP 172.19.0.3 unreachable - need to frag (mtu 1200), length 556 (tos 0x0, ttl 61, id 32897, offset 0, flags [DF], proto TCP (6), length 1400) 192.168.10.0.80 > 172.19.0.3.34662: Flags [.], seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 1348: HTTP 12:25:04.669246 breth0 In ifindex 6 02:42:ac:12:00:06 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 64, id 7195, offset 0, flags [none], proto ICMP (1), length 576) 172.18.0.6 > 192.168.10.0: ICMP 172.19.0.3 unreachable - need to frag (mtu 1200), length 556 (tos 0x0, ttl 61, id 32897, offset 0, flags [DF], proto TCP (6), length 1400) 192.168.10.0.80 > 172.19.0.3.34662: Flags [.], seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 1348: HTTP 12:25:04.669258 breth0 Out ifindex 6 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 63, id 7195, offset 0, flags [none], proto ICMP (1), length 576) 172.18.0.6 > 10.96.210.8: ICMP 172.19.0.3 unreachable - need to frag (mtu 1200), length 556 (tos 0x0, ttl 61, id 32897, offset 0, flags [DF], proto TCP (6), length 1400) 10.96.210.8.80 > 172.19.0.3.34662: Flags [.], seq 1:1349, ack 1, win 505, options [nop,nop,TS val 3234949640 ecr 682460951], length 1348: HTTP 12:25:04.669302 eth0 Out ifindex 507 02:42:ac:12:00:03 ethertype IPv4 (0x0800), length 596: (tos 0xc0, ttl 62, id 7195, offset 0, flags [none], proto ICMP (1), length 576) I still can't see the ICMP packet reaching the pod, though TCP packets are passing through. Attaching the dpctl output on the node while the ICMP frag needed is hitting the node. Adding notes here from Dumitru having taken a look at this: recirc_id(0),in_port(5),eth(),eth_type(0x0800),ipv4(dst=10.96.0.0/255.255.0.0,frag=no), packets:15, bytes:4264, used:0.923s, flags:SP., actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0xd82) recirc_id(0xd82),in_port(5),ct_state(-new-est+trk),ct_mark(0/0x2),eth(src=02:42:ac:12:00:03,dst=02:42:fb:cb:dc:e3),eth_type(0x0800),ipv4(src=168.0.0.0/252.0.0.0,dst=10.96.210.8,proto=1,ttl=63,frag=no), packets:4, bytes:2360, used:0.923s, actions:set(eth(dst=02:42:ac:12:00:03)),ct(zone=11,nat),recirc(0xd94) recirc_id(0xd94),in_port(5),ct_state(-new-est+rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=02:42:ac:12:00:03,dst=02:42:ac:12:00:03),eth_type(0x0800),ipv4(dst=10.96.0.0/255.255.0.0,ttl=63,frag=no), packets:4, bytes:2360, used:0.923s, actions:set(eth(dst=02:42:fb:cb:dc:e3)),set(ipv4(ttl=62)),ct(zone=11,nat),recirc(0xd9e) recirc_id(0xd9e),in_port(5),ct_state(-new-est+rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=02:42:ac:12:00:03,dst=02:42:fb:cb:dc:e3),eth_type(0x0800),ipv4(src=128.0.0.0/128.0.0.0,dst=10.96.0.0/255.255.0.0,frag=no), packets:4, bytes:2360, used:0.923s, actions:ct_clear,clone(ct(commit,zone=64000,mark=0x1/0xffffffff),4) port 0: ovs-system (internal) port 1: br-int (internal) port 2: genev_sys_6081 (geneve: packet_type=ptap) port 3: ovn-k8s-mp0 (internal) port 4: eth0 port 5: breth0 (internal) port 6: 78e61e9a0a00ec8 port 7: 41a4d3b6aebb7b1 port 8: 43921d1e48f7a25 Based on above tcpdumps, it seems conntrack isn't dnat-ing the icmp related packet. So 10.96.210.8 is not becoming 10.244.1.6 and packet is going out via eth0 ? recirc_id(0xd9e),in_port(5),ct_state(-new-est+rel-rpl-inv+trk),ct_mark(0x2/0x3),eth(src=02:42:ac:12:00:03,dst=02:42:fb:cb:dc:e3),eth_type(0x0800),ipv4(src=128.0.0.0/128.0.0.0,dst=10.96.0.0/255.255.0.0,frag=no), packets:4, bytes:2360, used:0.923s, sh-5.1# ovs-dpctl dump-conntrack | grep 36363 tcp,orig=(src=172.19.0.3,dst=10.96.210.8,sport=36363,dport=80),reply=(src=10.96.210.8,dst=169.254.169.2,sport=80,dport=36363),zone=64001,protoinfo=(state=ESTABLISHED) tcp,orig=(src=192.168.10.0,dst=172.19.0.3,sport=80,dport=36363),reply=(src=172.19.0.3,dst=192.168.10.0,sport=36363,dport=80),zone=64000,mark=2,protoinfo=(state=ESTABLISHED) tcp,orig=(src=172.19.0.3,dst=192.168.10.0,sport=36363,dport=80),reply=(src=10.96.210.8,dst=172.19.0.3,sport=80,dport=36363),protoinfo=(state=ESTABLISHED) tcp,orig=(src=169.254.169.2,dst=10.96.210.8,sport=36363,dport=80),reply=(src=10.244.1.6,dst=169.254.169.2,sport=80,dport=36363),zone=11,mark=2,protoinfo=(state=ESTABLISHED) tcp,orig=(src=100.64.0.3,dst=10.244.1.6,sport=36363,dport=80),reply=(src=10.244.1.6,dst=100.64.0.3,sport=80,dport=36363),zone=15,protoinfo=(state=ESTABLISHED) tcp,orig=(src=169.254.169.2,dst=10.244.1.6,sport=36363,dport=80),reply=(src=10.244.1.6,dst=100.64.0.3,sport=80,dport=36363),protoinfo=(state=ESTABLISHED) Hi Florian is there a way for me to confirm whether conntrack is dnat-ing the related ICMP error packet or not? We were discussing how to approach that and to avoid any flow explosion we are considering to have it available just for LBs applied to routers. Would that be enough for OCP or should we think about more generic approach that will work also for LBs applied to switches? Thanks, Ales Hey Ales, we will need a generic approach and will need for it to work on LBs on switches as well, functionality should be uniformly supported, scale aspect should come on top of that IMHO. Patches posted: https://patchwork.ozlabs.org/project/ovn/list/?series=326114 We have discovered two problems with the current issue. 1) The defrag flows needs to match on ICMP packets to actually get the +rel into the DNAT stage. 2) We need to pass along the force_snat and skip_snat flags whenever we do ct_commit_nat. Patches for the two additional issues are https://patchwork.ozlabs.org/project/ovn/list/?series=331610 ovn22.12 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2162592 I tested on version: ovn22.12-22.12.0-20.el8fdp.x86_64 but it doesn't work. server: # ovn-nbctl show switch 464a4b40-b843-4426-9d6e-bec060b91461 (public) port public_r1 type: router router-port: lr1lp port ln_p1 type: localnet addresses: ["unknown"] switch 04797a41-c20e-40f6-9349-81737da51cd4 (ls1) port ls1lr1 type: router addresses: ["00:01:02:0d:01:01 192.168.0.254"] router-port: lr1ls1 port ls1p1 addresses: ["00:01:02:01:01:01"] router fd8ae6bd-cc6a-47cf-bb13-8d20585b0cb9 (lr1) port lr1lp mac: "00:01:02:0d:01:02" networks: ["172.100.1.253/24"] gateway chassis: [hv1] port lr1ls1 mac: "00:01:02:0d:01:01" networks: ["192.168.0.254/24"] [root@dell-per740-53 regression]# ovn-nbctl -list load_balancer ovn-nbctl: unrecognized option 'l' [root@dell-per740-53 regression]# ovn-nbctl list load_balancer _uuid : b4eb1498-fdc2-40b0-afe8-11aebcf5c685 external_ids : {} health_check : [] ip_port_mappings : {} name : lb0 options : {reject="true"} protocol : tcp selection_fields : [] vips : {"172.100.1.10:80"="192.168.0.1:80"} client: # ovn-nbctl show switch e0d77038-acee-4d69-ba7c-ba1102ab2f73 (public) port public_r1 type: router router-port: lr1lp port ln_p1 type: localnet addresses: ["unknown"] switch 32b4732d-9b36-485a-9a24-1a90af366004 (ls1) port ls1lr1 type: router addresses: ["00:02:02:0d:01:01 172.168.0.254"] router-port: lr1ls1 port ls1p1 addresses: ["00:02:02:01:01:01"] router 191da699-f91f-4dc6-9bc8-8283b4dd39d8 (lr1) port lr1lp mac: "00:02:02:0d:01:02" networks: ["172.100.1.254/24"] gateway chassis: [hv1] port lr1ls1 mac: "00:02:02:0d:01:01" networks: ["172.168.0.254/24"] nat 75cc9500-4243-481b-957c-88994ac264b8 external ip: "172.100.1.20" logical ip: "172.168.0.0/24" type: "snat" # ovn-nbctl list logical-router-port _uuid : bc66dffd-7a43-4f2d-916f-13cbb64056b0 enabled : [] external_ids : {} gateway_chassis : [de6e031b-600b-4d05-bca2-e662ea255173] ha_chassis_group : [] ipv6_prefix : [] ipv6_ra_configs : {} mac : "00:02:02:0d:01:02" name : lr1lp networks : ["172.100.1.254/24"] options : {} peer : [] _uuid : e98907cf-a3c5-4189-b49d-0d733144412b enabled : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] ipv6_prefix : [] ipv6_ra_configs : {} mac : "00:02:02:0d:01:01" name : lr1ls1 networks : ["172.168.0.254/24"] options : {gateway_mtu="800"} peer : [] then, start tcp listem on server: ip netns exec server0 nc -l -p 80 start tcp connect from client to server: ip netns exec client0 nc 172.100.1.10 80 small packets pass but big packets fail from server to client. from packet capture,we can see that server didn't receive the icmp unreachable packets. This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |