Bug 1983894
Summary: | Hostnetwork pod to service backed by hostnetwork on the same node is not working with OVN Kubernetes | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | zenghui.shi <zshi> | |
Component: | kernel | Assignee: | Xin Long <lxin> | |
kernel sub component: | Networking | QA Contact: | Li Shuang <shuali> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | adrianc, jiji, jishi, kzhang, lariel, lxin, mleitner, shuali, sukulkar | |
Version: | 8.4 | Keywords: | Triaged, ZStream | |
Target Milestone: | beta | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-4.18.0-355.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2024410 2024411 (view as bug list) | Environment: | ||
Last Closed: | 2022-05-10 14:59:47 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1953278, 1961063, 1986662 | |||
Bug Blocks: | 2014673, 2024410, 2024411 |
Description
zenghui.shi
2021-07-20 06:01:43 UTC
(In reply to zenghui.shi from comment #0) > Tested with the following ovs configurations > > 1. hw-offload=true + tc-policy=none -> Not working More below. > 2. hw-offload=true + tc-policy=skip_hw -> Not working I don't see the details on this one? > 3. hw-offload=true + tc-policy=skip_sw -> Working this one ends up using dp:ovs everywhere, so it's effectively not using TC+CT. > Additional info: > 1. hw-offload=true + tc-policy=none -> Not working ... > > ======= Original direction ======= ... > > ZONE 40 > > ufid:d4666138-c62c-4f18-a51b-0c0bd5eeb476, > recirc_id(0x2b9),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0), > ct_state(0x21/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00: > 00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00), > eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.35.139,proto=6,tos=0/0, > ttl=0/0,frag=no),tcp(src=0/0,dst=8081),tcp_flags(0/0), packets:0, bytes:0, > used:never, dp:ovs, actions:hash(l4(0)),recirc(0x2ce) This one is okay to be dp:ovs ... > ufid:d0110001-df86-4b17-ae99-b1b63235c866, > recirc_id(0x2ce),dp_hash(0x4/0xf),skb_priority(0/0),in_port(br-ex), > skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), > eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00: > 00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0, > proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, > actions:ct(commit,zone=40,label=0x2/0x2,nat(dst=169.254.169.2:8081)), > recirc(0x2bb) > But not this one. Same happens on the reply direction. Ok, dp:ovs could still configure this conntrack entry to be committed and with that NAT info, but if nothing changes, this entry won't be on a flowtable and won't be offloaded. Yet, AFAICT ATM, this shouldn't break it. Just not offload it. And they have 0 pkts handled.. With this use case, br-ex is the src and dst of all packets here. Zenghui, can you please capture packets on br-ex? Thanks. Oh oh oh. I was so focused on the broken flow that I missed this: (In reply to zenghui.shi from comment #0) > ZONE 40 (asymmetric paths for original and reply traffic ) So we have 2 bugs here: - OVN needs to fix the asymmetric path - Somehow dp:tc is differing from dp:ovs and is causing the flow to break From comment #0, these zones are being used and they are quite asymmetric: gap here swapped vvvvv vv----------v original: 64001 -> 40 -> 40 -> 0 -> 64001 -> 64002 reply: 64002 -> 64001 -> 40 -> 40 -> 0 -> 64001 -> 64002 Zenghui, can you please report a new bz, towards OVN, to fix the asymmetric paths? Then lets keep this one for the broken flow. Thanks. AFAICT from the capture so far: The peers are able to establish a connection - the TCP handshake is done. The client issue a HTTP request, which is received by the server. The server sends the HTTP reply, which is NOT received by the client. After that: The client keeps retransmitting the request, because it never got an ack from server saying that it was received. The server keeps retransmitting the reply, because, well, the client never got it. The server HTTP reply has TCP FIN flag on it already. There is a special handling for TCP FIN in act_ct, but I don't see how it can cause issues here. Hangbin, especial note to the kernel that was used: 4.18.0-322.el8.mr942_210708_1548.x86_64 It has these commits: https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/942//commits For this bz: bz1961063 and now they are being tracked at bz1980537 and bz1980532. On how HWOL can be impacting here: act_ct instantiates 1 flowtable per zone, regardless of the interfaces involved. That means traffic to other node can lead to act_ct being instantiated on offloaded filters and that causes the conntrack entries to be offloaded to that NIC, even if they don't need to be for the use case here. Something in dp:tc is not behaving similarly to dp:ovs. It is possible that once OVN team fix the asymmetric path above this issue will go away automatically, but we need to understand what is causing the flow to break here so that we can understand how impactfull this difference is. (In reply to Marcelo Ricardo Leitner from comment #9) > Oh oh oh. I was so focused on the broken flow that I missed this: > > (In reply to zenghui.shi from comment #0) > > ZONE 40 (asymmetric paths for original and reply traffic ) > > So we have 2 bugs here: > - OVN needs to fix the asymmetric path > - Somehow dp:tc is differing from dp:ovs and is causing the flow to break > > From comment #0, these zones are being used and they are quite asymmetric: > > gap here swapped > vvvvv vv----------v > original: 64001 -> 40 -> 40 -> 0 -> 64001 -> 64002 > reply: 64002 -> 64001 -> 40 -> 40 -> 0 -> 64001 -> 64002 > > Zenghui, can you please report a new bz, towards OVN, to fix the asymmetric > paths? Then lets keep this one for the broken flow. Thanks. Marcelo, I think you mean creating a bug towards ovn-k8s, right? (In reply to zenghui.shi from comment #12) > Marcelo, I think you mean creating a bug towards ovn-k8s, right? Ah yes. Right. (In reply to Marcelo Ricardo Leitner from comment #13) > (In reply to zenghui.shi from comment #12) > > Marcelo, I think you mean creating a bug towards ovn-k8s, right? > > Ah yes. Right. New BZ created to track the asymmetric issue for host -> service -> host endpoint on same node flows: https://bugzilla.redhat.com/show_bug.cgi?id=1986662 (In reply to zenghui.shi from comment #0) > Description of problem: > > Hostnetwork pod to service backed by hostnetwork on the same node is not > working with OVN Kubernetes when ovs hardware offload is enabled. > > Tested with the following ovs configurations > > 1. hw-offload=true + tc-policy=none -> Not working > 2. hw-offload=true + tc-policy=skip_hw -> Not working > 3. hw-offload=true + tc-policy=skip_sw -> Working > > > Version-Release number of selected component (if applicable): > Kernel: 4.18.0-322.el8.mr942_210708_1548.x86_64 > OVS: openvswitch2.15-2.15.0-24.el8fdp.x86_64 > OVN: ovn2.13-20.12.0-115.el8fdp.x86_64 > OVN-Kubernetes: Built with > https://github.com/ovn-org/ovn-kubernetes/pull/2042 > 3. hw-offload=true + tc-policy=skip_sw -> Working > > Service IP: 172.30.128.247:8081 > Node IP: 192.168.111.27 > Pod IP: 192.168.111.27:8081 > > ======= Original direction ======= > > ufid:c1caaed1-9c3e-4762-9b2a-dbfb64418ba5, > recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0), > ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00: > 00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00), > eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.0.0/255.255.0.0,proto=0/ > 0,tos=0/0,ttl=0/0,frag=no), packets:33, bytes:4133, used:0.596s, flags:SFP., > dp:ovs, actions:ct(commit,zone=64001,nat(src=169.254.169.2)),recirc(0x306) > > ZONE 64001 > > ufid:d652a04b-9f09-4247-a78d-54ab6ba41347, > recirc_id(0x306),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0), > ct_state(0x21/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=0c:42:a1: > 08:0a:da,dst=52:54:00:56:00:31),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0. > 0,dst=172.30.128.247,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), > tcp_flags(0/0), packets:0, bytes:0, used:never, dp:ovs, > actions:ct_clear,set(eth(dst=0c:42:a1:08:0a:da)),ct(zone=40),recirc(0x308) > CT(zone=40) flow > ufid:8707e569-5bad-4cff-990e-8fec8c666390, > recirc_id(0x306),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0), > ct_state(0x22/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=0c:42:a1: > 08:0a:da,dst=52:54:00:56:00:31),eth_type(0x0800),ipv4(src=128.0.0.0/192.0.0. > 0,dst=172.30.128.247,proto=6,tos=0/0,ttl=64,frag=no),tcp(src=0/0,dst=0/0), > tcp_flags(0/0), packets:4, bytes:264, used:0.596s, flags:F., dp:ovs, > actions:ct_clear,set(eth(dst=0c:42:a1:08:0a:da)),ct(zone=40),recirc(0x308) > > ZONE 40 > > ufid:0f4f9b52-b45d-4bdb-be81-12c5df2d690f, > recirc_id(0x308),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0), > ct_state(0x21/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00: > 00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00), > eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.128.247,proto=6,tos=0/0, > ttl=0/0,frag=no),tcp(src=0/0,dst=8081),tcp_flags(0/0), packets:0, bytes:0, > used:never, dp:ovs, actions:hash(l4(0)),recirc(0x372) > > > ufid:964d18b5-7863-4484-9597-e134836e7e1a, > recirc_id(0x308),dp_hash(0/0),skb_priority(0/0),in_port(br-ex),skb_mark(0/0), > ct_state(0x22/0x22),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00: > 00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00), > eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=172.30.128.247,proto=6,tos=0/0, > ttl=0/0,frag=no),tcp(src=0/0,dst=8081),tcp_flags(0/0), packets:5, bytes:413, > used:0.596s, flags:FP., dp:ovs, actions:ct(zone=40,nat),recirc(0x309) > > ufid:fb55ccad-fb55-4823-b650-3023609984de, > recirc_id(0x372),dp_hash(0xa/0xf),skb_priority(0/0),in_port(br-ex), > skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), > eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00: > 00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0, > proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, > actions:ct(commit,zone=40,label=0x2/0x2,nat(dst=169.254.169.2:8081)), > recirc(0x309) > CT(zone=40,nat) flow > ZONE 40 > combination of CT/CT(nat) in the datapath (I assume this is fine as our > issue here is broken traffic) The combined use of CT/CT(nat) may not be the cause of broken flow, but it will prevent flows from being offloaded to Mellanox NICs (e.g. CX-5). > > The combined use of CT/CT(nat) may not be the cause of broken flow, but it > will > prevent flows from being offloaded to Mellanox NICs (e.g. CX-5). Created bz1988189 to track the combined use of CT/CT(nat) issue. Rerun the test with ovn version 21.09-host-21.09.0-8.el8fdp, the original issue remains. Talked with Sushil, assign it back to nst-kernel I was reviewing this bug with Xin Long today and then we realized that the traffic is broken because it hits the same situation of https://bugzilla.redhat.com/show_bug.cgi?id=1961063. (In reply to zenghui.shi from comment #0) ... > Additional info: > > 1. hw-offload=true + tc-policy=none -> Not working ... > ======= Original direction ======= > > ufid:eec1af9e-6ab3-4af7-8dbc-64113a1924b9, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0), > ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/0, ^^^^^^^^^^^^^^ ... > ZONE 64002 > > ufid:8d2f8f6e-dbb2-453b-9a26-e8af9740186e, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0), > ct_label(0/0),recirc_id(0x2d0),dp_hash(0/0),in_port(br-ex),packet_type(ns=0/ > 0,id=0/0),eth(src=0c:42:a1:08:0a:da,dst=52:54:00:56:00:31),eth_type(0x0800), > ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0, > frag=no), packets:13, bytes:2858, used:0.030s, dp:tc, > actions:set(eth(src=52:54:00:56:00:31,dst=0c:42:a1:08:0a:da)),br-ex ^^^^^ That said, this bz now resumes to waiting for the fixes from bz1961063 and from bz1986662 (comment #14). With that, I can take this bz for now as a place holder. (In reply to Marcelo Ricardo Leitner from comment #19) > I was reviewing this bug with Xin Long today and then we realized that the > traffic is broken because it hits the same situation of > https://bugzilla.redhat.com/show_bug.cgi?id=1961063. > I tried running the test (same node) with kernel that fixed the bz1961063 issue, the original issue in this bug remains. Is it the indication that additional fixes might be required for this bug? Adding dependencies per comment #14 and #16. Btw, the flows we're observing in the test environment are quite different from the ones in https://bugzilla.redhat.com/show_bug.cgi?id=1986662#c9 . Debugged with Marcelo and Zenghui, another fix is needed: https://lore.kernel.org/netdev/cover.1632133123.git.lucien.xin@gmail.com/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1988 |