Bug 1903414
Summary: | NodePort is not working when configuring an egress IP address | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | shishika | |
Component: | Networking | Assignee: | Jacob Tanenbaum <jtanenba> | |
Networking sub component: | openshift-sdn | QA Contact: | huirwang | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aconstan, anbhat, anowak, danw, huirwang, jdesousa, jtanenba, tkimura, zzhao | |
Version: | 4.6 | |||
Target Milestone: | --- | |||
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1986413 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 15:37:21 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1926662, 1986413 |
Description
shishika
2020-12-02 03:06:04 UTC
Hi Zhanqui, I think this may be a duplicate of BZ#1881882. Can you please try to reproduce this on RHEL nodes instead of RHCOS? I think this must be something in the kernel, to be more precise I think this is conntrack. huiran could you help have a check if same issue with BZ#1881882. Hello, this message is just the problem statement, feel free not to read it. The summary of the issue is that when a nodePort with externalTrafficPolicy: Local is reached from an egressIP, the node with the nodePort discards the egress traffic from the pod. There are two different scenarios here: 1- The client is on the node which is being used to reach the nodePort service, here there is no egress IP because packet is not leaving the node. This is working as intended 2- The client is on a different node, this does not work combined with egress IP. Now I have a simple test: $ oc get netnamespace test NAME NETID EGRESS IPS test 12136949 ["172.31.249.201"] $ oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS huirwang-bug1903414-rrkmp-master-0 huirwang-bug1903414-rrkmp-master-0 172.31.249.123 10.130.0.0/23 ["172.31.249.201"] huirwang-bug1903414-rrkmp-worker-97znx huirwang-bug1903414-rrkmp-worker-97znx 172.31.249.13 10.131.0.0/23 [] huirwang-bug1903414-rrkmp-worker-jmff2 huirwang-bug1903414-rrkmp-worker-jmff2 172.31.249.210 10.128.2.0/23 (some hostsubnets were deleted from the output for simplicty) $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-cc5n2 1/1 Running 0 70m 10.131.0.30 huirwang-bug1903414-rrkmp-worker-97znx <none> <none> test-rc-rbglb 1/1 Running 0 70m 10.128.3.33 huirwang-bug1903414-rrkmp-worker-jmff2 <none> <none> $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-pod NodePort 172.30.131.139 <none> 8000:30011/TCP 13h I acquired a tcpdump while doing this simple test: $ oc rsh test-rc-rbglb sh-5.0$ curl 172.31.249.13:30011 ^C sh-5.0$ curl 172.31.249.13:30011 ^C sh-5.0$ curl 172.31.249.13:30011 ^C sh-5.0$ curl 172.31.249.13:30011 ^C And checking the tcpdump in the ens192 oif the worker hosting the node: $ tshark -r huirwang-bug1903414-rrkmp-worker-jmff2.pcap -Y 'tcp.port == 30011' 207 2.571442 0.005239 15:59:23.744467 10.128.3.33 → 172.31.249.13 TCP 124 56812 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053704175 TSecr=0 WS=128 244 3.621191 0.012833 15:59:24.794216 10.128.3.33 → 172.31.249.13 TCP 124 [TCP Retransmission] 56812 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053705225 TSecr=0 WS=128 539 5.669215 0.003371 15:59:26.842240 10.128.3.33 → 172.31.249.13 TCP 124 [TCP Retransmission] 56812 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053707272 TSecr=0 WS=128 1027 10.347814 0.005093 15:59:31.520839 10.128.3.33 → 172.31.249.13 TCP 124 56960 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053711951 TSecr=0 WS=128 1110 11.365343 0.097540 15:59:32.538368 10.128.3.33 → 172.31.249.13 TCP 124 [TCP Retransmission] 56960 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053712969 TSecr=0 WS=128 1551 13.413177 0.063932 15:59:34.586202 10.128.3.33 → 172.31.249.13 TCP 124 [TCP Retransmission] 56960 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053715016 TSecr=0 WS=128 2054 17.649781 0.005325 15:59:38.822806 10.128.3.33 → 172.31.249.13 TCP 124 57096 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053719253 TSecr=0 WS=128 2196 18.661183 0.007356 15:59:39.834208 10.128.3.33 → 172.31.249.13 TCP 124 [TCP Retransmission] 57096 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053720264 TSecr=0 WS=128 6028 20.709184 0.003825 15:59:41.882209 10.128.3.33 → 172.31.249.13 TCP 124 [TCP Retransmission] 57096 → 30011 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3053722312 TSecr=0 WS=128 It's pretty obvious the issue is that either the traffic is not making to the node but either the pod is not getting it or the server is not answering. Checking the pod's tcpdump I see the pod ACKs the message, this means the issue happens in the server, for some reason the virtual switch discards this traffic. $ tshark -r serverpod.pcap | head -6 1 0.000000 0.000000 16:16:41.242155 172.31.249.201 → 10.131.0.30 TCP 74 47816 → 8000 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3054741674 TSecr=0 WS=128 2 0.000043 0.000043 16:16:41.242198 10.131.0.30 → 172.31.249.201 TCP 74 8000 → 47816 [SYN, ACK] Seq=0 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=530118369 TSecr=3054741674 WS=128 3 0.001140 0.001097 16:16:41.243295 172.31.249.201 → 10.131.0.30 TCP 54 47816 → 8000 [RST] Seq=1 Win=0 Len=0 4 1.053287 1.052147 16:16:42.295442 172.31.249.201 → 10.131.0.30 TCP 74 [TCP Retransmission] 47816 → 8000 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=3054742728 TSecr=0 WS=128 5 1.053326 0.000039 16:16:42.295481 10.131.0.30 → 172.31.249.201 TCP 74 [TCP Previous segment not captured] [TCP Port numbers reused] 8000 → 47816 [SYN, ACK] Seq=16457586 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=530119423 TSecr=3054742728 WS=128 6 1.053793 0.000467 16:16:42.295948 172.31.249.201 → 10.131.0.30 TCP 54 47816 → 8000 [RST] Seq=1 Win=0 Len=0 Investigation notes: Inside the pod we see the packet sent tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:01:12.517597 IP 172.31.249.201.33052 > 10.130.2.4.8000: Flags [S], seq 1370673013, win 28200, options [mss 1410,sackOK,TS val 47144581 ecr 0,nop,wscale 7], length 0 15:01:12.517646 IP 10.130.2.4.8000 > 172.31.249.201.33052: Flags [S.], seq 33490073, ack 1370673014, win 27960, options [mss 1410,sackOK,TS val 47144581 ecr 47144581,nop,wscale 7], length 0 15:01:12.518054 IP 172.31.249.201.33052 > 10.130.2.4.8000: Flags [R], seq 1370673014, win 0, length 0 15:01:13.520211 IP 172.31.249.201.33052 > 10.130.2.4.8000: Flags [S], seq 1370673013, win 28200, options [mss 1410,sackOK,TS val 47145584 ecr 0,nop,wscale 7], length 0 15:01:13.520272 IP 10.130.2.4.8000 > 172.31.249.201.33052: Flags [S.], seq 49156100, ack 1370673014, win 27960, options [mss 1410,sackOK,TS val 47145584 ecr 47145584,nop,wscale 7], length 0 15:01:13.520633 IP 172.31.249.201.33052 > 10.130.2.4.8000: Flags [R], seq 1370673014, win 0, length 0 In the node the conntrack isn't complete, it's just SYN_RECV: sh-4.4# conntrack -L -p tcp | grep 30011 conntrack v1.4.4 (conntrack-tools): 202 flow entries have been shown. tcp 6 59 SYN_RECV src=172.31.249.201 dst=172.31.249.158 sport=33926 dport=30011 src=10.130.2.4 dst=172.31.249.201 sport=8000 dport=33926 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 sh-4.4# ovs-ofctl dump-flows -O OpenFlow13 br0 | grep 10.130.2.4 | grep in_port= cookie=0x0, duration=12898.275s, table=20, n_packets=31, n_bytes=1302, priority=100,arp,in_port=5,arp_spa=10.130.2.4,arp_sha=00:00:0a:82:02:04/00:00:ff:ff:ff:ff actions=load:0xb931f5->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=12898.275s, table=20, n_packets=52, n_bytes=3892, priority=100,ip,in_port=5,nw_src=10.130.2.4 actions=load:0xb931f5->NXM_NX_REG0[],goto_table:21 Following the packet inside the switch, we care about port 5, source IP 10.130.2.4 and adestination ip 72.31.249.201. table 0: cookie=0x0, duration=48411.707s, table=0, n_packets=189023, n_bytes=53846847, priority=1000,ct_state=-trk,ip actions=ct(table=0) <- MATCHES conntrack table=0 # don't match cookie=0x0, duration=48411.707s, table=0, n_packets=49787, n_bytes=4512768, priority=400,ip,in_port=tun0,nw_src=10.130.2.1 actions=goto_table:30 cookie=0x0, duration=48411.707s, table=0, n_packets=696, n_bytes=97440, priority=300,ip,in_port=tun0,nw_src=10.130.2.0/23,nw_dst=10.128.0.0/14 actions=goto_table:25 cookie=0x0, duration=48411.707s, table=0, n_packets=0, n_bytes=0, priority=250,ip,in_port=tun0,nw_dst=224.0.0.0/4 actions=drop cookie=0x0, duration=48411.707s, table=0, n_packets=10922, n_bytes=458724, priority=200,arp,in_port=vxlan0,arp_spa=10.128.0.0/14,arp_tpa=10.130.2.0/23 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:10 cookie=0x0, duration=48411.707s, table=0, n_packets=68657, n_bytes=22547874, priority=200,ip,in_port=vxlan0,nw_src=10.128.0.0/14 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:10 cookie=0x0, duration=48411.707s, table=0, n_packets=48, n_bytes=2592, priority=200,ip,in_port=vxlan0,nw_dst=10.128.0.0/14 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:10 cookie=0x0, duration=48411.707s, table=0, n_packets=2291, n_bytes=96222, priority=200,arp,in_port=tun0,arp_spa=10.130.2.1,arp_tpa=10.128.0.0/14 actions=goto_table:30 cookie=0x0, duration=48411.707s, table=0, n_packets=20527, n_bytes=8734201, priority=200,ip,in_port=tun0 actions=goto_table:30 cookie=0x0, duration=48411.707s, table=0, n_packets=0, n_bytes=0, priority=150,in_port=vxlan0 actions=drop cookie=0x0, duration=48411.707s, table=0, n_packets=14, n_bytes=1068, priority=150,in_port=tun0 actions=drop cookie=0x0, duration=48411.707s, table=0, n_packets=11821, n_bytes=496482, priority=100,arp actions=goto_table:20 cookie=0x0, duration=48411.707s, table=0, n_packets=120306, n_bytes=31295397, priority=100,ip actions=goto_table:20 <- MATCHES go to table 20 table 20: # don't match cookie=0x0, duration=48536.838s, table=20, n_packets=1581, n_bytes=66402, priority=100,arp,in_port=veth07db5d4f,arp_spa=10.130.2.2,arp_sha=00:00:0a:82:02:02/00:00:ff:ff:ff:ff actions=load:0x390da7->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=48536.752s, table=20, n_packets=10242, n_bytes=430164, priority=100,arp,in_port=vethe3e9b323,arp_spa=10.130.2.3,arp_sha=00:00:0a:82:02:03/00:00:ff:ff:ff:ff actions=load:0x400329->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=13845.403s, table=20, n_packets=33, n_bytes=1386, priority=100,arp,in_port=veth105793d3,arp_spa=10.130.2.4,arp_sha=00:00:0a:82:02:04/00:00:ff:ff:ff:ff actions=load:0xb931f5->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=48536.838s, table=20, n_packets=19511, n_bytes=7245497, priority=100,ip,in_port=veth07db5d4f,nw_src=10.130.2.2 actions=load:0x390da7->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=48536.752s, table=20, n_packets=101058, n_bytes=24131939, priority=100,ip,in_port=vethe3e9b323,nw_src=10.130.2.3 actions=load:0x400329->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=13845.403s, table=20, n_packets=57, n_bytes=5375, priority=100,ip,in_port=veth105793d3,nw_src=10.130.2.4 actions=load:0xb931f5->NXM_NX_REG0[],goto_table:21 <- MATCHES REG0= 0xb931f5 cookie=0x0, duration=48562.706s, table=20, n_packets=0, n_bytes=0, priority=0 actions=drop table 21: # doesn't match cookie=0x0, duration=48648.757s, table=21, n_packets=97303, n_bytes=27643377, priority=200,ip,nw_dst=10.128.0.0/14 actions=ct(commit,table=30) cookie=0x0, duration=48648.797s, table=21, n_packets=35399, n_bytes=4290695, priority=0 actions=goto_table:30 <- MATCH go to table 30 table 30: # don't match: cookie=0x0, duration=48684.809s, table=30, n_packets=2302, n_bytes=96684, priority=300,arp,arp_tpa=10.130.2.1 actions=output:tun0 cookie=0x0, duration=48684.809s, table=30, n_packets=48547, n_bytes=4630691, priority=300,ip,nw_dst=10.130.2.1 actions=output:tun0 cookie=0x0, duration=48684.809s, table=30, n_packets=60852, n_bytes=27968399, priority=300,ct_state=+rpl,ip,nw_dst=10.130.2.0/23 actions=ct(table=70,nat) <- MATCH nat conntrack table=70 # don't match cookie=0x0, duration=48684.809s, table=30, n_packets=11883, n_bytes=499086, priority=200,arp,arp_tpa=10.130.2.0/23 actions=goto_table:40 cookie=0x0, duration=48684.809s, table=30, n_packets=78881, n_bytes=8146431, priority=200,ip,nw_dst=10.130.2.0/23 actions=goto_table:70 cookie=0x0, duration=48684.809s, table=30, n_packets=10979, n_bytes=461118, priority=100,arp,arp_tpa=10.128.0.0/14 actions=goto_table:50 cookie=0x0, duration=48684.809s, table=30, n_packets=52526, n_bytes=23326289, priority=100,ip,nw_dst=10.128.0.0/14 actions=goto_table:90 cookie=0x0, duration=48684.809s, table=30, n_packets=19677, n_bytes=3302902, priority=100,ip,nw_dst=172.30.0.0/16 actions=goto_table:60 cookie=0x0, duration=48684.809s, table=30, n_packets=0, n_bytes=0, priority=50,ip,in_port=vxlan0,nw_dst=224.0.0.0/4 actions=goto_table:120 cookie=0x0, duration=48684.809s, table=30, n_packets=0, n_bytes=0, priority=25,ip,nw_dst=224.0.0.0/4 actions=goto_table:110 cookie=0x0, duration=48684.809s, table=30, n_packets=3873, n_bytes=494356, priority=0,ip actions=goto_table:100 < MATCH go to table 100 table 100: # Don't match cookie=0x0, duration=48832.450s, table=100, n_packets=0, n_bytes=0, priority=300,udp,tp_dst=4789 actions=drop cookie=0x0, duration=48832.450s, table=100, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.31.249.158,tp_dst=53 actions=output:tun0 cookie=0x0, duration=48832.450s, table=100, n_packets=3855, n_bytes=494856, priority=200,udp,nw_dst=172.31.249.158,tp_dst=53 actions=output:tun0 cookie=0x0, duration=48832.185s, table=100, n_packets=48, n_bytes=3552, priority=100,ip,reg0=0xb931f5 actions=ct(commit),move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.31.249.123->tun_dst,output:vxlan0 <- MATCH set Effective flows cookie=0x0, duration=48411.707s, table=0, n_packets=189023, n_bytes=53846847, priority=1000,ct_state=-trk,ip actions=ct(table=0) <- MATCHES conntrack table=0 cookie=0x0, duration=48411.707s, table=0, n_packets=120306, n_bytes=31295397, priority=100,ip actions=goto_table:20 <- MATCHES go to table 20 cookie=0x0, duration=13845.403s, table=20, n_packets=57, n_bytes=5375, priority=100,ip,in_port=veth105793d3,nw_src=10.130.2.4 actions=load:0xb931f5->NXM_NX_REG0[],goto_table:21 <- MATCHES REG0= 0xb931f5 and go to table=21 cookie=0x0, duration=48648.797s, table=21, n_packets=35399, n_bytes=4290695, priority=0 actions=goto_table:30 <- MATCH go to table 30 cookie=0x0, duration=48684.809s, table=30, n_packets=60852, n_bytes=27968399, priority=300,ct_state=+rpl,ip,nw_dst=10.130.2.0/23 actions=ct(table=70,nat) <- MATCH nat conntrack table=70 cookie=0x0, duration=48684.809s, table=30, n_packets=3873, n_bytes=494356, priority=0,ip actions=goto_table:100 <- MATCH go to table 100 cookie=0x0, duration=48832.185s, table=100, n_packets=48, n_bytes=3552, priority=100,ip,reg0=0xb931f5 actions=ct(commit),move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.31.249.123->tun_dst,output:vxlan0 <- MATCH send the packet through the egressIP node, encapsulate with VNID 0xb931f5 And we see the traffic sent back in the vxlan NIC: $ tshark -r rhel-0-vxlan.pcap -Y 'ip.addr==10.130.2.4' 11 3.559266 1.025974 15:33:47.650832 10.130.2.4 → 172.31.249.201 TCP 74 8000 → 50494 [SYN, ACK] Seq=0 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=49099714 TSecr=49099713 WS=128 12 3.559990 0.000724 15:33:47.651556 172.31.249.201 → 10.130.2.4 TCP 54 50494 → 8000 [RST] Seq=1 Win=0 Len=0 17 4.560400 0.571170 15:33:48.651966 10.130.2.4 → 172.31.249.201 TCP 74 [TCP Previous segment not captured] [TCP Port numbers reused] 8000 → 50494 [SYN, ACK] Seq=15644212 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=49100715 TSecr=49100716 WS=128 18 4.560717 0.000317 15:33:48.652283 172.31.249.201 → 10.130.2.4 TCP 54 50494 → 8000 [RST] Seq=1 Win=0 Len=0 19 7.266425 2.705708 15:33:51.357991 10.130.2.4 → 172.31.249.201 TCP 74 8000 → 50532 [SYN, ACK] Seq=0 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=49103421 TSecr=49103422 WS=128 20 7.266925 0.000500 15:33:51.358491 172.31.249.201 → 10.130.2.4 TCP 54 50532 → 8000 [RST] Seq=1 Win=0 Len=0 21 8.269347 1.002422 15:33:52.360913 10.130.2.4 → 172.31.249.201 TCP 74 [TCP Previous segment not captured] [TCP Port numbers reused] 8000 → 50532 [SYN, ACK] Seq=15670647 Ack=1 Win=27960 Len=0 MSS=1410 SACK_PERM=1 TSval=49104424 TSecr=49104425 WS=128 22 8.269696 0.000349 15:33:52.361262 172.31.249.201 → 10.130.2.4 TCP 54 50532 → 8000 [RST] Seq=1 Win=0 Len=0 sh-4.4# ip route default via 172.31.248.1 dev ens192 proto dhcp metric 100 10.128.0.0/14 dev tun0 scope link 172.30.0.0/16 dev tun0 172.31.248.0/23 dev ens192 proto kernel scope link src 172.31.249.158 metric 100 We have a default GW so this traffic should be sent there, so I'm inclined to think the packets are lost in iptables, however I added trace rules in iptables and the reply doesn't show up at all. Needs further investigation OK I found the issue. The problem is this flow here: We incorrectly try to send the encapsulate the traffic here: cookie=0x0, duration=48832.185s, table=100, n_packets=48, n_bytes=3552, priority=100,ip,reg0=0xb931f5 actions=ct(commit),move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.31.249.123->tun_dst,output:vxlan0 <- MATCH send the packet through the egressIP node, encapsulate with VNID 0xb931f5 And the reason why we the don't see the traffic going through the ens192 is because we have created the vxlan interface with 'options:remote_ip=flow' but we're not defining the field tun_dst, therefore OVS has no destination for it and drops it. I can think of a couple ways to fix it, but because this is an architectural change, it will be small in code but very high in complexity and I'm fairly scared about possible side effects of the fix. I will need consensus here. Most of the team is on PTO and this is my last day before PTO. I'll raise a discussion about this when I'm back the 2nd of January. PS: I fixed this by adding a flow manually in my cluster which I don't think is going to break anything, but it may: ovs-ofctl -O OpenFlow13 add-flow br0 table=100,priority=250,ip,nw_dst=172.31.249.201,reg0=0xb931f5,actions=output:tun0 In my previous comment I assumed the client was another pod within the cluster consuming an egressIP. Unfortunately the customer's client is external which means we cannot rely on that constraint and my manually added flow won't work for this scenario. I need to discus this with other SDN team members because I don't see a way to differentiate the traffic the client egress traffic from the server egress traffic so that we can only apply the egressIP only to the client... (In reply to Juan Luis de Sousa-Valadas from comment #14) > OK I found the issue. The problem is this flow here: > > We incorrectly try to send the encapsulate the traffic here: > > cookie=0x0, duration=48832.185s, table=100, n_packets=48, n_bytes=3552, > priority=100,ip,reg0=0xb931f5 > actions=ct(commit),move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.31. > 249.123->tun_dst,output:vxlan0 <- MATCH send the packet through the egressIP > node, encapsulate with VNID 0xb931f5 > > And the reason why we the don't see the traffic going through the ens192 is > because we have created the vxlan interface with 'options:remote_ip=flow' > but we're not defining the field tun_dst, therefore OVS has no destination > for it and drops it. But we *are* defining tun_dst: "set_field:172.31.249.123->tun_dst" (In reply to Dan Winship from comment #18) > But we *are* defining tun_dst: "set_field:172.31.249.123->tun_dst" You're right, I don't know why I didn't see the reply in the tcpdump of ens192, I think the filter 'tcp.port == 30011' is valid, but I don't have the tcpdump any more. Anyway, we still need to avoid hitting that flow when the pod is the server. Even if the packet is sent to the node that has the egress IP, and this node forwards it as we expect when the pod is the client, the client will get a reply with the wrong source IP. ah... yes, I think you want to check `ct_state=-rpl`. Actually, all of table 100 should be bypassed for reply packets Hello, the reason why conntrack failed to skip table 100 is that I defined the flow as `ct_state=-rpl,actions=goto_table:101` instead of the current `ct_state=+rpl,actions=goto_table:101`. Jason verified both the nodePort and the egressIP work as expected in the PR. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |