Bug 2224260 - LB skip_snat improperly applied with affinity_timeout
Summary: LB skip_snat improperly applied with affinity_timeout
Keywords:
Status: POST
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn23.09
Version: FDP 23.K
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-20 09:42 UTC by François Rigault
Modified: 2023-07-28 00:54 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-3041 0 None None None 2023-07-20 09:45:36 UTC

Description François Rigault 2023-07-20 09:42:22 UTC
Description of problem:
traffic is still snatted when 
1- loadbalancer has skip_snat=true
2- logical router has lb_force_snat_ip=router_ip
3- loadbalancer has affinity_timeout
In the context of OVN-Kubernetes, traffic for services with externalTrafficPolicy: Local and  sessionAffinity: ClientIP are still getting snatted

Version-Release number of selected component (if applicable):
ovn main branch (commit 30952c248d4f804c25af9b1c9565f23c0045e915)

How reproducible:
all the time

Steps to Reproduce:
(greatly helped by reusing instructions from bz1995326)

1. in OVN sandbox:
# Create the first logical switch with one port
ovn-nbctl ls-add sw0                                         
ovn-nbctl lsp-add sw0 sw0-port1                           
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
                                     
ovs-vsctl add-port br-int sw0-port1 -- set interface sw0-port1 type=internal external_ids:iface-id=sw0-port1
ip netns add sw0-port1                                       
ip link set sw0-port1 netns sw0-port1                     
ip netns exec sw0-port1 ip link set sw0-port1 address 50:54:00:00:00:01
ip netns exec sw0-port1 ip link set sw0-port1 up
ip netns exec sw0-port1 ip addr add 192.168.0.2/24 dev sw0-port1
ip netns exec sw0-port1 ip route add default via 192.168.0.1
                                                     
# Create the second logical switch with one port
ovn-nbctl ls-add sw1                                
ovn-nbctl lsp-add sw1 sw1-port1                                    
ovn-nbctl lsp-set-addresses sw1-port1 "50:54:00:00:00:03 11.0.0.2"
                                                     
ovs-vsctl add-port br-int sw1-port1 -- set interface sw1-port1 type=internal external_ids:iface-id=sw1-port1
ip netns add sw1-port1     
ip link set sw1-port1 netns sw1-port1
ip netns exec sw1-port1 ip link set sw1-port1 address 50:54:00:00:00:03
ip netns exec sw1-port1 ip link set sw1-port1 up
ip netns exec sw1-port1 ip addr add 11.0.0.2/24 dev sw1-port1
ip netns exec sw1-port1 ip route add default via 11.0.0.1

# Create a logical router and attach both logical switches
ovn-nbctl lr-add lr0                   
ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 192.168.0.1/24
ovn-nbctl lsp-add sw0 lrp0-attachment           
ovn-nbctl lsp-set-type lrp0-attachment router   
ovn-nbctl lsp-set-addresses lrp0-attachment 00:00:00:00:ff:01
ovn-nbctl lsp-set-options lrp0-attachment router-port=lrp0
ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 11.0.0.1/24
ovn-nbctl lsp-add sw1 lrp1-attachment
ovn-nbctl lsp-set-type lrp1-attachment router
ovn-nbctl lsp-set-addresses lrp1-attachment 00:00:00:00:ff:02
ovn-nbctl lsp-set-options lrp1-attachment router-port=lrp1

ovn-nbctl set Logical_Router lr0 options:chassis=chassis-1
ovn-nbctl set Logical_Router lr0 options:lb_force_snat_ip=router_ip
ovn-nbctl lb-add lb0 11.0.0.200:1234 192.168.0.2:8080
ovn-nbctl set Load_Balancer lb0 options:skip_snat=true
ovn-nbctl set load_balancer lb0 options:affinity_timeout=1200
ovn-nbctl lr-lb-add lr0 lb0

ovn-sbctl dump-flows lr0 | grep lr_in_dnat
ovn-nbctl --wait=hv sync

ip netns exec sw0-port1 python3 -m http.server 8080 &

 
ip netns exec sw1-port1 curl 11.0.0.200:1234
ip netns exec sw1-port1 curl 11.0.0.200:1234


Actual results:
at least the second curl succeeds but after SNAT:
192.168.0.1 - - [20/Jul/2023 09:24:39] "GET / HTTP/1.1" 200 -

Expected results:
curl succeeds with the proper IP
11.0.0.2 - - [20/Jul/2023 09:27:27] "GET / HTTP/1.1" 200 -
11.0.0.2 - - [20/Jul/2023 09:27:27] "GET / HTTP/1.1" 200 - 

(as it is the case when removing the affinity_timeout with 
ovn-nbctl remove load_balancer lb0 options affinity_timeout=1200
)


Additional info:
also RH case https://access.redhat.com/support/cases/#/case/03563137

Comment 2 François Rigault 2023-07-20 14:05:30 UTC
I tried it and it works (thanks!), note that this fails now:


ip netns exec sw0-port1 curl 11.0.0.200:1234

(the fun case of the pod contacting the service for which it is its own endpoint, and thus requires the hairpin thing)

Comment 4 Scott Dodson 2023-07-20 14:08:09 UTC
@amusil Thanks, can you make sure that this gets backported to whichever version of OVN is present in OCP 4.12?

Comment 5 Ales Musil 2023-07-20 14:38:10 UTC
(In reply to François Rigault from comment #2)
> I tried it and it works (thanks!), note that this fails now:
> 
> 
> ip netns exec sw0-port1 curl 11.0.0.200:1234
> 
> (the fun case of the pod contacting the service for which it is its own
> endpoint, and thus requires the hairpin thing)

That also fails when you remove the affinity_timeout (on current main). AFAIK that's correct.

(In reply to Scott Dodson from comment #4)
> @amusil Thanks, can you make sure that this gets backported to
> whichever version of OVN is present in OCP 4.12?

Yeah, I'll make sure it gets backported. 

Thanks,
Ales


Note You need to log in before you can comment on or make changes to this bug.