Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2150533

Summary: [OVN] Session Affinity doesn't work as expected when a non-affinity service shares its backends with the affinity service
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Surya Seetharaman <surya>
Component: ovn22.09Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 22.LCC: ctrautma, dceara, jiji, jishi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn22.09-22.09.0-24.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-09 00:27:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Surya Seetharaman 2022-12-03 19:06:02 UTC
Description of problem:


OVNK world:

nettest-2100         netserver-0                                 1/1     Running   0          21m   10.244.0.11   ovn-control-plane   <none>           <none>
nettest-2100         netserver-1                                 1/1     Running   0          21m   10.244.1.14   ovn-worker          <none>           <none>
nettest-2100         netserver-2                                 1/1     Running   0          21m   10.244.2.19   ovn-worker2         <none>           <none>
nettest-2100         test-container-pod                          1/1     Running   0          20m   10.244.2.20   ovn-worker2         <none>           <none>


$ oc get svc -n nettest-2100
NAME                       TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                     AGE
node-port-service          NodePort   10.96.253.202   <none>        80:31455/TCP,90:31374/UDP   45m
session-affinity-service   NodePort   10.96.128.6     <none>        80:30744/TCP,90:30388/UDP   45m

$ oc get ep -n nettest-2100
NAME                       ENDPOINTS                                                        AGE
node-port-service          10.244.0.11:8083,10.244.1.14:8083,10.244.2.19:8083 + 3 more...   45m
session-affinity-service   10.244.0.11:8083,10.244.1.14:8083,10.244.2.19:8083 + 3 more...   45m

NOTE: +3 more... just means that same set of backend IPs but with the 8081 targetPort for UDP traffic, see the full service description below:

$ oc get svc -n nettest-2100 node-port-service -oyaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2022-12-03T17:50:49Z"
  name: node-port-service
  namespace: nettest-2100
  resourceVersion: "63639"
  uid: 914b0755-ea70-4f00-9793-80729cab3f54
spec:
  clusterIP: 10.96.253.202
  clusterIPs:
  - 10.96.253.202
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    nodePort: 31455
    port: 80
    protocol: TCP
    targetPort: 8083
  - name: udp
    nodePort: 31374
    port: 90
    protocol: UDP
    targetPort: 8081
  selector:
    selector-6d9ed6ed-e672-400d-b8da-297fc7d3a80c: "true"
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}


$ oc get svc -n nettest-2100 session-affinity-service -oyaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2022-12-03T17:50:49Z"
  name: session-affinity-service
  namespace: nettest-2100
  resourceVersion: "63643"
  uid: 85d32617-7889-4b38-baa9-273db9f04b09
spec:
  clusterIP: 10.96.128.6
  clusterIPs:
  - 10.96.128.6
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    nodePort: 30744
    port: 80
    protocol: TCP
    targetPort: 8083
  - name: udp
    nodePort: 30388
    port: 90
    protocol: UDP
    targetPort: 8081
  selector:
    selector-6d9ed6ed-e672-400d-b8da-297fc7d3a80c: "true"
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  type: NodePort
status:
  loadBalancer: {}

NOTE that the service with session affinity set has port 30388 and the one without affinity set has port 31374.

I try to curl the service node-port-service from netserver-0.:

bash-5.0# curl -g -q -s 'http://10.244.0.11:8083/dial?request=hostname&protocol=udp&host=172.19.0.4&port=31374&tries=1'                                                      
{"errors":["reading from udp connection failed. err:'read udp 10.244.0.11:37269-\u003e172.19.0.4:31374: i/o timeout'"]}

So the flow here is srcpod (netserver-0) on node ovn-control-plane -----> 172.19.0.4:31374 (nodeport service) same node i.e

src pod ---> node logical switch (LB DNAT to backend) ---> backend pod

This traffic fails :/

OVN Trace

sh-5.2# ovn-trace --ct new 'inport=="nettest-2100_netserver-0" && eth.src==0a:58:0a:f4:00:0b && eth.dst==0a:58:0a:f4:00:01 && ip4.src==10.244.0.11 && ip4.dst==172.[416/9092]
ip.ttl==64 && udp && udp.src==42696 && udp.dst==31374'                                                                                                                       
# udp,reg14=0x3,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=172.19.0.4,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=42696,tp
_dst=31374                                                                                                                                                                   
                                                                                                                                                                             
ingress(dp="ovn-control-plane", inport="nettest-2100_netserver-0")                                                                                                           
------------------------------------------------------------------                                                                                                           
 0. ls_in_check_port_sec (northd.c:8126): 1, priority 50, uuid 34fbba79                                                                                                      
    reg0[15] = check_in_port_sec();                                                                                                                                          
    next;                                                                                                                                                                    
 4. ls_in_pre_acl (northd.c:5801): ip, priority 100, uuid 8a58787a                                                                                                           
    reg0[0] = 1;                                                                                                                                                             
    next;                                                                                                                                                                    
 5. ls_in_pre_lb (northd.c:5971): ip, priority 100, uuid 0bcb14a5                                                                                                            
    reg0[2] = 1;                                                                                                                                                             
    next;                                                                                                                                                                    
 6. ls_in_pre_stateful (northd.c:6963): ip4.dst == 172.19.0.4 && udp.dst == 31374, priority 120, uuid 2e4f4105                                                               
    reg1 = 172.19.0.4;                                                                                                                                                       
    reg2[0..15] = 31374;                                                                                                                                                     
    ct_lb_mark;                                                                                                                                                              
                                                                                                                                                                             
ct_lb_mark                                                                                                                                                                   
----------                                                                                                                                                                   
 7. ls_in_acl_hint (northd.c:6052): ct.new && !ct.est, priority 7, uuid b612a73a                                                                                             
    reg0[7] = 1;                                                                                                                                                             
    reg0[9] = 1;
    next;
 8. ls_in_acl (northd.c:6668): ip && !ct.est, priority 1, uuid b7a59514
    reg0[1] = 1;
    next;
12. ls_in_lb (northd.c:7257): ct.new && ip4.dst == 172.19.0.4 && udp.dst == 31374, priority 120, uuid 4b6b480b
    reg0[1] = 0;
    ct_lb_mark(backends=10.244.0.11:8081,10.244.1.14:8081,10.244.2.19:8081);

ct_lb_mark /* default (use --ct to customize) */
------------------------------------------------
13. ls_in_lb_aff_learn (northd.c:7118): reg9[6] == 0 && ct.new && ip4 && ip4.dst == 10.244.1.14 && reg1 == 172.19.0.4 && udp.dst == 8081, priority 100, uuid 7ac23c03
    commit_lb_aff(vip = "172.19.0.4:30388", backend = "10.244.1.14:8081", proto = udp, timeout = 10800);


OVS Trace:

sh-5.2# ovs-appctl ofproto/trace br-int in_port=13,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,udp,nw_src=10.244.0.11,nw_dst=172.19.0.4,tp_dst=31374,tp_src=42696,nw_ttl=64,dp_hash=1
Bad openflow flow syntax: in_port=13,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,udp,nw_src=10.244.0.11,nw_dst=172.19.0.4,tp_dst=31374,tp_src=42696,nw_ttl=64,dp_hash=1: prerequisites not met for setting tp_dst
ovs-appctl: ovs-vswitchd: server returned an error
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# 
sh-5.2# ovs-appctl ofproto/trace br-int in_port=13,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,udp,nw_src=10.244.0.11,nw_dst=172.19.0.4,udp_dst=31374,udp_src=42696,nw_ttl=64,dp_hash=1
Flow: dp_hash=0x1,udp,in_port=13,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=172.19.0.4,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42696,tp_dst=31374

bridge("br-int")
----------------
 0. in_port=13, priority 100, cookie 0x549864ff
    set_field:0x1->reg13
    set_field:0x5->reg11
    set_field:0x7->reg12
    set_field:0x5->metadata
    set_field:0x3->reg14
    resubmit(,8)
 8. metadata=0x5, priority 50, cookie 0x34fbba79
    set_field:0/0x1000->reg10
    resubmit(,73)
    73. ip,reg14=0x3,metadata=0x5,dl_src=0a:58:0a:f4:00:0b,nw_src=10.244.0.11, priority 90, cookie 0x549864ff
            set_field:0/0x1000->reg10
    move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
     -> NXM_NX_XXREG0[111] is now 0
    resubmit(,9)
 9. metadata=0x5, priority 0, cookie 0xff69827e
    resubmit(,10)
10. metadata=0x5, priority 0, cookie 0x6af23947
    resubmit(,11)
11. metadata=0x5, priority 0, cookie 0x60947d4e
    resubmit(,12)
12. ip,metadata=0x5, priority 100, cookie 0x8a58787a
    set_field:0x1000000000000000000000000/0x1000000000000000000000000->xxreg0
    resubmit(,13)
13. ip,metadata=0x5, priority 100, cookie 0xbcb14a5
    set_field:0x4000000000000000000000000/0x4000000000000000000000000->xxreg0
    resubmit(,14)
14. udp,metadata=0x5,nw_dst=172.19.0.4,tp_dst=31374, priority 120, cookie 0x2e4f4105
    set_field:0xac1300040000000000000000/0xffffffff0000000000000000->xxreg0
    set_field:0x7a8e00000000/0xffff00000000->xxreg0
    ct(table=15,zone=NXM_NX_REG13[0..15],nat)
    nat
    -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 15.
     -> Sets the packet to an untracked state, and clears all the conntrack fields.

Final flow: dp_hash=0x1,udp,reg0=0x5,reg1=0xac130004,reg2=0x7a8e,reg11=0x5,reg12=0x7,reg13=0x1,reg14=0x3,metadata=0x5,in_port=13,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=172.19.0.4,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42696,tp_dst=31374
Megaflow: recirc_id=0,eth,udp,in_port=13,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=172.19.0.4,nw_frag=no,tp_src=0x8000/0x8000,tp_dst=31374
Datapath actions: ct(zone=1,nat),recirc(0x9e6)

===============================================================================
recirc(0x9e6) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
Replacing src/dst IP/ports to simulate NAT:
 Initial flow: 
 Modified flow: 
===============================================================================

Flow: recirc_id=0x9e6,dp_hash=0x1,ct_state=new|trk,ct_zone=1,eth,udp,reg0=0x5,reg1=0xac130004,reg2=0x7a8e,reg11=0x5,reg12=0x7,reg13=0x1,reg14=0x3,metadata=0x5,in_port=13,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=172.19.0.4,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42696,tp_dst=31374

bridge("br-int")
----------------
    thaw
        Resuming from table 15
15. ct_state=+new-est+trk,metadata=0x5, priority 7, cookie 0xb612a73a
    set_field:0x80000000000000000000000000/0x80000000000000000000000000->xxreg0
    set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0
    resubmit(,16)
16. ct_state=-est+trk,ip,metadata=0x5, priority 1, cookie 0xb7a59514
    set_field:0x2000000000000000000000000/0x2000000000000000000000000->xxreg0
    resubmit(,17)
17. metadata=0x5, priority 0, cookie 0xe95b10f8
    resubmit(,18)
18. metadata=0x5, priority 0, cookie 0x9a33e75b
    resubmit(,19)
19. metadata=0x5, priority 0, cookie 0xcfa98814
    resubmit(,20)
20. ct_state=+new+trk,udp,metadata=0x5,nw_dst=172.19.0.4,tp_dst=31374, priority 120, cookie 0x4b6b480b
    set_field:0/0x2000000000000000000000000->xxreg0
    group:12
     -> using bucket 1
    bucket 1
            ct(commit,table=21,zone=NXM_NX_REG13[0..15],nat(dst=10.244.1.14:8081),exec(set_field:0x2/0x2->ct_mark))
            nat(dst=10.244.1.14:8081)
            set_field:0x2/0x2->ct_mark
             -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 21.
             -> Sets the packet to an untracked state, and clears all the conntrack fields.

Final flow: recirc_id=0x9e6,dp_hash=0x1,ct_state=new|trk,ct_zone=1,eth,udp,reg0=0x285,reg1=0xac130004,reg2=0x7a8e,reg11=0x5,reg12=0x7,reg13=0x1,reg14=0x3,metadata=0x5,in_port=13,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=172.19.0.4,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42696,tp_dst=31374
Megaflow: recirc_id=0x9e6,dp_hash=0x1/0xf,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x3,eth,udp,in_port=13,dl_dst=0a:58:0a:f4:00:01,nw_dst=172.19.0.4,nw_frag=no,tp_dst=31374
Datapath actions: ct(commit,zone=1,mark=0x2/0x2,nat(dst=10.244.1.14:8081)),recirc(0x9e7)

===============================================================================
recirc(0x9e7) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
Replacing src/dst IP/ports to simulate NAT:
 Initial flow: nw_src=10.244.0.11,tp_src=42696,nw_dst=172.19.0.4,tp_dst=31374
 Modified flow: nw_src=10.244.0.11,tp_src=42696,nw_dst=10.244.1.14,tp_dst=8081
===============================================================================
Flow: recirc_id=0x9e7,dp_hash=0x1,ct_state=new|trk,ct_zone=1,ct_mark=0x2,eth,udp,reg0=0x285,reg1=0xac130004,reg2=0x7a8e,reg11=0x5,reg12=0x7,reg13=0x1,reg14=0x3,metadata=0x5,in_port=13,vlan_tci=0x0000,dl_src=0a:58:0a:f4:00:0b,dl_dst=0a:58:0a:f4:00:01,nw_src=10.244.0.11,nw_dst=10.244.1.14,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=42696,tp_dst=8081

bridge("br-int")
----------------
    thaw
        Resuming from table 21
21. ct_state=+new+trk,udp,reg1=0xac130004,reg9=0/0x40,metadata=0x5,nw_dst=10.244.1.14,tp_dst=8081, priority 100, cookie 0x7ac23c03
    learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf4010e->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
     >> suppressing side effects, so learn action ignored

Final flow: unchanged
Megaflow: recirc_id=0x9e7,ct_state=+new+trk,eth,udp,in_port=13,nw_src=10.244.0.11,nw_dst=10.244.1.14,nw_frag=no,tp_dst=8081
Datapath actions: drop





Version-Release number of selected component (if applicable):
22.09.0-22

Comment 2 Surya Seetharaman 2022-12-03 19:11:30 UTC
Note that running a trace towards the other node node port service also doesn't work:

[root@ovn-control-plane ~]# ovn-trace --ct new 'inport=="breth0_ovn-worker" && eth.dst==02:42:ac:13:00:03 && ip4.src==172.19.0.4 && ip4.dst==172.19.0.3 && udp && ip.ttl==64 
&& udp.dst==31374 && udp.src==46519'                                                                                                                                         
# udp,reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=02:42:ac:13:00:03,nw_src=172.19.0.4,nw_dst=172.19.0.3,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=46519,tp_
dst=31374                                                                                                                                                                    
                                                                                                                                                                             
ingress(dp="ext_ovn-worker", inport="breth0_ovn-worker")                                                                                                                     
--------------------------------------------------------                                                                                                                     
 0. ls_in_check_port_sec (northd.c:8126): 1, priority 50, uuid 34fbba79                                                                                                      
    reg0[15] = check_in_port_sec();                                                                                                                                          
    next;                                                                                                                                                                    
 5. ls_in_pre_lb (northd.c:5687): ip && inport == "breth0_ovn-worker", priority 110, uuid 850b2e24                                                                           
    next;                                                                                                                                                                    
19. ls_in_arp_rsp (northd.c:8147): inport == "breth0_ovn-worker", priority 100, uuid bde6d400                                                                                
    next;                                                                                                                                                                    
25. ls_in_l2_lkup (northd.c:8790): eth.dst == 02:42:ac:13:00:03, priority 50, uuid 042863e9                                                                                  
    outport = "etor-GR_ovn-worker";                                                                                                                                          
    output;                                                                                                                                                                  
                                                                                                                                                                             
egress(dp="ext_ovn-worker", inport="breth0_ovn-worker", outport="etor-GR_ovn-worker")                                                                                        
-------------------------------------------------------------------------------------                                                                                        
 0. ls_out_pre_lb (northd.c:5690): ip && outport == "etor-GR_ovn-worker", priority 110, uuid 57eb6b70                                                                        
    next;                                                                                                                                                                    
 8. ls_out_check_port_sec (northd.c:5657): 1, priority 0, uuid 45576990                                                                                                      
    reg0[15] = check_out_port_sec();                                                                                                                                         
    next;
 9. ls_out_apply_port_sec (northd.c:5662): 1, priority 0, uuid 17cdfb51
    output;
    /* output to "etor-GR_ovn-worker", type "l3gateway" */
ingress(dp="GR_ovn-worker", inport="rtoe-GR_ovn-worker")
--------------------------------------------------------
 0. lr_in_admission (northd.c:11302): eth.dst == 02:42:ac:13:00:03 && inport == "rtoe-GR_ovn-worker", priority 50, uuid c0c5108a
    xreg0[0..47] = 02:42:ac:13:00:03;
    next;
 1. lr_in_lookup_neighbor (northd.c:11460): 1, priority 0, uuid d19378bd
    reg9[2] = 1;
    next;
 2. lr_in_learn_neighbor (northd.c:11469): reg9[2] == 1 || reg9[3] == 0, priority 100, uuid 90d32301
    next;
 4. lr_in_unsnat (northd.c:10366): ip4 && ip4.dst == 172.19.0.3 && udp && udp.dst == 31374, priority 120, uuid 99e99d89
    next;
 5. lr_in_defrag (northd.c:10609): ip && ip4.dst == 172.19.0.3 && udp, priority 110, uuid a763ac47
    reg0 = 172.19.0.3;
    reg9[16..31] = udp.dst;
    ct_dnat;

ct_dnat /* assuming no un-dnat entry, so no change */
-----------------------------------------------------
 7. lr_in_dnat (northd.c:10193): ct.new && ip4 && reg0 == 172.19.0.3 && udp && reg9[16..31] == 31374, priority 120, uuid 61ad7b2c
    flags.force_snat_for_lb = 1;
    ct_lb_mark(backends=10.244.0.11:8081,10.244.1.14:8081,10.244.2.19:8081);

ct_lb_mark /* default (use --ct to customize) */
------------------------------------------------
 8. lr_in_lb_aff_learn (northd.c:7118): reg9[6] == 0 && ct.new && ip4 && ip4.dst == 10.244.2.19 && reg0 == 172.19.0.3 && udp.dst == 8081, priority 100, uuid f1e7de2c
    commit_lb_aff(vip = "172.19.0.3:30388", backend = "10.244.2.19:8081", proto = udp, timeout = 10800);

The problem persists on switch and router LB's seems like..

Comment 4 Surya Seetharaman 2022-12-03 19:33:33 UTC
sh-5.2# ovs-ofctl dump-flows br-int | grep 30388
 cookie=0xfd4ff2d6, duration=6032.801s, table=12, n_packets=0, n_bytes=0, idle_age=6032, priority=120,udp,metadata=0x6,nw_dst=172.19.0.4,tp_dst=30388 actions=resubmit(,13)
 cookie=0xab651266, duration=6032.802s, table=14, n_packets=0, n_bytes=0, idle_age=6032, priority=120,udp,metadata=0x5,nw_dst=172.19.0.4,tp_dst=30388 actions=load:0xac130004->NXM_NX_XXREG0[64..95],load:0x76b4->NXM_NX_XXREG0[32..47],ct(table=15,zone=NXM_NX_REG13[0..15],nat)
 cookie=0xfdc41308, duration=6032.800s, table=16, n_packets=0, n_bytes=0, idle_age=6032, priority=100,ct_state=+new+trk,udp,reg0=0xac130004,reg9=0/0x40,metadata=0x6,nw_dst=10.244.2.19,tp_dst=8081 actions=learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf40213->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
 cookie=0x15cade6b, duration=6032.800s, table=16, n_packets=0, n_bytes=0, idle_age=6032, priority=100,ct_state=+new+trk,udp,reg0=0xac130004,reg9=0/0x40,metadata=0x6,nw_dst=10.244.1.14,tp_dst=8081 actions=learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf4010e->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
 cookie=0xfe8526f0, duration=6032.800s, table=16, n_packets=0, n_bytes=0, idle_age=6032, priority=100,ct_state=+new+trk,udp,reg0=0xac130004,reg9=0/0x40,metadata=0x6,nw_dst=10.244.0.11,tp_dst=8081 actions=learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf4000b->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
 cookie=0xe49409e5, duration=6032.801s, table=20, n_packets=0, n_bytes=0, idle_age=6032, priority=120,ct_state=+new+trk,udp,metadata=0x5,nw_dst=172.19.0.4,tp_dst=30388 actions=load:0->NXM_NX_XXREG0[97],group:12
 cookie=0x7ac23c03, duration=6032.801s, table=21, n_packets=17, n_bytes=850, idle_age=5235, priority=100,ct_state=+new+trk,udp,reg1=0xac130004,reg9=0/0x40,metadata=0x5,nw_dst=10.244.1.14,tp_dst=8081 actions=learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf4010e->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
 cookie=0xbe5faf9a, duration=6032.801s, table=21, n_packets=15, n_bytes=750, idle_age=4984, priority=100,ct_state=+new+trk,udp,reg1=0xac130004,reg9=0/0x40,metadata=0x5,nw_dst=10.244.2.19,tp_dst=8081 actions=learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf40213->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
 cookie=0x8df75fac, duration=6032.801s, table=21, n_packets=9, n_bytes=450, idle_age=5763, priority=100,ct_state=+new+trk,udp,reg1=0xac130004,reg9=0/0x40,metadata=0x5,nw_dst=10.244.0.11,tp_dst=8081 actions=learn(table=78,idle_timeout=10800,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.19.0.4,nw_proto=17,udp_dst=30388,load:0x1->NXM_NX_REG10[14],load:0xaf4000b->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15])
 cookie=0x0, duration=6020.060s, table=78, n_packets=0, n_bytes=0, idle_timeout=10800, idle_age=6020, hard_age=4984, udp,metadata=0x5,nw_src=10.244.0.11,nw_dst=172.19.0.4,tp_dst=30388 actions=load:0x1->NXM_NX_REG10[14],load:0xaf40213->NXM_NX_REG4[],load:0x1f91->NXM_NX_REG8[0..15]

these flows with packet counters up are highly suspicious since I have never curled 30388 so technically they shouldn't be hit.

Comment 6 Surya Seetharaman 2022-12-03 19:39:57 UTC
Sorry forgot to paste the LB objects I am talking about, its around these two objects that we are having some confusion...

_uuid               : 9bfc143a-d955-4d06-86da-d1a6e6b86028
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="nettest-2100/session-affinity-service"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_nettest-2100/session-affinity-service_UDP_node_router+switch_ovn-control-plane"
options             : {affinity_timeout="10800", event="false", hairpin_snat_ip="169.254.169.5 fd69::5", reject="true", skip_snat="false"}
protocol            : udp
selection_fields    : []
vips                : {"172.19.0.4:30388"="10.244.0.11:8081,10.244.1.14:8081,10.244.2.19:8081"}

==== versus

_uuid               : ba4b6a74-b148-4faf-a5ee-511962ef3e09                                                                                                                   
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="nettest-2100/node-port-service"}                                                                     
health_check        : []                                                                                                                                                     
ip_port_mappings    : {}                                                                                                                                                     
name                : "Service_nettest-2100/node-port-service_UDP_node_router+switch_ovn-control-plane"                                                                      
options             : {event="false", hairpin_snat_ip="169.254.169.5 fd69::5", reject="true", skip_snat="false"}                                                             
protocol            : udp                                                                                                                                                    
selection_fields    : []                                                                                                                                                     
vips                : {"172.19.0.4:31374"="10.244.0.11:8081,10.244.1.14:8081,10.244.2.19:8081"}

Comment 7 Dumitru Ceara 2022-12-05 13:04:22 UTC
V1 posted for review: https://patchwork.ozlabs.org/project/ovn/list/?series=331232&state=*

Comment 8 OVN Bot 2022-12-06 15:31:58 UTC
ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2151287

Comment 12 ying xu 2023-01-30 06:31:43 UTC
I can reproduce this issue on version:
# rpm -qa|grep ovn
ovn22.09-central-22.09.0-22.el8fdp.x86_64
ovn22.09-host-22.09.0-22.el8fdp.x86_64
ovn22.09-22.09.0-22.el8fdp.x86_64

topo:
#    foo -- R1 -- join - R2 -- alice
#           |          |
#    bar ----          - R3 --- bob
#

set lb for R1 as below:
# ovn-nbctl list load_balancer
_uuid               : 84152d94-1c68-4311-8fd7-36ae12a4377a
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lb1-no-aff
options             : {}
protocol            : tcp
selection_fields    : []
vips                : {"172.16.1.101:8081"="192.168.1.2:80,192.168.2.2:80"}


_uuid               : 8b2caa4d-83c5-49bc-a779-20dc5cbf1730
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lb1
options             : {affinity_timeout="3"}
protocol            : tcp
selection_fields    : []
vips                : {"172.16.1.101:8080"="192.168.1.2:80,192.168.2.2:80"}


then, do "ip netns exec alice1 ncat 172.16.1.101 8081 <<< h"
but we can see that counter is increased for port 8080
# ovs-ofctl dump-flows br-int|grep 8080
 cookie=0x48842071, duration=3948.235s, table=16, n_packets=53, n_bytes=3922, idle_age=3361, priority=100,ct_state=+new+trk,tcp,reg0=0xac100165,reg9=0/0x40,metadata=0x2,nw_dst=192.168.1.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.101,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80102->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 cookie=0x10fdf642, duration=3948.235s, table=16, n_packets=72, n_bytes=5328, idle_age=108, priority=100,ct_state=+new+trk,tcp,reg0=0xac100165,reg9=0/0x40,metadata=0x2,nw_dst=192.168.2.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.101,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80202->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 cookie=0x95d6ca81, duration=3948.219s, table=16, n_packets=0, n_bytes=0, idle_age=3948, priority=100,ct_state=+new+trk,tcp,reg0=0xac10016f,reg9=0/0x40,metadata=0x2,nw_dst=192.168.1.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.111,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80102->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 cookie=0x5e2f7f57, duration=3948.219s, table=16, n_packets=0, n_bytes=0, idle_age=3948, priority=100,ct_state=+new+trk,tcp,reg0=0xac10016f,reg9=0/0x40,metadata=0x2,nw_dst=192.168.2.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.111,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80202->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 
verified on version:
# rpm -qa|grep ovn
ovn22.09-central-22.09.0-31.el8fdp.x86_64
ovn22.09-host-22.09.0-31.el8fdp.x86_64
ovn22.09-22.09.0-31.el8fdp.x86_64

we set two lb for lr,one with affiinity and another without,
do "ip netns exec alice1 ncat 172.16.1.101 8081 <<< h"
we can see that counter isn't increased for port 8080
# ovs-ofctl dump-flows br-int|grep 8080                   ------------all packets counters are 0.
 cookie=0x772eba22, duration=1700.642s, table=16, n_packets=0, n_bytes=0, idle_age=1700, priority=100,ct_state=+new+trk,tcp,reg0=0xac100165,reg9=0x1f900000/0xffff0040,metadata=0x2,nw_dst=192.168.1.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.101,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80102->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 cookie=0x7937c223, duration=1700.642s, table=16, n_packets=0, n_bytes=0, idle_age=1700, priority=100,ct_state=+new+trk,tcp,reg0=0xac100165,reg9=0x1f900000/0xffff0040,metadata=0x2,nw_dst=192.168.2.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.101,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80202->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 cookie=0x30b4c06c, duration=1700.628s, table=16, n_packets=0, n_bytes=0, idle_age=1700, priority=100,ct_state=+new+trk,tcp,reg0=0xac10016f,reg9=0x1f900000/0xffff0040,metadata=0x2,nw_dst=192.168.1.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.111,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80102->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])
 cookie=0x933beece, duration=1700.628s, table=16, n_packets=0, n_bytes=0, idle_age=1700, priority=100,ct_state=+new+trk,tcp,reg0=0xac10016f,reg9=0x1f900000/0xffff0040,metadata=0x2,nw_dst=192.168.2.2,tp_dst=80 actions=learn(table=78,idle_timeout=3,delete_learned,OXM_OF_METADATA[],eth_type=0x800,NXM_OF_IP_SRC[],ip_dst=172.16.1.111,nw_proto=6,tcp_dst=8080,load:0x1->NXM_NX_REG10[14],load:0xc0a80202->NXM_NX_REG4[],load:0x50->NXM_NX_REG8[0..15])

Comment 14 errata-xmlrpc 2023-02-09 00:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.09 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0686