The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2127959 - [RFE] Add support for load balancer backend affinity (with timeout)
Summary: [RFE] Add support for load balancer backend affinity (with timeout)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn22.12
Version: FDP 22.E
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: lorenzo bianconi
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-19 13:09 UTC by Dumitru Ceara
Modified: 2023-08-21 02:08 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-21 02:08:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2296 0 None None None 2022-09-19 13:16:21 UTC
Red Hat Product Errata RHBA-2023:4677 0 None None None 2023-08-21 02:08:22 UTC

Description Dumitru Ceara 2022-09-19 13:09:17 UTC
Description of problem:

Currently OVN load balancing works by selecting a backend (DNAT IP)
according to a hash function (by default 5-tuple but can also be changed
to other hash functions, e.g., src-ip+dst-ip+proto).

Some CMS implementations might require more customization to the load
balancing algorithm.

For example, Kubernetes services have an option to select affinity based
on the client's IP address.  This translates to always selecting the
same backend if the same client IP is used to access a service (a load
balancer VIP).  ovn-kubernetes implements this by overriding the default
hash function of the load balancer and using the 3-tuple: src + dst-ip + proto.

On top of this, kubernetes also allows users to specify a timeout for
the "sticky" backend described above.  This BZ tracks adding native
support for load balancer backend affinity (with a timeout) in OVN.

A potential implementation is described below (this is likely not
the only option and it might also not be the most efficient):

Assuming a load balancer:
  ovn-nbctl lb-add lb-test 42.42.42.42:4242 43.43.43.43:4343 tcp

Applied to a logical switch:
  ovn-nbctl ls-lb-add sw0 lb-test

And to a logical router:
  ovn-nbctl lr-lb-add lr0 lb-test

The load balancer backend is selected for a new connection by using the
following logical flows:

  uuid=0xbb033afb, table=11(ls_in_lb           ), priority=120  , match=(ct.new && ip4.dst == 42.42.42.42 && tcp.dst == 4242), action=(reg0[1] = 0; ct_lb_mark(backends=43.43.43.43:4343);)
  uuid=0xdeacd560, table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip4 && reg0 == 42.42.42.42 && tcp && reg9[16..31] == 4242), action=(ct_lb_mark(backends=43.43.43.43:4343);)

Action ct_lb_mark translates to the following OF flow/group:

  # For the switch:
  cookie=0xbb033afb, duration=346.664s, table=19, n_packets=0, n_bytes=0, idle_age=346, priority=120,ct_state=+new+trk,tcp,metadata=0x1,nw_dst=42.42.42.42,tp_dst=4242 actions=load:0->NXM_NX_XXREG0[97],group:1
  group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=43.43.43.43:4343),exec(load:0x1->NXM_NX_CT_MARK[1]))

  # For the router:
  cookie=0xdeacd560, duration=284.447s, table=14, n_packets=0, n_bytes=0, idle_age=284, priority=120,ct_state=+new+trk,tcp,reg0=0x2a2a2a2a,reg9=0x10920000/0xffff0000,metadata=0x3 actions=group:2
  group_id=2,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=43.43.43.43:4343),exec(load:0x1->NXM_NX_CT_MARK[1]))

In order to support affinity, if the load balancer is configured to
implement that, we could add a new logical action, e.g., apply_lb_affinity()
and add logical flows in the stage immediately preceding ls_in_lb and
lr_in_dnat that use this action and store its result in a register bit,
e.g. "reg0[16]".

Logically, the action would return true (1) if the packet matches an
existing session from the same client IP to the load balancer with
session affinity configured.  If so, then the backend to be used should
be the one already used by the session with affinity configured.

To implement this we could use an additional OF table, e.g.,
OFTABLE_CHK_LB_AFFINITY, similar to OFTABLE_CHK_LB_HAIRPIN/
OFTABLE_CHK_IN_PORT_SEC/etc.  For load balancers with affinity configured
we could change the group buckets associated to its backends and add
action learn to insert rows in this new OF table.  These rows could
match on:

  ct.new && ip.src == <pkt.src_ip> && ip.dst == <lb.bip> && proto == <lb.proto> && l4.dst == <lb.port> then set reg0[16] to 1 and "load group-id and bucket-id into some registers"

This flows timeout would be set to the load balancer's affinity timeout
configuration.

A new (higher priority) needs to be added to tables ls_in_lb and
lr_in_dnat to check on the result of the apply_lb_affinity() action and,
if this is set to true (1) then bypass regular load balancing and use
the learnt backent.

Areas that need extra care:
- the learn action flows will likely be generated by ovn-controller
  which means that we need to change the SB.Load_Balancer table and
  also insert there rows that correspond to load balancers applied to
  routers (we currently only add rows that correspond to LBs
  applied to switches).
- we need to ensure that the datapath flows (both when session affinity
  is hit or not) are still generic enough and don't generate unnecessary
  upcalls.

Comment 1 Dumitru Ceara 2022-09-19 13:13:28 UTC
Some context on the Kubernetes services requirements.  From https://kubernetes.io/docs/concepts/services-networking/service/ :

  If you want to make sure that connections from a particular client are
  passed to the same Pod each time, you can select the session affinity
  based on the client's IP addresses by setting service.spec.sessionAffinity
  to "ClientIP" (the default is "None"). You can also set the maximum
  session sticky time by setting
  service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
  appropriately. (the default value is 10800, which works out to be 3
  hours).

Comment 4 lorenzo bianconi 2022-10-04 13:35:40 UTC
upstream series: https://patchwork.ozlabs.org/project/ovn/list/?series=321324

Comment 9 errata-xmlrpc 2023-08-21 02:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.12 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4677


Note You need to log in before you can comment on or make changes to this bug.