Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2127959

Summary:	[RFE] Add support for load balancer backend affinity (with timeout)
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	Dumitru Ceara <dceara>
Component:	ovn22.12	Assignee:	lorenzo bianconi <lorenzo.bianconi>
Status:	CLOSED ERRATA	QA Contact:	ying xu <yinxu>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	FDP 22.E	CC:	ctrautma, dcbw, jiji, jishi, lorenzo.bianconi, mmichels, trozet
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-08-21 02:08:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dumitru Ceara 2022-09-19 13:09:17 UTC

Description of problem:

Currently OVN load balancing works by selecting a backend (DNAT IP)
according to a hash function (by default 5-tuple but can also be changed
to other hash functions, e.g., src-ip+dst-ip+proto).

Some CMS implementations might require more customization to the load
balancing algorithm.

For example, Kubernetes services have an option to select affinity based
on the client's IP address.  This translates to always selecting the
same backend if the same client IP is used to access a service (a load
balancer VIP).  ovn-kubernetes implements this by overriding the default
hash function of the load balancer and using the 3-tuple: src + dst-ip + proto.

On top of this, kubernetes also allows users to specify a timeout for
the "sticky" backend described above.  This BZ tracks adding native
support for load balancer backend affinity (with a timeout) in OVN.

A potential implementation is described below (this is likely not
the only option and it might also not be the most efficient):

Assuming a load balancer:
  ovn-nbctl lb-add lb-test 42.42.42.42:4242 43.43.43.43:4343 tcp

Applied to a logical switch:
  ovn-nbctl ls-lb-add sw0 lb-test

And to a logical router:
  ovn-nbctl lr-lb-add lr0 lb-test

The load balancer backend is selected for a new connection by using the
following logical flows:

  uuid=0xbb033afb, table=11(ls_in_lb           ), priority=120  , match=(ct.new && ip4.dst == 42.42.42.42 && tcp.dst == 4242), action=(reg0[1] = 0; ct_lb_mark(backends=43.43.43.43:4343);)
  uuid=0xdeacd560, table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip4 && reg0 == 42.42.42.42 && tcp && reg9[16..31] == 4242), action=(ct_lb_mark(backends=43.43.43.43:4343);)

Action ct_lb_mark translates to the following OF flow/group:

  # For the switch:
  cookie=0xbb033afb, duration=346.664s, table=19, n_packets=0, n_bytes=0, idle_age=346, priority=120,ct_state=+new+trk,tcp,metadata=0x1,nw_dst=42.42.42.42,tp_dst=4242 actions=load:0->NXM_NX_XXREG0[97],group:1
  group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=43.43.43.43:4343),exec(load:0x1->NXM_NX_CT_MARK[1]))

  # For the router:
  cookie=0xdeacd560, duration=284.447s, table=14, n_packets=0, n_bytes=0, idle_age=284, priority=120,ct_state=+new+trk,tcp,reg0=0x2a2a2a2a,reg9=0x10920000/0xffff0000,metadata=0x3 actions=group:2
  group_id=2,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=43.43.43.43:4343),exec(load:0x1->NXM_NX_CT_MARK[1]))

In order to support affinity, if the load balancer is configured to
implement that, we could add a new logical action, e.g., apply_lb_affinity()
and add logical flows in the stage immediately preceding ls_in_lb and
lr_in_dnat that use this action and store its result in a register bit,
e.g. "reg0[16]".

Logically, the action would return true (1) if the packet matches an
existing session from the same client IP to the load balancer with
session affinity configured.  If so, then the backend to be used should
be the one already used by the session with affinity configured.

To implement this we could use an additional OF table, e.g.,
OFTABLE_CHK_LB_AFFINITY, similar to OFTABLE_CHK_LB_HAIRPIN/
OFTABLE_CHK_IN_PORT_SEC/etc.  For load balancers with affinity configured
we could change the group buckets associated to its backends and add
action learn to insert rows in this new OF table.  These rows could
match on:

  ct.new && ip.src == <pkt.src_ip> && ip.dst == <lb.bip> && proto == <lb.proto> && l4.dst == <lb.port> then set reg0[16] to 1 and "load group-id and bucket-id into some registers"

This flows timeout would be set to the load balancer's affinity timeout
configuration.

A new (higher priority) needs to be added to tables ls_in_lb and
lr_in_dnat to check on the result of the apply_lb_affinity() action and,
if this is set to true (1) then bypass regular load balancing and use
the learnt backent.

Areas that need extra care:
- the learn action flows will likely be generated by ovn-controller
  which means that we need to change the SB.Load_Balancer table and
  also insert there rows that correspond to load balancers applied to
  routers (we currently only add rows that correspond to LBs
  applied to switches).
- we need to ensure that the datapath flows (both when session affinity
  is hit or not) are still generic enough and don't generate unnecessary
  upcalls.

Comment 1 Dumitru Ceara 2022-09-19 13:13:28 UTC

Some context on the Kubernetes services requirements.  From https://kubernetes.io/docs/concepts/services-networking/service/ :

  If you want to make sure that connections from a particular client are
  passed to the same Pod each time, you can select the session affinity
  based on the client's IP addresses by setting service.spec.sessionAffinity
  to "ClientIP" (the default is "None"). You can also set the maximum
  session sticky time by setting
  service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
  appropriately. (the default value is 10800, which works out to be 3
  hours).

Comment 4 lorenzo bianconi 2022-10-04 13:35:40 UTC

upstream series: https://patchwork.ozlabs.org/project/ovn/list/?series=321324

Comment 7 ying xu 2023-07-28 08:18:13 UTC

this can be verified by case:#https://bugzilla.redhat.com/show_bug.cgi?id=2150533:nat/lb_force_snat_ip

https://beaker.engineering.redhat.com/recipes/14222802/tasks/162837788/logs/taskout.log
https://beaker.engineering.redhat.com/recipes/14222803/tasks/162837808/logs/taskout.log


set verified.

verified on version:ovn22.12-22.12.0-108.el9fdp

Comment 9 errata-xmlrpc 2023-08-21 02:08:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.12 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4677