Bug 2112361

Summary:	Supplier should do periodic update to avoid slow replication when a new direct update happen
Product:	Red Hat Enterprise Linux 9	Reporter:	thierry bordaz <tbordaz>
Component:	389-ds-base	Assignee:	mreynolds
Status:	CLOSED ERRATA	QA Contact:	LDAP QA Team <idm-ds-qe-bugs>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	9.0	CC:	bsmejkal, idm-ds-dev-bugs, mreynolds, pasik, tmihinto
Target Milestone:	rc	Keywords:	TestCaseProvided, Triaged
Target Release:	9.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	sync-to-jira
Fixed In Version:	389-ds-base-2.2.4-3.el9	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-05-09 07:41:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description thierry bordaz 2022-07-29 13:34:12 UTC

Description of problem:
In a replicated topology, if a supplier has not received any direct update for a long time (for example 3 months). When this supplier receive a direct update, its RUVelement will be selected for each replication session (lowest csn) and the session will start by scanning 3 months of updates before finding something to send.

The consequences are 
- the replication session will last a long time preventing others suppliers to send their own updates
- the replication session will be inefficient, sending few updates in a long delay
- Showing an false alarming replication lag with RUV comparison tools (a 3 months lag where actually this is a single update missing).
- risk of apparent replication breakage if the consumer closes the inactive replication connection (idletimeout or nsidletimeout)

Version-Release number of selected component (if applicable):
All version


How reproducible:


Steps to Reproduce:
1. 4 suppliers
2. configure idletimeout=15s
2. one update on all suppliers
3. multiple updates (200K) on 3 out 4 suppliers. One supplier without any update.
3. check all suppliers are in sync
4. update the last supplier.
5. wait 1min
6. do a single update on the 3 others


Actual results:
Likely the updates (6) are not replicated or with a long delay


Expected results:
replication should get in sync in few seconds.

Additional info:

Comment 1 thierry bordaz 2022-07-29 13:44:18 UTC

An option is that each replica registers a slapi_eq_repeat task, that will do a dummy update on keep alive entry.
It should use a non indexed attribute (description ?). something like

dn: cn=keep alive 1, cn=<suffix>
objetclass:..
description: touch at 2022-07-29 13:34:19

Comment 2 mreynolds 2022-08-02 19:36:48 UTC

Upstream ticket:

https://github.com/389ds/389-ds-base/issues/3903

Comment 3 thierry bordaz 2022-08-03 13:16:20 UTC

Note that the upstream ticket is replication agreement focus. I think it could rather be replica oriented, like tombstone trimming thread (eq_cb_reap_tombstones).

Comment 10 bsmejkal 2023-01-19 09:33:03 UTC

As per comment #c8 marking as VERIFIED.

Comment 12 errata-xmlrpc 2023-05-09 07:41:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2274