Bug 2112361

Summary: Supplier should do periodic update to avoid slow replication when a new direct update happen
Product: Red Hat Enterprise Linux 9 Reporter: thierry bordaz <tbordaz>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED ERRATA QA Contact: LDAP QA Team <idm-ds-qe-bugs>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.0CC: bsmejkal, idm-ds-dev-bugs, mreynolds, pasik, tmihinto
Target Milestone: rcKeywords: TestCaseProvided, Triaged
Target Release: 9.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sync-to-jira
Fixed In Version: 389-ds-base-2.2.4-3.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:41:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description thierry bordaz 2022-07-29 13:34:12 UTC
Description of problem:
In a replicated topology, if a supplier has not received any direct update for a long time (for example 3 months). When this supplier receive a direct update, its RUVelement will be selected for each replication session (lowest csn) and the session will start by scanning 3 months of updates before finding something to send.

The consequences are 
- the replication session will last a long time preventing others suppliers to send their own updates
- the replication session will be inefficient, sending few updates in a long delay
- Showing an false alarming replication lag with RUV comparison tools (a 3 months lag where actually this is a single update missing).
- risk of apparent replication breakage if the consumer closes the inactive replication connection (idletimeout or nsidletimeout)

Version-Release number of selected component (if applicable):
All version


How reproducible:


Steps to Reproduce:
1. 4 suppliers
2. configure idletimeout=15s
2. one update on all suppliers
3. multiple updates (200K) on 3 out 4 suppliers. One supplier without any update.
3. check all suppliers are in sync
4. update the last supplier.
5. wait 1min
6. do a single update on the 3 others


Actual results:
Likely the updates (6) are not replicated or with a long delay


Expected results:
replication should get in sync in few seconds.

Additional info:

Comment 1 thierry bordaz 2022-07-29 13:44:18 UTC
An option is that each replica registers a slapi_eq_repeat task, that will do a dummy update on keep alive entry.
It should use a non indexed attribute (description ?). something like

dn: cn=keep alive 1, cn=<suffix>
objetclass:..
description: touch at 2022-07-29 13:34:19

Comment 2 mreynolds 2022-08-02 19:36:48 UTC
Upstream ticket:

https://github.com/389ds/389-ds-base/issues/3903

Comment 3 thierry bordaz 2022-08-03 13:16:20 UTC
Note that the upstream ticket is replication agreement focus. I think it could rather be replica oriented, like tombstone trimming thread (eq_cb_reap_tombstones).

Comment 10 bsmejkal 2023-01-19 09:33:03 UTC
As per comment #c8 marking as VERIFIED.

Comment 12 errata-xmlrpc 2023-05-09 07:41:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2274