Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1908553

Summary: LDAP replication halt , nstate and large offsets, CSN poisoning example
Product: Red Hat Enterprise Linux 7 Reporter: Marc Sauton <msauton>
Component: 389-ds-baseAssignee: LDAP Maintainers <ldap-maint>
Status: CLOSED DUPLICATE QA Contact: RHDS QE <ds-qe-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.7CC: bsmejkal, ldap-maint, mreynolds, tbordaz
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-01 19:51:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marc Sauton 2020-12-17 02:02:38 UTC
Description of problem:

this is a report to track another example of IPA LDAP replication halt with "CSN poisoning"

I will speculate the non proven root cause may be:
there was a LDAP change processed for replication while either the system time in unstable or the system time is changed on one or more replica.


Version-Release number of selected component (if applicable):

389-ds-base-1.3.9.1-12.el7_7.x86_64
ipa-server-4.6.5-11.el7_7.4.x86_64
redhat-release-server-7.7-10.el7.x86_64


How reproducible:
N/A

Steps to Reproduce:
1. N/A
2.
3.


Actual results:

IPA LDAP replication halt

the errors logs are full of events like this sample:

[09/Dec/2020:21:47:06.908139155 +0000] - ERR - agmt="cn=edited-host1-to-edited-host2" (edited-host2:389) - clcache_load_buffer - Can't locate CSN ffffffffe05601dc0000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.


Expected results:
yes


Additional info:

the ugly facts:

giant offsets in several replication agreement nsstate, edited example:

nsState is 3AEAAAAAAACvNdFfAAAAAAAAAAAAAAAAMapQowAAAAACAAAAAAAAAA==
Little Endian
For replica cn=replica,cn=dc\3Dedited\2Cdc\3Dedited,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 476
    Sampled Time  : 1607546287
    Gen as csn    : 5fd135af000204760000
    Time as str   : Wed Dec  9 20:38:07 2020
    Local Offset  : 0
    Remote Offset : 2739972657
    Seq. num      : 2
    System time   : Thu Dec 17 01:18:32 2020
    Diff in sec.  : 621625
    Day:sec diff  : 7:16825

nsState is qQsAAAAAAADujzJlAAAAAJe0EwAAAAAAhUwBAAAAAABcTwAAAAAAAA==
Little Endian
For replica cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 2985
    Sampled Time  : 1697812462
    Gen as csn    : 65328fee2031629850000
    Time as str   : Fri Oct 20 14:34:22 2023
    Local Offset  : 1291415
    Remote Offset : 85125
    Seq. num      : 20316
    System time   : Thu Dec 17 01:18:32 2020
    Diff in sec.  : -89644550
    Day:sec diff  : -1038:38650


the RUV in some of the replication agreements have non valid CSNs, edited sample:

nsds50ruv: {replicageneration} 00278c29000000030000
nsds50ruv: {replica 1 ldap://edited-host1:389} 0000002d000001f50000 03201ac3000001f50000
nsds50ruv: {replica 2 ldap://edited-host2:389} 5b49720c000001db0000 03206fa0000001db0000
nsds50ruv: {replica 3 ldap://edited-host3:389} 5b496c72000001da0000 03206edf000001da0000
nsds50ruv: {replica 4 ldap://edited-host4:389} 5b4a1fd6000001e10000 5caf4aeb000501e10000
nsds50ruv: {replica 5 ldap://edited-host5:389} 5b49675f000001d90000 0320696d000001d90000
nsds50ruv: {replica 6 ldap://edited-host6:389} 5b4a6558000801eb0000 5caf500bdf4801eb0000
nsds50ruv: {replica 7 ldap://edited-host7:389} 00000027000002020000 02f3a7ec000102020000
nsds50ruv: {replica 8 ldap://edited-host8:389} 00000029000002070000 00ddbb78000002070000
nsds50ruv: {replica 9 ldap://edited-host9:389} 0000001c0000020e0000 00c1d7300003020e0000
nsds50ruv: {replica 10 ldap://edited-host10:389} 5ca3804d000001f10000 03206715000001f10000
nsds50ruv: {replica 11 ldap://edited-host11:389} 5b4a68c8000101ec0000 5caf561a000401ec0000
nsds50ruv: {replica 12 ldap://edited-host12:389} 5b497b17000101dc0000 ffffffffe05601dc0000

Comment 9 thierry bordaz 2021-12-08 16:21:04 UTC
POssibly fixed by Issue 4943 - Fix csn generator to limit time skew drift (#4946)

Comment 10 mreynolds 2022-08-01 19:51:21 UTC
This was potentially fixed in two bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=2049812 ---> fixed in 389-ds-base-1.3.10.2-15.el7_9

Not fixed in RHEL 7.9 yet:

https://bugzilla.redhat.com/show_bug.cgi?id=2113056 ---> Import may break replication because changelog starting csn may not be created 

A hotfix could be provided using this commit to see if it helps the issue:  https://github.com/389ds/389-ds-base/commit/2e4625fc533011a4214408612eb93eeb66a4ddb0

Since there is the 7.9 bug listed above, and the customer case is closed I am going to close this bug as a duplicate of BZ#2113056.

*** This bug has been marked as a duplicate of bug 2113056 ***