Bug 1908553 - LDAP replication halt , nstate and large offsets, CSN poisoning example
Summary: LDAP replication halt , nstate and large offsets, CSN poisoning example
Keywords:
Status: CLOSED DUPLICATE of bug 2113056
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.7
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: LDAP Maintainers
QA Contact: RHDS QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-17 02:02 UTC by Marc Sauton
Modified: 2023-07-04 17:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-01 19:51:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marc Sauton 2020-12-17 02:02:38 UTC
Description of problem:

this is a report to track another example of IPA LDAP replication halt with "CSN poisoning"

I will speculate the non proven root cause may be:
there was a LDAP change processed for replication while either the system time in unstable or the system time is changed on one or more replica.


Version-Release number of selected component (if applicable):

389-ds-base-1.3.9.1-12.el7_7.x86_64
ipa-server-4.6.5-11.el7_7.4.x86_64
redhat-release-server-7.7-10.el7.x86_64


How reproducible:
N/A

Steps to Reproduce:
1. N/A
2.
3.


Actual results:

IPA LDAP replication halt

the errors logs are full of events like this sample:

[09/Dec/2020:21:47:06.908139155 +0000] - ERR - agmt="cn=edited-host1-to-edited-host2" (edited-host2:389) - clcache_load_buffer - Can't locate CSN ffffffffe05601dc0000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.


Expected results:
yes


Additional info:

the ugly facts:

giant offsets in several replication agreement nsstate, edited example:

nsState is 3AEAAAAAAACvNdFfAAAAAAAAAAAAAAAAMapQowAAAAACAAAAAAAAAA==
Little Endian
For replica cn=replica,cn=dc\3Dedited\2Cdc\3Dedited,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 476
    Sampled Time  : 1607546287
    Gen as csn    : 5fd135af000204760000
    Time as str   : Wed Dec  9 20:38:07 2020
    Local Offset  : 0
    Remote Offset : 2739972657
    Seq. num      : 2
    System time   : Thu Dec 17 01:18:32 2020
    Diff in sec.  : 621625
    Day:sec diff  : 7:16825

nsState is qQsAAAAAAADujzJlAAAAAJe0EwAAAAAAhUwBAAAAAABcTwAAAAAAAA==
Little Endian
For replica cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 2985
    Sampled Time  : 1697812462
    Gen as csn    : 65328fee2031629850000
    Time as str   : Fri Oct 20 14:34:22 2023
    Local Offset  : 1291415
    Remote Offset : 85125
    Seq. num      : 20316
    System time   : Thu Dec 17 01:18:32 2020
    Diff in sec.  : -89644550
    Day:sec diff  : -1038:38650


the RUV in some of the replication agreements have non valid CSNs, edited sample:

nsds50ruv: {replicageneration} 00278c29000000030000
nsds50ruv: {replica 1 ldap://edited-host1:389} 0000002d000001f50000 03201ac3000001f50000
nsds50ruv: {replica 2 ldap://edited-host2:389} 5b49720c000001db0000 03206fa0000001db0000
nsds50ruv: {replica 3 ldap://edited-host3:389} 5b496c72000001da0000 03206edf000001da0000
nsds50ruv: {replica 4 ldap://edited-host4:389} 5b4a1fd6000001e10000 5caf4aeb000501e10000
nsds50ruv: {replica 5 ldap://edited-host5:389} 5b49675f000001d90000 0320696d000001d90000
nsds50ruv: {replica 6 ldap://edited-host6:389} 5b4a6558000801eb0000 5caf500bdf4801eb0000
nsds50ruv: {replica 7 ldap://edited-host7:389} 00000027000002020000 02f3a7ec000102020000
nsds50ruv: {replica 8 ldap://edited-host8:389} 00000029000002070000 00ddbb78000002070000
nsds50ruv: {replica 9 ldap://edited-host9:389} 0000001c0000020e0000 00c1d7300003020e0000
nsds50ruv: {replica 10 ldap://edited-host10:389} 5ca3804d000001f10000 03206715000001f10000
nsds50ruv: {replica 11 ldap://edited-host11:389} 5b4a68c8000101ec0000 5caf561a000401ec0000
nsds50ruv: {replica 12 ldap://edited-host12:389} 5b497b17000101dc0000 ffffffffe05601dc0000

Comment 9 thierry bordaz 2021-12-08 16:21:04 UTC
POssibly fixed by Issue 4943 - Fix csn generator to limit time skew drift (#4946)

Comment 10 mreynolds 2022-08-01 19:51:21 UTC
This was potentially fixed in two bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=2049812 ---> fixed in 389-ds-base-1.3.10.2-15.el7_9

Not fixed in RHEL 7.9 yet:

https://bugzilla.redhat.com/show_bug.cgi?id=2113056 ---> Import may break replication because changelog starting csn may not be created 

A hotfix could be provided using this commit to see if it helps the issue:  https://github.com/389ds/389-ds-base/commit/2e4625fc533011a4214408612eb93eeb66a4ddb0

Since there is the 7.9 bug listed above, and the customer case is closed I am going to close this bug as a duplicate of BZ#2113056.

*** This bug has been marked as a duplicate of bug 2113056 ***


Note You need to log in before you can comment on or make changes to this bug.