Bug 2057059 - Import may break replication because changelog starting csn may not be created
Summary: Import may break replication because changelog starting csn may not be created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Directory Server
Classification: Red Hat
Component: 389-ds-base
Version: 12.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: DS12.0
: dirsrv-12.0
Assignee: Pierre Rogier
QA Contact: RHDS QE
Zuzana Zoubkova
URL:
Whiteboard: sync-to-jira
Depends On: 2057056 2057058 2113056
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-22 16:25 UTC by mreynolds
Modified: 2022-08-01 19:49 UTC (History)
6 users (show)

Fixed In Version: redhat-ds-12-9000020220323222240.1674d574
Doc Type: Bug Fix
Doc Text:
.Import from an LDIF file with replication metadata now works correctly Previously, importing an LDIF file with replication metadata could cause the replication to fail in certain cases: In the first case, a replication update vector (RUV) entry placed before the suffix entry in an imported LDIF file was ignored. As a consequence, the replication with the imported replica failed, because of a generation ID mismatch. This update ensures that Directory Server writes the skipped RUV entry at the end of the import. In the second case, a changelog reinitialized after an RUV mismatch did not contain the starting change sequence numbers (CSNs). As a consequence, the replication with the imported replica failed, because of a missing CSN in the changelog. This update ensures that Directory Server creates the RUV `maxcsn` entries, when reinitializing the changelog. As a result, with this update, administrators do not have to reinitialize the replication after importing from an LDIF file that contains replication metadata.
Clone Of: 2057058
Environment:
Last Closed: 2022-05-18 15:28:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker IDMDS-2027 0 None None None 2022-02-22 16:40:18 UTC
Red Hat Issue Tracker IDMDS-2048 0 None None None 2022-02-28 10:55:42 UTC
Red Hat Issue Tracker IDMDS-2134 0 None None None 2022-03-23 21:02:35 UTC
Red Hat Product Errata RHEA-2022:4664 0 None None None 2022-05-18 15:28:18 UTC

Description mreynolds 2022-02-22 16:25:15 UTC
+++ This bug was initially created as a clone of Bug #2057058 +++

+++ This bug was initially created as a clone of Bug #2057056 +++

Description of problem:

Import and bulk import may break the replication (RUV may be lost - and changelog starting csn may not be created.)

Investigation of CI test failure when working on issue 4939 (LMDB import redesign) showed that the problem was not because of lmdb import code but that there were several issue around:

1. The test itself. ( Should not have to reinitialize the replication after the import (i.e: replication should not be broken by the import)

2. The bdb import ( RUV entry is skipped)

3. The changelog initialization code (if changelog is empty either the starting csn of the database RUV should be updated or dummy changes about the max csn(s) should be created

4. Replication should not be broken when reimporting a master from an ldif including replication data.
so IMHO a step is missing in the CI test (tocheck that replication is still working after the import and before the total update
(Total update may still be needed because originally the hang that leaded to create the bug was occurring in the total update)

5. While running the test on supplier1 in bdb mode:

[07/Jan/2022:22:34:49.975351682 +0100] - WARN - bdb_import_foreman - import userRoot: Skipping entry "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=example,dc=com" which has no parent, ending at line 15 of file "/home/progier/sb/i4939/tst/ci-install/var/lib/dirsrv/slapd-supplier1/ldif/supplier1.ldif"
...
[07/Jan/2022:22:34:50.647405077 +0100] - INFO - bdb_public_bdb_import_main - import userRoot: Import complete. Processed 20 entries (1 were skipped) in 1 seconds. (20.00 entries/sec)
...
[07/Jan/2022:22:34:55.111294779 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replicageneration} 61d8b1e2000000010000
[07/Jan/2022:22:34:55.114265534 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replica 1} 61d8b1e2000100010000 61d8b1e9000100010000 00000000
[07/Jan/2022:22:34:55.117366657 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replica 2} 61d8b1e6000100020000 61d8b1ea000200020000 00000000
[07/Jan/2022:22:34:55.120676337 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replicageneration} 61d8b1ff000000010000
[07/Jan/2022:22:34:55.123860710 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 1 ldap://linux.home:39001}
==> So the database RUV was recreated empty (with another generation number by the import)
which is clearly wrong as it breaks the replication.
3) while running the test on supplier1 in lmdb mode:
[07/Jan/2022:22:35:58.666675869 +0100] - INFO - dbmdb_public_dbmdb_import_main - import userRoot: Import complete. Processed 20 entries in 1 seconds. (20.00 entries/sec)
...
[07/Jan/2022:22:36:02.388680247 +0100] - DEBUG - NSMMReplicationPlugin - changelog program - cldb_SetReplicaDB: cldb is set
[07/Jan/2022:22:36:02.430453701 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replicageneration} 61d8b229000000010000
[07/Jan/2022:22:36:02.433352533 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replicageneration} 61d8b229000000010000
[07/Jan/2022:22:36:02.436294638 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 1 ldap://linux.home:39001} 61d8b22a000000010000 61d8b231000100010000 00000000
[07/Jan/2022:22:36:02.439217513 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 2 ldap://linux.home:39002} 61d8b22e000100020000 61d8b232000100020000 00000000
...
[07/Jan/2022:22:36:13.797250968 +0100] - ERR - agmt="cn=002" (linux:39002) - clcache_load_buffer - Can't locate CSN 61d8b231000100010000 in the changelog (DB rc=-12797). If replication stops, the consumer may need to be reinitialized.
[07/Jan/2022:22:36:13.800517424 +0100] - DEBUG - agmt="cn=002" (linux:39002) - clcache_load_buffer - rc=-12797
[07/Jan/2022:22:36:13.803327610 +0100] - DEBUG - NSMMReplicationPlugin - changelog program - _cl5CheckMissingCSN - the change with 61d8b231000100010000 csn was never logged because it was imported during replica initialization


==> This time the database RUV is rightly preserved but replication fails because the supplier1 maxcsn is not in the changelog.  (either the mincsn should have the same value to ignore the error (to be confirmed) or some dummy changelog records should exists in the changelog)



Upstream ticket:

https://github.com/389ds/389-ds-base/issues/5098

Comment 4 sgouvern 2022-04-15 13:39:53 UTC
With 389-ds-base-2.0.14-7.module+el9dsrv+14845+69b2f526.x86_64

[root@ci-vm-10-0-136-100 389-ds-base]# py.test -v dirsrvtests/tests/suites/replication/regression_m2_test.py::test_online_reinit_may_hang
============================================================ test session starts =============================================================
platform linux -- Python 3.9.10, pytest-7.1.1, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
389-ds-base: 2.0.14-7.module+el9dsrv+14845+69b2f526
nss: 3.71.0-7.el9
nspr: 4.32.0-9.el9
openldap: 2.4.59-4.el9
cyrus-sasl: not installed
FIPS: disabled
rootdir: /mnt/tests/rhds/install/389-ds-base/dirsrvtests, configfile: pytest.ini
collected 1 item                                                                                                                             

dirsrvtests/tests/suites/replication/regression_m2_test.py::test_online_reinit_may_hang PASSED                                         [100%]

======================================================= 1 passed, 5 warnings in 46.68s =======================================================

Marking as VERIFIED

Comment 9 errata-xmlrpc 2022-05-18 15:28:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (redhat-ds:12 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:4664


Note You need to log in before you can comment on or make changes to this bug.