Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause:
[1] RUV entry placed before the suffix entry in an import ldif file is ignored.
[2] Changelog reinitialized after a RUV missmatch does not contains the starting CSNs.
Consequence:
Replication with the replica that have been imported is broken:
because a generation Id mismatch in case [1]
because a csn cannot be found in the changelog in case [2]
Fix:
[1] Insure that the skipped RUV get written at the end of the import.
[2] Insure that the RUV maxcsn entries get created when
reinitializing the changelog.
Result: No need to reinitialize the replication after an import (if the import file contains the replication metadata)
Description of problem:
Import and bulk import may break the replication (RUV may be lost - and changelog starting csn may not be created.)
Investigation of CI test failure when working on issue 4939 (LMDB import redesign) showed that the problem was not because of lmdb import code but that there were several issue around:
1. The test itself. ( Should not have to reinitialize the replication after the import (i.e: replication should not be broken by the import)
2. The bdb import ( RUV entry is skipped)
3. The changelog initialization code (if changelog is empty either the starting csn of the database RUV should be updated or dummy changes about the max csn(s) should be created
4. Replication should not be broken when reimporting a master from an ldif including replication data.
so IMHO a step is missing in the CI test (tocheck that replication is still working after the import and before the total update
(Total update may still be needed because originally the hang that leaded to create the bug was occurring in the total update)
5. While running the test on supplier1 in bdb mode:
[07/Jan/2022:22:34:49.975351682 +0100] - WARN - bdb_import_foreman - import userRoot: Skipping entry "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=example,dc=com" which has no parent, ending at line 15 of file "/home/progier/sb/i4939/tst/ci-install/var/lib/dirsrv/slapd-supplier1/ldif/supplier1.ldif"
...
[07/Jan/2022:22:34:50.647405077 +0100] - INFO - bdb_public_bdb_import_main - import userRoot: Import complete. Processed 20 entries (1 were skipped) in 1 seconds. (20.00 entries/sec)
...
[07/Jan/2022:22:34:55.111294779 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replicageneration} 61d8b1e2000000010000
[07/Jan/2022:22:34:55.114265534 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replica 1} 61d8b1e2000100010000 61d8b1e9000100010000 00000000
[07/Jan/2022:22:34:55.117366657 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replica 2} 61d8b1e6000100020000 61d8b1ea000200020000 00000000
[07/Jan/2022:22:34:55.120676337 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replicageneration} 61d8b1ff000000010000
[07/Jan/2022:22:34:55.123860710 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 1 ldap://linux.home:39001}
==> So the database RUV was recreated empty (with another generation number by the import)
which is clearly wrong as it breaks the replication.
3) while running the test on supplier1 in lmdb mode:
[07/Jan/2022:22:35:58.666675869 +0100] - INFO - dbmdb_public_dbmdb_import_main - import userRoot: Import complete. Processed 20 entries in 1 seconds. (20.00 entries/sec)
...
[07/Jan/2022:22:36:02.388680247 +0100] - DEBUG - NSMMReplicationPlugin - changelog program - cldb_SetReplicaDB: cldb is set
[07/Jan/2022:22:36:02.430453701 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replicageneration} 61d8b229000000010000
[07/Jan/2022:22:36:02.433352533 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replicageneration} 61d8b229000000010000
[07/Jan/2022:22:36:02.436294638 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 1 ldap://linux.home:39001} 61d8b22a000000010000 61d8b231000100010000 00000000
[07/Jan/2022:22:36:02.439217513 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 2 ldap://linux.home:39002} 61d8b22e000100020000 61d8b232000100020000 00000000
...
[07/Jan/2022:22:36:13.797250968 +0100] - ERR - agmt="cn=002" (linux:39002) - clcache_load_buffer - Can't locate CSN 61d8b231000100010000 in the changelog (DB rc=-12797). If replication stops, the consumer may need to be reinitialized.
[07/Jan/2022:22:36:13.800517424 +0100] - DEBUG - agmt="cn=002" (linux:39002) - clcache_load_buffer - rc=-12797
[07/Jan/2022:22:36:13.803327610 +0100] - DEBUG - NSMMReplicationPlugin - changelog program - _cl5CheckMissingCSN - the change with 61d8b231000100010000 csn was never logged because it was imported during replica initialization
==> This time the database RUV is rightly preserved but replication fails because the supplier1 maxcsn is not in the changelog. (either the mincsn should have the same value to ignore the error (to be confirmed) or some dummy changelog records should exists in the changelog)
Upstream ticket:
https://github.com/389ds/389-ds-base/issues/5098
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: 389-ds-base security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2022:8162
Description of problem: Import and bulk import may break the replication (RUV may be lost - and changelog starting csn may not be created.) Investigation of CI test failure when working on issue 4939 (LMDB import redesign) showed that the problem was not because of lmdb import code but that there were several issue around: 1. The test itself. ( Should not have to reinitialize the replication after the import (i.e: replication should not be broken by the import) 2. The bdb import ( RUV entry is skipped) 3. The changelog initialization code (if changelog is empty either the starting csn of the database RUV should be updated or dummy changes about the max csn(s) should be created 4. Replication should not be broken when reimporting a master from an ldif including replication data. so IMHO a step is missing in the CI test (tocheck that replication is still working after the import and before the total update (Total update may still be needed because originally the hang that leaded to create the bug was occurring in the total update) 5. While running the test on supplier1 in bdb mode: [07/Jan/2022:22:34:49.975351682 +0100] - WARN - bdb_import_foreman - import userRoot: Skipping entry "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=example,dc=com" which has no parent, ending at line 15 of file "/home/progier/sb/i4939/tst/ci-install/var/lib/dirsrv/slapd-supplier1/ldif/supplier1.ldif" ... [07/Jan/2022:22:34:50.647405077 +0100] - INFO - bdb_public_bdb_import_main - import userRoot: Import complete. Processed 20 entries (1 were skipped) in 1 seconds. (20.00 entries/sec) ... [07/Jan/2022:22:34:55.111294779 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replicageneration} 61d8b1e2000000010000 [07/Jan/2022:22:34:55.114265534 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replica 1} 61d8b1e2000100010000 61d8b1e9000100010000 00000000 [07/Jan/2022:22:34:55.117366657 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replica 2} 61d8b1e6000100020000 61d8b1ea000200020000 00000000 [07/Jan/2022:22:34:55.120676337 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replicageneration} 61d8b1ff000000010000 [07/Jan/2022:22:34:55.123860710 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 1 ldap://linux.home:39001} ==> So the database RUV was recreated empty (with another generation number by the import) which is clearly wrong as it breaks the replication. 3) while running the test on supplier1 in lmdb mode: [07/Jan/2022:22:35:58.666675869 +0100] - INFO - dbmdb_public_dbmdb_import_main - import userRoot: Import complete. Processed 20 entries in 1 seconds. (20.00 entries/sec) ... [07/Jan/2022:22:36:02.388680247 +0100] - DEBUG - NSMMReplicationPlugin - changelog program - cldb_SetReplicaDB: cldb is set [07/Jan/2022:22:36:02.430453701 +0100] - DEBUG - NSMMReplicationPlugin - changelog max RUV: {replicageneration} 61d8b229000000010000 [07/Jan/2022:22:36:02.433352533 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replicageneration} 61d8b229000000010000 [07/Jan/2022:22:36:02.436294638 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 1 ldap://linux.home:39001} 61d8b22a000000010000 61d8b231000100010000 00000000 [07/Jan/2022:22:36:02.439217513 +0100] - DEBUG - NSMMReplicationPlugin - database RUV: {replica 2 ldap://linux.home:39002} 61d8b22e000100020000 61d8b232000100020000 00000000 ... [07/Jan/2022:22:36:13.797250968 +0100] - ERR - agmt="cn=002" (linux:39002) - clcache_load_buffer - Can't locate CSN 61d8b231000100010000 in the changelog (DB rc=-12797). If replication stops, the consumer may need to be reinitialized. [07/Jan/2022:22:36:13.800517424 +0100] - DEBUG - agmt="cn=002" (linux:39002) - clcache_load_buffer - rc=-12797 [07/Jan/2022:22:36:13.803327610 +0100] - DEBUG - NSMMReplicationPlugin - changelog program - _cl5CheckMissingCSN - the change with 61d8b231000100010000 csn was never logged because it was imported during replica initialization ==> This time the database RUV is rightly preserved but replication fails because the supplier1 maxcsn is not in the changelog. (either the mincsn should have the same value to ignore the error (to be confirmed) or some dummy changelog records should exists in the changelog) Upstream ticket: https://github.com/389ds/389-ds-base/issues/5098