788745 – Data inconsitency during replication

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 788745 - Data inconsitency during replication

Summary: Data inconsitency during replication

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Rich Megginson
QA Contact:	IDM QE LIST
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-02-08 22:43 UTC by Rich Megginson
Modified:	2020-09-13 19:45 UTC (History)
CC List:	4 users (show)
Fixed In Version:	389-ds-base-1.2.10.0-1.el6
Doc Type:	Bug Fix
Doc Text:	Cause: CSNs in RUV were not refreshed when a replication role was changed. Consequence: It caused data inconsistency. Fix: CSNs are refreshed at the timing of role change. Result: Data inconsistency is not observed.
Clone Of:
Environment:
Last Closed:	2012-06-20 07:14:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 18	0	None	None	None	2020-09-13 19:45:20 UTC
Red Hat Product Errata	RHSA-2012:0813	0	normal	SHIPPED_LIVE	Low: 389-ds-base security, bug fix, and enhancement update	2012-06-19 19:29:15 UTC

Description Rich Megginson 2012-02-08 22:43:17 UTC

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/18

https://bugzilla.redhat.com/show_bug.cgi?id=750425

{{{
Description of problem:

Data loss during the promotion operation(Slave to Master).
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
Step-1:

Have a topology like Master replicating to Slave and Slave replication to
consumer.

Master -> Slave-> Consumer.

Step-2:
Make sure that all are on sync at this time. Let?s take an example all are the
on sync up to CSN5 (5 records are added to master from CSN1 to CSN5).

Step-3:

Delete the replication agreement from Master to Slave and also from Slave to
consumer.

Step-4:

Promote the Slave to master.  Promotion steps are given below.

-       Delete Supplier DN (cn=suppdn,cn=config) from Slave
-       Delete ?cn=replica? entry for the suffix ?o=USA? using ldapmodify. As a
result, it will delete the changelog file.
Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
changetype: delete
-       Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below
EX: dn: cn=o=USA,cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-state
nsslapd-state: backend

dn: cn=o=USA,cn=mapping tree,cn=config
changetype: modify
delete: nsslapd-referral
-       Recreate the ?cn=replica? entry for the suffix as below.
dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
changetype: add
objectClass: nsds5replica
objectClass: top
nsDS5ReplicaRoot: o=USA
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsDS5ReplicaId: 10  --? Please assign the same ?nsDS5ReplicaId value what
master was having. In my case, Original master replica ID was 10.
nsds5ReplicaPurgeDelay: 1
nsds5ReplicaTombstonePurgeInterval: -1
cn: replica
-       Restart  slapd process. Now Slave become Master.

Is there anything am I missing during promotion operation or it?s not the right
way to do the promotion operation?

Step -5:

Add the replication agreement between Slave(newly promoted Master) and Consumer
. At this time both Slave and consumer are on sync up to CSN5. During agreement
creation please do not initialize the consumer.

           Slave(newly promoted as master) - > consumer.

Step-6:

Add another 5 more entries to Slave which was promoted above as Master. Let?s
assume CSN numbers for these 5 entries are from CSN6 to CSN10.

Step-7:

Now, you will see, among the last 5 entries only last few will gets replicated
without halting the replication.


Actual results

Expected results:


Additional info:
}}}

Comment 2 mreynolds 2012-05-11 20:40:29 UTC

Verification steps    

[1]  Set up Master (replica ID = 1), Hub, and Consumer.

[2]  Shutdown Master and reconfigure Hub into a master and assign the Master's replica ID (replica ID = 1)

[3]  Generate an agreement for the NewMaster?? (ex-Hub) pointing to the same Consumer.

[4]  Now, without initializing consumer on the NewMaster?, add multiple entries to the NewMaster?. For instance, add 5 entries: uid=test0, ..., uid=test4 with one ldapadd command-line. 

[5]  If all 5 are replicated to Consumer correctly, the bug was verified.

Comment 3 Noriko Hosoi 2012-05-25 00:24:07 UTC

    Technical note added. If any revisions are required, please edit the
"Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content
Services team.

    New Contents:
Cause: CSNs in RUV were not refreshed when a replication role was changed.
Consequence: It caused data inconsistency.
Fix: CSNs are refreshed at the timing of role change.
Result: Data inconsistency is not observed.

Comment 4 Noriko Hosoi 2012-05-25 00:27:08 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: CSNs in RUV were not refreshed when a replication role was changed.
Consequence: It caused data inconsistency.
Fix: CSNs are refreshed at the timing of role change.
Result: Data inconsistency is not observed.

Comment 5 Amita Sharma 2012-05-29 16:15:46 UTC

Single MAster
===============
[root@dhcp201-194 ~]# cat /var/log/dirsrv/slapd-dhcp201-1942/errors
	389-Directory/1.2.10.2 B2012.144.1937
	dhcp201-194.englab.pnq.redhat.com:1389 (/etc/dirsrv/slapd-dhcp201-1942)

[29/May/2012:20:50:05 +051800] NSMMReplicationPlugin - agmt_delete: begin
[29/May/2012:21:04:50 +051800] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=Master-hub" (dhcp201-194:2389)".
[29/May/2012:21:04:53 +051800] NSMMReplicationPlugin - Finished total update of replica "agmt="cn=Master-hub" (dhcp201-194:2389)". Sent 16 entries.
[29/May/2012:21:13:13 +051800] createprlistensockets - PR_Bind() on All Interfaces port 1389 failed: Netscape Portable Runtime error -5982 (Local Network address is in use.)


Initially Hub thn new master
=====================================
[root@dhcp201-194 ~]# cat /var/log/dirsrv/slapd-dhcp201-1943/errors
	389-Directory/1.2.10.2 B2012.144.1937
	dhcp201-194.englab.pnq.redhat.com:2389 (/etc/dirsrv/slapd-dhcp201-1943)

[29/May/2012:20:52:42 +051800] NSMMReplicationPlugin - agmt_delete: begin
[29/May/2012:21:04:49 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is going offline; disabling replication
[29/May/2012:21:04:50 +051800] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[29/May/2012:21:04:52 +051800] - import userRoot: Workers finished; cleaning up...
[29/May/2012:21:04:52 +051800] - import userRoot: Workers cleaned up.
[29/May/2012:21:04:52 +051800] - import userRoot: Indexing complete.  Post-processing...
[29/May/2012:21:04:52 +051800] - import userRoot: Generating numSubordinates complete.
[29/May/2012:21:04:52 +051800] - import userRoot: Flushing caches...
[29/May/2012:21:04:52 +051800] - import userRoot: Closing files...
[29/May/2012:21:04:52 +051800] - import userRoot: Import complete.  Processed 16 entries in 3 seconds. (5.33 entries/sec)
[29/May/2012:21:04:52 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is coming online; enabling replication
[29/May/2012:21:04:52 +051800] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=example,dc=com does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.
[29/May/2012:21:06:16 +051800] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=hub-consumer" (dhcp201-194:24337)".
[29/May/2012:21:06:20 +051800] NSMMReplicationPlugin - Finished total update of replica "agmt="cn=hub-consumer" (dhcp201-194:24337)". Sent 16 entries.

Consumer
============
[25/May/2012:12:42:37 +051800] - 389-Directory/1.2.10.2 B2012.144.1937 starting up
[25/May/2012:12:42:37 +051800] - slapd started.  Listening on All Interfaces port 24337 for LDAP requests
[29/May/2012:11:53:46 +051800] - slapd shutting down - signaling operation threads
[29/May/2012:11:53:46 +051800] - slapd shutting down - closing down internal subsystems and plugins
[29/May/2012:11:53:48 +051800] - Waiting for 4 database threads to stop
[29/May/2012:11:53:49 +051800] - All database threads now stopped
[29/May/2012:11:53:49 +051800] - slapd stopped.
[29/May/2012:11:58:12 +051800] - 389-Directory/1.2.10.2 B2012.144.1937 starting up
[29/May/2012:11:58:15 +051800] - slapd started.  Listening on All Interfaces port 24337 for LDAP requests
[29/May/2012:21:06:16 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is going offline; disabling replication
[29/May/2012:21:06:16 +051800] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=92a07901-a3f911e1-b71d8fb2-352a3d0e,cn=x,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)"
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=7dc8e301-a40311e1-b71d8fb2-352a3d0e,uid=aami,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)"
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=85888781-a40311e1-bf1bdd56-24a2cdb7,uid=bb,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)"
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=eb4a5001-a57811e1-bf1bdd56-24a2cdb7,uid=tt,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)"
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=c10b9781-a63711e1-b71d8fb2-352a3d0e,uid=xz,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)"
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 10
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 11
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 12
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 13
[29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 14
[29/May/2012:21:06:19 +051800] - import userRoot: Workers finished; cleaning up...
[29/May/2012:21:06:19 +051800] - import userRoot: Workers cleaned up.
[29/May/2012:21:06:19 +051800] - import userRoot: Indexing complete.  Post-processing...
[29/May/2012:21:06:19 +051800] - import userRoot: Generating numSubordinates complete.
[29/May/2012:21:06:19 +051800] - import userRoot: Flushing caches...
[29/May/2012:21:06:19 +051800] - import userRoot: Closing files...
[29/May/2012:21:06:19 +051800] - import userRoot: Import complete.  Processed 16 entries (5 were skipped) in 3 seconds. (5.33 entries/sec)
[29/May/2012:21:06:19 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is coming online; enabling replication

NOTE:: Entries got replicated between new master and consumer so bug is VERIFIED.

Comment 6 errata-xmlrpc 2012-06-20 07:14:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0813.html

Note You need to log in before you can comment on or make changes to this bug.