Hide Forgot
This bug is created as a clone of upstream ticket: https://fedorahosted.org/389/ticket/18 https://bugzilla.redhat.com/show_bug.cgi?id=750425 {{{ Description of problem: Data loss during the promotion operation(Slave to Master). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: Step-1: Have a topology like Master replicating to Slave and Slave replication to consumer. Master -> Slave-> Consumer. Step-2: Make sure that all are on sync at this time. Let?s take an example all are the on sync up to CSN5 (5 records are added to master from CSN1 to CSN5). Step-3: Delete the replication agreement from Master to Slave and also from Slave to consumer. Step-4: Promote the Slave to master. Promotion steps are given below. - Delete Supplier DN (cn=suppdn,cn=config) from Slave - Delete ?cn=replica? entry for the suffix ?o=USA? using ldapmodify. As a result, it will delete the changelog file. Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config changetype: delete - Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below EX: dn: cn=o=USA,cn=mapping tree,cn=config changetype: modify replace: nsslapd-state nsslapd-state: backend dn: cn=o=USA,cn=mapping tree,cn=config changetype: modify delete: nsslapd-referral - Recreate the ?cn=replica? entry for the suffix as below. dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config changetype: add objectClass: nsds5replica objectClass: top nsDS5ReplicaRoot: o=USA nsDS5ReplicaType: 3 nsDS5Flags: 1 nsDS5ReplicaId: 10 --? Please assign the same ?nsDS5ReplicaId value what master was having. In my case, Original master replica ID was 10. nsds5ReplicaPurgeDelay: 1 nsds5ReplicaTombstonePurgeInterval: -1 cn: replica - Restart slapd process. Now Slave become Master. Is there anything am I missing during promotion operation or it?s not the right way to do the promotion operation? Step -5: Add the replication agreement between Slave(newly promoted Master) and Consumer . At this time both Slave and consumer are on sync up to CSN5. During agreement creation please do not initialize the consumer. Slave(newly promoted as master) - > consumer. Step-6: Add another 5 more entries to Slave which was promoted above as Master. Let?s assume CSN numbers for these 5 entries are from CSN6 to CSN10. Step-7: Now, you will see, among the last 5 entries only last few will gets replicated without halting the replication. Actual results Expected results: Additional info: }}}
Verification steps [1] Set up Master (replica ID = 1), Hub, and Consumer. [2] Shutdown Master and reconfigure Hub into a master and assign the Master's replica ID (replica ID = 1) [3] Generate an agreement for the NewMaster?? (ex-Hub) pointing to the same Consumer. [4] Now, without initializing consumer on the NewMaster?, add multiple entries to the NewMaster?. For instance, add 5 entries: uid=test0, ..., uid=test4 with one ldapadd command-line. [5] If all 5 are replicated to Consumer correctly, the bug was verified.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: CSNs in RUV were not refreshed when a replication role was changed. Consequence: It caused data inconsistency. Fix: CSNs are refreshed at the timing of role change. Result: Data inconsistency is not observed.
Single MAster =============== [root@dhcp201-194 ~]# cat /var/log/dirsrv/slapd-dhcp201-1942/errors 389-Directory/1.2.10.2 B2012.144.1937 dhcp201-194.englab.pnq.redhat.com:1389 (/etc/dirsrv/slapd-dhcp201-1942) [29/May/2012:20:50:05 +051800] NSMMReplicationPlugin - agmt_delete: begin [29/May/2012:21:04:50 +051800] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=Master-hub" (dhcp201-194:2389)". [29/May/2012:21:04:53 +051800] NSMMReplicationPlugin - Finished total update of replica "agmt="cn=Master-hub" (dhcp201-194:2389)". Sent 16 entries. [29/May/2012:21:13:13 +051800] createprlistensockets - PR_Bind() on All Interfaces port 1389 failed: Netscape Portable Runtime error -5982 (Local Network address is in use.) Initially Hub thn new master ===================================== [root@dhcp201-194 ~]# cat /var/log/dirsrv/slapd-dhcp201-1943/errors 389-Directory/1.2.10.2 B2012.144.1937 dhcp201-194.englab.pnq.redhat.com:2389 (/etc/dirsrv/slapd-dhcp201-1943) [29/May/2012:20:52:42 +051800] NSMMReplicationPlugin - agmt_delete: begin [29/May/2012:21:04:49 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is going offline; disabling replication [29/May/2012:21:04:50 +051800] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [29/May/2012:21:04:52 +051800] - import userRoot: Workers finished; cleaning up... [29/May/2012:21:04:52 +051800] - import userRoot: Workers cleaned up. [29/May/2012:21:04:52 +051800] - import userRoot: Indexing complete. Post-processing... [29/May/2012:21:04:52 +051800] - import userRoot: Generating numSubordinates complete. [29/May/2012:21:04:52 +051800] - import userRoot: Flushing caches... [29/May/2012:21:04:52 +051800] - import userRoot: Closing files... [29/May/2012:21:04:52 +051800] - import userRoot: Import complete. Processed 16 entries in 3 seconds. (5.33 entries/sec) [29/May/2012:21:04:52 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is coming online; enabling replication [29/May/2012:21:04:52 +051800] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=example,dc=com does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [29/May/2012:21:06:16 +051800] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn=hub-consumer" (dhcp201-194:24337)". [29/May/2012:21:06:20 +051800] NSMMReplicationPlugin - Finished total update of replica "agmt="cn=hub-consumer" (dhcp201-194:24337)". Sent 16 entries. Consumer ============ [25/May/2012:12:42:37 +051800] - 389-Directory/1.2.10.2 B2012.144.1937 starting up [25/May/2012:12:42:37 +051800] - slapd started. Listening on All Interfaces port 24337 for LDAP requests [29/May/2012:11:53:46 +051800] - slapd shutting down - signaling operation threads [29/May/2012:11:53:46 +051800] - slapd shutting down - closing down internal subsystems and plugins [29/May/2012:11:53:48 +051800] - Waiting for 4 database threads to stop [29/May/2012:11:53:49 +051800] - All database threads now stopped [29/May/2012:11:53:49 +051800] - slapd stopped. [29/May/2012:11:58:12 +051800] - 389-Directory/1.2.10.2 B2012.144.1937 starting up [29/May/2012:11:58:15 +051800] - slapd started. Listening on All Interfaces port 24337 for LDAP requests [29/May/2012:21:06:16 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is going offline; disabling replication [29/May/2012:21:06:16 +051800] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=92a07901-a3f911e1-b71d8fb2-352a3d0e,cn=x,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)" [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=7dc8e301-a40311e1-b71d8fb2-352a3d0e,uid=aami,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)" [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=85888781-a40311e1-bf1bdd56-24a2cdb7,uid=bb,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)" [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=eb4a5001-a57811e1-bf1bdd56-24a2cdb7,uid=tt,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)" [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: Skipping entry "nsuniqueid=c10b9781-a63711e1-b71d8fb2-352a3d0e,uid=xz,nsuniqueid=d2fef783-a3f711e1-af33e12d-43d608a3,ou=People,dc=example,dc=com" which has no parent, ending at line 0 of file "(bulk import)" [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 10 [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 11 [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 12 [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 13 [29/May/2012:21:06:17 +051800] - import userRoot: WARNING: bad entry: ID 14 [29/May/2012:21:06:19 +051800] - import userRoot: Workers finished; cleaning up... [29/May/2012:21:06:19 +051800] - import userRoot: Workers cleaned up. [29/May/2012:21:06:19 +051800] - import userRoot: Indexing complete. Post-processing... [29/May/2012:21:06:19 +051800] - import userRoot: Generating numSubordinates complete. [29/May/2012:21:06:19 +051800] - import userRoot: Flushing caches... [29/May/2012:21:06:19 +051800] - import userRoot: Closing files... [29/May/2012:21:06:19 +051800] - import userRoot: Import complete. Processed 16 entries (5 were skipped) in 3 seconds. (5.33 entries/sec) [29/May/2012:21:06:19 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=example,dc=com is coming online; enabling replication NOTE:: Entries got replicated between new master and consumer so bug is VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0813.html