Bug 790656

Summary: Retain the changelog db file when the replica entry is modified during the promotion or demotion operation on a replica
Product: [Retired] 389 Reporter: Jyoti ranjan das <jyoti-ranjan.das>
Component: Replication - GeneralAssignee: Rich Megginson <rmeggins>
Status: CLOSED DEFERRED QA Contact: Ben Levenson <benl>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 1.2.10CC: nhosoi
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 19:22:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jyoti ranjan das 2012-02-15 05:15:18 UTC
Description of problem:

This bug is opened with reference to the bug id:750425

During the promotion or demotion operation of a replica(cases like when Master is demoted to play the Hub role  and Hub is promoted to play the Master), if we delete the replica entry for the suffix to recreate a new replica for the same suffix then it intern delete the changelogdb file associated with that replica role, which causes data loss in few scenarios mentioned below.

Scenario-1:
==========

Topology:

       Master ( Replica ID: 1)
         |
        Hub
      /     \
Consumer1  Consumer2

let us consider in this topology everybody are in sync up to a specific CSN say "CSN5". Now consumer1 is taken out of the topology for some reason. Master is open to receive the updates. Now consider in the topology all are in sync up to "CSN10" where as Cosumer1 is in "CSN5" because it's out of topology. Due to some reason Master disaster happened so to reduce the downtime the Hub is promoted to play the Master role in the absence of original Master. The steps which are followed to promote a Hub to plat the Master role is given below.After Hub is being promoted its open now to receive the changes.
Now bring back the Consumer1 into to topology without initializing the Consumer1. In this case, Consumer1 will miss the changes from "CSN6" to "CSN10".
This is one the potential data loss issue.

Steps followed during Promotion:
================================
All these steps are followed when the "ns-slapd" process is running.

  Note: Please note that the promotion operation is getting performed when hub is up and running.

  1:   Delete the supplier bind DN(cn=<suppdn>,cn=config)  from the Slave.
       This the DN which was being used by the master to communicate with the
       Hub/consumer. This being used when we create the replication aggrement.

   EX:    dn: cn=replication manager,cn=config
          objectClass: inetorgperson
          objectClass: top
          cn: replication manager
          sn: RM
          userPassword: password
          passwordExpirationTime: 20380119031407Z

        # /opt/dirsrv/bin/ldapmodify -h localhost -p  4601  
                               -D "cn=directory manager" -w <xxxxx>
          dn: cn=replication manager,cn=config
          changetype: delete
   
   2: Make a copy of the "nsDS5ReplicaName" attribute value from the replica
      entry.

   3:  Modify cn=replica entry. it will in-tern delete the changelogdb file.

        EX:  /opt/dirsrv/bin/ldapmodify -h localhost -p  4601  
                                      -D "cn=directory    manager" -w <xxxx>
             dn: cn=replica,cn="dc=ind, dc=hp, dc=com",cn=mapping tree,cn=config
             changetype: delete

          Note: After this step there won't be any changelogdb file for the hub

   4:  Modify "cn=<suffix>,cn=mapping tree, cn=config" entry
       
            # /opt/dirsrv/bin/ldapmodify -h localhost -p  4601  
                                      -D "cn=directory manager" -w <xxxx>
              dn: cn="dc=ind, dc=hp, dc=com",cn=mapping tree,cn=config
              changetype: modify
              replace: nsslapd-state
              nsslapd-state: backend

            # /opt/dirsrv/bin/ldapmodify -h localhost -p  4601  
                                            -D "cn=directory manager" -w <xxxx>
              dn: cn="dc=ind, dc=hp, dc=com",cn=mapping tree,cn=config
              changetype: modify
              delete: nsslapd-referral

   5: Now Re-create the cn=replica entry.

           # /opt/dirsrv/bin/ldapmodify -h localhost -p  4601  
                                            -D "cn=directory manager" -w <xxxx>
             dn: cn=replica,cn="dc=ind, dc=hp, dc=com",cn=mapping tree,cn=config
             changetype: add
             objectClass: nsDS5Replica
             objectClass: top
             nsDS5ReplicaRoot: dc=ind, dc=hp, dc=com
             nsDS5ReplicaType: 3
             nsDS5Flags: 1
             nsDS5ReplicaId: 1  < This is the same replicaid which is being used by the Master >
             nsDS5ReplicaName: < Same value which was taken in step-2 above>
             cn: replica
             nsds5ReplicaPurgeDelay: 1
             nsds5ReplicaTombstonePurgeInterval: -1

    Step-5: Restart the Hub. Now Hub becomes Master

There are couple of other scenario where we can see this type of issue.

Version-Release number of selected component (if applicable):


How reproducible:
Frequently

Steps to Reproduce:
Steps are given above to reproduce the issue.
  
Actual results:


Expected results:

In this case, the expectation is, if we could modify the replica entry instead of deleting it during promotion or demotion operation and also retain the changelog db file when modification is done on the replica entry, it could resolve few of the data loss scenario in the topology.
Additional info:

Please let me know if more details necessary on this.

Comment 1 Rich Megginson 2012-02-15 15:04:58 UTC
Upstream ticket:
https://fedorahosted.org/389/ticket/295

Comment 2 Rich Megginson 2012-02-15 15:08:32 UTC
We have a new ticket tracking system for 389.
http://directory.fedoraproject.org/wiki/Bugs
We have copied this bug to that system.  Please create a Fedora account and add yourself to the CC of that ticket.
If you have patches to contribute, please attach them to that ticket as attachments, and send an email to 389-devel.org asking for a review.

Thanks

Comment 4 Noriko Hosoi 2015-11-19 19:22:47 UTC
Closing this bug since we moved to the trac ticket: 
https://fedorahosted.org/389/ticket/295