Bug 830335

Summary: restore of replica ldif file on second master after deleting two records shows only 1 deletion
Product: Red Hat Enterprise Linux 6 Reporter: Nathan Kinder <nkinder>
Component: 389-ds-baseAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Sankar Ramalingam <sramling>
Severity: unspecified Docs Contact:
Priority: high    
Version: 6.4CC: jgalipea, jrusnack, mreynolds
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.12-1.el6 Doc Type: Bug Fix
Doc Text:
Cause: restoring an ldif from a replica which has older changes that were not seen by other servers. Consequence: might lead to those updates not being replicated to other replicas Fix: check the csn's and allow older updates to be replicated. Result: replicas stay in sync
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 08:17:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nathan Kinder 2012-06-08 21:13:25 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/369

steps:

1) on IPA replica, lets create 4 IPA users: A,B,C and D.  Now make a backup with 'db2ldif.pl -r ...'

2) on IPA replica, delete the user D. 'ipa user-del D'.

3) on IPA master, delete the user C. 'ipa user-del C'.

4) now check on other IPA master and IPA replica, both shows only two users 'A' and 'B'. this is expected.

5) now on IPA replica, restore the backup with 'ldif2db.pl'

6) check on IPA replica immediately, 'ipa user-find' shows 4 users 'A, B, C, D' at the beginning.

7) check IPA Master, 'ipa user-find' shows still only two users 'A, B'.

8) wait 3 minutes or so, check on IPA replica, and found that there are only THREE users 'A, B, D'. The users 'C' is deleted now -- change propagated from IPA Master.

9) check on IPA Master again and again, there are still only two users 'A, B'.

10) check on IPA Replica again and again, there are still three users 'A, B,D'. --- this status is different from IPA Master's 'A,B', or backup's  'A, B, C, D'.

If backup was created without '-r' option, then the step 8 above will always show 'A,B,C,D', the same as backup.  with '-r' option make the final result between.

I think the delete of C that first occurred on the replica should have been propagated to the master, and then back to the replica after the restore from ldif.

Comment 1 RHEL Program Management 2012-07-10 07:10:58 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 2 RHEL Program Management 2012-07-10 23:00:16 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 4 Sankar Ramalingam 2012-11-30 00:42:42 UTC
On fourwayMMR setup with winsync replication agreement, I created 4 users on M1 and took a backup, deleted user1 from M2, deleted user2 from M1 and then restored the backed up file using ldif2db.pl. 

Steps:

1. Added 4 users. ./AddEntry.sh Users 1189 "dc=passsync,dc=com" bug830335users 4 localhost

2. [root@weelie slapd-M1]# pwd
/usr/lib64/dirsrv/slapd-M1
[root@weelie slapd-M1]# ./db2ldif.pl -r -D "cn=Directory Manager" -w Secret123 -n passsync1189 -a /tmp/testbug830335.ldif

3. [root@weelie slapd-M1]# PORT=1289; "uid=bug830335users1,dc=passsync,dc=com"; /usr/lib64/mozldap/ldapdelete -h 10.65.206.72 -p $PORT -D "cn=Directory Manager" -w Secret123 "$userid"

4. PORT=1189; userid="uid=bug830335users2,dc=passsync,dc=com"; /usr/lib64/mozldap/ldapdelete -h 10.65.206.72 -p $PORT -D "cn=Directory Manager" -w Secret123 "$userid"

5. ldapsearch on M1 and M2 resulted in 2 entries.

6. [root@weelie slapd-M1]# ./ldif2db.pl  -D "cn=Directory Manager" -w Secret123 -n passsync1189 -i /tmp/testbug830335.ldif

7. Now, 4 users on M1 and two users on M2. Seems like the bug still exists...

Message from DS error logs...

[29/Nov/2012:19:29:51 -0500] - import passsync1189: Closing files...
[29/Nov/2012:19:29:52 -0500] - import passsync1189: Import complete.  Processed 68 entries in 3 seconds. (22.67 entries/sec)
[29/Nov/2012:19:29:52 -0500] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=passsync,dc=com is coming online; enabling replication
[29/Nov/2012:19:29:52 -0500] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=passsync,dc=com does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.

Comment 5 mreynolds 2012-11-30 16:33:07 UTC
I think you need to make an update, after doing the ldif2db, to trigger replication.  In the testcase it saays to wait 3 minutes for replciation to catch up, but like I said, making some random update to the db should trigger it as well.

Comment 6 Sankar Ramalingam 2012-12-04 14:34:36 UTC
I and Varun tested the same problem in an IPA enironment and found that the issue is not yet fixed.

Steps followed exactly same as in the bug description. Additionally, we waited for 3 mins and then we created few users from Master and checked whether the same created in Replica.

Result: FAIL - It has no impact for the users which are restored.

Users on Master: UserC and UserD
Users on Replica: UserA, UserC and UserD ( Note UserB got deleted after 3 mins).

Comment 7 mreynolds 2012-12-04 20:31:02 UTC
Ok, I can reproduce your failure, and I can also reproduce the fix working.

Recap:

"replica" - we delete entry "D".  This is also where we make our backup and do the restore

"master"  - we delete entry "C"

After the restore:

[1]  If you make an update on "replica" immediately after the restore, this updates its maxcsn, and now "master" will not replay updates that came from "replica" back to "replica" if they are before the "new" maxcsn.  So the delete of C is replicated, but not D.

[2]  But if you make an update on "master" instead, it will push all the missing updates, as the "replica" maxcsn has not been updated and contians the original older maxcsn.  So both C and D will be deleted.


I do not see a way to deal with scenario [1].  If you make an update to the restored replica before the other agreements can update it, then those changes are lost.  I do not think this is possible to prevent this.

The preferred process you would not be restoring a replica with old data.  You should be making a backup ldif from a active "replica", and use that ldif to do the restore.  When you do a restore, it is expected that it is the most recent version of the data.

Anyway, to show the fix is working, just make an update on "master" after doing the restore, and that will trigger the correct behavior.

Mark

Comment 8 Sankar Ramalingam 2012-12-06 14:37:16 UTC
(In reply to comment #7)
> Ok, I can reproduce your failure, and I can also reproduce the fix working.
> 
> Recap:
> 
> "replica" - we delete entry "D".  This is also where we make our backup and
> do the restore
> 
> "master"  - we delete entry "C"
> 
> After the restore:
> 
> [1]  If you make an update on "replica" immediately after the restore, this
> updates its maxcsn, and now "master" will not replay updates that came from
> "replica" back to "replica" if they are before the "new" maxcsn.  So the
> delete of C is replicated, but not D.
I tried creating a new user in replica after restoring the ldif file. But, it didn't help. 
Resulted entries in Replica after 3 mins:
[root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login"
  User login: admin
  User login: newtest1
  User login: newtest3
  User login: newtest4
  User login: newtest5
  User login: varun2
  User login: varun3
  User login: varunmylaraiah

newtest2 got deleted - 
User creation is not syncing from Replica to Master. Added varun2 and varun3.

Resulted entries in Master after 3 mins:
[root@hp-dl785g6-01 ~]# ipa user-find |grep -i "User login"
  User login: admin
  User login: newtest3
  User login: newtest4
  User login: newtest5
  User login: varunmylaraiah

> 
> [2]  But if you make an update on "master" instead, it will push all the
> missing updates, as the "replica" maxcsn has not been updated and contians
> the original older maxcsn.  So both C and D will be deleted.
> 
> 
> I do not see a way to deal with scenario [1].  If you make an update to the
> restored replica before the other agreements can update it, then those
> changes are lost.  I do not think this is possible to prevent this.
> 
> The preferred process you would not be restoring a replica with old data. 
> You should be making a backup ldif from a active "replica", and use that
> ldif to do the restore.  When you do a restore, it is expected that it is
> the most recent version of the data.
> 
> Anyway, to show the fix is working, just make an update on "master" after
> doing the restore, and that will trigger the correct behavior.
> 
> Mark

Comment 9 Sankar Ramalingam 2012-12-06 14:42:27 UTC
Another test from replica:

[root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login"
  User login: admin
  User login: varun3
  User login: varunmylaraiah
[root@hp-xw4600-01 scripts-TESTRELM-COM]# ./ldif2db.pl -D "cn=Directory Manager" -w Secret123 -n userRoot -i /tmp/newtestbug830335.ldif
adding new entry "cn=import_2012_12_6_9_35_5, cn=import, cn=tasks, cn=config"

[root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login"
  User login: admin
  User login: newtest12
  User login: varun1
  User login: varun2
  User login: varun3
  User login: varun4
  User login: varun5
  User login: varunmylaraiah
[root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-add newtest18
First name: newtest18
Last name: newtest18
----------------------
Added user "newtest18"
----------------------
  User login: newtest18
  First name: newtest18
  Last name: newtest18
  Full name: newtest18 newtest18
  Display name: newtest18 newtest18
  Initials: nn
  Home directory: /home/newtest18
  GECOS field: newtest18 newtest18
  Login shell: /bin/sh
  Kerberos principal: newtest18
  Email address: newtest18
  UID: 615900010
  GID: 615900010
  Password: False
  Kerberos keys available: False
[root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login"
  User login: admin
  User login: newtest12
  User login: newtest18
  User login: varun1
  User login: varun2
  User login: varun3
  User login: varunmylaraiah


Users at Master:

[root@hp-dl785g6-01 ~]# ipa user-find |grep -i "User login"
  User login: admin
  User login: newtest18
  User login: varun3
  User login: varunmylaraiah


The outcome is very much inconsistent and leads to lot of confusion. Not sure what is the expected behaviour.

My gut feeling is, this issue must be documented for sure, if not fixed.

Comment 10 mreynolds 2012-12-06 15:15:48 UTC
Ok, you aren't follwoing the correct proceedure...

Let me reword it:

We have Master A and Master B.

Add 4 users to master A (user1, user2, user3, user4)

Take a backup from master A (db2ldif -r)

Delete user4 from master A

Delete user3 from master B

Restore the ldif on Master A (ldif2db)

Make any kind of update on master B (not master A!).  If you update master A things will break - there is no way to fix this.

Wait 5 seconds at the most.

Now master A has the same entries as master B.

Comment 11 Sankar Ramalingam 2012-12-18 11:18:26 UTC
(In reply to comment #10)
> Ok, you aren't follwoing the correct proceedure...
> 
> Let me reword it:
> 
> We have Master A and Master B.
> 
> Add 4 users to master A (user1, user2, user3, user4)
> 
> Take a backup from master A (db2ldif -r)
> 
> Delete user4 from master A
> 
> Delete user3 from master B
> 
> Restore the ldif on Master A (ldif2db)
> 
> Make any kind of update on master B (not master A!).

After restoring, I created/Deleted few entries in MasterB. But, no luck this time also. I waited for more than 5 mins and MasterA and masterB ended up having different set of entries.
  If you update master A
> things will break - there is no way to fix this.
> 
> Wait 5 seconds at the most.
> 
> Now master A has the same entries as master B.

Comment 12 Sankar Ramalingam 2012-12-19 02:02:17 UTC
Adding an automated test in fourwaymmr.sh test suite, "bug830335". Please apply fixes directly on this test case, if the test is not written correctly. I will re-assign the bug if the test case fails on the automated test execution.

Comment 13 Sankar Ramalingam 2013-01-15 02:31:02 UTC
Hi Mark, Can you apply fixes to the scripts(for bug830335 test in fourwaymmr), if its not written correctly?

Comment 14 mreynolds 2013-01-16 16:15:16 UTC
I checked in a possible fix.  I am unable to get tet running (again), so I can not verify my fix.

You were missing the "-r" option for the db2ldif command.  This is necessary step to verify the fix.  I have checked this change into trunk and rhel64 tet branches.  I did not see anything else wrong with your testcase.  

Can you run the test to see if it his passing now?

Thanks,
Mark

Comment 15 Sankar Ramalingam 2013-01-21 10:40:33 UTC
mmrepl fourwaymmr run 	100% (23/23). Hence the marking the bug as verified.

User 1 added successfully to ou=People,dc=example,dc=com , return code, 0
Sleeping for 3 mins to sync M1 and M2
Running ldapsearch for ou=People,dc=example,dc=com to find out how many entries present after restoring M1, updating M2 and sleep for 180 secs
Entries after restoring M1, Updating M2 and sleeping for 3 mins: Users on M1-14 , Users on M2-14
Step3: Users at M1 and M2 are in sync, Users on M1-14 , Users on M2-14, test successful
After successful restoring of M1
TestCase [bug830335] result-> [PASS]

Comment 17 errata-xmlrpc 2013-02-21 08:17:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0503.html