Bug 830335
Summary: | restore of replica ldif file on second master after deleting two records shows only 1 deletion | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nathan Kinder <nkinder> |
Component: | 389-ds-base | Assignee: | Rich Megginson <rmeggins> |
Status: | CLOSED ERRATA | QA Contact: | Sankar Ramalingam <sramling> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 6.4 | CC: | jgalipea, jrusnack, mreynolds |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 389-ds-base-1.2.11.12-1.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: restoring an ldif from a replica which has older changes that were not seen by other servers.
Consequence: might lead to those updates not being replicated to other replicas
Fix: check the csn's and allow older updates to be replicated.
Result: replicas stay in sync
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 08:17:24 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nathan Kinder
2012-06-08 21:13:25 UTC
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4. On fourwayMMR setup with winsync replication agreement, I created 4 users on M1 and took a backup, deleted user1 from M2, deleted user2 from M1 and then restored the backed up file using ldif2db.pl. Steps: 1. Added 4 users. ./AddEntry.sh Users 1189 "dc=passsync,dc=com" bug830335users 4 localhost 2. [root@weelie slapd-M1]# pwd /usr/lib64/dirsrv/slapd-M1 [root@weelie slapd-M1]# ./db2ldif.pl -r -D "cn=Directory Manager" -w Secret123 -n passsync1189 -a /tmp/testbug830335.ldif 3. [root@weelie slapd-M1]# PORT=1289; "uid=bug830335users1,dc=passsync,dc=com"; /usr/lib64/mozldap/ldapdelete -h 10.65.206.72 -p $PORT -D "cn=Directory Manager" -w Secret123 "$userid" 4. PORT=1189; userid="uid=bug830335users2,dc=passsync,dc=com"; /usr/lib64/mozldap/ldapdelete -h 10.65.206.72 -p $PORT -D "cn=Directory Manager" -w Secret123 "$userid" 5. ldapsearch on M1 and M2 resulted in 2 entries. 6. [root@weelie slapd-M1]# ./ldif2db.pl -D "cn=Directory Manager" -w Secret123 -n passsync1189 -i /tmp/testbug830335.ldif 7. Now, 4 users on M1 and two users on M2. Seems like the bug still exists... Message from DS error logs... [29/Nov/2012:19:29:51 -0500] - import passsync1189: Closing files... [29/Nov/2012:19:29:52 -0500] - import passsync1189: Import complete. Processed 68 entries in 3 seconds. (22.67 entries/sec) [29/Nov/2012:19:29:52 -0500] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=passsync,dc=com is coming online; enabling replication [29/Nov/2012:19:29:52 -0500] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=passsync,dc=com does not match the data in the changelog. Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. I think you need to make an update, after doing the ldif2db, to trigger replication. In the testcase it saays to wait 3 minutes for replciation to catch up, but like I said, making some random update to the db should trigger it as well. I and Varun tested the same problem in an IPA enironment and found that the issue is not yet fixed. Steps followed exactly same as in the bug description. Additionally, we waited for 3 mins and then we created few users from Master and checked whether the same created in Replica. Result: FAIL - It has no impact for the users which are restored. Users on Master: UserC and UserD Users on Replica: UserA, UserC and UserD ( Note UserB got deleted after 3 mins). Ok, I can reproduce your failure, and I can also reproduce the fix working. Recap: "replica" - we delete entry "D". This is also where we make our backup and do the restore "master" - we delete entry "C" After the restore: [1] If you make an update on "replica" immediately after the restore, this updates its maxcsn, and now "master" will not replay updates that came from "replica" back to "replica" if they are before the "new" maxcsn. So the delete of C is replicated, but not D. [2] But if you make an update on "master" instead, it will push all the missing updates, as the "replica" maxcsn has not been updated and contians the original older maxcsn. So both C and D will be deleted. I do not see a way to deal with scenario [1]. If you make an update to the restored replica before the other agreements can update it, then those changes are lost. I do not think this is possible to prevent this. The preferred process you would not be restoring a replica with old data. You should be making a backup ldif from a active "replica", and use that ldif to do the restore. When you do a restore, it is expected that it is the most recent version of the data. Anyway, to show the fix is working, just make an update on "master" after doing the restore, and that will trigger the correct behavior. Mark (In reply to comment #7) > Ok, I can reproduce your failure, and I can also reproduce the fix working. > > Recap: > > "replica" - we delete entry "D". This is also where we make our backup and > do the restore > > "master" - we delete entry "C" > > After the restore: > > [1] If you make an update on "replica" immediately after the restore, this > updates its maxcsn, and now "master" will not replay updates that came from > "replica" back to "replica" if they are before the "new" maxcsn. So the > delete of C is replicated, but not D. I tried creating a new user in replica after restoring the ldif file. But, it didn't help. Resulted entries in Replica after 3 mins: [root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login" User login: admin User login: newtest1 User login: newtest3 User login: newtest4 User login: newtest5 User login: varun2 User login: varun3 User login: varunmylaraiah newtest2 got deleted - User creation is not syncing from Replica to Master. Added varun2 and varun3. Resulted entries in Master after 3 mins: [root@hp-dl785g6-01 ~]# ipa user-find |grep -i "User login" User login: admin User login: newtest3 User login: newtest4 User login: newtest5 User login: varunmylaraiah > > [2] But if you make an update on "master" instead, it will push all the > missing updates, as the "replica" maxcsn has not been updated and contians > the original older maxcsn. So both C and D will be deleted. > > > I do not see a way to deal with scenario [1]. If you make an update to the > restored replica before the other agreements can update it, then those > changes are lost. I do not think this is possible to prevent this. > > The preferred process you would not be restoring a replica with old data. > You should be making a backup ldif from a active "replica", and use that > ldif to do the restore. When you do a restore, it is expected that it is > the most recent version of the data. > > Anyway, to show the fix is working, just make an update on "master" after > doing the restore, and that will trigger the correct behavior. > > Mark Another test from replica: [root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login" User login: admin User login: varun3 User login: varunmylaraiah [root@hp-xw4600-01 scripts-TESTRELM-COM]# ./ldif2db.pl -D "cn=Directory Manager" -w Secret123 -n userRoot -i /tmp/newtestbug830335.ldif adding new entry "cn=import_2012_12_6_9_35_5, cn=import, cn=tasks, cn=config" [root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login" User login: admin User login: newtest12 User login: varun1 User login: varun2 User login: varun3 User login: varun4 User login: varun5 User login: varunmylaraiah [root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-add newtest18 First name: newtest18 Last name: newtest18 ---------------------- Added user "newtest18" ---------------------- User login: newtest18 First name: newtest18 Last name: newtest18 Full name: newtest18 newtest18 Display name: newtest18 newtest18 Initials: nn Home directory: /home/newtest18 GECOS field: newtest18 newtest18 Login shell: /bin/sh Kerberos principal: newtest18 Email address: newtest18 UID: 615900010 GID: 615900010 Password: False Kerberos keys available: False [root@hp-xw4600-01 scripts-TESTRELM-COM]# ipa user-find |grep -i "User login" User login: admin User login: newtest12 User login: newtest18 User login: varun1 User login: varun2 User login: varun3 User login: varunmylaraiah Users at Master: [root@hp-dl785g6-01 ~]# ipa user-find |grep -i "User login" User login: admin User login: newtest18 User login: varun3 User login: varunmylaraiah The outcome is very much inconsistent and leads to lot of confusion. Not sure what is the expected behaviour. My gut feeling is, this issue must be documented for sure, if not fixed. Ok, you aren't follwoing the correct proceedure... Let me reword it: We have Master A and Master B. Add 4 users to master A (user1, user2, user3, user4) Take a backup from master A (db2ldif -r) Delete user4 from master A Delete user3 from master B Restore the ldif on Master A (ldif2db) Make any kind of update on master B (not master A!). If you update master A things will break - there is no way to fix this. Wait 5 seconds at the most. Now master A has the same entries as master B. (In reply to comment #10) > Ok, you aren't follwoing the correct proceedure... > > Let me reword it: > > We have Master A and Master B. > > Add 4 users to master A (user1, user2, user3, user4) > > Take a backup from master A (db2ldif -r) > > Delete user4 from master A > > Delete user3 from master B > > Restore the ldif on Master A (ldif2db) > > Make any kind of update on master B (not master A!). After restoring, I created/Deleted few entries in MasterB. But, no luck this time also. I waited for more than 5 mins and MasterA and masterB ended up having different set of entries. If you update master A > things will break - there is no way to fix this. > > Wait 5 seconds at the most. > > Now master A has the same entries as master B. Adding an automated test in fourwaymmr.sh test suite, "bug830335". Please apply fixes directly on this test case, if the test is not written correctly. I will re-assign the bug if the test case fails on the automated test execution. Hi Mark, Can you apply fixes to the scripts(for bug830335 test in fourwaymmr), if its not written correctly? I checked in a possible fix. I am unable to get tet running (again), so I can not verify my fix. You were missing the "-r" option for the db2ldif command. This is necessary step to verify the fix. I have checked this change into trunk and rhel64 tet branches. I did not see anything else wrong with your testcase. Can you run the test to see if it his passing now? Thanks, Mark mmrepl fourwaymmr run 100% (23/23). Hence the marking the bug as verified. User 1 added successfully to ou=People,dc=example,dc=com , return code, 0 Sleeping for 3 mins to sync M1 and M2 Running ldapsearch for ou=People,dc=example,dc=com to find out how many entries present after restoring M1, updating M2 and sleep for 180 secs Entries after restoring M1, Updating M2 and sleeping for 3 mins: Users on M1-14 , Users on M2-14 Step3: Users at M1 and M2 are in sync, Users on M1-14 , Users on M2-14, test successful After successful restoring of M1 TestCase [bug830335] result-> [PASS] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0503.html |