Bug 1309963
| Summary: | keep alive entries can break replication | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Kurik <jkurik> |
| Component: | 389-ds-base | Assignee: | Noriko Hosoi <nhosoi> |
| Status: | CLOSED ERRATA | QA Contact: | Viktor Ashirov <vashirov> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.3 | CC: | ekeck, mkolaja, msauton, nhosoi, nkinder, rmeggins, sramling, tbordaz |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 389-ds-base-1.3.4.0-27.el7 | Doc Type: | Bug Fix |
| Doc Text: |
Previously, a keep alive entry was being created at too many opportunities during replication, potentially causing a race condition when adding the entry to the replica changelog and resulting in operations being dropped from the replication. With this update, unnecessary keep alive entry creation has been eliminated, and missing replication no longer occurs.
|
Story Points: | --- |
| Clone Of: | 1307151 | Environment: | |
| Last Closed: | 2016-03-31 22:04:39 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1307151 | ||
| Bug Blocks: | |||
|
Description
Jan Kurik
2016-02-19 06:05:08 UTC
Hi Noriko, to verify this bug... I understand we need to create replication with "cn=repl keep alive" entry under the suffix. However, its not clear enough how the setup should be. So, request you to provide the steps to reproduce this. Hi Thierry, You put a reproducer in this comment which requires IPA: https://fedorahosted.org/389/ticket/48445#comment:1 Could there be an easy steps that duplicate this scenario? It seems total update triggered this bug. Just repeating the total updates and checking that the data loss did not occur is good enough? Or it's safer to run the IPA install and check? Hi Noriko,
The bug silently cleared (from the CL) the ADD of the replica keep alive entry.
This ADD occurred (on the consumer side) at the end of the total init.
It was not a detected, until the replica needs to update the keep alive entry. Those updates hit err=32 when they were replicated.
So the reproducer are:
- Create a Master and a Replica
- Configure the Master, with fractional replication so that it skips sending updates on 'description'
- Do a total init of the replica
(with the bug fix, it does not create the keep alive entry)
- ON Master loop doing more than 100 updates of 'description' attribute (skipped by replica agreement)
- Check that the keep alive entry (ADD) is created and sent the replica
- Checks that the update of the keep alive entry is replicated
1. Created 6 master replication setup. 2. Created few users and groups on M1 3. Total init completed from M1. No fractional replication configured. 4. Configured fractional replication for telePhoneNumber attribute 5. Run total init to check if telePhoneNumber is removed from other Masters and Consumer than M1. 6. Its successful. telePhoneNumber attribute removed in all masters but M1 7. About 100 modify operations for telePhoneNumber attribute on M1. [root@vm-idm-004 ~]# no=1 ; while [ $no -lt 101 ]; do ldapmodify -x -p 1189 -h localhost -D "cn=Directory Manager" -w Secret123 << EOF > /dev/null dn: uid=users1189users7,ou=People,dc=passsync,dc=com replace: telephoneNumber telephoneNumber: 999999$no EOF sleep 0.3; no=`expr $no + 1`; done 8. Checked if telePhone attribute values exist on other masters. [root@vm-idm-004 ~]# for PORT in `echo "1189 1289 1389 1489 2189 2289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "uid=users1189users7,ou=People,dc=passsync,dc=com" |grep -i tele ; RC=$? ; if [ $RC -eq 0 ]; then echo "telePhoneNumber fractional attribute only on PORT-$PORT"; fi ; done telephoneNumber: 999999100 telePhoneNumber fractional attribute only on PORT-1189 9. Checking "cn=repl keep alive" entry on all servers. [root@vm-idm-004 ~]# grep -li "cn=repl keep" /var/log/dirsrv/slapd-*/errors /var/log/dirsrv/slapd-M1/errors /var/log/dirsrv/slapd-M4/errors /var/log/dirsrv/slapd-M5/errors I will redo the testing with the fractional replication setup in start and update the bug with my findings. This time on a fresh setup. 1. Created 6 master replication setup with fractional replication for description and telephoneNumber attributes. 2. Created few users and groups on M1. 3. Total init completed from M1. 4. Checked if "cn=repl keep alive $replica_id,dc=passsync,dc=com is created. 5. "cn=repl keep alive 2211,dc=passsync,dc=com" is created on all masters. for PORT in `echo "1189 1289 1389 1489 2189 2289 3189 3289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "cn=repl keep alive 2211,dc=passsync,dc=com" -s base |grep -i "dn: cn=repl keep alive 2211" ; RC=$? ; if [ $RC -eq 0 ]; then echo "cn=repl keep alive entry created on PORT-$PORT"; fi ; done dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-1189 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-1289 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-1389 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-1489 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-2189 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-2289 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-3189 dn: cn=repl keep alive 2211,dc=passsync,dc=com cn=repl keep alive entry created on PORT-3289 Note: I haven't yet run ldapmodify for frational attributes in M1. As per your comment, the entry "cn=repl keep alive 2211,dc=passsync,dc=com" should be created only after ldapmodify operations. I appreciate, if you could give us more instructions to proceed with the bug verification. Thanks. Hi Sankar, keep alive entry is created in two cases: * If a supplier does a total init of a consumer, it first checks that its own keep alive entry exists. * If any instance needs to update its keep alive entry (for example, it skips more than 100 updates), then if the keep alive entry does not exist it creates it first. In your case, the keep alive entry of M1 is created (although no updates was done on M1) because M1 did a total update of others replica. thanks theirry Thanks Thierry for answers :). So, as per your comment #10, the feature seems to be working. Anyways, I modified fractional replication attributes more than 100 times and checked the cn=repl keep alive entry and replica states. Everything seems working fine. Hence, marking the bug as Verified. [root@vm-idm-004 ~]# rpm -qa |grep -i 389-ds 389-ds-base-debuginfo-1.3.4.0-27.el7_2.x86_64 389-ds-base-libs-1.3.4.0-27.el7_2.x86_64 389-ds-base-devel-1.3.4.0-27.el7_2.x86_64 389-ds-base-1.3.4.0-27.el7_2.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0550.html |