Bug 1342382
| Summary: | Running db2index with no options breaks replication | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jan Kurik <jkurik> |
| Component: | 389-ds-base | Assignee: | Noriko Hosoi <nhosoi> |
| Status: | CLOSED ERRATA | QA Contact: | Viktor Ashirov <vashirov> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.0 | CC: | batkisso, ekeck, hgraham, mreynolds, nhosoi, nkinder, pbokoc, rmeggins, sramling |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 389-ds-base-1.2.11.15-75.el6_8 | Doc Type: | Bug Fix |
| Doc Text: |
When running the db2index script with no options, the script failed to handle on-disk RUV entries because these entries have no parent entries. The existing RUV was skipped and a new one was generated instead, which subsequently caused the next replication to fail due to an ID mismatch. This update fixes handling of RUV entries in db2index, and running this script without specifying any options no longer causes replication failures.
|
Story Points: | --- |
| Clone Of: | 1150817 | Environment: | |
| Last Closed: | 2016-07-12 18:34:43 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1150817 | ||
| Bug Blocks: | |||
|
Description
Jan Kurik
2016-06-03 06:58:08 UTC
1). Replication is working with 4 masters
[root@vm-idm-004 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done
99
99
99
99
2). Stop directory server instances
service dirsrv stop
3). Run ds2index from all the instance directory.
/usr/lib64/dirsrv/slapd-M1/db2index
/usr/lib64/dirsrv/slapd-M2/db2index
/usr/lib64/dirsrv/slapd-M3/db2index
/usr/lib64/dirsrv/slapd-M4/db2index
4). Start the directory server instances
service dirsrv start
5). Added few entries to all masters.
[root@vm-idm-004 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ./AddEntry.sh Users $PORT "ou=people,dc=passsync,dc=com" usr${PORT}nnnn 9 localhost ; done
6). Check how many entries on all masters
[root@vm-idm-004 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done
135
135
135
135
No issues with the replication. However, I see these errors in the M1's error log.
[05/Jul/2016:11:38:10 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=passsync,dc=com is coming online; enabling replication
[05/Jul/2016:11:38:10 +051800] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=passsync,dc=com does not match the data in the changelog.
Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[05/Jul/2016:11:40:37 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_1626_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1626): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_2616_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2616): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_2616_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2616): Data required to update replica has been purged. The replica must be reinitialized.
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_1389_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1389): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_1389_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1389): Data required to update replica has been purged. The replica must be reinitialized.
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_1489_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1489): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_1489_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1489): Data required to update replica has been purged. The replica must be reinitialized.
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_2616_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2616): Incremental update failed and requires administrator action
I also see these errors from M2
[05/Jul/2016:11:38:05 +051800] - resizing db cache size: 3301138432 -> 8000000
[05/Jul/2016:11:38:06 +051800] slapi_ldap_bind - Error: could not send bind request for id [cn=SyncManager,cn=config] mech [SIMPLE]: error -1 (Can't contact LDAP server) -5987 (Invalid function argument.) 107 (Transport endpoint is not connected)
[05/Jul/2016:11:38:06 +051800] NSMMReplicationPlugin - agmt="cn=2189_to_2626_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2626): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
Though there are few error messages as stated in comment #5 after step 3 & 4, the replication seems to be working fine. I tried deleting/adding few entries from M1 and it worked all fine. [root@vm-idm-008 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289 1389 1489"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done0 0 0 0 [root@vm-idm-008 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ./AddEntry.sh Users $PORT "ou=people,dc=passsync,dc=com" oou${PORT}neen 9 localhost ; done No of entries added will be 9 [root@vm-idm-008 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289 1389 1489"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done 36 36 36 36 The bug seems to be fixed which is around running ds2index and checking if replication breaks. Hence, marking the bug as Verified. Platforms tested: i386 and x86_64 Packages tested: [root@vm-idm-008 MMR_WINSYNC]# rpm -qa |grep -i 389-ds 389-ds-base-devel-1.2.11.15-75.el6_8.i686 389-ds-base-1.2.11.15-75.el6_8.i686 389-ds-base-debuginfo-1.2.11.15-75.el6_8.i686 389-ds-base-libs-1.2.11.15-75.el6_8.i686 [root@vm-idm-004 MMR_WINSYNC]# rpm -qa |grep -i 389-ds 389-ds-base-devel-1.2.11.15-75.el6_8.x86_64 389-ds-base-1.2.11.15-75.el6_8.x86_64 389-ds-base-debuginfo-1.2.11.15-75.el6_8.x86_64 389-ds-base-libs-1.2.11.15-75.el6_8.x86_64 Hi Sankar, Something looks wrong to me... "Can't contact LDAP server" does not look right. The error "Invalid function argument" looks odd to me... Do you have any idea why you got this? And was it recovered or did it stay in your test? How does this agreement "cn=2189_to_2626_on_vm-idm-004.lab.eng.pnq.redhat.com" look like? > I also see these errors from M2 > [05/Jul/2016:11:38:05 +051800] - resizing db cache size: 3301138432 -> 8000000 > [05/Jul/2016:11:38:06 +051800] slapi_ldap_bind - Error: could not send bind request for id [cn=SyncManager,cn=config] mech [SIMPLE]: error -1 (Can't contact LDAP server) -5987 (Invalid function argument.) 107 (Transport endpoint is not connected) > [05/Jul/2016:11:38:06 +051800] NSMMReplicationPlugin - agmt="cn=2189_to_2626_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2626): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ((null)) > Though there are few error messages as stated in comment #5 after step 3 & 4, the replication seems to be working fine. I tried deleting/adding few entries from M1 and it worked all fine. Is the replication topology a mesh style? If so, if one connection is broken, other paths could take it over and following modifications could be replicated fine. But that does not mean the replication is healthy... Could you please rerun the test and gather more info? Thanks. --noriko Probably, it's too late for 6.8.z. This bug fix is also in rhel-7.3 [1]. I'm updating the bug with the comments for more testing. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1340307 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1404 |