Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1342382

Summary: Running db2index with no options breaks replication
Product: Red Hat Enterprise Linux 6 Reporter: Jan Kurik <jkurik>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: batkisso, ekeck, hgraham, mreynolds, nhosoi, nkinder, pbokoc, rmeggins, sramling
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-base-1.2.11.15-75.el6_8 Doc Type: Bug Fix
Doc Text:
When running the db2index script with no options, the script failed to handle on-disk RUV entries because these entries have no parent entries. The existing RUV was skipped and a new one was generated instead, which subsequently caused the next replication to fail due to an ID mismatch. This update fixes handling of RUV entries in db2index, and running this script without specifying any options no longer causes replication failures.
Story Points: ---
Clone Of: 1150817 Environment:
Last Closed: 2016-07-12 18:34:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1150817    
Bug Blocks:    

Description Jan Kurik 2016-06-03 06:58:08 UTC
This bug has been copied from bug #1150817 and has been proposed
to be backported to 6.8 z-stream (EUS).

Comment 5 Sankar Ramalingam 2016-07-05 06:22:40 UTC
1). Replication is working with 4 masters
[root@vm-idm-004 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done
99
99
99
99

2). Stop directory server instances
service dirsrv stop

3). Run ds2index from all the instance directory.
/usr/lib64/dirsrv/slapd-M1/db2index
/usr/lib64/dirsrv/slapd-M2/db2index
/usr/lib64/dirsrv/slapd-M3/db2index
/usr/lib64/dirsrv/slapd-M4/db2index

4). Start the directory server instances
service dirsrv start

5). Added few entries to all masters.
[root@vm-idm-004 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ./AddEntry.sh Users $PORT "ou=people,dc=passsync,dc=com" usr${PORT}nnnn 9 localhost ; done

6). Check how many entries on all masters
[root@vm-idm-004 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done
135
135
135
135

No issues with the replication. However, I see these errors in the M1's error log.

[05/Jul/2016:11:38:10 +051800] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=passsync,dc=com is coming online; enabling replication
[05/Jul/2016:11:38:10 +051800] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=passsync,dc=com does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.
[05/Jul/2016:11:40:37 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_1626_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1626): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_2616_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2616): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_2616_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2616): Data required to update replica has been purged. The replica must be reinitialized.
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_1389_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1389): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_1389_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1389): Data required to update replica has been purged. The replica must be reinitialized.
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - changelog program - agmt="cn=1189_to_1489_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1489): CSN 577b4e16001008a50000 not found, we aren't as up to date, or we purged
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_1489_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:1489): Data required to update replica has been purged. The replica must be reinitialized.
[05/Jul/2016:11:40:39 +051800] NSMMReplicationPlugin - agmt="cn=1189_to_2616_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2616): Incremental update failed and requires administrator action

I also see these errors from M2

[05/Jul/2016:11:38:05 +051800] - resizing db cache size: 3301138432 -> 8000000
[05/Jul/2016:11:38:06 +051800] slapi_ldap_bind - Error: could not send bind request for id [cn=SyncManager,cn=config] mech [SIMPLE]: error -1 (Can't contact LDAP server) -5987 (Invalid function argument.) 107 (Transport endpoint is not connected)
[05/Jul/2016:11:38:06 +051800] NSMMReplicationPlugin - agmt="cn=2189_to_2626_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2626): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ((null))

Comment 6 Sankar Ramalingam 2016-07-05 09:02:41 UTC
Though there are few error messages as stated in comment #5 after step 3 & 4, the replication seems to be working fine. I tried deleting/adding few entries from M1 and it worked all fine.

[root@vm-idm-008 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289 1389 1489"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done0
0
0
0

[root@vm-idm-008 MMR_WINSYNC]#  for PORT in `echo "1189 1289 2189 2289"`; do ./AddEntry.sh Users $PORT "ou=people,dc=passsync,dc=com" oou${PORT}neen 9 localhost ; done
No of entries added will be 9

[root@vm-idm-008 MMR_WINSYNC]# for PORT in `echo "1189 1289 2189 2289 1389 1489"`; do ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b "ou=people,dc=passsync,dc=com" |grep -i "dn: uid=" |wc -l; done
36
36
36
36


The bug seems to be fixed which is around running ds2index and checking if replication breaks. Hence, marking the bug as Verified.

Platforms tested: i386 and x86_64

Packages tested:
[root@vm-idm-008 MMR_WINSYNC]# rpm -qa |grep -i 389-ds
389-ds-base-devel-1.2.11.15-75.el6_8.i686
389-ds-base-1.2.11.15-75.el6_8.i686
389-ds-base-debuginfo-1.2.11.15-75.el6_8.i686
389-ds-base-libs-1.2.11.15-75.el6_8.i686

[root@vm-idm-004 MMR_WINSYNC]# rpm -qa |grep -i 389-ds
389-ds-base-devel-1.2.11.15-75.el6_8.x86_64
389-ds-base-1.2.11.15-75.el6_8.x86_64
389-ds-base-debuginfo-1.2.11.15-75.el6_8.x86_64
389-ds-base-libs-1.2.11.15-75.el6_8.x86_64

Comment 7 Noriko Hosoi 2016-07-06 20:16:13 UTC
Hi Sankar,

Something looks wrong to me...  "Can't contact LDAP server" does not look right.  The error "Invalid function argument" looks odd to me...  Do you have any idea why you got this?  And was it recovered or did it stay in your test?  How does this agreement "cn=2189_to_2626_on_vm-idm-004.lab.eng.pnq.redhat.com" look like?

> I also see these errors from M2
> [05/Jul/2016:11:38:05 +051800] - resizing db cache size: 3301138432 -> 8000000
> [05/Jul/2016:11:38:06 +051800] slapi_ldap_bind - Error: could not send bind request for id [cn=SyncManager,cn=config] mech [SIMPLE]: error -1 (Can't contact LDAP server) -5987 (Invalid function argument.) 107 (Transport endpoint is not connected)
> [05/Jul/2016:11:38:06 +051800] NSMMReplicationPlugin - agmt="cn=2189_to_2626_on_vm-idm-004.lab.eng.pnq.redhat.com" (vm-idm-004:2626): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ((null))

> Though there are few error messages as stated in comment #5 after step 3 & 4, the replication seems to be working fine. I tried deleting/adding few entries from M1 and it worked all fine.

Is the replication topology a mesh style?  If so, if one connection is broken, other paths could take it over and following modifications could be replicated fine.  But that does not mean the replication is healthy...

Could you please rerun the test and gather more info?
Thanks.
--noriko

Comment 8 Noriko Hosoi 2016-07-06 20:30:07 UTC
Probably, it's too late for 6.8.z.  This bug fix is also in rhel-7.3 [1].

I'm updating the bug with the comments for more testing.

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1340307

Comment 10 errata-xmlrpc 2016-07-12 18:34:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1404