Bug 1340307 - Running db2index with no options breaks replication
Summary: Running db2index with no options breaks replication
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base   
(Show other bugs)
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
Petr Bokoc
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-27 01:21 UTC by Noriko Hosoi
Modified: 2016-11-03 20:42 UTC (History)
4 users (show)

Fixed In Version: 389-ds-base-1.3.5.5-1.el7
Doc Type: Bug Fix
Doc Text:
Running *db2index* with no options no longer causes replication failures When running the *db2index* script with no options, the script failed to handle on-disk Replica Update Vector (RUV) entries because these entries have no parent entries. The existing RUV was skipped and a new one was generated instead, which subsequently caused the next replication to fail due to an ID mismatch. This update fixes handling of RUV entries in *db2index*, and running this script without specifying any options no longer causes replication failures.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 20:42:14 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2594 normal SHIPPED_LIVE Moderate: 389-ds-base security, bug fix, and enhancement update 2016-11-03 12:11:08 UTC

Description Noriko Hosoi 2016-05-27 01:21:09 UTC
{{{
Description of problem:

It was found that running db2index with no command line arguments results in
breaking replication by changing the slave's RUV tombstone.

Version-Release number of selected component (if applicable):
389-ds-base-1.2.11.15-32.el6_5.x86_64 and
389-ds-base-1.2.11.15-32.1.el6_5.bug1009122.x86_64

How reproducible:

This happened on 10+ slaves in our production environment, however, it did not
occur in pre-prod environments.  In production, I noticed some DB errors, which
may be related:

[08/Oct/2014:14:47:55 -0400] upgrade DB - userRoot: Start upgradedb.
[08/Oct/2014:14:47:55 -0400] - WARNING: Import is running with
nsslapd-db-private-import-mem on; No other process is allowed to access the
database
[08/Oct/2014:14:47:55 -0400] - reindex userRoot: Index buffering enabled with
bucket size 100
[08/Oct/2014:14:47:56 -0400] entryrdn-index - entryrdn_lookup_dn: Failed to
position cursor at the key: P2992: DB_PAGE_NOTFOUND: Requested page not
found(-30986)
[08/Oct/2014:14:48:12 -0400] - reindex userRoot: WARNING: Skipping entry
"nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff" which has no parent, ending at
line 119705 of file "id2entry.db4"
[08/Oct/2014:14:48:12 -0400] - reindex userRoot: WARNING: bad entry: ID 119705
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Workers finished; cleaning
up...
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Workers cleaned up.
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Cleaning up producer thread...
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Indexing complete.
Post-processing...
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Generating numSubordinates
complete.
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Flushing caches...
[08/Oct/2014:14:48:14 -0400] - reindex userRoot: Closing files...
[08/Oct/2014:14:48:16 -0400] - All database threads now stopped
[08/Oct/2014:14:48:16 -0400] - reindex userRoot: Reindexing complete.
Processed 126506 entries (1 were skipped) in 21 seconds. (6024.10 entries/sec)
[08/Oct/2014:14:48:16 -0400] - All database threads now stopped

I also noticed the tool stating 'upgrade DB - userRoot: Start upgradedb.', that
next should probably be changed to 'Start indexing'.


Steps to Reproduce:
1. service dirsrv stop
2. /usr/lib64/dirsrv/slapd-*/db2index
3. service dirsrv start

Actual results:

Replication fails with:

[08/Oct/2014:16:09:09 -0400] NSMMReplicationPlugin -
agmt="cn=FQDM" (ldap02:636): Replica has a different
generation ID than the local data. on the master


Expected results:

Replication to resume once dirsrv is started.


Additional info:
Master:
$ ldapsearch -xLLL -h FQDN -D "uid=manager" -W -b dc=test,dc=com
'(&(objectclass=nstombstone)(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff))'
Enter LDAP Password:
dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=test,dc=com
objectClass: top
objectClass: nsTombstone
objectClass: extensibleobject
nsds50ruv: {replicageneration} 51154ed3003d00010000
nsds50ruv: {replica 1 ldap://FQDN:389} 51154ed3003e00010000 56edd86c000000010000
nsds50ruv: {replica 2 ldap://FQDN:389} 51159de3000000020000 56ed65d1000200020000
dc: test
nsruvReplicaLastModified: {replica 1 ldap://FQDN:389} 5435b09b
nsruvReplicaLastModified: {replica 2 ldap://FQDN:389} 00000000

Slave:
$ ldapsearch -xLLL -h FQDN -D "uid=manager" -W -b dc=test,dc=com
'(&(objectclass=nstombstone)(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff))'
Enter LDAP Password:
dn: nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=test,dc=com
objectClass: top
objectClass: nsTombstone
objectClass: extensibleobject
nsds50ruv: {replicageneration} 56ed21150001ffff0000
nsds50ruv: {replica 2 ldap://FQDN:389}
nsds50ruv: {replica 1 ldap://FQDN:389}
dc: test
nsruvReplicaLastModified: {replica 2 ldap://FQDN:389} 00000000
nsruvReplicaLastModified: {replica 1 ldap://FQDN:389} 00000000

Reinitialzing the consume restored replication.
}}}

Comment 3 Sankar Ramalingam 2016-08-11 07:54:16 UTC
1).
[root@ratangad MMR_WINSYNC]# rpm -qa |grep -i 389-ds-base
389-ds-base-debuginfo-1.3.5.10-6.el7.x86_64
389-ds-base-1.3.5.10-7.el7.x86_64
389-ds-base-devel-1.3.5.10-7.el7.x86_64
389-ds-base-libs-1.3.5.10-7.el7.x86_64

2).
[root@ratangad MMR_WINSYNC]# PORT=1389; SUFF="dc=passsync,dc=com"; for PORT in `echo "1189 1289 1389 14089"`; do /usr/bin/ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b $SUFF |grep -i dn: | wc -l ; done
67
67
67
67

3).
[root@ratangad MMR_WINSYNC]# /bin/systemctl status dirsrv.target |grep -i active    Active: active since Thu 2016-08-11 12:08:33 IST; 16s ago
[root@ratangad MMR_WINSYNC]#  /bin/systemctl stop  dirsrv.target

4).
[root@ratangad MMR_WINSYNC]# /usr/lib64/dirsrv/slapd-M1/db2index
[root@ratangad MMR_WINSYNC]# /usr/lib64/dirsrv/slapd-M2/db2index
[root@ratangad MMR_WINSYNC]# /usr/lib64/dirsrv/slapd-C1/db2index
[root@ratangad MMR_WINSYNC]# /usr/lib64/dirsrv/slapd-C2/db2index

5).
[root@ratangad MMR_WINSYNC]#  /bin/systemctl start  dirsrv.target
[root@ratangad MMR_WINSYNC]# ./AddEntry.sh Users 1289 "ou=people,dc=passsync,dc=com" m2usr 9 localhost
[root@ratangad MMR_WINSYNC]# ./AddEntry.sh Users 1189 "ou=people,dc=passsync,dc=com" m1usr 9 localhost

6).
[root@ratangad MMR_WINSYNC]# PORT=1389; SUFF="dc=passsync,dc=com"; for PORT in `echo "1189 1289 1389 14089"`; do /usr/bin/ldapsearch -x -p $PORT -h localhost -D "cn=Directory Manager" -w Secret123 -b $SUFF |grep -i dn: | wc -l ; done
85
85
85
85

7). Replication works fine and no errors observed from error logs.

Hence, marking the bug as Verified.

Comment 5 errata-xmlrpc 2016-11-03 20:42:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2594.html


Note You need to log in before you can comment on or make changes to this bug.