Bug 745259 - Incorrect entryUSN index under high load in replicated environment
Summary: Incorrect entryUSN index under high load in replicated environment
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: 389
Classification: Retired
Component: Database - Indexes/Searches
Version: 1.2.9
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Noriko Hosoi
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks: 690319 389_1.2.10 752607 759301
TreeView+ depends on / blocked
 
Reported: 2011-10-11 19:58 UTC by Andrey Ivanov
Modified: 2015-12-10 18:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Under a heavy load in replicated environments, 389 Directory Server did not handle the Entry USN index correctly. Consequently, the index could become out of sync with the main database and search operations on USN entries returned incorrect results. This update modifies the Entry USN plug-in and 389 Directory Server now handles the Entry USN index as expected.
Clone Of:
: 749882 759301 (view as bug list)
Environment:
Last Closed: 2015-12-10 18:44:24 UTC
Embargoed:


Attachments (Terms of Use)
git patch file (master) (49.19 KB, patch)
2011-10-27 00:00 UTC, Noriko Hosoi
nkinder: review+
Details | Diff
scripts to reproduce the problem (10.00 KB, application/x-tar)
2011-10-27 00:18 UTC, Noriko Hosoi
no flags Details
git patch file (master) (8.51 KB, patch)
2011-10-28 18:46 UTC, Noriko Hosoi
nhosoi: review?
rmeggins: review+
Details | Diff
Patch backport to 1.2.9.10 version (5.25 KB, text/plain)
2011-11-03 20:17 UTC, Andrey Ivanov
no flags Details
git patch file (master) (11.70 KB, patch)
2011-11-11 23:19 UTC, Noriko Hosoi
no flags Details | Diff
git patch file (master) (12.43 KB, patch)
2011-11-12 02:23 UTC, Noriko Hosoi
no flags Details | Diff
reproducer (10.00 KB, application/x-tar)
2011-11-12 02:29 UTC, Noriko Hosoi
no flags Details
git patch file (master) (12.48 KB, patch)
2011-11-29 21:52 UTC, Noriko Hosoi
nhosoi: review?
rmeggins: review+
Details | Diff

Description Andrey Ivanov 2011-10-11 19:58:39 UTC
User-Agent:       Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1

The entryusn.db4 index is not maintained correctly when a large number of replicated operations arrive almost simultaneously. As a result, several entryusn numbers in the index may correspond to one entryid and the searches on entryusn return incorrect results.

Manual reindexing on entryusn resolves the problem.

Reproducible: Always

Steps to Reproduce:
I will describe our configuration where it is reproducible every time.

1. Configure three 389ds servers (in our case, it's 1.2.9.8 version with the additional bugfix patch for the bug 735121).

2. Configure multimaster replication from each server to two others. While configuring replicas, exclude entryusn and memberOf from replication.

Important: do not disable the replication of modifyTimestamp or modifiersName. It's thanks to these attributes that a high load may be produced in this configuration.

3. Create a sufficiently large groups with nested subgroups (~200 entries per group is usually sufficient).

4. In a single (it is important!) operation make a replace or add/delete of all or many uniqueMember attribute values of the group. It will generate the following events :

* an avalanche of changes of the memberOf and entryusn attributes in a lot of entries. These changes are not replicated, so it's ok.

* each changed entry has its modifyTimestamp and modifiersName attribute changed at first directly and then by memberOf plugin.

* the changes are replicated to two other replicas. On these replicas memberOf replaces the corresponding attributes and once again change modifyTimestamp and modifiersName.

* As a result for each change we have two "returns to the sender" from two replicas because of modifyTimestamp and modifiersName. Here is the typical ldap audit log example :

time: 20111011210009
dn: cn=groupe dsi restreint,ou=par entite,ou=groupes globaux,ou=groupes,dc=id,dc=polytechnique,dc=edu
changetype: modify
add: uniqueMember
uniqueMember: uid=andrey.ivanov,ou=Personnel,ou=Utilisateurs,dc=id,dc=polytech
 nique,dc=edu
-
replace: modifiersname
modifiersname: cn=X LDAP Root
-
replace: modifytimestamp
modifytimestamp: 20111011190009Z
-
replace: entryusn
entryusn: 69538
-

time: 20111011210011
dn: uid=andrey.ivanov,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011190011Z
-
replace: entryusn
entryusn: 69542
-

time: 20111011210015
dn: uid=andrey.ivanov,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011190016Z
-
replace: entryusn
entryusn: 69545
-



So, to resume, one change of the first replica generates two "returned" replicated operations. With a sufficiently large initial group modification we have a sufficiently large number of changes arriving at the same moment.
Actual Results:  
After this operation the examining of entryusn index shows a large site of duplicates. Here is an example :

dbscan -r -f entryusn.db4|less

=69294                                  
        649
=69357                                  
        649
=69387                                  
        649 
=69490                                  
        649
=69511                                  
        649

The entry with id 649 (in id2entry.db4) contains 69511 (on two other replicas entryusn for this entry is 69891 and 69780, so the replication of entryusn is out of game).

The number of duplicates can easily be found by comparing the number of all entries and unique entries :

dbscan -r -f entryusn.db4| grep -v ^= | sort |wc
   5232   12074   71602

dbscan -r -f entryusn.db4| grep -v ^= | sort |uniq|wc
   5109   11951   70820

The searches of entryUSN=smth and entryUSN>=smth, naturally, do not return the expected results since the index is corrupted.


Expected Results:  
I am unable te reproduce the problem without replication or by changing just one single independent attribute (like 'description').
I think this problem has to do something with heavy internal updates (memberOf plugin) and a lot of replicated entries arriving simultaneously.

Reindexing recreates the index correctly:

db2index.pl -v -D "cn=Directory Manager" -w - -n userRoot -t entryusn
...

dbscan -r -f entryusn.db4| grep -v ^= | sort |wc
   5109   11951   70820

dbscan -r -f entryusn.db4| grep -v ^= | sort |uniq|wc
   5109   11951   70820

Comment 1 Andrey Ivanov 2011-10-11 20:11:52 UTC
The ldap audit log for the entry id 649 producing the index entryusn.db4 file shown above :

time: 20111011205652
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185649Z
-
replace: entryusn
entryusn: 69294
-

<...some other operations...>

time: 20111011205652
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185649Z
-
replace: entryusn
entryusn: 69324
-

<...some other operations...>

time: 20111011205655
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185648Z
-
replace: entryusn
entryusn: 69357
-

<...some other operations...>

time: 20111011205655
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185649Z
-
replace: entryusn
entryusn: 69387
-

<...some other operations...>

time: 20111011205658
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185656Z
-
replace: entryusn
entryusn: 69428
-

<...some other operations...>

time: 20111011205701
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185655Z
-
replace: entryusn
entryusn: 69490
-

<...some other operations...>

time: 20111011205701
dn: uid=gilles.delobel,ou=personnel,ou=utilisateurs,dc=id,dc=polytechnique,dc=edu
changetype: modify
replace: modifiersName
modifiersName: cn=MemberOf Plugin,cn=plugins,cn=config
-
replace: modifyTimestamp
modifyTimestamp: 20111011185655Z
-
replace: entryusn
entryusn: 69511
-

Comment 2 Andrey Ivanov 2011-10-11 20:15:13 UTC
OS - CentOS 5.7 x86_64 with all the latest patches.

Comment 3 Andrey Ivanov 2011-10-12 12:07:33 UTC
Actually, a high load is not necessary. I've just made another test - a single uniquemember modify operation in a group causes the cache to be corrupted on two or three replicas. Here is the ldif for the change :

dn: cn=SomeGroup,ou=Groups,dc=example,dc=com
changetype: modify
add: uniqueMember
uniqueMember: uid=mylogin,ou=Users,dc=example,dc=com


 The duplicates in entryusn are produced for the entry referenced in uniqueMember (the entry memberOf works with). Depending on the replica there may be up to 5 duplicates (entryid 8217 in this example):

dbscan -r -f entryusn.db4

=70067                                  
        10789 
=70069                                  
        8217 
=70070                                  
        10895 
=70071                                  
        8217 
=70072                                  
        8217 
=70074                                  
        8217 
=70075                                  
        8217 
=70079                                  
        11569  

The entryid 11569 corresponds to the nsTombstone with nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff (RUV vectors). 10895 and 10789 are the entryid of the groups for which the entry membership changes.

Comment 4 Rich Megginson 2011-10-12 14:02:44 UTC
Sounds like the entryusn functionality is a good candidate for conversion to a betxnpreop/betxnpostop plugin.

Comment 5 Noriko Hosoi 2011-10-12 17:02:04 UTC
Hi Andrey, 

Could you check if you configure the fractional replication on the memberof attribute?  We highly recommend to set it up.  Please see "IMPORTANT" in "11.1.4. Using the memberOf Attribute to Manage Group Membership Information".
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html/Administration_Guide/Advanced_Entry_Management.html#groups-memberof-configuring

Also, could you check if "conflicts" occurred in the replica?

But still the corruption of the entryusn index is bad...  Once it happens, we have to ask to reindex the entryUSN attribute, which would be very painful.

(In reply to comment #4)
> Sounds like the entryusn functionality is a good candidate for conversion to a
> betxnpreop/betxnpostop plugin.

May not be that easy...  W/o replication, the backend is serialized (by default), but the backend allows simultaneous updates to the replicated operations.  If 2 updates on the same entry are replicated almost at the same time, each grabs the same pre-updated entry and do the operations in parallel.  On each thread, entryUSN plugin increments the attribute by one and replaces the old entryUSN with the new one.  For the increments, it uses slapi-counter which is guaranteed atomic.  That's said, 2 entryUSNs would appear in the entryUSN index pointing the same entry.  And the entry contains just one entryUSN, which would be the one overrides the other. (It even may not be the largest value... ):  This problem may not be solved even if we convert entryUSN to betxnpreop/betxnpostop since betxnpreop just reads the data from the not-yet-committed database...  But it's worth trying.

Comment 6 Andrey Ivanov 2011-10-12 17:35:49 UTC
Hi Noriko,

Yes, i have disabled the replication of memberOf and entryUSN (as i described in my initial description of the problem:
----------------
2. Configure multimaster replication from each server to two others. While
configuring replicas, exclude entryusn and memberOf from replication.
--------------
)

Here is a part of the replicated agreement configuration:
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf

You can also see in audit logs that it is the attributes 'modifiersName' and 'modifyTimestamp' that are replicated, not 'memberOf'. I cannot disable the replication of these attributes, especially 'modifiersName'...



I think your description is pretty close to what i observe and what's going on in our system. I have also encountered the part "(It even may not be the
largest value... )" - that's actually how i have found this bug. The entry had entryUSN, say, entryUSN=10 and the ldapsearch returned this entry even with the filter '(entryUSN>=15)'. The entryusn.db4 index file contained in fact (along with the correct '10' value) the USNs larger than 10 for the entry.

I've stumbled upon this bug while i was writing an optimisation for one of our scripts. This optimisation is using entryUSN to work on recently modified sets of entries instead of making each time a complete directory server search.

Comment 7 Noriko Hosoi 2011-10-12 18:02:36 UTC
Thanks for the confirmation, Andrey.  I see.  So, replicating modifiersName and modifyTimestamp could be one of the causes for this problem.  I also double checked with Nathan that these internal modify attributes are not allowed to put into the no-replicate list... :(  

We have to try the betxn conversion first to see how it improves the behaviour...

Comment 8 Andrey Ivanov 2011-10-12 19:30:55 UTC
If the index updates are also integrated into the transactions then betxn conversion may work (the entryusn index will be updated, maintained and non-corrupted in each parallel transaction and commited at the end of each transaction with its integrity conserved, so the last finished transaction will commit its correct index and entry attribute at the same time).

As for modifiersName and modifyTimestamp, they are generated by memberOf plugin. Some time ago i have filed a bug to have an option of disabling the update of internally updated attributes by memberOf (https://bugzilla.redhat.com/show_bug.cgi?id=453756). It would also resolve our current entryusn index problem...

Comment 12 Noriko Hosoi 2011-10-27 00:00:31 UTC
Created attachment 530413 [details]
git patch file (master)

Description:
. Changed the backend entry lock from PR_Lock to PR_Monitor to
  allow the re-entrant locking.  In ldbm_back_delete and ldbm_
  back_modify, backend entry lock was released before calling
  be_preop plugins and re-acquired just after that to avoid
  the deadlock by the be_preop plugins called from the same
  thread. The short no-lock window was the cause of the entry-
  usn corruption.  By replacing PR_Lock with the re-entrant
  PR_Monitor, we could eliminate the no-lock window.
. USN plugin: made add, modify, and modrdn usn plugins
  transaction aware.
  Note: delete plugins are intact. USN delete operation converts
  an entry to a tombstone entry, which is not suitable to move
  to inside of the transaction.

Additional changes
. Introduced SLAPI_PLUGIN_BE_TXNABORT_POST_*_FN.
. In ldbm_back_delete, SLAPI_PLUGIN_BE_TXN_PRE/POST_DELETE_FN are
  changed to apply just to the normal entries (not to the tombstone
  entries).
. Changed the condition to call dblayer_txn_abort from (retry_count
  > 0) to (retry_count >= 0) to cover the error in the first trial.
. Cleaned up compiler warnings.

Comment 13 Noriko Hosoi 2011-10-27 00:18:41 UTC
Created attachment 530415 [details]
scripts to reproduce the problem

Contents of the attachement:
  script/
  script/memofgrpL.sh
  script/memofmod1.sh
  script/memofmod0.sh

How to reproduce/verify the problem:
1. Set up 2 masters - 1 replica.

    M0 <---> M1
      \     /
       v   v
        R0

2. Enable USN and memberof plugins on the 3 servers (M0, M1, R0).

3. Edit memofgrpL.sh to adjust SUFFIX, MAXGRP, and MAXUSR 
   (currently, "dc=example,dc=com", 20, 200, respectively)

4. Run memofgrpL.sh and add the generated ldif to one of the masters (M0).
   $ sh -x memofgrpL.sh > memofgrpL.ldif
   $ ldapmodify -x -h localhost -p <portM0> -D 'cn=directory manager' -w <pw> -af memofgrpL.ldif

5. On 2 windows...
5.1 Edit memofmod[01].sh to adjust PORT, PASSWD, and SUFFIX.
    PORT: memofmod0.sh sets port of M0; memofmod1.sh sets port of M1
    PASSWD: password of Directory Manager for M0 and M1, respectively
    SUFFIX: same suffix in memofgrpL.sh
5.2 Run the scripts.
    window1> sh memofmod0.sh
    window2> sh memofmod1.sh

6. When the scripts finish, run dbscan against the R0's entryusn.db4
   # dbscan -f /var/lib/dirsrv/slapd-R0/db/userRoot/entryusn.db4 -r > out
   # tail -8 out
   =225                                   
       231
   =2353                                  
       12
   =2971                                  
       10
   =2995                                  
       13

   Entry 12, 10, 13 are the ones frequently modified.  
   Scan the output file "out" and check if the 3 entryIDs (12, 10, 13) 
   do not appear twice or more.  If not, the bug was verified.

Comment 14 Rich Megginson 2011-10-28 17:14:19 UTC
Comment on attachment 530413 [details]
git patch file (master)

Would prefer to see these changes as multiple commits - first commit would be to fix the initial problem - that is, convert to PRMonitor, and do not unlock the entry before calling the bepreop plugins.

Then, convert usn to use transactions.

Ideally, if we were using DB_SEQUENCE inside the transaction, we would not need a pre-abort function, because the actual transaction abort would make sure the incremented value was not committed.  But I can see the logic in having it, especially for auto-generated values like uuid, csn that cannot make use of DB_SEQUENCE because they are not simple sequential integers.

The way transaction support has been currently designed is that it allows pre/post betxn plugins to call internal operations (e.g. do an internal modify operation) rather than edit the list of mods for the parent operation (which necessitates moving all of that schema/syntax/etc. checking code into the transaction loop).  If we have to retry the transaction loop (e.g. due to DB_DEADLOCK), we have to restore everything back to its initial state, then reapply the mods again.  Is this handled?  I didn't see this.

Why is it a problem to handle the delete_tombstone_entry?

Comment 15 Noriko Hosoi 2011-10-28 17:44:29 UTC
(In reply to comment #14)
> Comment on attachment 530413 [details]
> git patch file (master)
> 
> Would prefer to see these changes as multiple commits - first commit would be
> to fix the initial problem - that is, convert to PRMonitor, and do not unlock
> the entry before calling the bepreop plugins.
> 
> Then, convert usn to use transactions.

All right.  I'm going to create 2 patches.

> Ideally, if we were using DB_SEQUENCE inside the transaction, we would not need
> a pre-abort function, because the actual transaction abort would make sure the
> incremented value was not committed.  But I can see the logic in having it,
> especially for auto-generated values like uuid, csn that cannot make use of
> DB_SEQUENCE because they are not simple sequential integers.
> 
> The way transaction support has been currently designed is that it allows
> pre/post betxn plugins to call internal operations (e.g. do an internal modify
> operation) rather than edit the list of mods for the parent operation (which
> necessitates moving all of that schema/syntax/etc. checking code into the
> transaction loop).  If we have to retry the transaction loop (e.g. due to
> DB_DEADLOCK), we have to restore everything back to its initial state, then
> reapply the mods again.  Is this handled?  I didn't see this.

For USN, it just "replaces" the entryusn value, repeating it is not efficient, but no harm.  I don't think there is no other plugins do this type of operations.  Do you think entryUSN is not appropriate to make transaction aware?  I haven't tested it yet, but converting PR_Lock with PR_Monitor could solve the original problem.  I can revert the transaction changes and run the test...

> Why is it a problem to handle the delete_tombstone_entry?

Reading ldbm_back_delete, checking & creating a tombstone entry is too complicated to move to the inside of transaction, I thought...

Comment 16 Rich Megginson 2011-10-28 18:02:56 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > Comment on attachment 530413 [details]
> > git patch file (master)
> > 
> > Would prefer to see these changes as multiple commits - first commit would be
> > to fix the initial problem - that is, convert to PRMonitor, and do not unlock
> > the entry before calling the bepreop plugins.
> > 
> > Then, convert usn to use transactions.
> 
> All right.  I'm going to create 2 patches.

Thanks.

> 
> > Ideally, if we were using DB_SEQUENCE inside the transaction, we would not need
> > a pre-abort function, because the actual transaction abort would make sure the
> > incremented value was not committed.  But I can see the logic in having it,
> > especially for auto-generated values like uuid, csn that cannot make use of
> > DB_SEQUENCE because they are not simple sequential integers.
> > 
> > The way transaction support has been currently designed is that it allows
> > pre/post betxn plugins to call internal operations (e.g. do an internal modify
> > operation) rather than edit the list of mods for the parent operation (which
> > necessitates moving all of that schema/syntax/etc. checking code into the
> > transaction loop).  If we have to retry the transaction loop (e.g. due to
> > DB_DEADLOCK), we have to restore everything back to its initial state, then
> > reapply the mods again.  Is this handled?  I didn't see this.
> 
> For USN, it just "replaces" the entryusn value, repeating it is not efficient,
> but no harm.  I don't think there is no other plugins do this type of
> operations.

For now.  What you have will work for this specific entryusn case.  But for the general case, we will have to revisit this.  The way memberof and referint work, they just perform internal operations, they do not modify the parent's mods list or add entry or etc.  If we want to allow any betxn pre/post plugin to make changes to the entry to add, mods list, etc., either those plugins will have to be aware that they can be called multiple times if the transaction needs to be retried, or the add/modify/etc. code will need to reset all entries/mods to the condition prior to calling any of the betxn plugins in the case where a transaction needs to be retried due to DB_DEADLOCK (or other reasons).  The former will make converting plugins difficult, and the latter will be very tricky to get right (or involve lots and lots of copying/freeing).

> Do you think entryUSN is not appropriate to make transaction
> aware?

I think it is.  Any code that generates some value associated with the operation (dna, usn, csn, uuid) should work like this:

start operation
start transaction
generate value (e.g. usn)
make value available for other plugins to use inside the transaction
commit transaction <- makes next value available for use
or
abort transaction <- rolls back value

When using DB_SEQUENCE, the "generate value" and side effects associated with commit and abort are done automatically by the db code.  I think we should think a bit more about how to make entryusn transaction aware, with the goal being to support all of the dna/usn/csn/uuid code.

> I haven't tested it yet, but converting PR_Lock with PR_Monitor could
> solve the original problem.  I can revert the transaction changes and run the
> test...

Yes, please.

> 
> > Why is it a problem to handle the delete_tombstone_entry?
> 
> Reading ldbm_back_delete, checking & creating a tombstone entry is too
> complicated to move to the inside of transaction, I thought...

Ah, ok, I see.  You're just doing the same thing for betxn plugins that is already done for be plugins with respect to the delete_tombstone_entry case.

Comment 17 Noriko Hosoi 2011-10-28 18:46:30 UTC
Created attachment 530722 [details]
git patch file (master)

This patch only contains the PRLock -> PRMonitor changes.

Rerunning the reproducer in Comment 13 against the servers with this patch applied, the corrupted entryusn index files were not observed.

I'm going to open a new bug for the transaction aware entryUSN since it is independent from the original bug...

Comment 18 Andrey Ivanov 2011-10-28 18:50:21 UTC
Hi Noriko and Rich,

i think it's a good idea to make two separate patches. I will then be able to
test the "PR_Lock -> PR_Monitor" patch in our production environment (if
backporting the patch to 1.2.9.10 is going to be simple) where the bug is
reproducible by a single MOD.

Comment 19 Noriko Hosoi 2011-10-28 19:10:48 UTC
Thanks, Andrey!

I've opened a separated bug 749882 "Convert entryUSN plugin to transaction aware type".

Comment 20 Noriko Hosoi 2011-10-28 19:39:59 UTC
Reviewed by Nathan and Rich (Thank you!!)

Pushed to master.

$ git merge work
Updating f639711..9a9f93c
Fast-forward
 ldap/servers/slapd/back-ldbm/back-ldbm.h   |   22 +++++++++++-----------
 ldap/servers/slapd/back-ldbm/backentry.c   |    2 +-
 ldap/servers/slapd/back-ldbm/cache.c       |   19 ++++++++++++++-----
 ldap/servers/slapd/back-ldbm/id2entry.c    |    1 -
 ldap/servers/slapd/back-ldbm/ldbm_delete.c |    6 ++----
 ldap/servers/slapd/back-ldbm/ldbm_modify.c |    6 ++----
 ldap/servers/slapd/modify.c                |    4 ++++
 7 files changed, 34 insertions(+), 26 deletions(-)

$ git push
Counting objects: 25, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (13/13), 1.74 KiB, done.
Total 13 (delta 11), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   f639711..9a9f93c  master -> master

Comment 21 Andrey Ivanov 2011-11-03 20:16:35 UTC
I've tried to backport the changes to the version 1.2.9.10. The patch diff is attached. It's only the "PR_Lock -> PR_Monitor" part of the original patch since other changes were incompatible (there were sufficiently important changes since 1.2.9.10).

The backported patch not fix the problem, at least not in 1.2.9.10. Here is the dbscan of entryusn.db4 file on one of the replicas (two other ones are similar):

=22                                     
        4030 
=32                                     
        4030 
=40                                     
        7755 
=42                                     
        10595 
=47                                     
        7755 
=49                                     
        10595 
=51                                     
        10700 
=52                                     
        4030 
=53                                     
        4030 
=54                                     
        4030 
=55                                     
        4030 
=56                                     
        4030 
=57                                     
        4030 
=58                                     
        4030 
=59                                     
        4030 
=60                                     
        4030 
=61                                     
        4030 
=64                                     
        4030 
=65                                     
        4030 
=66                                     
        4030 
=68                                     
        12263 

I'll try again when 1.2.10 will come into rc or release version from the current alpha state, there were some changes that could help : locking type change, transactions etc.

Comment 22 Andrey Ivanov 2011-11-03 20:17:47 UTC
Created attachment 531656 [details]
Patch backport to 1.2.9.10 version

Comment 23 Noriko Hosoi 2011-11-03 20:51:38 UTC
Thank for testing the patch, Andrey.
I'm reopening this bug...

Comment 24 Noriko Hosoi 2011-11-03 21:30:28 UTC
(In reply to comment #21)
> I'll try again when 1.2.10 will come into rc or release version from the
> current alpha state, there were some changes that could help : locking type
> change, transactions etc.

Andrey, you don't have to try...  I could reproduce the problem on 1.2.10...

Comment 25 Noriko Hosoi 2011-11-11 23:19:15 UTC
Created attachment 533192 [details]
git patch file (master)

Bug Description: When replication conflicts occur in replacing
entryusn, dangling entryusn keys pointing the same entryid are
generated in the entryusn index file.  There is no way to clean
them up unless reindexing the entryusn.

Fix Description: When replication conflicts occur, entryusn
value in the original entry and new entry are identical.  If
that is observed in index_add_mods, it skips updating entryusn
index and inform it to the entryusn plugin by setting LDAP_
MOD_IGNORE in the mod.  Entryusn plugin bepostop_modify uses
the info to determine to increment the entryusn or not.

Comment 26 Noriko Hosoi 2011-11-12 02:23:46 UTC
Created attachment 533205 [details]
git patch file (master)

Bug Description: When replication conflicts occur in replacing
entryusn, dangling entryusn keys pointing the same entryid are
generated in the entryusn index file.  There is no way to clean
them up unless reindexing the entryusn.
  
Fix Description: When replication conflicts occur in the replace
op and new entry still contains the old value, then the old value
won't be removed from the index file.  Also, if the new value is
not added to the entry, the new value won't be added to the index
file. 

The failure is informed to entryusn by by setting LDAP_MOD_IGNORE
in the mod.  Entryusn plugin bepostop_modify uses the info to
determine to increment the entryusn or not.

Comment 27 Noriko Hosoi 2011-11-12 02:29:28 UTC
Created attachment 533206 [details]
reproducer

Use this attachment instead of the above "530415: scripts to reproduce the problem".

Comment 28 Rich Megginson 2011-11-14 18:36:14 UTC
Comment on attachment 533205 [details]
git patch file (master)

https://bugzilla.redhat.com/attachment.cgi?id=533205&action=diff#a/ldap/servers/slapd/slapi-private.h_sec1

The comment is wrong, it should be

 * #define LDAP_MOD_INCREMENT        0x03 -- Openldap extension

https://bugzilla.redhat.com/attachment.cgi?id=533205&action=diff#a/ldap/servers/slapd/back-ldbm/index.c_sec3 - at the new line 626 - is it possible that mods_valueArray will be NULL at this point?  If so, what should rc be set to?

Comment 29 Noriko Hosoi 2011-11-29 21:49:32 UTC
Thanks for your comments, Rich.

(In reply to comment #28)
> Comment on attachment 533205 [details]
> git patch file (master)
> 
> https://bugzilla.redhat.com/attachment.cgi?id=533205&action=diff#a/ldap/servers/slapd/slapi-private.h_sec1
> 
> The comment is wrong, it should be
> 
>  * #define LDAP_MOD_INCREMENT        0x03 -- Openldap extension

Fixed.

> https://bugzilla.redhat.com/attachment.cgi?id=533205&action=diff#a/ldap/servers/slapd/back-ldbm/index.c_sec3
> - at the new line 626 - is it possible that mods_valueArray will be NULL at
> this point?  If so, what should rc be set to?

Our server does not return an error when conflict occurs, but just stores the conflict info in the entries.  So, I think we should not set any error to rc and fail the operation here (one of the nested memberof updates in this test case)...  What do you think?

Comment 30 Noriko Hosoi 2011-11-29 21:52:38 UTC
Created attachment 538249 [details]
git patch file (master)

Modified the previous patch '533205: git patch file (master)' applying the suggestions by Rich.

Comment 31 Noriko Hosoi 2011-12-01 22:59:23 UTC
Reviewed by Rich (Thank you!!!)

Pushed to master.

$ git merge 745259-4
Updating d439e3a..a360ab7
Fast-forward
 ldap/servers/plugins/usn/usn.c                 |   50 +++++++++++++++++++++++-
 ldap/servers/slapd/back-ldbm/index.c           |   35 ++++++++++++++---
 ldap/servers/slapd/back-ldbm/ldbm_modify.c     |    4 +-
 ldap/servers/slapd/back-ldbm/ldbm_modrdn.c     |    6 +-
 ldap/servers/slapd/back-ldbm/proto-back-ldbm.h |    2 +-
 ldap/servers/slapd/entrywsi.c                  |   12 ++++--
 ldap/servers/slapd/slapi-private.h             |    4 +-
 7 files changed, 95 insertions(+), 18 deletions(-)

$ git push
Counting objects: 29, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (15/15), done.
Writing objects: 100% (15/15), 2.55 KiB, done.
Total 15 (delta 12), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   d439e3a..a360ab7  master -> master

Comment 32 Andrey Ivanov 2011-12-02 21:22:11 UTC
After backporting both patches to 1.2.9.10 and testing it turns out that the bug is now fixed :) Thank you, Noriko!

Comment 33 Noriko Hosoi 2011-12-02 21:46:48 UTC
(In reply to comment #32)
> After backporting both patches to 1.2.9.10 and testing it turns out that the
> bug is now fixed :) Thank you, Noriko!

Thank you sooooo much for verifying the bug, Andrey!!
I'm so glad and relieved that I could get rid of your blocker... ;)
Once again, thanks for reporting a crucial bug.

Comment 34 Rich Megginson 2012-01-10 20:18:27 UTC
Upstream ticket:
https://fedorahosted.org/389/ticket/242

Comment 35 Miroslav Svoboda 2012-01-17 11:35:44 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Under a heavy load in replicated environments, 389 Directory Server did not handle the Entry USN index correctly. Consequently, the index could become out of sync with the main database and search operations on USN entries returned incorrect results. This update modifies the Entry USN plug-in and 389 Directory Server now handles the Entry USN index as expected.


Note You need to log in before you can comment on or make changes to this bug.