Bug 183222

Summary: Directory Server hangs when running VLV search and update operations simultaneously.
Product: [Retired] 389 Reporter: reinhard nappert <rnappert>
Component: Directory ServerAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED CURRENTRELEASE QA Contact: Viktor Ashirov <vashirov>
Severity: high Docs Contact:
Priority: medium    
Version: 1.0.2CC: nhosoi, nkinder, rmeggins
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-07 16:34:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 152373, 240316, 427409    
Attachments:
Description Flags
stacktrace showing the deadlock
none
proposed fix ldap/servers/slapd/back-ldbm/vlv.c
none
cvs commit message
none
problem description
none
cvs diffs
none
cvs diff vlv.c
none
cvs diffs
none
cvs commit message
none
cvs diff vlv.c none

Description reinhard nappert 2006-02-27 16:36:35 UTC
Directory Server hangs when running VLV search and update operations
simultaneously. This bug also exited in the iPlanet/Sun Directory Server
products (5.1 and 5.2). They fixed it. See bug-id 4973380 in the release notes
(http://docs.sun.com/source/819-1814-10/relnotes_ds51sp4.html)

This bug exists in 1.0, 1.0.1 and 7.1

First, you create a vlv-index (through command line, by adding the appropriate
configuration objects and setting the index vlvindex ...)
Then, you need to have two clients, one performing vlv searches and on
performing adds.

I used the com.sun.jndi.ldap.ctl package for the client. This client would
perform the search operation continuously, where as the second client add objects.

The server hangs after less than a minutes

Comment 1 reinhard nappert 2006-03-03 20:49:09 UTC
Whenever you remove the broswing index objects (vlvSearch and vlvIndex objects):
dn: cn=vlvSearch,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
objectClass: top
objectClass: vlvSearch
cn: vlvSearcht
vlvBase: o=test
vlvScope: 1
vlvFilter: (objectclass=*)

dn: cn=vlvIndex,cn=userRoot,cn=ldbm database,cn=plugins,cn=config
objectClass: vlvindex
objectclass: top
cn: vlvIndex
vlvUses: 0
vlvEnable: 1
vlvSort: -createtimestamp

the server does not hang anymore.

Comment 2 Noriko Hosoi 2006-03-08 00:11:31 UTC
Created attachment 125779 [details]
stacktrace showing the deadlock

I could reproduce the problem on the optimized build.  But to make it happen on
the debug build, I needed a little trick: the libdb debug build takes more
diagnostic options, which changes the timing, I guess.	I built it with the
CC="cc -g" and no --enable-debug configuration option.	Using the "debug" build
libdb, the hang-up could be duplicated. 

Cause of the hang-up:
add thread (thread 13) starts transaction (works as a db lock), then tries to
acquire the write lock on vlvSearchList_lock (one per backend), which is
already read locked by the search thread.
search thread (thread 14) acquires the read lock on vlvSearchList_lock, then
happens to read the page being updated, which is already locked by the add
thread.

The vlv implementation needs the lock on vlvSearchList_lock to protect
vlvSearchList (linked list per backend) and vlvIndex lists (linked list pointed
by vlvSearch->vlv_index).  To avoid the deadlock, we should prohibit the write
lock (vlvSearchList_lock) in the transaction.  Following is the list of
functions which tries to acquire write lock.  Most functions are implemented
for manupulating vlvSearchList and/or vlvIndex list except
vlv_update_all_indexes, which traverses the linked list and updates the vlv
index in the db.  It does not update the linked list itself. 

[Functions which calls PR_RWLock_Wlock]
/* Callback to add a new VLV Search specification. Added write lock.*/
vlv_AddSearchEntry 
    ==> vlvSearch_addtolist (no access to db)

/* Callback to add a new VLV Index specification. Added write lock.*/
vlv_AddIndexEntry 
    ==> vlvSearch_addIndex (no access to db)

/* Callback to delete a  VLV Index specification. */
vlv_DeleteSearchEntry 
    ==> vlvSearch_removefromlist (no access to db)

/* Look at a new entry, and the set of VLV searches, and see whether
there are any which have deferred initialization and which can now
be initialized given the new entry. Added write lock. */
vlv_grok_new_import_entry 
    ==> vlvSearch_reinit 
	(no search list nor viv index list change; initilalize search entry)
    --> (called from import_foreman only: 
	 no transaction, no search at the same time)

/*
 * Search for the VLV entries which describe the pre-computed indexes we
 * support.  Register administartion DSE callback functions.
 * This is exported to the backend initialisation routine.
 * 'inst' may be NULL for non-slapd initialization...
 */
vlv_init 
    ==> vlvSearch_delete (vlvSearch_delete -> vlvIndex_go_offline -> 
	dblayer_erase_index_file_nolock -> dblayer_close_file)

/* Given an entry modification check if a VLV index needs to be updated.
* This is called for every modifying operation, so it must be very efficient.
*/
vlv_update_all_indexes 
    ==> no vlvSearchList, vlvIndex list update

/* Builds strings from Slapi_DN similar console GUI. Uses those dns to
   delete vlvsearch's if they match. New write lock. */
vlv_delete_search_entry 
    ==> vlvSearch_removefromlist

Comment 3 Noriko Hosoi 2006-03-08 00:15:09 UTC
Created attachment 125780 [details]
proposed fix ldap/servers/slapd/back-ldbm/vlv.c

Change description:
Demote the write lock to the read lock in vlv_update_all_indexes.

Comment 4 Noriko Hosoi 2006-03-08 00:37:10 UTC
Test case:
1) create a vlv index and vlv search index
ldapadd .....
dn: cn=roomNumber <suffix>, cn=userRoot, cn=ldbm database, cn=plugins, cn=config
objectClass: top
objectClass: vlvSearch
cn: roomNumber <suffix>
vlvBase: <suffix>
vlvScope: 2
vlvFilter: (objectclass=*)

dn: cn=by roomNumber <suffix>,cn=roomNumber <suffix>, cn=userRoot, cn=ldbm
database, cn=plugins, cn=config
objectClass: top
objectClass: vlvIndex
cn: by roomNumber <suffix>
vlvSort: roomNumber
===========================================================================
2) import entries which include roomnumber (I imported 10,000 entries)
3) ran vlv search, update, and add simultaneously.
3-1) vlv search
$ while true; do
./ldapsearch -p <port> -D <DirectoryManager> -w <password> -b "<suffix>" -G
10:10:<someroomnumber> -x -S roomNumber "(objectClass=*)" dn roomnumber
done
3-2) update
$ while true; do ksh my_mod_script <port>; done
$ cat my_mod_script
#!/usr/bin/ksh
DM=<DirectoryManager>
DMPA=<password>
./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0
dn: <one of the entries which were imported in (2)>
changetype: modify
add: roomnumber
roomnumber: 5491
EOD0

./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD1
dn: <one of the entries which were imported in (2)>
changetype: modify
replace: roomnumber
roomnumber: 54910
EOD1

./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD2
dn: <one of the entries which were imported in (2)>
changetype: modify
delete: roomnumber
EOD2
========================================================================
3-3) add
./ldapmodify -p <port> -D <DirectoryManager> -w <password> -a -f 40kdata.ldif

The test caused the deadlock reported in the Comment #2.  The server with the
demoted lock ran through without hang-up.

If you could provide us the detailed test cases to verify the problem, we'd
appreciate it.

Comment 5 Noriko Hosoi 2006-03-08 01:26:52 UTC
Created attachment 125781 [details]
cvs commit message

Reviewed by Pete and Rich (Thank you!)

Comment 6 Noriko Hosoi 2006-03-08 01:46:56 UTC
*DOCS*
known bug (DS7.1 SP2 and DS6.21 SP3)
Directory Server could hang when running VLV search and update operations
simultaneously.

Comment 7 reinhard nappert 2006-10-05 15:48:17 UTC
The bug seems to be fixed in case you perform modifcations and adds, like in
comment #4. However, if you include deletions, it still hangs after a while (see
the following script):

#!/usr/bin/ksh
DM=<Directory Manager>
DMPA=<Password>
./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0
dn: o=testorg,<suffix>
changetype: add
objectClass: organization
objectClass: top
o: testorg
EOD0

./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD1
dn: o=testorg,<suffix>
changetype: delete
EOD1

I perform the following searches:
./ldapsearch -D $DM -w $DMPA -p $1 -b <suffix> -G 0:0:0:0 -x -S -createTimestamp
objectclass=* createTimestamp modifyTimestamp

The attributes createTimestamp modifyTimestamp are "vlv-indexed".

Comment 8 reinhard nappert 2006-11-02 16:55:11 UTC
Is there any chance that this could be fixed.

Comment 9 Noriko Hosoi 2006-11-30 19:13:52 UTC
Created attachment 142507 [details]
problem description

In reply to comment #7)
> The bug seems to be fixed in case you perform modifcations and adds, like in
> comment #4. However, if you include deletions, it still hangs after a while
(see
> the following script):

Thank you for the bug report.  Attached is the bug analysis.

Comment 10 Noriko Hosoi 2006-11-30 19:23:41 UTC
Created attachment 142508 [details]
cvs diffs

Files:
 proto-back-ldbm.h
 idl.c
 sort.c
 vlv.c

Changes:
1. promoted idl_delete to global to make it available in
vlv_trim_candidates_byvalue.  In vlv_trim_candidate_byvalue, if any id's in the
idlist is found not having the corresponding entry, delete the id from the
idlist and retry the binary search.
2. demoted too noisy error message:
   [...] - compare_entries db err -30990
   [...] - compare_entries db err -30990
   [...]
3. eliminated read-lock from vlv_find_index_by_filter to prevent the deadlock
with the delete operation.

Comment 11 Rich Megginson 2006-11-30 20:27:05 UTC
What changes be->vlvSearchList?  What happens if you remove the locks and some
other thread changes be->vlvSearchList in the middle of the loop in
vlv_find_index_by_filter()?

Comment 12 Noriko Hosoi 2006-11-30 21:10:39 UTC
Yeah, that's not safe.  Then, please consider it's just a proof of concept that
the read-lock is the cause of the dead lock...  We have to come up with some
other way to solve the dead lock...

Comment 13 Rich Megginson 2006-11-30 21:21:46 UTC
Ok.  Yes, that looks like it is the source of the deadlock.

Comment 14 Noriko Hosoi 2006-11-30 23:35:58 UTC
Created attachment 142537 [details]
cvs diff vlv.c

I changed the code to put the read-lock back, but avoided to include the db
access code (cursor operation) inside of the read-lock.  The db access code is
independent from the vlvSearchList, itself.  Thus, the linked list change
should not have any impact.

I've been running the test for an hour.  So far, no deadlock has occurred.  I
keep running this test until tomorrow...

Comment 15 Rich Megginson 2006-12-01 04:42:03 UTC
Ok.  That looks good.  Presumably the regular db locking will handle the case
where updates happens to the records within the cursor scope.

Comment 16 Noriko Hosoi 2006-12-01 19:51:35 UTC
Created attachment 142607 [details]
cvs diffs

Thank you, Rich!

In addition to the changes made to vlv.c, I added more error checking (stop
binary search if the idlist is empty) and adding ber_bvecfree before the "not
found" error return in vlv_trim_candidates_byvalue.

Comment 17 Noriko Hosoi 2006-12-01 20:13:12 UTC
[testcase]

0) Setting up vlv index
dn: cn=roomNumber ou=Accounting dc=example dc=com, cn=userRoot, cn=ldbm
database, cn=plugins, cn=config
objectClass: top
objectClass: vlvSearch
cn: roomNumber ou=Accounting dc=example dc=com
vlvBase: ou=Accounting, dc=example,dc=com
vlvScope: 2
vlvFilter: (objectclass=*)

dn: cn=by roomNumber ou=Accounting dc=example dc=com,cn=roomNumber ou=Accounting
dc=example dc=com, cn=userRoot, cn=ldbm database, cn=plugins, cn=config
objectClass: top
objectClass: vlvIndex
cn: by roomNumber ou=Accounting dc=example dc=com
vlvSort: roomNumber
----------------------------------------------------------

1) import data which contain these roomNumber attribute values for the test
scripts bellow to change the sort order.
roomNumber: 198
roomNumber: 1983
roomNumber: 199
roomNumber: 2008
----------------------------------------------------------

Run the following 3 scripts in the endless loop simultaneously.
2) add and delete an entry
#!/usr/bin/ksh
DM="cn=Directory Manager"
DMPA="password"
./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0
dn: uid=tuser0, ou=Accounting, dc=example,dc=com
changetype: add
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
cn: Test User0
sn: User0
uid: tuser0
givenName: Test
roomNumber: 1999
EOD0

./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD1
dn: uid=tuser0, ou=Accounting, dc=example,dc=com
changetype: delete
EOD1
----------------------------------------------------------

3) replace + delete + add the vlv indexed attribute
#!/usr/bin/ksh
DM="cn=Directory Manager"
DMPA="password"
./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0
dn: uid=tuser0, ou=Accounting, dc=example,dc=com
changetype: modify
replace: roomNumber
roomNumber: 1981
EOD0

./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0
dn: uid=tuser0, ou=Accounting, dc=example,dc=com
changetype: modify
delete: roomNumber
EOD0

./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0
dn: uid=tuser0, ou=Accounting, dc=example,dc=com
changetype: modify
add: roomNumber
roomNumber: 2000
EOD0
----------------------------------------------------------

4) vlv search (using some other existing user)
ldapsearch -p $1 -D 'uid=TVradmin0, ou=Accounting, dc=example,dc=com' -w
"TVradmin0" -b "ou=Accounting,dc=example,dc=com" -s one -G 2:2:199 -x -S
roomNumber "(objectclass=*)" dn roomNumber
----------------------------------------------------------

Ran this test for more than 12 hours.
No deadlock nor crash was observed.

Comment 18 Noriko Hosoi 2006-12-01 22:01:52 UTC
Created attachment 142626 [details]
cvs commit message

Reviewed by Rich (Thank you!)

Checked in into HEAD.

Comment 19 reinhard nappert 2006-12-06 16:34:02 UTC
(In reply to comment #18)
> Created an attachment (id=142626) [edit]
> cvs commit message
> 
> Reviewed by Rich (Thank you!)
> 
> Checked in into HEAD.
Again, I used the scripts from Comment #7 and it still deadlocks. I did not run
the scripts #17, but I saw that you perform a one-level search in step 4, but
the vlvSearch object was configured as Scope 2 (sub-tree search).



Comment 20 reinhard nappert 2006-12-07 17:31:36 UTC
Does it make a difference that my request does not trim the candidates by value,
but by index (./ldapsearch ... -G 0:0:0:0 ...)? I guess, it does.

Comment 21 Noriko Hosoi 2006-12-07 18:34:21 UTC
(In reply to comment #20)
> Does it make a difference that my request does not trim the candidates by value,
> but by index (./ldapsearch ... -G 0:0:0:0 ...)? I guess, it does.

Thank you for your report.  My bad.  I don't know why my test case had "-s one"...
After removing it, I could reproduce the problem and found the similar source of
deadlock.  I ran the test (2 vlv searches + 1 delete + 1 update) last night and
I did not see the deadlock.  I'm attaching the diff in the next Comment for review.

Comment 22 Noriko Hosoi 2006-12-07 18:47:25 UTC
Created attachment 143078 [details]
cvs diff vlv.c

File:
  back-ldbm/vlv.c

Problem description:
There was another source of deadlock.
vlv_build_candidate_list creates db cursor in it.  The current code locks the
vlvSearchList, calls vlv_build_candidate_list, then unlock it after the
function returns.  Creating db cursor should not be inside of the vlvSearchList
lock.

Changes:
Before creating db cursor, unlock vlvSearchList.  It should be safe since there
is no chance to traverse the vlvSearchList.

Comment 23 reinhard nappert 2006-12-07 19:39:33 UTC
This looks much better!!! I let it run for a while, but I am confident that the
bug is fixed.

Thanks

Comment 24 Noriko Hosoi 2006-12-07 21:22:32 UTC
(In reply to comment #23)
> This looks much better!!! I let it run for a while, but I am confident that the
> bug is fixed.
> 
> Thanks

Thank you very much for the report and your testing!

Reviewed by Rich (Thank you, too!)

Checked in into HEAD.

Resolves: #183222 Summary: Directory Server hangs when running VLV search and
update operations simultaneously. (Comment#22)
Change: Before creating db cursor, unlock vlvSearchList.
CVS: ----------------------------------------------------------------------
CVS: Modified Files:
CVS:    vlv.c
CVS: ----------------------------------------------------------------------
Checking in vlv.c;
/cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/vlv.c,v  <--  vlv.c
new revision: 1.12; previous revision: 1.11
done