Bug 183222
Summary: | Directory Server hangs when running VLV search and update operations simultaneously. | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] 389 | Reporter: | reinhard nappert <rnappert> | ||||||||||||||||||||
Component: | Directory Server | Assignee: | Noriko Hosoi <nhosoi> | ||||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Viktor Ashirov <vashirov> | ||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||
Priority: | medium | ||||||||||||||||||||||
Version: | 1.0.2 | CC: | nhosoi, nkinder, rmeggins | ||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||
Last Closed: | 2015-12-07 16:34:43 UTC | Type: | --- | ||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||
Bug Blocks: | 152373, 240316, 427409 | ||||||||||||||||||||||
Attachments: |
|
Description
reinhard nappert
2006-02-27 16:36:35 UTC
Whenever you remove the broswing index objects (vlvSearch and vlvIndex objects): dn: cn=vlvSearch,cn=userRoot,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: vlvSearch cn: vlvSearcht vlvBase: o=test vlvScope: 1 vlvFilter: (objectclass=*) dn: cn=vlvIndex,cn=userRoot,cn=ldbm database,cn=plugins,cn=config objectClass: vlvindex objectclass: top cn: vlvIndex vlvUses: 0 vlvEnable: 1 vlvSort: -createtimestamp the server does not hang anymore. Created attachment 125779 [details]
stacktrace showing the deadlock
I could reproduce the problem on the optimized build. But to make it happen on
the debug build, I needed a little trick: the libdb debug build takes more
diagnostic options, which changes the timing, I guess. I built it with the
CC="cc -g" and no --enable-debug configuration option. Using the "debug" build
libdb, the hang-up could be duplicated.
Cause of the hang-up:
add thread (thread 13) starts transaction (works as a db lock), then tries to
acquire the write lock on vlvSearchList_lock (one per backend), which is
already read locked by the search thread.
search thread (thread 14) acquires the read lock on vlvSearchList_lock, then
happens to read the page being updated, which is already locked by the add
thread.
The vlv implementation needs the lock on vlvSearchList_lock to protect
vlvSearchList (linked list per backend) and vlvIndex lists (linked list pointed
by vlvSearch->vlv_index). To avoid the deadlock, we should prohibit the write
lock (vlvSearchList_lock) in the transaction. Following is the list of
functions which tries to acquire write lock. Most functions are implemented
for manupulating vlvSearchList and/or vlvIndex list except
vlv_update_all_indexes, which traverses the linked list and updates the vlv
index in the db. It does not update the linked list itself.
[Functions which calls PR_RWLock_Wlock]
/* Callback to add a new VLV Search specification. Added write lock.*/
vlv_AddSearchEntry
==> vlvSearch_addtolist (no access to db)
/* Callback to add a new VLV Index specification. Added write lock.*/
vlv_AddIndexEntry
==> vlvSearch_addIndex (no access to db)
/* Callback to delete a VLV Index specification. */
vlv_DeleteSearchEntry
==> vlvSearch_removefromlist (no access to db)
/* Look at a new entry, and the set of VLV searches, and see whether
there are any which have deferred initialization and which can now
be initialized given the new entry. Added write lock. */
vlv_grok_new_import_entry
==> vlvSearch_reinit
(no search list nor viv index list change; initilalize search entry)
--> (called from import_foreman only:
no transaction, no search at the same time)
/*
* Search for the VLV entries which describe the pre-computed indexes we
* support. Register administartion DSE callback functions.
* This is exported to the backend initialisation routine.
* 'inst' may be NULL for non-slapd initialization...
*/
vlv_init
==> vlvSearch_delete (vlvSearch_delete -> vlvIndex_go_offline ->
dblayer_erase_index_file_nolock -> dblayer_close_file)
/* Given an entry modification check if a VLV index needs to be updated.
* This is called for every modifying operation, so it must be very efficient.
*/
vlv_update_all_indexes
==> no vlvSearchList, vlvIndex list update
/* Builds strings from Slapi_DN similar console GUI. Uses those dns to
delete vlvsearch's if they match. New write lock. */
vlv_delete_search_entry
==> vlvSearch_removefromlist
Created attachment 125780 [details]
proposed fix ldap/servers/slapd/back-ldbm/vlv.c
Change description:
Demote the write lock to the read lock in vlv_update_all_indexes.
Test case: 1) create a vlv index and vlv search index ldapadd ..... dn: cn=roomNumber <suffix>, cn=userRoot, cn=ldbm database, cn=plugins, cn=config objectClass: top objectClass: vlvSearch cn: roomNumber <suffix> vlvBase: <suffix> vlvScope: 2 vlvFilter: (objectclass=*) dn: cn=by roomNumber <suffix>,cn=roomNumber <suffix>, cn=userRoot, cn=ldbm database, cn=plugins, cn=config objectClass: top objectClass: vlvIndex cn: by roomNumber <suffix> vlvSort: roomNumber =========================================================================== 2) import entries which include roomnumber (I imported 10,000 entries) 3) ran vlv search, update, and add simultaneously. 3-1) vlv search $ while true; do ./ldapsearch -p <port> -D <DirectoryManager> -w <password> -b "<suffix>" -G 10:10:<someroomnumber> -x -S roomNumber "(objectClass=*)" dn roomnumber done 3-2) update $ while true; do ksh my_mod_script <port>; done $ cat my_mod_script #!/usr/bin/ksh DM=<DirectoryManager> DMPA=<password> ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0 dn: <one of the entries which were imported in (2)> changetype: modify add: roomnumber roomnumber: 5491 EOD0 ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD1 dn: <one of the entries which were imported in (2)> changetype: modify replace: roomnumber roomnumber: 54910 EOD1 ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD2 dn: <one of the entries which were imported in (2)> changetype: modify delete: roomnumber EOD2 ======================================================================== 3-3) add ./ldapmodify -p <port> -D <DirectoryManager> -w <password> -a -f 40kdata.ldif The test caused the deadlock reported in the Comment #2. The server with the demoted lock ran through without hang-up. If you could provide us the detailed test cases to verify the problem, we'd appreciate it. Created attachment 125781 [details]
cvs commit message
Reviewed by Pete and Rich (Thank you!)
*DOCS* known bug (DS7.1 SP2 and DS6.21 SP3) Directory Server could hang when running VLV search and update operations simultaneously. The bug seems to be fixed in case you perform modifcations and adds, like in comment #4. However, if you include deletions, it still hangs after a while (see the following script): #!/usr/bin/ksh DM=<Directory Manager> DMPA=<Password> ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0 dn: o=testorg,<suffix> changetype: add objectClass: organization objectClass: top o: testorg EOD0 ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD1 dn: o=testorg,<suffix> changetype: delete EOD1 I perform the following searches: ./ldapsearch -D $DM -w $DMPA -p $1 -b <suffix> -G 0:0:0:0 -x -S -createTimestamp objectclass=* createTimestamp modifyTimestamp The attributes createTimestamp modifyTimestamp are "vlv-indexed". Is there any chance that this could be fixed. Created attachment 142507 [details] problem description In reply to comment #7) > The bug seems to be fixed in case you perform modifcations and adds, like in > comment #4. However, if you include deletions, it still hangs after a while (see > the following script): Thank you for the bug report. Attached is the bug analysis. Created attachment 142508 [details]
cvs diffs
Files:
proto-back-ldbm.h
idl.c
sort.c
vlv.c
Changes:
1. promoted idl_delete to global to make it available in
vlv_trim_candidates_byvalue. In vlv_trim_candidate_byvalue, if any id's in the
idlist is found not having the corresponding entry, delete the id from the
idlist and retry the binary search.
2. demoted too noisy error message:
[...] - compare_entries db err -30990
[...] - compare_entries db err -30990
[...]
3. eliminated read-lock from vlv_find_index_by_filter to prevent the deadlock
with the delete operation.
What changes be->vlvSearchList? What happens if you remove the locks and some other thread changes be->vlvSearchList in the middle of the loop in vlv_find_index_by_filter()? Yeah, that's not safe. Then, please consider it's just a proof of concept that the read-lock is the cause of the dead lock... We have to come up with some other way to solve the dead lock... Ok. Yes, that looks like it is the source of the deadlock. Created attachment 142537 [details]
cvs diff vlv.c
I changed the code to put the read-lock back, but avoided to include the db
access code (cursor operation) inside of the read-lock. The db access code is
independent from the vlvSearchList, itself. Thus, the linked list change
should not have any impact.
I've been running the test for an hour. So far, no deadlock has occurred. I
keep running this test until tomorrow...
Ok. That looks good. Presumably the regular db locking will handle the case where updates happens to the records within the cursor scope. Created attachment 142607 [details]
cvs diffs
Thank you, Rich!
In addition to the changes made to vlv.c, I added more error checking (stop
binary search if the idlist is empty) and adding ber_bvecfree before the "not
found" error return in vlv_trim_candidates_byvalue.
[testcase] 0) Setting up vlv index dn: cn=roomNumber ou=Accounting dc=example dc=com, cn=userRoot, cn=ldbm database, cn=plugins, cn=config objectClass: top objectClass: vlvSearch cn: roomNumber ou=Accounting dc=example dc=com vlvBase: ou=Accounting, dc=example,dc=com vlvScope: 2 vlvFilter: (objectclass=*) dn: cn=by roomNumber ou=Accounting dc=example dc=com,cn=roomNumber ou=Accounting dc=example dc=com, cn=userRoot, cn=ldbm database, cn=plugins, cn=config objectClass: top objectClass: vlvIndex cn: by roomNumber ou=Accounting dc=example dc=com vlvSort: roomNumber ---------------------------------------------------------- 1) import data which contain these roomNumber attribute values for the test scripts bellow to change the sort order. roomNumber: 198 roomNumber: 1983 roomNumber: 199 roomNumber: 2008 ---------------------------------------------------------- Run the following 3 scripts in the endless loop simultaneously. 2) add and delete an entry #!/usr/bin/ksh DM="cn=Directory Manager" DMPA="password" ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0 dn: uid=tuser0, ou=Accounting, dc=example,dc=com changetype: add objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetOrgPerson cn: Test User0 sn: User0 uid: tuser0 givenName: Test roomNumber: 1999 EOD0 ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD1 dn: uid=tuser0, ou=Accounting, dc=example,dc=com changetype: delete EOD1 ---------------------------------------------------------- 3) replace + delete + add the vlv indexed attribute #!/usr/bin/ksh DM="cn=Directory Manager" DMPA="password" ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0 dn: uid=tuser0, ou=Accounting, dc=example,dc=com changetype: modify replace: roomNumber roomNumber: 1981 EOD0 ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0 dn: uid=tuser0, ou=Accounting, dc=example,dc=com changetype: modify delete: roomNumber EOD0 ./ldapmodify -p $1 -D "$DM" -w $DMPA << EOD0 dn: uid=tuser0, ou=Accounting, dc=example,dc=com changetype: modify add: roomNumber roomNumber: 2000 EOD0 ---------------------------------------------------------- 4) vlv search (using some other existing user) ldapsearch -p $1 -D 'uid=TVradmin0, ou=Accounting, dc=example,dc=com' -w "TVradmin0" -b "ou=Accounting,dc=example,dc=com" -s one -G 2:2:199 -x -S roomNumber "(objectclass=*)" dn roomNumber ---------------------------------------------------------- Ran this test for more than 12 hours. No deadlock nor crash was observed. Created attachment 142626 [details]
cvs commit message
Reviewed by Rich (Thank you!)
Checked in into HEAD.
(In reply to comment #18) > Created an attachment (id=142626) [edit] > cvs commit message > > Reviewed by Rich (Thank you!) > > Checked in into HEAD. Again, I used the scripts from Comment #7 and it still deadlocks. I did not run the scripts #17, but I saw that you perform a one-level search in step 4, but the vlvSearch object was configured as Scope 2 (sub-tree search). Does it make a difference that my request does not trim the candidates by value, but by index (./ldapsearch ... -G 0:0:0:0 ...)? I guess, it does. (In reply to comment #20) > Does it make a difference that my request does not trim the candidates by value, > but by index (./ldapsearch ... -G 0:0:0:0 ...)? I guess, it does. Thank you for your report. My bad. I don't know why my test case had "-s one"... After removing it, I could reproduce the problem and found the similar source of deadlock. I ran the test (2 vlv searches + 1 delete + 1 update) last night and I did not see the deadlock. I'm attaching the diff in the next Comment for review. Created attachment 143078 [details]
cvs diff vlv.c
File:
back-ldbm/vlv.c
Problem description:
There was another source of deadlock.
vlv_build_candidate_list creates db cursor in it. The current code locks the
vlvSearchList, calls vlv_build_candidate_list, then unlock it after the
function returns. Creating db cursor should not be inside of the vlvSearchList
lock.
Changes:
Before creating db cursor, unlock vlvSearchList. It should be safe since there
is no chance to traverse the vlvSearchList.
This looks much better!!! I let it run for a while, but I am confident that the bug is fixed. Thanks (In reply to comment #23) > This looks much better!!! I let it run for a while, but I am confident that the > bug is fixed. > > Thanks Thank you very much for the report and your testing! Reviewed by Rich (Thank you, too!) Checked in into HEAD. Resolves: #183222 Summary: Directory Server hangs when running VLV search and update operations simultaneously. (Comment#22) Change: Before creating db cursor, unlock vlvSearchList. CVS: ---------------------------------------------------------------------- CVS: Modified Files: CVS: vlv.c CVS: ---------------------------------------------------------------------- Checking in vlv.c; /cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/vlv.c,v <-- vlv.c new revision: 1.12; previous revision: 1.11 done |