Bug 220127

Summary:	reindex holds vlv lock which makes searches wait
Product:	[Retired] 389	Reporter:	Noriko Hosoi <nhosoi>
Component:	Database - Indexes/Searches	Assignee:	Noriko Hosoi <nhosoi>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Viktor Ashirov <vashirov>
Severity:	high	Docs Contact:
Priority:	high
Version:	1.0.4	CC:	nkinder, rmeggins
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-12-07 16:44:41 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	152373, 240316, 427409

Description Noriko Hosoi 2006-12-19 00:59:26 UTC

Description of problem:
RHN engineering is seeing this problem and I was able to reproduce it as well.
Here are the steps...

* Simple install of RHDS (psycho.sfbay.redhat.com root/netscape)
* Imported 700K entries. Took about 7 mins.
* Did an ldapsearch and got results ok
* Turned on indexing (console and via command line) for custom attribute orgid
(the problem is reproducible for any attribute)
* While the server is indexing the data, ldapsearch (even as Dir Manager) does
not get a response. Works fine after indexing is over.

    RHN engineering is unhappy since the outage can last several minutes even if
they do this type of indexing once in a while. I also had a chat with Noriko on
this issue and she mentioned that the backend that is being indexed is not
available.

    Questions: Why does this happen? Is the workaround to avoid such a long
outage even for reads?

-Satish. 

Steps to reproduce the problem:
While running db2index, run ldapsearch
$ ./db2index.pl -D "cn=Directory Manager" -w pw -n userRoot -t givenname
$ ./ldapsearch -p <port> -D "cn=Directory Manager" -w pw -b
"dc=sfbay,dc=redhat,dc=com" "(cn=*)" dn

The search does not return since it waits for the vlv lock:
(gdb) bt
#0  0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x008a3b26 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/libpthread.so.0
#2  0x00424d20 in PR_WaitCondVar (cvar=0x863bd20, timeout=4294967295)
    at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:405
#3  0x0041104c in PR_RWLock_Rlock (rwlock=0x86bdd98)
    at ../../../../../nsprpub/pr/src/threads/prrwlock.c:246
#4  0x0096de38 in vlv_find_index_by_filter (be=0x870f2b0,
    base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", f=0x88f8660) at vlv.c:1854
#5  0x0095350a in filter_candidates (pb=0x87c42f0, be=0x870f2b0,
    base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", f=0x88f8660, nextf=0x0,
    range=0, err=0x4d33564) at filterindex.c:105
#6  0x009863e5 in subtree_candidates (pb=0x87c42f0, be=0x870f2b0,
    base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", e=0x8799478, filter=0x889acc0,
    managedsait=0, allids_before_scopingp=0x4d33680, err=0x4d33564)
    at ldbm_search.c:862
#7  0x00985eb7 in build_candidate_list (pb=0x87c42f0, be=0x870f2b0,
    e=0x8799478, base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", scope=2,
    lookup_returned_allidsp=0x4d33680, candidates=0x4d3370c)
    at ldbm_search.c:656
#8  0x009856ff in ldbm_back_search (pb=0x87c42f0) at ldbm_search.c:415
#9  0x00e5d2c5 in op_shared_search (pb=0x87c42f0, send_result=1)
    at opshared.c:545
#10 0x0805e4e1 in do_search (pb=0x87c42f0) at search.c:276
#11 0x08055f42 in connection_dispatch_operation (conn=0xb6cc1808,
    op=0x8ba3050, pb=0x87c42f0) at connection.c:521
#12 0x080573a2 in connection_threadmain () at connection.c:2146
#13 0x0042c296 in _pt_root (arg=0x88f0160)
    at ../../../../../nsprpub/pr/src/pthreads/ptthread.c:220
#14 0x008a1371 in start_thread () from /lib/tls/libpthread.so.0
#15 0x00708ffe in clone () from /lib/tls/libc.so.6
(gdb) frame 4
#4  0x0096de38 in vlv_find_index_by_filter (be=0x870f2b0,
    base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", f=0x88f8660) at vlv.c:1854
1854            PR_RWLock_Rlock(be->vlvSearchList_lock);

Please note that this occurs even if the server does not have a vlv index and
the reindexing is not vlv related at all...

Comment 1 Noriko Hosoi 2006-12-19 01:29:05 UTC

While reindexing, it holds a write lock on vlv:
[vlv.c]
   1249 int
   1250 ldbm_back_ldbm2index(Slapi_PBlock *pb)
   1251 {
    [...]
   1493     /* Bug 603120: slapd dumps core while indexing and deleting the db a
       t the
   1494      * same time. Now added the lock for the indexing code too.
   1495      */
   1496     vlv_acquire_lock(be);
   1497     while (1) {

Actually, the lock does not need to be a write lock because when reindexing, the
backend is put into the read mode.  So, no other threads have a chance to update
the backend.  I propose to demote the lock in vlv_acquire_lock to the read lock:
[vlv.c]
   2000 void
   2001 vlv_acquire_lock(backend *be)
   2002 {
   2003     LDAPDebug(LDAP_DEBUG_TRACE, "vlv_acquire_lock => trying to acquire t
       he lock\n", 0, 0, 0);    2004     PR_RWLock_Wlock(be->vlvSearchList_lock);
   2005 }

And it turned out this bug is a duplicate of bug [171081] ldapsearch hung at
browsing index creation.

Comment 2 Noriko Hosoi 2006-12-19 01:50:04 UTC

With the change, search works fine and update is blocked while the backend is
being reindexed:
Index: vlv.c
===================================================================
RCS file: /cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/vlv.c,v
retrieving revision 1.12
diff -t -w -U4 -r1.12 vlv.c
--- vlv.c       7 Dec 2006 21:15:00 -0000       1.12
+++ vlv.c       19 Dec 2006 01:41:22 -0000
@@ -2000,9 +2000,9 @@
 void
 vlv_acquire_lock(backend *be)
 {
         LDAPDebug(LDAP_DEBUG_TRACE, "vlv_acquire_lock => trying to acquire the
lock\n", 0, 0, 0);
-        PR_RWLock_Wlock(be->vlvSearchList_lock);
+        PR_RWLock_Rlock(be->vlvSearchList_lock);
 }



$ ./ldapsearch -p <port> -D "cn=Directory Manager" -w pw -b
"dc=sfbay,dc=redhat,dc=com" "(cn=*)" dn
dn: uid=EDommety9996, ou=Product Development, dc=sfbay,dc=redhat,dc=com
dn: uid=JSauck9997, ou=Product Testing, dc=sfbay,dc=redhat,dc=com
[...]

$ ./ldapmodify  -p <port> -D "cn=Directory Manager" -w pw
dn: uid=JSauck9997, ou=Product Testing, dc=sfbay,dc=redhat,dc=com
changetype: modify
replace: mail
mail: js

modifying entry uid=JSauck9997, ou=Product Testing, dc=sfbay,dc=redhat,dc=com
ldap_modify: DSA is unwilling to perform
ldap_modify: additional info: database is read-only

Comment 3 Rich Megginson 2006-12-19 03:01:51 UTC

Is it really a duplicate?  Is it possible that we need to acquire a write lock
under some other circumstance, but not reindex?  If so, should we introduce some
flag that lets the code acquire a write lock if not reindexing?  What happens if
you reindex an attribute that is involved in a vlv operation?

If the change is safe under the above circumstances, then I approve.  The next
step would be to find out from the ops team what is their desired resolution to
this issue.  I'm hoping that it is acceptable to shutdown for several minutes to
reindex (actually, offline indexing is much faster than online, so downtime
should be even less) because we don't want to have to release another service pack.

Comment 4 Bryan Kearney 2006-12-19 12:32:33 UTC

We would take a short shutdown to re-index. But it would need ot be on the
ordder of 5-10 minutes to add the index. Also.. we need to ensure that if we
have replication, there is a strategy to index both nodes in the cluster.

Comment 5 Rich Megginson 2006-12-21 19:17:03 UTC

(In reply to comment #4)
> We would take a short shutdown to re-index. But it would need ot be on the
> ordder of 5-10 minutes to add the index. Also.. we need to ensure that if we
> have replication, there is a strategy to index both nodes in the cluster.

10 minutes might be long enough, but you'll have to run some tests to find out
for sure.

Comment 6 Noriko Hosoi 2007-11-27 23:33:14 UTC

The fix for the bug 171081 also solved this problem.

To verify using these steps, marking as modified, instead of duplicate.
Steps to reproduce the problem:
While running db2index, run ldapsearch
$ ./db2index.pl -D "cn=Directory Manager" -w pw -n userRoot -t givenname
$ ./ldapsearch -p <port> -D "cn=Directory Manager" -w pw -b
"dc=sfbay,dc=redhat,dc=com" "(cn=*)" dn