Bug 220127
| Summary: | reindex holds vlv lock which makes searches wait | ||
|---|---|---|---|
| Product: | [Retired] 389 | Reporter: | Noriko Hosoi <nhosoi> |
| Component: | Database - Indexes/Searches | Assignee: | Noriko Hosoi <nhosoi> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Viktor Ashirov <vashirov> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 1.0.4 | CC: | nkinder, rmeggins |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-12-07 16:44:41 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 152373, 240316, 427409 | ||
While reindexing, it holds a write lock on vlv:
[vlv.c]
1249 int
1250 ldbm_back_ldbm2index(Slapi_PBlock *pb)
1251 {
[...]
1493 /* Bug 603120: slapd dumps core while indexing and deleting the db a
t the
1494 * same time. Now added the lock for the indexing code too.
1495 */
1496 vlv_acquire_lock(be);
1497 while (1) {
Actually, the lock does not need to be a write lock because when reindexing, the
backend is put into the read mode. So, no other threads have a chance to update
the backend. I propose to demote the lock in vlv_acquire_lock to the read lock:
[vlv.c]
2000 void
2001 vlv_acquire_lock(backend *be)
2002 {
2003 LDAPDebug(LDAP_DEBUG_TRACE, "vlv_acquire_lock => trying to acquire t
he lock\n", 0, 0, 0); 2004 PR_RWLock_Wlock(be->vlvSearchList_lock);
2005 }
And it turned out this bug is a duplicate of bug [171081] ldapsearch hung at
browsing index creation.
With the change, search works fine and update is blocked while the backend is
being reindexed:
Index: vlv.c
===================================================================
RCS file: /cvs/dirsec/ldapserver/ldap/servers/slapd/back-ldbm/vlv.c,v
retrieving revision 1.12
diff -t -w -U4 -r1.12 vlv.c
--- vlv.c 7 Dec 2006 21:15:00 -0000 1.12
+++ vlv.c 19 Dec 2006 01:41:22 -0000
@@ -2000,9 +2000,9 @@
void
vlv_acquire_lock(backend *be)
{
LDAPDebug(LDAP_DEBUG_TRACE, "vlv_acquire_lock => trying to acquire the
lock\n", 0, 0, 0);
- PR_RWLock_Wlock(be->vlvSearchList_lock);
+ PR_RWLock_Rlock(be->vlvSearchList_lock);
}
$ ./ldapsearch -p <port> -D "cn=Directory Manager" -w pw -b
"dc=sfbay,dc=redhat,dc=com" "(cn=*)" dn
dn: uid=EDommety9996, ou=Product Development, dc=sfbay,dc=redhat,dc=com
dn: uid=JSauck9997, ou=Product Testing, dc=sfbay,dc=redhat,dc=com
[...]
$ ./ldapmodify -p <port> -D "cn=Directory Manager" -w pw
dn: uid=JSauck9997, ou=Product Testing, dc=sfbay,dc=redhat,dc=com
changetype: modify
replace: mail
mail: js
modifying entry uid=JSauck9997, ou=Product Testing, dc=sfbay,dc=redhat,dc=com
ldap_modify: DSA is unwilling to perform
ldap_modify: additional info: database is read-only
Is it really a duplicate? Is it possible that we need to acquire a write lock under some other circumstance, but not reindex? If so, should we introduce some flag that lets the code acquire a write lock if not reindexing? What happens if you reindex an attribute that is involved in a vlv operation? If the change is safe under the above circumstances, then I approve. The next step would be to find out from the ops team what is their desired resolution to this issue. I'm hoping that it is acceptable to shutdown for several minutes to reindex (actually, offline indexing is much faster than online, so downtime should be even less) because we don't want to have to release another service pack. We would take a short shutdown to re-index. But it would need ot be on the ordder of 5-10 minutes to add the index. Also.. we need to ensure that if we have replication, there is a strategy to index both nodes in the cluster. (In reply to comment #4) > We would take a short shutdown to re-index. But it would need ot be on the > ordder of 5-10 minutes to add the index. Also.. we need to ensure that if we > have replication, there is a strategy to index both nodes in the cluster. 10 minutes might be long enough, but you'll have to run some tests to find out for sure. The fix for the bug 171081 also solved this problem. To verify using these steps, marking as modified, instead of duplicate. Steps to reproduce the problem: While running db2index, run ldapsearch $ ./db2index.pl -D "cn=Directory Manager" -w pw -n userRoot -t givenname $ ./ldapsearch -p <port> -D "cn=Directory Manager" -w pw -b "dc=sfbay,dc=redhat,dc=com" "(cn=*)" dn |
Description of problem: RHN engineering is seeing this problem and I was able to reproduce it as well. Here are the steps... * Simple install of RHDS (psycho.sfbay.redhat.com root/netscape) * Imported 700K entries. Took about 7 mins. * Did an ldapsearch and got results ok * Turned on indexing (console and via command line) for custom attribute orgid (the problem is reproducible for any attribute) * While the server is indexing the data, ldapsearch (even as Dir Manager) does not get a response. Works fine after indexing is over. RHN engineering is unhappy since the outage can last several minutes even if they do this type of indexing once in a while. I also had a chat with Noriko on this issue and she mentioned that the backend that is being indexed is not available. Questions: Why does this happen? Is the workaround to avoid such a long outage even for reads? -Satish. Steps to reproduce the problem: While running db2index, run ldapsearch $ ./db2index.pl -D "cn=Directory Manager" -w pw -n userRoot -t givenname $ ./ldapsearch -p <port> -D "cn=Directory Manager" -w pw -b "dc=sfbay,dc=redhat,dc=com" "(cn=*)" dn The search does not return since it waits for the vlv lock: (gdb) bt #0 0x006287a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x008a3b26 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x00424d20 in PR_WaitCondVar (cvar=0x863bd20, timeout=4294967295) at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:405 #3 0x0041104c in PR_RWLock_Rlock (rwlock=0x86bdd98) at ../../../../../nsprpub/pr/src/threads/prrwlock.c:246 #4 0x0096de38 in vlv_find_index_by_filter (be=0x870f2b0, base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", f=0x88f8660) at vlv.c:1854 #5 0x0095350a in filter_candidates (pb=0x87c42f0, be=0x870f2b0, base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", f=0x88f8660, nextf=0x0, range=0, err=0x4d33564) at filterindex.c:105 #6 0x009863e5 in subtree_candidates (pb=0x87c42f0, be=0x870f2b0, base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", e=0x8799478, filter=0x889acc0, managedsait=0, allids_before_scopingp=0x4d33680, err=0x4d33564) at ldbm_search.c:862 #7 0x00985eb7 in build_candidate_list (pb=0x87c42f0, be=0x870f2b0, e=0x8799478, base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", scope=2, lookup_returned_allidsp=0x4d33680, candidates=0x4d3370c) at ldbm_search.c:656 #8 0x009856ff in ldbm_back_search (pb=0x87c42f0) at ldbm_search.c:415 #9 0x00e5d2c5 in op_shared_search (pb=0x87c42f0, send_result=1) at opshared.c:545 #10 0x0805e4e1 in do_search (pb=0x87c42f0) at search.c:276 #11 0x08055f42 in connection_dispatch_operation (conn=0xb6cc1808, op=0x8ba3050, pb=0x87c42f0) at connection.c:521 #12 0x080573a2 in connection_threadmain () at connection.c:2146 #13 0x0042c296 in _pt_root (arg=0x88f0160) at ../../../../../nsprpub/pr/src/pthreads/ptthread.c:220 #14 0x008a1371 in start_thread () from /lib/tls/libpthread.so.0 #15 0x00708ffe in clone () from /lib/tls/libc.so.6 (gdb) frame 4 #4 0x0096de38 in vlv_find_index_by_filter (be=0x870f2b0, base=0x8791dd0 "dc=sfbay,dc=redhat,dc=com", f=0x88f8660) at vlv.c:1854 1854 PR_RWLock_Rlock(be->vlvSearchList_lock); Please note that this occurs even if the server does not have a vlv index and the reindexing is not vlv related at all...