Hide Forgot
Description of problem: A bind operation tries to update the target entry. To do this it triggers an internal modify that tries to lock the entry into the entry cache. The entry lock, is already acquired (so the bind operation hang). But the owner thread is processing an other operation and can not own the lock Version-Release number of selected component (if applicable): master branch (d61a073 Tests to support ticket 48026) + ipa master branch How reproducible: TBD. (so far no clear test case) Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hang description: Thread 24 is trying to update an entry ("uid=stageadm,cn=users,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com") during a bind. The bind is successfull. The owner of the entry (stageadm) in the entry cache is thread 15 that is doing a MOD on a completely different entry (fqdn=vm-yyy.xxx,cn=computers,cn=accounts,dc=idm,dc=lab,dc=eng,dc=brq,dc=redhat,dc=com" It is like thread 25 did and completed and operation that did not release the entry lock. (gdb) thread 24 [Switching to thread 24 (Thread 0x7f97a07f8700 (LWP 20310))] #0 0x00007f97c4ae3590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) where #0 0x00007f97c4ae3590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f97c51386b3 in PR_EnterMonitor (mon=0x7f97800cfea0) at ../../../nspr/pr/src/pthreads/ptsynch.c:592 #2 0x00007f97baf0ca5a in cache_lock_entry (cache=cache@entry=0x7f97c95a2b38, e=0x7f9780085f70) at ldap/servers/slapd/back-ldbm/cache.c:1548 #3 0x00007f97baf1dd65 in find_entry_internal_dn (flags=0, txn=0x7f97a07f3320, lock=1, sdn=0x7f97800d0450, be=0x7f97c96c1c30, pb=0x7f97800cf9e0) at ldap/servers/slapd/back-ldbm/findentry.c:155 #4 find_entry_internal (pb=pb@entry=0x7f97800cf9e0, be=0x7f97c96c1c30, lock=lock@entry=1, txn=txn@entry=0x7f97a07f3320, flags=0, addr=<optimized out>, addr=<optimized out>) at ldap/servers/slapd/back-ldbm/findentry.c:293 #5 0x00007f97baf1e03c in find_entry2modify (pb=pb@entry=0x7f97800cf9e0, be=<optimized out>, addr=<optimized out>, txn=txn@entry=0x7f97a07f3320) at ldap/servers/slapd/back-ldbm/findentry.c:324 #6 0x00007f97baf51f86 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:576 #7 0x00007f97c6d45035 in op_shared_modify (pb=pb@entry=0x7f97800cf9e0, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1086 #8 0x00007f97c6d45ad7 in modify_internal_pb (pb=pb@entry=0x7f97800cf9e0) at ldap/servers/slapd/modify.c:631 #9 0x00007f97c6d465e3 in slapi_modify_internal_pb (pb=pb@entry=0x7f97800cf9e0) at ldap/servers/slapd/modify.c:486 #10 0x00007f97bc5f7bc3 in ipalockout_postop (pb=0x7f97a07f7b20) at ipa_lockout.c:630 #11 0x00007f97c6d56abf in plugin_call_func (list=0x7f97c950f930, operation=operation@entry=501, pb=pb@entry=0x7f97a07f7b20, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1949 #12 0x00007f97c6d56de3 in plugin_call_list (pb=0x7f97a07f7b20, operation=501, list=<optimized out>) at ldap/servers/slapd/plugin.c:1893 #13 plugin_call_plugins (pb=pb@entry=0x7f97a07f7b20, whichfunction=whichfunction@entry=501) at ldap/servers/slapd/plugin.c:467 #14 0x00007f97c722328d in do_bind (pb=pb@entry=0x7f97a07f7b20) at ldap/servers/slapd/bind.c:424 #15 0x00007f97c72299d4 in connection_dispatch_operation (pb=0x7f97a07f7b20, op=0x7f97c9a44410, conn=0x7f97b404e130) at ldap/servers/slapd/connection.c:635 #16 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #17 0x00007f97c513dcab in _pt_root (arg=0x7f97c9a8d0e0) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #18 0x00007f97c4ade52a in start_thread () from /lib64/libpthread.so.0 #19 0x00007f97c481a79d in clone () from /lib64/libc.so.6 (gdb) print *pb->pb_op->o_params.p.p_modify.modify_mods[0] $10 = {mod_op = 130, mod_type = 0x7f97800526b0 "krbLastSuccessfulAuth", mod_vals = {modv_strvals = 0x7f9780080090, modv_bvals = 0x7f9780080090}} (gdb) print **pb->pb_op->o_params.p.p_modify.modify_mods[0].mod_vals.modv_bvals $16 = {bv_len = 15, bv_val = 0x7f9780086360 "20150422172417Z"} (gdb) frame 2 #2 0x00007f97baf0ca5a in cache_lock_entry (cache=cache@entry=0x7f97c95a2b38, e=0x7f9780085f70) at ldap/servers/slapd/back-ldbm/cache.c:1548 1548 PR_EnterMonitor(e->ep_mutexp); (gdb) print e->ep_mutexp->owner $20 = 140289133967104 -> 0x7f979bfef700 Thread 15 (Thread 0x7f979bfef700 (LWP 20319)): #0 0x00007f97c4ae3590 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f97c51386b3 in PR_EnterMonitor (mon=0x7f97c95a4b80) at ../../../nspr/pr/src/pthreads/ptsynch.c:592 #2 0x00007f97baf11612 in dblayer_lock_backend (be=<optimized out>) at ldap/servers/slapd/back-ldbm/dblayer.c:3978 #3 0x00007f97baf165b6 in dblayer_txn_begin (be=0x7f97c96c1c30, parent_txn=0x0, txn=txn@entry=0x7f979bfea6f0) at ldap/servers/slapd/back-ldbm/dblayer.c:3697 #4 0x00007f97baf51995 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:560 #5 0x00007f97c6d45035 in op_shared_modify (pb=pb@entry=0x7f979bfeeb20, pw_change=pw_change@entry=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1086 #6 0x00007f97c6d46337 in do_modify (pb=pb@entry=0x7f979bfeeb20) at ldap/servers/slapd/modify.c:419 #7 0x00007f97c7229985 in connection_dispatch_operation (pb=0x7f979bfeeb20, op=0x7f97c9b8f6d0, conn=0x7f97b4051310) at ldap/servers/slapd/connection.c:660 #8 connection_threadmain () at ldap/servers/slapd/connection.c:2534 #9 0x00007f97c513dcab in _pt_root (arg=0x7f97c9aaf5e0) at ../../../nspr/pr/src/pthreads/ptthread.c:212 #10 0x00007f97c4ade52a in start_thread () from /lib64/libpthread.so.0 #11 0x00007f97c481a79d in clone () from /lib64/libc.so.6
Created attachment 1017562 [details] error log
Created attachment 1017563 [details] access log
Created attachment 1017564 [details] pstack
Hi Tierry, could you put the version of 389-ds-base? Since you opened it for RHDS10, it is on RHEL-7?
Humm. Hi Noriko, I had a doubt, either open a ticket or a bz. I did my test on F21 platform with a private build (389-ds-base-devel-1.3.4.a1-20150419103346.git1bca52e.fc21.x86_64) done with master branch (HEAD 'd61a073 Tests to support ticket 48026') + 1209573. I understand the product/component/version does not match the test I did. What should I change in this those settings ? thanks theirry
Thanks for the clarification, Theirry. Then, please open an upstream ticket. We could talk about the target for the ticket, but you want to fix it in 1.3.4, don't you? Regarding this bugzilla, I'll turn it into rhel-7.2. So, could you open a ticket ane let me know the ticket number? Thanks!
Hi Thierry, do you think we still have this problem? Thanks, --noriko
Hi Noriko, sorry for the very long delay of my answer. The upstream ticket 48165 is very likely a duplicate of 47978. In both cases, a thread may forget to unlock an entry in the entry cache (if by "chance" the entry gets out of the cache between the betxn_postop and the end of the backend operation. thanks theirry
(In reply to thierry bordaz from comment #10) > Hi Noriko, > > sorry for the very long delay of my answer. > The upstream ticket 48165 is very likely a duplicate of 47978. > In both cases, a thread may forget to unlock an entry in the entry cache (if > by "chance" the entry gets out of the cache between the betxn_postop and the > end of the backend operation. > > thanks > theirry Thank you for the good news, Thierry! One down. ;) *** This bug has been marked as a duplicate of bug 1273550 ***