Bug 1198263

Summary: Limit deadlocks between DS plugin DNA and slapi-nis
Product: Red Hat Enterprise Linux 7 Reporter: Martin Kosek <mkosek>
Component: ipaAssignee: thierry bordaz <tbordaz>
Status: CLOSED ERRATA QA Contact: Namita Soman <nsoman>
Severity: high Docs Contact:
Priority: medium    
Version: 7.0CC: drieden, jcholast, jkurik, rcritten, spoore
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ipa-4.2.0-0.1.alpha1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1199128 (view as bug list) Environment:
Last Closed: 2015-11-19 12:01:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1199128    

Description Martin Kosek 2015-03-03 16:47:21 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/freeipa/ticket/4927

The deadlock was occurs between two threads acquiring locks in the opposite order

There is a deadlock between thread 33 and 30:

{{{
(gdb) thread 33
[Switching to thread 33 (Thread 0x7f505effd700 (LWP 17961))]
#0  0x00007f507d7e606e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
(gdb) where
#0  0x00007f507d7e606e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00007f507256fba3 in backend_shr_add_cb.part.13 () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x00007f507256fd51 in backend_shr_betxn_post_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x00007f507fa6ddb0 in plugin_call_func (list=0x7f5080980230, operation=operation@entry=560, pb=pb@entry=0x7f5054109b10, 
    call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1952
#4  0x00007f507fa6e008 in plugin_call_list (pb=0x7f5054109b10, operation=560, list=<optimized out>)
    at ldap/servers/slapd/plugin.c:1886
#5  plugin_call_plugins (pb=pb@entry=0x7f5054109b10, whichfunction=whichfunction@entry=560) at ldap/servers/slapd/plugin.c:459
#6  0x00007f50743140e4 in ldbm_back_add (pb=0x7f5054109b10) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1170
#7  0x00007f507fa14d8a in op_shared_add (pb=pb@entry=0x7f5054109b10) at ldap/servers/slapd/add.c:735
#8  0x00007f507fa155e3 in add_internal_pb (pb=pb@entry=0x7f5054109b10) at ldap/servers/slapd/add.c:434
#9  0x00007f507fa162c3 in slapi_add_internal_pb (pb=pb@entry=0x7f5054109b10) at ldap/servers/slapd/add.c:356
#10 0x00007f5075fe65fa in dna_update_shared_config (config_entry=config_entry@entry=0x7f505410e0f0)
    at ldap/servers/plugins/dna/dna.c:2592
#11 0x00007f5075fe94a3 in dna_update_config_event (event_time=<optimized out>, arg=<optimized out>)
    at ldap/servers/plugins/dna/dna.c:1556
#12 0x00007f507fa360da in eq_call_all () at ldap/servers/slapd/eventq.c:312
#13 eq_loop (arg=<optimized out>) at ldap/servers/slapd/eventq.c:359
#14 0x00007f507de41740 in _pt_root () from /lib64/libnspr4.so
#15 0x00007f507d7e2df3 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f507d5103dd in clone () from /lib64/libc.so.6

(gdb) thread 30
[Switching to thread 30 (Thread 0x7f505d355700 (LWP 17988))]
#0  0x00007f507d7e6705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) where
#0  0x00007f507d7e6705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5078332213 in __db_hybrid_mutex_suspend () from /lib64/libdb-5.3.so
#2  0x00007f50783315f8 in __db_tas_mutex_lock () from /lib64/libdb-5.3.so
#3  0x00007f50783dbc0a in __lock_get_internal () from /lib64/libdb-5.3.so
#4  0x00007f50783dc6f0 in __lock_get () from /lib64/libdb-5.3.so
#5  0x00007f5078408032 in __db_lget () from /lib64/libdb-5.3.so
#6  0x00007f507834f515 in __bam_search () from /lib64/libdb-5.3.so
#7  0x00007f507833a176 in __bamc_search () from /lib64/libdb-5.3.so
#8  0x00007f507833bc2f in __bamc_get () from /lib64/libdb-5.3.so
#9  0x00007f50783f4b76 in __dbc_iget () from /lib64/libdb-5.3.so
#10 0x00007f50784039f2 in __dbc_get_pp () from /lib64/libdb-5.3.so
#11 0x00007f50742ff2e0 in idl_new_fetch (be=0x7f5080af48d0, db=<optimized out>, inkey=0x7f505d343110, txn=0x0, a=0x7f5080b71670, 
    flag_err=0x7f505d349b6c, allidslimit=100000) at ldap/servers/slapd/back-ldbm/idl_new.c:231
#12 0x00007f50742fef65 in idl_fetch_ext (be=be@entry=0x7f5080af48d0, db=<optimized out>, key=key@entry=0x7f505d343110, 
    txn=txn@entry=0x0, a=<optimized out>, err=err@entry=0x7f505d349b6c, allidslimit=allidslimit@entry=100000)
    at ldap/servers/slapd/back-ldbm/idl_shim.c:130
#13 0x00007f507430d746 in index_read_ext_allids (pb=pb@entry=0x7f50600a9be0, be=be@entry=0x7f5080af48d0, 
    type=type@entry=0x7f50600f2a60 "objectClass", indextype=indextype@entry=0x7f5074353e3f "eq", val=<optimized out>, 
    txn=txn@entry=0x7f505d3473e0, err=err@entry=0x7f505d349b6c, unindexed=unindexed@entry=0x7f505d3473d4, 
    allidslimit=allidslimit@entry=100000) at ldap/servers/slapd/back-ldbm/index.c:1068
#14 0x00007f50742f7f34 in keys2idl (pb=pb@entry=0x7f50600a9be0, be=be@entry=0x7f5080af48d0, type=0x7f50600f2a60 "objectClass", 
    indextype=indextype@entry=0x7f5074353e3f "eq", ivals=<optimized out>, err=err@entry=0x7f505d349b6c, 
    unindexed=unindexed@entry=0x7f505d3473d4, txn=txn@entry=0x7f505d3473e0, allidslimit=allidslimit@entry=100000)
    at ldap/servers/slapd/back-ldbm/filterindex.c:1003
#15 0x00007f50742f8693 in ava_candidates (pb=pb@entry=0x7f50600a9be0, be=be@entry=0x7f5080af48d0, f=f@entry=0x7f50600a8b10, 
    ftype=<optimized out>, err=0x7f505d349b6c, allidslimit=100000, range=0, nextf=0x0)
    at ldap/servers/slapd/back-ldbm/filterindex.c:317
#16 0x00007f50742f8c82 in filter_candidates_ext (pb=pb@entry=0x7f50600a9be0, be=be@entry=0x7f5080af48d0, 
    base=base@entry=0x7f5060007780 "cn=users,cn=accounts,dc=testrelm,dc=test", f=f@entry=0x7f50600a8b10, nextf=nextf@entry=0x0, 
    range=range@entry=0, err=err@entry=0x7f505d349b6c, allidslimit=allidslimit@entry=100000)
    at ldap/servers/slapd/back-ldbm/filterindex.c:140
#17 0x00007f50742f9cf6 in list_candidates (pb=pb@entry=0x7f50600a9be0, be=be@entry=0x7f5080af48d0, 
    base=base@entry=0x7f5060007780 "cn=users,cn=accounts,dc=testrelm,dc=test", flist=flist@entry=0x7f50600009d0, 
    ftype=<optimized out>, err=0x7f505d349b6c, allidslimit=100000) at ldap/servers/slapd/back-ldbm/filterindex.c:837
#18 0x00007f50742f8bf0 in filter_candidates_ext (pb=pb@entry=0x7f50600a9be0, be=be@entry=0x7f5080af48d0, 
    base=base@entry=0x7f5060007780 "cn=users,cn=accounts,dc=testrelm,dc=test", f=0x7f50600009d0, nextf=nextf@entry=0x0, 
    range=range@entry=0, err=err@entry=0x7f505d349b6c, allidslimit=allidslimit@entry=100000)
    at ldap/servers/slapd/back-ldbm/filterindex.c:173
#19 0x00007f507433413c in subtree_candidates (pb=pb@entry=0x7f50600a9be0, be=0x7f5080af48d0, 
    base=base@entry=0x7f5060007780 "cn=users,cn=accounts,dc=testrelm,dc=test", e=e@entry=0x7f5080cab650, filter=0x7f50600a8b10, 
    managedsait=0, allids_before_scopingp=allids_before_scopingp@entry=0x7f505d349b5c, err=err@entry=0x7f505d349b6c)
    at ldap/servers/slapd/back-ldbm/ldbm_search.c:1218
#20 0x00007f5074335757 in build_candidate_list (candidates=0x7f505d349b98, lookup_returned_allidsp=0x7f505d349b5c, 
    scope=<optimized out>, base=0x7f5060007780 "cn=users,cn=accounts,dc=testrelm,dc=test", e=<optimized out>, be=<optimized out>, 
    pb=0x7f50600a9be0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:1025
#21 ldbm_back_search (pb=0x7f50600a9be0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:686
#22 0x00007f507fa61b39 in op_shared_search (pb=pb@entry=0x7f50600a9be0, send_result=send_result@entry=1)
    at ldap/servers/slapd/opshared.c:823
#23 0x00007f507fa71d0e in search_internal_callback_pb (pb=0x7f50600a9be0, callback_data=<optimized out>, prc=0x0, 
    psec=0x7f507256ea20 <backend_shr_set_entry_cb>, prec=0x0) at ldap/servers/slapd/plugin_internal_op.c:812
#24 0x00007f5072572d58 in backend_shr_set_config_entry_add () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#25 0x00007f507256decd in backend_set_config_entry_add_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#26 0x00007f50725721d7 in backend_shr_modify_cb.part.19 () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#27 0x00007f5072572381 in backend_shr_betxn_post_modify_cb () from /usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#28 0x00007f507fa6ddb0 in plugin_call_func (list=0x7f5080980230, operation=operation@entry=561, pb=pb@entry=0x7f505d354ae0, 
    call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:1952
#29 0x00007f507fa6e008 in plugin_call_list (pb=0x7f505d354ae0, operation=561, list=<optimized out>)
    at ldap/servers/slapd/plugin.c:1886
#30 plugin_call_plugins (pb=pb@entry=0x7f505d354ae0, whichfunction=whichfunction@entry=561) at ldap/servers/slapd/plugin.c:459
#31 0x00007f507fa2a925 in dse_modify (pb=0x7f505d354ae0) at ldap/servers/slapd/dse.c:2026
#32 0x00007f507fa5af51 in op_shared_modify (pb=pb@entry=0x7f505d354ae0, pw_change=pw_change@entry=0, old_pw=0x0)
    at ldap/servers/slapd/modify.c:1086
#33 0x00007f507fa5c29f in do_modify (pb=pb@entry=0x7f505d354ae0) at ldap/servers/slapd/modify.c:419
#34 0x00007f507ff3b371 in connection_dispatch_operation (pb=0x7f505d354ae0, op=0x7f5080f6bcf0, conn=0x7f505e35d800)
    at ldap/servers/slapd/connection.c:660
#35 connection_threadmain () at ldap/servers/slapd/connection.c:2534
#36 0x00007f507de41740 in _pt_root () from /lib64/libnspr4.so
#37 0x00007f507d7e2df3 in start_thread () from /lib64/libpthread.so.0
#38 0x00007f507d5103dd in clone () from /lib64/libc.so.6
}}}

Thread 33 was updating the DNA shared config entry but trigger a ADD and entry "dnaHostname=vm-idm-002.testrelm.test+dnaPortNum=0,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=testrelm,dc=test"

{{{
(gdb) print *config_entry
$3 = {list = {next = 0x7f505416e870, prev = 0x7f505416e870}, 
  dn = 0x7f50541113b0 "cn=posix ids,cn=distributed numeric assignment plugin,cn=plugins,cn=config", types = 0x7f5054117780, 
  prefix = 0x0, filter = 0x7f505416be90 "(|(objectClass=posixAccount)(objectClass=posixGroup)(objectClass=ipaIDobject))", 
  slapi_filter = 0x7f505416a860, generate = 0x7f50541177f0 "-1", scope = 0x7f5054117690 "dc=testrelm,dc=test", interval = 1, 
  threshold = 500, shared_cfg_base = 0x7f5054111410 "cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=testrelm,dc=test", 
  shared_cfg_dn = 0x7f505416b280 "dnaHostname=vm-idm-002.testrelm.test+dnaPortNum=0,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=testrelm,dc=test", remote_binddn = 0x0, remote_bindpw = 0x0, timeout = 600000, lock = 0x0, nextval = 1998600011, maxval = 1998799999, 
  remaining = 199989, next_range_lower = 0, next_range_upper = 0, extend_lock = 0x0, extend_in_progress = 0}
(gdb) print *e
$4 = {e_sdn = {flag = 6 '\006', udn = 0x0, 
    dn = 0x7f505416bf20 "dnaHostname=vm-idm-002.testrelm.test+dnaPortNum=0,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=testrelm,dc=test", 
    ndn = 0x7f505416a0e0 "dnahostname=vm-idm-002.testrelm.test+dnaportnum=0,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=testrelm,dc=test", ndn_len = 103}, e_srdn = {flag = 0 '\000', rdn = 0x7f5054116820 "dnaHostname=vm-idm-002.testrelm.test+dnaPortNum=0", rdns = 0x0, 
    butcheredupto = -1, nrdn = 0x7f505410ed20 "dnahostname=vm-idm-002.testrelm.test+dnaportnum=0", all_rdns = 0x7f5054000bb0, 
    all_nrdns = 0x7f505416aa30}, e_uniqueid = 0x7f5054098ec0 "693cb98a-b56111e4-9d638bf8-211e875e", e_dncsnset = 0x7f5054115430, 
  e_maxcsn = 0x7f5054115bf0, e_attrs = 0x7f505416aed0, e_deleted_attrs = 0x0, e_virtual_attrs = 0x0, e_virtual_watermark = 0, 
  e_virtual_lock = 0x7f505416af80, e_extension = 0x7f50540beea0, e_flags = 0 '\000', e_aux_attrs = 0x0}
}}}

Thread 30 was modifying the schema-compat config entry:
{{{
(gdb) print *ecc
$2 = {e_sdn = {flag = 6 '\006', udn = 0x0, dn = 0x7f506016b980 "cn=users,cn=Schema Compatibility,cn=plugins,cn=config", 
    ndn = 0x7f5060172780 "cn=users,cn=schema compatibility,cn=plugins,cn=config", ndn_len = 53}, e_srdn = {flag = 0 '\000', 
    rdn = 0x0, rdns = 0x0, butcheredupto = -1, nrdn = 0x0, all_rdns = 0x0, all_nrdns = 0x0}, e_uniqueid = 0x0, e_dncsnset = 0x0, 
  e_maxcsn = 0x0, e_attrs = 0x7f50600968a0, e_deleted_attrs = 0x0, e_virtual_attrs = 0x0, e_virtual_watermark = 0, 
  e_virtual_lock = 0x7f5060171960, e_extension = 0x7f50601122e0, e_flags = 0 '\000', e_aux_attrs = 0x0}
}}}

Thread 33, holds "dc=testrelm,dc=test" backend lock and some DB page locks, then in txnbe-postop tries to acquire schema-compat lock

Thread 30, holds schema-compat lock (on 'cn=config' update) and tries to acquire the DB page lock

Comment 1 Martin Kosek 2015-03-03 16:50:55 UTC
This deadlock was hit during IPA upgrade, which has a chance of breaking the IPA server - marking with higher priority.

Comment 7 thierry bordaz 2015-03-10 15:58:09 UTC
*** Bug 1193125 has been marked as a duplicate of this bug. ***

Comment 9 Scott Poore 2015-09-14 19:09:23 UTC
Verified.

Version ::

ipa-server-4.2.0-9.el7.x86_64

Results ::

Normal IPA Upgrade tests run many times and no deadlocks seen so far.  

I'm marking this one verified. Sanity only.

Comment 10 errata-xmlrpc 2015-11-19 12:01:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2362.html