Bug 1694263

Summary: ipa-server-install --setup-kra deadlocks while installing the KRA
Product: Red Hat Enterprise Linux 7 Reporter: Florence Blanc-Renaud <frenaud>
Component: slapi-nisAssignee: Alexander Bokovoy <abokovoy>
Status: CLOSED DUPLICATE QA Contact: ipa-qe <ipa-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.7CC: abokovoy, mkosek, ndehadra, tbordaz, tmihinto
Target Milestone: rcKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-24 10:08:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
389-ds access log
none
389-ds stack trace
none
Cumulative patch for 1694263 abokovoy: review-

Description Florence Blanc-Renaud 2019-03-29 23:01:13 UTC
Description of problem:
ipa-server-install --setup-kra deadlocks with slapi-nis-0.56.0-10.el7

Version-Release number of selected component (if applicable):
0.56.0-10.el7

How reproducible:
Everytime

Steps to Reproduce:
1. /usr/sbin/ipa-server-install \
  -n $DOMAIN \
  -r $REALM \
  -p $PASSWORD -a $APASSWORD \
  --setup-dns \
  --forwarder $FORWARDER \
  --auto-reverse \
  --setup-kra \
  -U

The installation blocks in the step configuring KRA server, in pki-spawn:
Configuring KRA server (pki-tomcatd). Estimated time: 2 minutes
  [1/10]: configuring KRA instance


Actual results:
Installation deadlocks in the step configuring the KRA.

Expected results:
Installation should proceed till the end.


Additional info:
If slapi-nis is downgraded to 0.56.0-9, the same ipa-server-install command is successful. Reproduced with ipa-server 4.6.5-1.el7.x86_64

Comment 2 Florence Blanc-Renaud 2019-03-29 23:03:18 UTC
Created attachment 1549603 [details]
389-ds access log

Comment 3 Florence Blanc-Renaud 2019-03-29 23:06:20 UTC
Created attachment 1549604 [details]
389-ds stack trace

Comment 4 Florence Blanc-Renaud 2019-03-29 23:07:12 UTC
Attaching 389-ds access log and stack trace obtained with 
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1

Comment 6 thierry bordaz 2019-04-01 14:43:12 UTC
  - it looks that a map lock was acquired but not released by a previous thread
(gdb) where
#0  0x00007f0b92b782ce in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#1  0x00007f0b9544147a in slapi_rwlock_wrlock (rwlock=<optimized out>) at ldap/servers/slapd/slapi2nspr.c:237
#2  0x00007f0b850a23c8 in wrap_rwlock_wrlock (rwlock=<optimized out>) at wrap.c:328
#3  0x00007f0b850a111c in plugin_wrlock () at map.c:1238
#4  0x00007f0b8508b4dd in backend_be_pre_write_cb (pb=<optimized out>) at back-sch.c:2339
#5  0x00007f0b9541e7d8 in plugin_call_func (list=0x558d0129c840, operation=operation@entry=451, pb=pb@entry=0x558d02b94540, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2028
#6  0x00007f0b9541ea93 in plugin_call_list (pb=0x558d02b94540, operation=451, list=<optimized out>)
    at ldap/servers/slapd/plugin.c:1972
#7  0x00007f0b9541ea93 in plugin_call_plugins (pb=pb@entry=0x558d02b94540, whichfunction=whichfunction@entry=451)
    at ldap/servers/slapd/plugin.c:442
#8  0x00007f0b8706b732 in ldbm_back_modify (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:637
#9  0x00007f0b9540adb6 in op_shared_modify (pb=pb@entry=0x558d02b94540, pw_change=pw_change@entry=0, old_pw=0x0)
    at ldap/servers/slapd/modify.c:1021
#10 0x00007f0b9540c2db in do_modify (pb=pb@entry=0x558d02b94540) at ldap/servers/slapd/modify.c:391
#11 0x0000558cfe8c94f4 in connection_dispatch_operation (pb=0x558d02b94540, op=0x558d018f6fc0, conn=0x558d02b3c380)
    at ldap/servers/slapd/connection.c:625
#12 0x0000558cfe8c94f4 in connection_threadmain () at ldap/servers/slapd/connection.c:1785
#13 0x00007f0b931d4bfb in _pt_root () at /lib64/libnspr4.so
#14 0x00007f0b92b74dd5 in start_thread () at /lib64/libpthread.so.0
#15 0x00007f0b9222109d in clone () at /lib64/libc.so.6
(gdb) print *map_data.plugin_lock
$12 = {rwlock = 0x558d0127df80}
(gdb) x/8w 0x558d0127df80
0x558d0127df80:	0x0	0x0	0x0	0x0
0x558d0127df90:	0x2	0x1	0x2ba1	0x0
(gdb) print 0x2ba1  # this is the lwp of the owner of the write lock  "__cur_writer" (pthread_rwlock_t)
$13 = 11169
(gdb) thread 20
[Switching to thread 20 (Thread 0x7f0b71361700 (LWP 11169))]
#0  0x00007f0b92b78965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) where        # this thread completed the operation that acquired the write lock but did not release it
#0  0x00007f0b92b78965 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007f0b931cf210 in PR_WaitCondVar () at /lib64/libnspr4.so
#2  0x0000558cfe8c717e in [IDLE THREAD] connection_wait_for_new_work (pb=pb@entry=0x558d02b942a0, interval=interval@entry=4294967295) at ldap/servers/slapd/connection.c:966
#3  0x0000558cfe8c8781 in connection_threadmain () at ldap/servers/slapd/connection.c:1532
#4  0x00007f0b931d4bfb in _pt_root () at /lib64/libnspr4.so
#5  0x00007f0b92b74dd5 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f0b9222109d in clone () at /lib64/libc.so.6

  - Not possible to know how it happened. the most probable cause is the invalid initialization
    of the thread private data that returned invalid value.
    because of the invalid value, thread did not release the lock.


  - the good news is that the problem is reproducible, so we can test a fix

Comment 7 thierry bordaz 2019-04-04 07:16:29 UTC
Reworked the original fix but it is still hanging... continue investigations

Comment 16 thierry bordaz 2019-04-11 12:55:31 UTC
Created attachment 1554553 [details]
Cumulative patch for 1694263

This patch contains 3 patches:
 - Fix the private thread variable initialization
 - Cleanup the debug log for better matching pstack/threadId
 - Actual fix for 1694263, that ignores updates on subordinates of cn=config

Comment 17 Alexander Bokovoy 2019-04-15 10:12:11 UTC
Comment on attachment 1554553 [details]
Cumulative patch for 1694263

A biggest comment I have is that there seems to be no freeing ignored_containers_sdn array. It is allocated in init_map_lock() but there is no similar call to remove it in plugin_shutdown().

Comment 22 Alexander Bokovoy 2019-04-24 10:08:20 UTC

*** This bug has been marked as a duplicate of bug 1435663 ***