Description of problem: During replica-install, the installer hangs in: [...] Done configuring directory server (dirsrv). Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes 30 seconds [1/27]: creating certificate server user [2/27]: configuring certificate server instance [3/27]: stopping certificate server instance to update CS.cfg [4/27]: backing up CS.cfg [5/27]: disabling nonces [6/27]: set up CRL publishing [7/27]: enable PKIX certificate path discovery and validation [8/27]: starting certificate server instance The pstack is showing: Thread 32 (Thread 0x7f3e70ff9700 (LWP 3717)): #0 0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so #2 0x00007f3e8c7e0321 in replica_get_generation () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #3 0x00007f3e8c7d9b90 in copy_operation_parameters () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #4 0x00007f3e8c7db1ea in multimaster_preop_modify () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #5 0x00007f3e99cdcdb8 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0 #6 0x00007f3e99cdd043 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0 #7 0x00007f3e99cca359 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0 #8 0x00007f3e99ccb91f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0 #9 0x00007f3e9a1b49d0 in connection_threadmain () #10 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so #11 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f3e9757973d in clone () from /lib64/libc.so.6 Thread 31 (Thread 0x7f3e6bfff700 (LWP 3718)): #0 0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so #2 0x00007f3e8c7dfe50 in replica_is_updatedn () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #3 0x00007f3e8c7c6e2a in multimaster_extop_StartNSDS50ReplicationRequest () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #4 0x00007f3e9a1bbec4 in do_extended () #5 0x00007f3e9a1b4aca in connection_threadmain () #6 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so #7 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f3e9757973d in clone () from /lib64/libc.so.6 Thread 38 (Thread 0x7f3e80e9b700 (LWP 3710)): #0 0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so #2 0x00007f3e8c7e2e58 in replica_update_state () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so #3 0x00007f3e99ca2e9a in eq_loop () from /usr/lib64/dirsrv/libslapd.so.0 #4 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so #5 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f3e9757973d in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): 389-ds-base-1.3.5.10-15.el7_3.x86_64 I have seen this in two customers but in the second one we should confirm with the pstack.
Created attachment 1262854 [details] pstack.
Created attachment 1263342 [details] second pstack
Thanks for the logs, I now understand when this is happening and I can reproduce the leak comes from managing the min csn in the ruv, which is set when a replica receives the first change for its replicaid. in the logs we see: [04/Apr/2017:15:25:15.102326426 +0300] conn=18 op=2 MOD dn="o=ipaca" [04/Apr/2017:15:25:15.141517718 +0300] conn=18 op=2 RESULT err=20 tag=103 nentries=0 etime=0 csn=58e39de9000104a10000 which is failing. If I setup a replica and apply as a first change an operation which is failing I can reproduce the hang
yes, it fixes my testcase. But did you adopt the patch to 1.3.5 ? the error log functions are different
Hi, I had to create a new diff because the original one was not applying but it's the same: @@ -3670,6 +3670,7 @@ abort_csn_callback(const CSN *csn, void int rc = csnplRemove(r->min_csn_pl, csn); if (rc) { slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name, "csnplRemove failed"); + replica_unlock(r->repl_lock); return; } }
(In reply to German Parente from comment #23) > Hi, > > I had to create a new diff because the original one was not applying but > it's the same: > > @@ -3670,6 +3670,7 @@ abort_csn_callback(const CSN *csn, void > int rc = csnplRemove(r->min_csn_pl, csn); > if (rc) { > slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name, "csnplRemove > failed"); > + replica_unlock(r->repl_lock); > return; > } > } perfect
*** Bug 1442799 has been marked as a duplicate of this bug. ***
*** Bug 1435710 has been marked as a duplicate of this bug. ***
Build tested: 389-ds-base-1.3.6.1-13.el7.x86_64 ipa-server-4.5.0-10.el7.x86_64 Verification steps: 1. Install IPA server on one machine 2. Install IPA replica on another machine. ipa-replica-install should be successful: [root@vm-idm-003 ~]# ipa-replica-install WARNING: conflicting time&date synchronization service 'chronyd' will be disabled in favor of ntpd Password for admin.ENG.BOS.REDHAT.COM: Run connection check to master Connection check OK [...] Upgrading IPA:. Estimated time: 1 minute 30 seconds [1/9]: stopping directory server [2/9]: saving configuration [3/9]: disabling listeners [4/9]: enabling DS global lock [5/9]: starting directory server [6/9]: upgrading server [7/9]: stopping directory server [8/9]: restoring configuration [9/9]: starting directory server Done. Restarting the KDC 3. Check ipa.service status: [root@vm-idm-003 ~]# systemctl status ipa ● ipa.service - Identity, Policy, Audit Loaded: loaded (/usr/lib/systemd/system/ipa.service; enabled; vendor preset: disabled) Active: active (exited) since Tue 2017-05-09 15:44:40 IST; 55min ago Process: 1293 ExecStart=/usr/sbin/ipactl start (code=exited, status=0/SUCCESS) Main PID: 1293 (code=exited, status=0/SUCCESS) [...] May 09 15:44:40 vm-idm-003.lab.eng.pnq.redhat.com systemd[1]: Started Identity, Policy, Audit. Hint: Some lines were ellipsized, use -l to show in full. Result. IPA replica was successfully installed and no deadlock has happened. Marking as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2086
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.