Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
During replica-install, the installer hangs in:
[...]
Done configuring directory server (dirsrv).
Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes 30 seconds
[1/27]: creating certificate server user
[2/27]: configuring certificate server instance
[3/27]: stopping certificate server instance to update CS.cfg
[4/27]: backing up CS.cfg
[5/27]: disabling nonces
[6/27]: set up CRL publishing
[7/27]: enable PKIX certificate path discovery and validation
[8/27]: starting certificate server instance
The pstack is showing:
Thread 32 (Thread 0x7f3e70ff9700 (LWP 3717)):
#0 0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so
#2 0x00007f3e8c7e0321 in replica_get_generation () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#3 0x00007f3e8c7d9b90 in copy_operation_parameters () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#4 0x00007f3e8c7db1ea in multimaster_preop_modify () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#5 0x00007f3e99cdcdb8 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#6 0x00007f3e99cdd043 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#7 0x00007f3e99cca359 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#8 0x00007f3e99ccb91f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#9 0x00007f3e9a1b49d0 in connection_threadmain ()
#10 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so
#11 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f3e9757973d in clone () from /lib64/libc.so.6
Thread 31 (Thread 0x7f3e6bfff700 (LWP 3718)):
#0 0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so
#2 0x00007f3e8c7dfe50 in replica_is_updatedn () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#3 0x00007f3e8c7c6e2a in multimaster_extop_StartNSDS50ReplicationRequest () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#4 0x00007f3e9a1bbec4 in do_extended ()
#5 0x00007f3e9a1b4aca in connection_threadmain ()
#6 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so
#7 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f3e9757973d in clone () from /lib64/libc.so.6
Thread 38 (Thread 0x7f3e80e9b700 (LWP 3710)):
#0 0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so
#2 0x00007f3e8c7e2e58 in replica_update_state () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#3 0x00007f3e99ca2e9a in eq_loop () from /usr/lib64/dirsrv/libslapd.so.0
#4 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so
#5 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0
#6 0x00007f3e9757973d in clone () from /lib64/libc.so.6
Version-Release number of selected component (if applicable): 389-ds-base-1.3.5.10-15.el7_3.x86_64
I have seen this in two customers but in the second one we should confirm with the pstack.
Thanks for the logs, I now understand when this is happening and I can reproduce
the leak comes from managing the min csn in the ruv, which is set when a replica receives the first change for its replicaid.
in the logs we see:
[04/Apr/2017:15:25:15.102326426 +0300] conn=18 op=2 MOD dn="o=ipaca"
[04/Apr/2017:15:25:15.141517718 +0300] conn=18 op=2 RESULT err=20 tag=103 nentries=0 etime=0 csn=58e39de9000104a10000
which is failing. If I setup a replica and apply as a first change an operation which is failing I can reproduce the hang
Hi,
I had to create a new diff because the original one was not applying but it's the same:
@@ -3670,6 +3670,7 @@ abort_csn_callback(const CSN *csn, void
int rc = csnplRemove(r->min_csn_pl, csn);
if (rc) {
slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name, "csnplRemove failed");
+ replica_unlock(r->repl_lock);
return;
}
}
(In reply to German Parente from comment #23)
> Hi,
>
> I had to create a new diff because the original one was not applying but
> it's the same:
>
> @@ -3670,6 +3670,7 @@ abort_csn_callback(const CSN *csn, void
> int rc = csnplRemove(r->min_csn_pl, csn);
> if (rc) {
> slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name, "csnplRemove
> failed");
> + replica_unlock(r->repl_lock);
> return;
> }
> }
perfect
Build tested:
389-ds-base-1.3.6.1-13.el7.x86_64
ipa-server-4.5.0-10.el7.x86_64
Verification steps:
1. Install IPA server on one machine
2. Install IPA replica on another machine.
ipa-replica-install should be successful:
[root@vm-idm-003 ~]# ipa-replica-install
WARNING: conflicting time&date synchronization service 'chronyd' will
be disabled in favor of ntpd
Password for admin.ENG.BOS.REDHAT.COM:
Run connection check to master
Connection check OK
[...]
Upgrading IPA:. Estimated time: 1 minute 30 seconds
[1/9]: stopping directory server
[2/9]: saving configuration
[3/9]: disabling listeners
[4/9]: enabling DS global lock
[5/9]: starting directory server
[6/9]: upgrading server
[7/9]: stopping directory server
[8/9]: restoring configuration
[9/9]: starting directory server
Done.
Restarting the KDC
3. Check ipa.service status:
[root@vm-idm-003 ~]# systemctl status ipa
● ipa.service - Identity, Policy, Audit
Loaded: loaded (/usr/lib/systemd/system/ipa.service; enabled; vendor preset: disabled)
Active: active (exited) since Tue 2017-05-09 15:44:40 IST; 55min ago
Process: 1293 ExecStart=/usr/sbin/ipactl start (code=exited, status=0/SUCCESS)
Main PID: 1293 (code=exited, status=0/SUCCESS)
[...]
May 09 15:44:40 vm-idm-003.lab.eng.pnq.redhat.com systemd[1]: Started Identity, Policy, Audit.
Hint: Some lines were ellipsized, use -l to show in full.
Result. IPA replica was successfully installed and no deadlock has happened.
Marking as verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2017:2086
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.
If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.