RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1432016 - Possible deadlock while installing an ipa replica.
Summary: Possible deadlock while installing an ipa replica.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: mreynolds
QA Contact: Viktor Ashirov
URL:
Whiteboard:
: 1435710 1442799 (view as bug list)
Depends On:
Blocks: 1440654
TreeView+ depends on / blocked
 
Reported: 2017-03-14 10:36 UTC by German Parente
Modified: 2020-09-13 21:58 UTC (History)
17 users (show)

Fixed In Version: 389-ds-base-1.3.6.1-9.el7
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1440654 (view as bug list)
Environment:
Last Closed: 2017-08-01 21:14:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pstack. (23.54 KB, text/plain)
2017-03-14 10:37 UTC, German Parente
no flags Details
second pstack (24.72 KB, text/plain)
2017-03-15 13:50 UTC, German Parente
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 2268 0 None closed Hang due to omitted replica lock release 2020-11-01 10:32:50 UTC
Red Hat Product Errata RHBA-2017:2086 0 normal SHIPPED_LIVE 389-ds-base bug fix and enhancement update 2017-08-01 18:37:38 UTC

Description German Parente 2017-03-14 10:36:05 UTC
Description of problem:

During replica-install, the installer hangs in:

[...]
Done configuring directory server (dirsrv).
Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes 30 seconds
  [1/27]: creating certificate server user
  [2/27]: configuring certificate server instance
  [3/27]: stopping certificate server instance to update CS.cfg
  [4/27]: backing up CS.cfg
  [5/27]: disabling nonces
  [6/27]: set up CRL publishing
  [7/27]: enable PKIX certificate path discovery and validation
  [8/27]: starting certificate server instance


The pstack is showing:



Thread 32 (Thread 0x7f3e70ff9700 (LWP 3717)):
#0  0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so
#2  0x00007f3e8c7e0321 in replica_get_generation () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#3  0x00007f3e8c7d9b90 in copy_operation_parameters () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#4  0x00007f3e8c7db1ea in multimaster_preop_modify () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#5  0x00007f3e99cdcdb8 in plugin_call_func () from /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f3e99cdd043 in plugin_call_plugins () from /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007f3e99cca359 in op_shared_modify () from /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007f3e99ccb91f in do_modify () from /usr/lib64/dirsrv/libslapd.so.0
#9  0x00007f3e9a1b49d0 in connection_threadmain ()
#10 0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so
#11 0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f3e9757973d in clone () from /lib64/libc.so.6
Thread 31 (Thread 0x7f3e6bfff700 (LWP 3718)):
#0  0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so
#2  0x00007f3e8c7dfe50 in replica_is_updatedn () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#3  0x00007f3e8c7c6e2a in multimaster_extop_StartNSDS50ReplicationRequest () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#4  0x00007f3e9a1bbec4 in do_extended ()
#5  0x00007f3e9a1b4aca in connection_threadmain ()
#6  0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so
#7  0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f3e9757973d in clone () from /lib64/libc.so.6

Thread 38 (Thread 0x7f3e80e9b700 (LWP 3710)):
#0  0x00007f3e9784e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e97ea5473 in PR_EnterMonitor () from /lib64/libnspr4.so
#2  0x00007f3e8c7e2e58 in replica_update_state () from /usr/lib64/dirsrv/plugins/libreplication-plugin.so
#3  0x00007f3e99ca2e9a in eq_loop () from /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007f3e97eaa9bb in _pt_root () from /lib64/libnspr4.so
#5  0x00007f3e9784adc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f3e9757973d in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable): 389-ds-base-1.3.5.10-15.el7_3.x86_64 



I have seen this in two customers but in the second one we should confirm with the pstack.

Comment 2 German Parente 2017-03-14 10:37:14 UTC
Created attachment 1262854 [details]
pstack.

Comment 8 German Parente 2017-03-15 13:50:55 UTC
Created attachment 1263342 [details]
second pstack

Comment 20 Ludwig 2017-04-05 09:04:55 UTC
Thanks for the logs, I now understand when this is happening and I can reproduce

the leak comes from managing the min csn in the ruv, which is set when a replica receives the first change for its replicaid.

in the logs we see:
[04/Apr/2017:15:25:15.102326426 +0300] conn=18 op=2 MOD dn="o=ipaca"
[04/Apr/2017:15:25:15.141517718 +0300] conn=18 op=2 RESULT err=20 tag=103 nentries=0 etime=0 csn=58e39de9000104a10000

which is failing. If I setup a replica and apply as a first change an operation which is failing I can reproduce the hang

Comment 22 Ludwig 2017-04-05 09:35:48 UTC
yes, it fixes my testcase. 

But did you adopt the patch to 1.3.5 ? the error log functions are different

Comment 23 German Parente 2017-04-05 09:39:50 UTC
Hi,

I had to create a new diff because the original one was not applying but it's the same:

@@ -3670,6 +3670,7 @@ abort_csn_callback(const CSN *csn, void 
         int rc = csnplRemove(r->min_csn_pl, csn);
         if (rc) {
             slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name, "csnplRemove failed");
+            replica_unlock(r->repl_lock);
             return;
         }
     }

Comment 24 Ludwig 2017-04-05 09:56:14 UTC
(In reply to German Parente from comment #23)
> Hi,
> 
> I had to create a new diff because the original one was not applying but
> it's the same:
> 
> @@ -3670,6 +3670,7 @@ abort_csn_callback(const CSN *csn, void 
>          int rc = csnplRemove(r->min_csn_pl, csn);
>          if (rc) {
>              slapi_log_error(SLAPI_LOG_FATAL, repl_plugin_name, "csnplRemove
> failed");
> +            replica_unlock(r->repl_lock);
>              return;
>          }
>      }

perfect

Comment 34 Nathan Kinder 2017-04-25 15:25:00 UTC
*** Bug 1442799 has been marked as a duplicate of this bug. ***

Comment 35 Petr Vobornik 2017-05-02 14:55:40 UTC
*** Bug 1435710 has been marked as a duplicate of this bug. ***

Comment 36 Simon Pichugin 2017-05-09 11:15:38 UTC
Build tested:
389-ds-base-1.3.6.1-13.el7.x86_64
ipa-server-4.5.0-10.el7.x86_64

Verification steps:
1. Install IPA server on one machine
2. Install IPA replica on another machine.
ipa-replica-install should be successful:
[root@vm-idm-003 ~]# ipa-replica-install
WARNING: conflicting time&date synchronization service 'chronyd' will
be disabled in favor of ntpd

Password for admin.ENG.BOS.REDHAT.COM:
Run connection check to master
Connection check OK

[...]

Upgrading IPA:. Estimated time: 1 minute 30 seconds
  [1/9]: stopping directory server
  [2/9]: saving configuration
  [3/9]: disabling listeners
  [4/9]: enabling DS global lock
  [5/9]: starting directory server
  [6/9]: upgrading server
  [7/9]: stopping directory server
  [8/9]: restoring configuration
  [9/9]: starting directory server
Done.
Restarting the KDC

3. Check ipa.service status:
[root@vm-idm-003 ~]# systemctl status ipa
● ipa.service - Identity, Policy, Audit
   Loaded: loaded (/usr/lib/systemd/system/ipa.service; enabled; vendor preset: disabled)
   Active: active (exited) since Tue 2017-05-09 15:44:40 IST; 55min ago
  Process: 1293 ExecStart=/usr/sbin/ipactl start (code=exited, status=0/SUCCESS)
 Main PID: 1293 (code=exited, status=0/SUCCESS)

[...]

May 09 15:44:40 vm-idm-003.lab.eng.pnq.redhat.com systemd[1]: Started Identity, Policy, Audit.
Hint: Some lines were ellipsized, use -l to show in full.


Result. IPA replica was successfully installed and no deadlock has happened.
Marking as verified.

Comment 37 errata-xmlrpc 2017-08-01 21:14:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2086

Comment 38 Alex McLeod 2020-02-19 12:44:12 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.


Note You need to log in before you can comment on or make changes to this bug.