RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2013524 - RHEL-7.9 ipa-replica-install "hangs" remote IPA LDAP server
Summary: RHEL-7.9 ipa-replica-install "hangs" remote IPA LDAP server
Keywords:
Status: CLOSED DUPLICATE of bug 2018257
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.9
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: thierry bordaz
QA Contact: RHDS QE
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-13 05:47 UTC by Marc Sauton
Modified: 2021-10-28 16:13 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-28 16:13:40 UTC
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker IDMDS-1735 0 None None None 2021-10-21 14:22:14 UTC
Red Hat Issue Tracker RHELPLAN-99589 0 None None None 2021-10-13 05:50:16 UTC

Description Marc Sauton 2021-10-13 05:47:09 UTC
Description of problem:

A RHEL-7.9 ipa-replica-install remotely render its peer IPA LDAP server unresponsive for a long period of time.


Version-Release number of selected component (if applicable):

RHEL-7.9
389-ds-base-1.3.10.2-12.el7_9.x86_64
ipa-server-4.6.8-5.el7_9.7.x86_64
redhat-release-server-7.9-6.el7_9.x86_64


How reproducible:
N/A

Steps to Reproduce:
1. N/A
2.
3.


Actual results:

remote/master IPA replica LDAP service must be killer and restarted

replica 85 install:

Done configuring the web interface (httpd).
Configuring ipa-otpd
  [1/2]: starting ipa-otpd
  [2/2]: configuring ipa-otpd to start on boot
Done configuring ipa-otpd.
Configuring ipa-custodia
  [1/4]: Generating ipa-custodia config file
  [2/4]: Generating ipa-custodia keys
  [3/4]: starting ipa-custodia
  [4/4]: configuring ipa-custodia to start on boot
Done configuring ipa-custodia.
('SEB:', {'ccache': 'MEMORY:Custodia_fMyDiDoK/iI=', 'client_keytab': '/etc/krb5.keytab'}, Name(host, <OID 1.2.840.113554.1.2.1.4>), None)
Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes
  [1/30]: creating certificate server db
  [2/30]: setting up initial replication
Starting replication, please wait until this has completed.   <--------------------------------- More replicating caused 83/84 to hang again
Update in progress, 21 seconds elapsed
Update succeeded

  [3/30]: creating ACIs for admin
  [4/30]: creating installation admin user
  [5/30]: configuring certificate server instance
  [6/30]: secure AJP connector
  [7/30]: reindex attributes
  [8/30]: exporting Dogtag certificate store pin
  [9/30]: stopping certificate server instance to update CS.cfg
  [10/30]: backing up CS.cfg
  [11/30]: disabling nonces
  [12/30]: set up CRL publishing
  [13/30]: enable PKIX certificate path discovery and validation
  [14/30]: destroying installation admin user
  [15/30]: starting certificate server instance
  [16/30]: Finalize replication settings


and it all completes to the end, except for a last LDAP connection that fails from the replica 85 to the master 84:

2021-10-12T21:16:27Z ERROR cannot connect to 'ldap://84.edited:389':
2021-10-12T21:16:27Z ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
(END)


this looks more like extremely slow connection processing than a complete hang/deadlock
ldapsearch with a simple BIND would not prompt for credentials


replica 84 ( may have needed more samples )

dn: cn=config
nsslapd-idletimeout: 0
nsslapd-ioblocktimeout: 10000
nsslapd-listen-backlog-size: 128
nsslapd-threadnumber: 192
nsslapd-maxdescriptors: 16384
nsslapd-reservedescriptors: 64

dn: cn=monitor
threads: 194
currentconnections: 3462
totalconnections: 11486
currentconnectionsatmaxthreads: 0
maxthreadsperconnhits: 1911
dtablesize: 16384
readwaiters: 0
opsinitiated: 943235
opscompleted: 943231
currenttime: 20211012213934Z
starttime: 20211012211635Z
nbackends: 3

ldapsearch -o ldif-wrap=no -LLLxD cn=Directory\ Manager -W -b cn=monitor -s base connection
->
only 71 connections out of 3400 plus with a sign of been blocked, not significant in the sample taken at that moment.

already has sysctl
net.core.somaxconn = 65535


stack trace has nearly all the threads appear blocked, 199 out of 211 threads, like for example:

Thread 199 (Thread 0x7f335a91f700 (LWP 85062)):
#0  0x00007f3390db6184 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x00007f339368c80a in slapi_rwlock_rdlock (rwlock=<optimized out>) at ldap/servers/slapd/slapi2nspr.c:246
#2  0x00007f33936a4f7d in vattr_rdlock () at ldap/servers/slapd/vattr.c:188


netstat returned  100s of connections in CLOSE_WAIT state related to the IPA LDAP service from netstat outputs

nsslapd-idletimeout with a default value of 0 in use, which means no timeout.

I would set this nsslapd-idletimeout to 5mn / 300 seconds, on the replica 84 and 83, for example, from
nsslapd-idletimeout: 0
nsslapd-listen-backlog-size: 128

to
nsslapd-idletimeout: 300
nsslapd-listen-backlog-size: 2048



this can be related to a non responding LDAP service, and a cascade of problems, including fail over to other replica and causing the same issue again




Expected results:
yes


Additional info:

the remote/master IPA LDAP system netstat output show
- hundreds of KDC connections in TIME_WAIT state
- hundreds of LDAP connections in ESTABLISHED state

a remote session with the customer was showing hundreds of LDAP connections in CLOSE_WAIT state


Note You need to log in before you can comment on or make changes to this bug.