Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
A RHEL-7.9 ipa-replica-install remotely render its peer IPA LDAP server unresponsive for a long period of time.
Version-Release number of selected component (if applicable):
RHEL-7.9
389-ds-base-1.3.10.2-12.el7_9.x86_64
ipa-server-4.6.8-5.el7_9.7.x86_64
redhat-release-server-7.9-6.el7_9.x86_64
How reproducible:
N/A
Steps to Reproduce:
1. N/A
2.
3.
Actual results:
remote/master IPA replica LDAP service must be killer and restarted
replica 85 install:
Done configuring the web interface (httpd).
Configuring ipa-otpd
[1/2]: starting ipa-otpd
[2/2]: configuring ipa-otpd to start on boot
Done configuring ipa-otpd.
Configuring ipa-custodia
[1/4]: Generating ipa-custodia config file
[2/4]: Generating ipa-custodia keys
[3/4]: starting ipa-custodia
[4/4]: configuring ipa-custodia to start on boot
Done configuring ipa-custodia.
('SEB:', {'ccache': 'MEMORY:Custodia_fMyDiDoK/iI=', 'client_keytab': '/etc/krb5.keytab'}, Name(host, <OID 1.2.840.113554.1.2.1.4>), None)
Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes
[1/30]: creating certificate server db
[2/30]: setting up initial replication
Starting replication, please wait until this has completed. <--------------------------------- More replicating caused 83/84 to hang again
Update in progress, 21 seconds elapsed
Update succeeded
[3/30]: creating ACIs for admin
[4/30]: creating installation admin user
[5/30]: configuring certificate server instance
[6/30]: secure AJP connector
[7/30]: reindex attributes
[8/30]: exporting Dogtag certificate store pin
[9/30]: stopping certificate server instance to update CS.cfg
[10/30]: backing up CS.cfg
[11/30]: disabling nonces
[12/30]: set up CRL publishing
[13/30]: enable PKIX certificate path discovery and validation
[14/30]: destroying installation admin user
[15/30]: starting certificate server instance
[16/30]: Finalize replication settings
and it all completes to the end, except for a last LDAP connection that fails from the replica 85 to the master 84:
2021-10-12T21:16:27Z ERROR cannot connect to 'ldap://84.edited:389':
2021-10-12T21:16:27Z ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information
(END)
this looks more like extremely slow connection processing than a complete hang/deadlock
ldapsearch with a simple BIND would not prompt for credentials
replica 84 ( may have needed more samples )
dn: cn=config
nsslapd-idletimeout: 0
nsslapd-ioblocktimeout: 10000
nsslapd-listen-backlog-size: 128
nsslapd-threadnumber: 192
nsslapd-maxdescriptors: 16384
nsslapd-reservedescriptors: 64
dn: cn=monitor
threads: 194
currentconnections: 3462
totalconnections: 11486
currentconnectionsatmaxthreads: 0
maxthreadsperconnhits: 1911
dtablesize: 16384
readwaiters: 0
opsinitiated: 943235
opscompleted: 943231
currenttime: 20211012213934Z
starttime: 20211012211635Z
nbackends: 3
ldapsearch -o ldif-wrap=no -LLLxD cn=Directory\ Manager -W -b cn=monitor -s base connection
->
only 71 connections out of 3400 plus with a sign of been blocked, not significant in the sample taken at that moment.
already has sysctl
net.core.somaxconn = 65535
stack trace has nearly all the threads appear blocked, 199 out of 211 threads, like for example:
Thread 199 (Thread 0x7f335a91f700 (LWP 85062)):
#0 0x00007f3390db6184 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1 0x00007f339368c80a in slapi_rwlock_rdlock (rwlock=<optimized out>) at ldap/servers/slapd/slapi2nspr.c:246
#2 0x00007f33936a4f7d in vattr_rdlock () at ldap/servers/slapd/vattr.c:188
netstat returned 100s of connections in CLOSE_WAIT state related to the IPA LDAP service from netstat outputs
nsslapd-idletimeout with a default value of 0 in use, which means no timeout.
I would set this nsslapd-idletimeout to 5mn / 300 seconds, on the replica 84 and 83, for example, from
nsslapd-idletimeout: 0
nsslapd-listen-backlog-size: 128
to
nsslapd-idletimeout: 300
nsslapd-listen-backlog-size: 2048
this can be related to a non responding LDAP service, and a cascade of problems, including fail over to other replica and causing the same issue again
Expected results:
yes
Additional info:
the remote/master IPA LDAP system netstat output show
- hundreds of KDC connections in TIME_WAIT state
- hundreds of LDAP connections in ESTABLISHED state
a remote session with the customer was showing hundreds of LDAP connections in CLOSE_WAIT state
Description of problem: A RHEL-7.9 ipa-replica-install remotely render its peer IPA LDAP server unresponsive for a long period of time. Version-Release number of selected component (if applicable): RHEL-7.9 389-ds-base-1.3.10.2-12.el7_9.x86_64 ipa-server-4.6.8-5.el7_9.7.x86_64 redhat-release-server-7.9-6.el7_9.x86_64 How reproducible: N/A Steps to Reproduce: 1. N/A 2. 3. Actual results: remote/master IPA replica LDAP service must be killer and restarted replica 85 install: Done configuring the web interface (httpd). Configuring ipa-otpd [1/2]: starting ipa-otpd [2/2]: configuring ipa-otpd to start on boot Done configuring ipa-otpd. Configuring ipa-custodia [1/4]: Generating ipa-custodia config file [2/4]: Generating ipa-custodia keys [3/4]: starting ipa-custodia [4/4]: configuring ipa-custodia to start on boot Done configuring ipa-custodia. ('SEB:', {'ccache': 'MEMORY:Custodia_fMyDiDoK/iI=', 'client_keytab': '/etc/krb5.keytab'}, Name(host, <OID 1.2.840.113554.1.2.1.4>), None) Configuring certificate server (pki-tomcatd). Estimated time: 3 minutes [1/30]: creating certificate server db [2/30]: setting up initial replication Starting replication, please wait until this has completed. <--------------------------------- More replicating caused 83/84 to hang again Update in progress, 21 seconds elapsed Update succeeded [3/30]: creating ACIs for admin [4/30]: creating installation admin user [5/30]: configuring certificate server instance [6/30]: secure AJP connector [7/30]: reindex attributes [8/30]: exporting Dogtag certificate store pin [9/30]: stopping certificate server instance to update CS.cfg [10/30]: backing up CS.cfg [11/30]: disabling nonces [12/30]: set up CRL publishing [13/30]: enable PKIX certificate path discovery and validation [14/30]: destroying installation admin user [15/30]: starting certificate server instance [16/30]: Finalize replication settings and it all completes to the end, except for a last LDAP connection that fails from the replica 85 to the master 84: 2021-10-12T21:16:27Z ERROR cannot connect to 'ldap://84.edited:389': 2021-10-12T21:16:27Z ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information (END) this looks more like extremely slow connection processing than a complete hang/deadlock ldapsearch with a simple BIND would not prompt for credentials replica 84 ( may have needed more samples ) dn: cn=config nsslapd-idletimeout: 0 nsslapd-ioblocktimeout: 10000 nsslapd-listen-backlog-size: 128 nsslapd-threadnumber: 192 nsslapd-maxdescriptors: 16384 nsslapd-reservedescriptors: 64 dn: cn=monitor threads: 194 currentconnections: 3462 totalconnections: 11486 currentconnectionsatmaxthreads: 0 maxthreadsperconnhits: 1911 dtablesize: 16384 readwaiters: 0 opsinitiated: 943235 opscompleted: 943231 currenttime: 20211012213934Z starttime: 20211012211635Z nbackends: 3 ldapsearch -o ldif-wrap=no -LLLxD cn=Directory\ Manager -W -b cn=monitor -s base connection -> only 71 connections out of 3400 plus with a sign of been blocked, not significant in the sample taken at that moment. already has sysctl net.core.somaxconn = 65535 stack trace has nearly all the threads appear blocked, 199 out of 211 threads, like for example: Thread 199 (Thread 0x7f335a91f700 (LWP 85062)): #0 0x00007f3390db6184 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0 #1 0x00007f339368c80a in slapi_rwlock_rdlock (rwlock=<optimized out>) at ldap/servers/slapd/slapi2nspr.c:246 #2 0x00007f33936a4f7d in vattr_rdlock () at ldap/servers/slapd/vattr.c:188 netstat returned 100s of connections in CLOSE_WAIT state related to the IPA LDAP service from netstat outputs nsslapd-idletimeout with a default value of 0 in use, which means no timeout. I would set this nsslapd-idletimeout to 5mn / 300 seconds, on the replica 84 and 83, for example, from nsslapd-idletimeout: 0 nsslapd-listen-backlog-size: 128 to nsslapd-idletimeout: 300 nsslapd-listen-backlog-size: 2048 this can be related to a non responding LDAP service, and a cascade of problems, including fail over to other replica and causing the same issue again Expected results: yes Additional info: the remote/master IPA LDAP system netstat output show - hundreds of KDC connections in TIME_WAIT state - hundreds of LDAP connections in ESTABLISHED state a remote session with the customer was showing hundreds of LDAP connections in CLOSE_WAIT state