Bug 1658423

Summary: openldap and concurrency error connecting to ldaps server
Product: Red Hat Enterprise Linux 7 Reporter: ryan.brothers
Component: openldapAssignee: Matus Honek <mhonek>
Status: CLOSED WONTFIX QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: pkis
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-06 15:24:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description ryan.brothers 2018-12-12 02:50:05 UTC
Description of problem:
I hope you can assist with a problem I'm having with openldap and concurrency.  Please let me know if there's a different place to report this.

I am running CentOS 7 on x86_64, and starting with openldap-2.4.44-13, I am seeing errors when running multiple processes concurrently that connect to a LDAP server over SSL.  It seems there is some sort of race condition with multiple processes trying to write to the cache directory at the same time (such as /tmp/openldap-tlsmc-certs--CC).  I confirmed the problem still occurs with the latest openldap-2.4.44-20.  If I downgrade to openldap-2.4.44-5, I can't get the problem to occur.

When the issue happens, all calls fail with an error "Can't contact LDAP server".  If I try to connect again, I get the same error as the cache seems to be corrupt.

To fix it, I have to manually delete the cache directory in /tmp, re-run one of the processes by itself, and then the other processes can run successfully as the cache is now rebuilt.

Please let me know if I can supply more information to help with this issue.

Version-Release number of selected component (if applicable):
openldap-2.4.44-20

How reproducible:
If I run the below command multiple times concurrently, I can get the problem to reproduce with an error of "Can't contact LDAP server".

Steps to Reproduce:
1. Run the below command in multiple ssh sessions concurrently:

ldapsearch -H "ldaps://server"

Actual results:
"Can't contact LDAP server"

Expected results:
Connection is made to LDAP server

Comment 2 Matus Honek 2018-12-17 14:40:46 UTC
Hello,

thanks for the report. The concurrency handling in this case is only on the level of the process itself. When multiple process try to do the extraction using the same configuration, a collision may occur (which is indeed a bug). We'll look into ways how to fix this bug efficiently.

In the meantime, two workarounds come on my mind:
- Do not use NSS database configuration, use PEM files (OpenSSL style of configuration) for the TLS_* options.
- Before the troublesome calls, do a single dummy (e.g. ldapwhoami) call with the very same configuration. This will create the /tmp/openldap-tlsmc-* directory structure and all the subsequent calls will only read files from there.

Regards.

Comment 4 Matus Honek 2019-06-06 15:24:09 UTC
Hello,

given the support level in this phase of RHEL 7, and given there is a workaround by using PEM files instead of NSS DB, I'm closing this bug as WONTFIX. Should there be sufficient justification for a need to develop a fix, please provide the justification, preferably contacting our customer support.

Thank you for you understanding.