Description of problem:
I hope you can assist with a problem I'm having with openldap and concurrency. Please let me know if there's a different place to report this.
I am running CentOS 7 on x86_64, and starting with openldap-2.4.44-13, I am seeing errors when running multiple processes concurrently that connect to a LDAP server over SSL. It seems there is some sort of race condition with multiple processes trying to write to the cache directory at the same time (such as /tmp/openldap-tlsmc-certs--CC). I confirmed the problem still occurs with the latest openldap-2.4.44-20. If I downgrade to openldap-2.4.44-5, I can't get the problem to occur.
When the issue happens, all calls fail with an error "Can't contact LDAP server". If I try to connect again, I get the same error as the cache seems to be corrupt.
To fix it, I have to manually delete the cache directory in /tmp, re-run one of the processes by itself, and then the other processes can run successfully as the cache is now rebuilt.
Please let me know if I can supply more information to help with this issue.
Version-Release number of selected component (if applicable):
If I run the below command multiple times concurrently, I can get the problem to reproduce with an error of "Can't contact LDAP server".
Steps to Reproduce:
1. Run the below command in multiple ssh sessions concurrently:
ldapsearch -H "ldaps://server"
"Can't contact LDAP server"
Connection is made to LDAP server
thanks for the report. The concurrency handling in this case is only on the level of the process itself. When multiple process try to do the extraction using the same configuration, a collision may occur (which is indeed a bug). We'll look into ways how to fix this bug efficiently.
In the meantime, two workarounds come on my mind:
- Do not use NSS database configuration, use PEM files (OpenSSL style of configuration) for the TLS_* options.
- Before the troublesome calls, do a single dummy (e.g. ldapwhoami) call with the very same configuration. This will create the /tmp/openldap-tlsmc-* directory structure and all the subsequent calls will only read files from there.
given the support level in this phase of RHEL 7, and given there is a workaround by using PEM files instead of NSS DB, I'm closing this bug as WONTFIX. Should there be sufficient justification for a need to develop a fix, please provide the justification, preferably contacting our customer support.
Thank you for you understanding.