Bug 1833044

Summary: 389-ds-base is bound to two competing SSL libraries (NSS, openssl)
Product: Red Hat Enterprise Linux 9 Reporter: Graham Leggett <minfrin>
Component: 389-ds-baseAssignee: LDAP Maintainers <ldap-maint>
Status: CLOSED WONTFIX QA Contact: RHDS QE <ds-qe-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: ldap-maint, pasik, spichugi, tbordaz, vashirov
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-07 07:27:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Graham Leggett 2020-05-07 17:30:14 UTC
Description of problem:

389-ds-base is bound to two competing SSL libraries (NSS, openssl), breaking SSL replication.

This renders 389ds useless, and in turn RHEL8 useless, and brings an upgrade project to a halt.

Version-Release number of selected component (if applicable):

389-ds-base-1.4.1.3-7

How reproducible:

Always.

Steps to Reproduce:
1. Configure replication where RHEL8 is the sender.
2. After replication fails, check SSL libraries bound to ns-slapd:

ldd /usr/sbin/ns-slapd
	libssl3.so => /lib64/libssl3.so (0x00007f53c63fa000)
	libsmime3.so => /lib64/libsmime3.so (0x00007f53c61d1000)
	libnss3.so => /lib64/libnss3.so (0x00007f53c5ea1000)
	libnssutil3.so => /lib64/libnssutil3.so (0x00007f53c5c70000)
	libnspr4.so => /lib64/libnspr4.so (0x00007f53c5626000)
	libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f53c4033000)
	libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f53c376d000)

3. Notice that 389-ds is bound to both OpenSSL and NSS.
Actual results:

Replication fails.

Expected results:

Replication succeeds as it did in RHEL7.

Additional info:

Log message is as follows:

[07/May/2020:19:17:59.973737716 +0200] - ERR - slapi_ldap_bind - Could not send bind request for id [cn=Replication Manager,cn=config] authentication mechanism [SIMPLE]: error -1 (Can't contact LDAP server), system error -5987 (Invalid function argument.), network error 0 (Unknown error, host "x.x.x:636")

Key words above are "network error 0", which is "success".

Comment 1 Viktor Ashirov 2020-05-22 12:53:10 UTC
That fact that ns-slapd is linked with both openssl and nss is by design. They are used for different purposes in 389-ds. 

Could you please provide more information about your replication topology?
What is the receiver and what is its version?
Do you have tcpdumps from both sender and receiver at the time of the error?
What is your crypto policy set to on RHEL8 (update-crypto-policies --show)?

Thanks.

Comment 2 Graham Leggett 2020-06-18 12:28:06 UTC
To make this easier.

Back when we upgraded Ubuntu from Trusty to Xenial, our master-slave replicating LDAP servers all stopped replicating with a similar/same error as the above.

Up 389ds went on the debugger. Long story short, 389ds on Ubuntu Xenial is bound to two competing crypto libraries. The 389ds server uses NSS, where the configuration is passed around as directory paths. The 389ds client uses gnutls, where the configuration is passed around as discrete PEM files.

The replication feature of 389ds glues the 389ds server to the 389ds client, thus giving the ability to replicate. But 389ds has no idea of the concept of two different crypto libraries, it only knows about NSS. NSS configuration pointing at a directory is passed to gnutls expecting a file, and gnutls fails, obviously. Add to that some unfinished error handling, and you get the bizarre error "error: success".

We fixed it back then by dumping Ubuntu and replacing with RHEl7. Sanity returned, and our directories carried on replicating.

Then, separately, we tried to upgrade to RHEL8. Bang, replication broken with "error:success". A check of the crypto libraries bound to the 389ds server - there are now two of them, one NSS, the other OpenSSL.

So, to replicate this:

- Deploy two RHEL8 389ds directory servers.
- Enable replication between the two, master master, master slave, doesn't matter.
- Turn on encryption.
- Replication breaks.

Comment 3 Viktor Ashirov 2020-06-18 13:37:07 UTC
Could you please provide more information about how exactly does the replication break? Can you provide replication error logs (please see [1] on how to enable replication logging)?
How do you enable replication between masters? How do you turn on encryption? Are you following the official documentation [2][3]?

What are the versions of 389-ds-base that you're using? (rpm -q 389-ds-base)

Do you have tcpdumps from both sender and receiver at the time of the error?

What is your crypto policy set to on RHEL8 (update-crypto-policies --show)?

Please provide this information so we can help you resolve your issue. Without it it's hard to tell what exactly is the problem.

[1] https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html-single/administration_guide/index#Managing_Replication-Troubleshooting_Replication_Related_Problems
[2] https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html-single/administration_guide/index#setting_up_multi-master_replication_using_the_command_line
[3] https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html-single/administration_guide/index#enabling_tls

Comment 5 Graham Leggett 2020-12-24 10:14:07 UTC
The original reason for breakage was the passing of configuration parameters understood by NSS (directories) being passed to openssl (which wants files) and causing chaos.

Subsequently a hack was created where the certs and keys in the NSS database were temporarily exported as files to be visible to openssl, but this hack was broken in two ways:

- HSMs/Smartcards can no longer be used.
- OpenLDAP certificate handling with openssl works differently to NSS and so the hack doesn't work - https://bugzilla.redhat.com/show_bug.cgi?id=1771979

To fix this, teach 389ds server side code to use openssl, or revert back to the 389ds client side code using NSS.

Comment 7 RHEL Program Management 2021-11-07 07:27:01 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.