RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 671553 - openldap can't use TLS after a fork()
Summary: openldap can't use TLS after a fork()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: openldap
Version: 6.1
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Jan Vcelak
QA Contact: Ondrej Moriš
URL:
Whiteboard:
Depends On: 636956
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-21 20:31 UTC by Rich Megginson
Modified: 2018-09-07 21:41 UTC (History)
12 users (show)

Fixed In Version: openldap-2.4.23-13.el6
Doc Type: Bug Fix
Doc Text:
- use of any tool which calls fork() and then initiates TLS connection to LDAP server using libldap - TLS negotiation will fail as Mozilla NSS require crypto modules intitialization for each process (even for forked process, because it cannot be detected that the modules were initialized in parent process) - added code which reloads NSS crypto modules before initializing TLS - added code to explicitly disable NSS pkcs11 fork() checking in the software token (note that using a 3rd party pkcs11 library or hardware module will still fail as there is no way to turn off fork() checking there) - TLS negotiation will work as expected in forked processes
Clone Of: 636956
Environment:
Last Closed: 2011-05-19 13:59:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0673 0 normal SHIPPED_LIVE openldap bug fix and enhancement update 2011-05-18 18:10:44 UTC

Description Rich Megginson 2011-01-21 20:31:48 UTC
+++ This bug was initially created as a clone of Bug #636956 +++

Description of problem:

After upgrading to nss_ldap-265-6.fc14.x86_64, authenticating with starttls against my openldap server stops working.

Version-Release number of selected component (if applicable):
nss_ldap-265-6.fc14.x86_64
openldap-servers-2.4.19-4.fc12.x86_64

This works fine with nss_ldap-264-9.fc12.x86_64

If I increase the debug level in /etc/nss_ldap.conf, I get the following output:
TLS certificate verification: Error, -12276: Unknown code ___P 12

--- Additional comment from nalin on 2010-09-23 16:07:02 EDT ---

Hmm, this appears to be coming from openldap's recent changeover to use NSS for TLS functionality.  Adding the openldap package maintainer to the CC list.

--- Additional comment from jvcelak on 2010-09-27 04:58:45 EDT ---

The error codes description can be found on [1]. Quoting:

-12276: SSL_ERROR_BAD_CERT_DOMAIN

"Unable to communicate securely with peer: requested domain name does not match the server's certificate." 

This error code should be returned by the certificate authentication callback function when it detects that the Common Name in the remote server's certificate does not match the hostname sought by the local client, according to the matching rules specified for CERT_VerifyCertName.

--

Ruben, please, can you provide more information about your configuration? Especially which certificate do you use (self-signed, CA signed, ...). Does the hostnames match?

[1] http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslerr.html

--- Additional comment from ruben on 2010-09-27 08:06:39 EDT ---

Hi Jan,

Some more info about our setup:

We use 2 ldap servers, ldap1a.example.org and ldap1b.example.org, behind a virtual ip. So our active ldap server is reachable at ldap.example.org.

To make this work with TLS we use the X509v3 Subject Alternative Name:

X509v3 Subject Alternative Name:
                DNS:ldap, DNS:ldap.example.org, DNS:ldap.priv.example.org, DNS:ldap1a.example.org

The certificates of our ldap servers are signed by our puppet ca. So the ca is self-signed.

This setup works fine on Fedora 11/12/13.

One thing I've noticed, is that we have a required tls_ssf of 256. When authenticating with a F-14 client it shows a ssf of 168, which was 256 on older hosts. I've worked around this by setting the cipher list on the client to TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5, but I think it's a strange change in behavior.

--- Additional comment from ruben on 2010-09-27 08:08:56 EDT ---

in reply to comment #2:

This might be a separate RFE, but it would be nice if the error message showed up in the logs instead of just the error code. That makes debugging a bit easier.

--- Additional comment from jvcelak on 2010-10-12 12:17:48 EDT ---

Ruben, is this a whole error message you get?
Can you please attach your ldap.conf? I'm interested in TLS_CERT* directives.

--- Additional comment from ruben on 2010-10-17 20:58:10 EDT ---

Sure, sorry this took a while, I had to put the machine we saw this on into production, so I've installed a new one with F-14 on it.

Here's my /etc/openldap/ldap.conf (server names changed to protect the innocent ;-)):

URI ldap://ldap.example.org
BASE dc=example,dc=org
TLS_CACERTDIR /etc/openldap/cacerts
TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

The TLS_CIPHER_SUITE line is needed, because we have a tls_ssf configured of 256. Without this line, the cipher suite used by (presumably) mozilla-nss has a ssf of 168.

This is a regression IMHO.

With the configuration above, ldapsearch -xZZ works. Now onto nss_ldap:

/etc/nss_ldap.conf:

uri ldap://example.org
ssl start_tls
tls_cacertfile /etc/openldap/cacerts/ca.pem
tls_checkpeer yes
tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
debug 4

# getent passwd ruben:

[root@ldaptest ~]# getent passwd ruben
TLS certificate verification: Error, -12276: Unknown code ___P 12
ldap_build_search_req ATTRS: uid userPassword uidNumber gidNumber cn homeDirectory loginShell gecos description objectClass
ruben:x:10001:10000:Ruben Kerkhof:/home/ruben:/bin/bash

WTF? It fails to validate the ca certificate but still returns my ldap user

By the way, why is it trying to read my userPassword?

Ok, so nsswitch lookups work, but how about logging in.

/etc/pam_ldap.conf:

uri ldap://example.org
ssl start_tls
tls_cacertdir /etc/openldap/cacerts
pam_password exop
debug 4

# ssh ruben@localhost

no luck, this just hangs.

so lets try to switch off tls:
/etc/pam_ldap.conf: ssl off

doesn't seem to matter, it hangs my vm.

now lets switch of tls in /etc/nss_ldap.conf: ssl off

ah, that works.

So now let's say that there is an issue with my CA and server certs. Lets disable validation in both /etc/nss_ldap.conf and /etc/pam_ldap.conf: tls_checkpeer no.

Now my vm hangs again, and I cannot login.


I have no either what's going on, but
- mozilla-nss doesn't like my certificates, which used to work fine in F-13
- the tls_checkpeer setting gets ignored.

Please advise.

--- Additional comment from jvcelak on 2010-10-18 07:44:41 EDT ---

(In reply to comment #6)
> URI ldap://ldap.example.org
> BASE dc=example,dc=org
> TLS_CACERTDIR /etc/openldap/cacerts
> TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
> 
> The TLS_CIPHER_SUITE line is needed, because we have a tls_ssf configured of
> 256. Without this line, the cipher suite used by (presumably) mozilla-nss has a
> ssf of 168.
> 
> This is a regression IMHO.

CC'ing rmeggins. Rich, what do you think? Does OpenSSL and MozNSS have different default cipher suite?

> [root@ldaptest ~]# getent passwd ruben
> TLS certificate verification: Error, -12276: Unknown code ___P 12
> ldap_build_search_req ATTRS: uid userPassword uidNumber gidNumber cn
> homeDirectory loginShell gecos description objectClass
> ruben:x:10001:10000:Ruben Kerkhof:/home/ruben:/bin/bash
> 
> WTF? It fails to validate the ca certificate but still returns my ldap user

The thing is, that moznss assume CA_CERTDIR to be moznss key/cert db. And if this fails, it will produce the error message you see. Then it will try to use CA_CERTDIR as a directory with PEM certificates, as it was done with OpenSSL.

It means, that you certificate can be OK. The error message is not really lucky.

> By the way, why is it trying to read my userPassword?

I think this is all right. Your password can be set in passwd db. It was done so in the past. To increase security, shadow db was "invented".

> Ok, so nsswitch lookups work, but how about logging in.
> 
> /etc/pam_ldap.conf:
> 
> uri ldap://example.org
> ssl start_tls
> tls_cacertdir /etc/openldap/cacerts
> pam_password exop
> debug 4
> 
> # ssh ruben@localhost
> 
> no luck, this just hangs.
> 
> so lets try to switch off tls:
> /etc/pam_ldap.conf: ssl off
> 
> doesn't seem to matter, it hangs my vm.
> 
> now lets switch of tls in /etc/nss_ldap.conf: ssl off
> 
> ah, that works.
> 
> So now let's say that there is an issue with my CA and server certs. Lets
> disable validation in both /etc/nss_ldap.conf and /etc/pam_ldap.conf:
> tls_checkpeer no.
> 
> Now my vm hangs again, and I cannot login.
> 

No idea, more verbose output could help:
ssh -v (-vvv)
sshd -d (-ddd)

> 
> I have no either what's going on, but
> - mozilla-nss doesn't like my certificates, which used to work fine in F-13
> - the tls_checkpeer setting gets ignored.

It likes your certificates. The error message is misleading.

As for tls_checkpeer, I wouldn't say that he setting gets ignored, maybe the problem is elsewhere.

> 
> Please advise.

--- Additional comment from rmeggins on 2010-10-18 11:11:05 EDT ---

> CC'ing rmeggins. Rich, what do you think? Does OpenSSL and MozNSS have
> different default cipher suite?

Using openldap with openssl, and no cipher suites are specified, which cipher suites are enabled?  All of them?

> It means, that you certificate can be OK. The error message is not really
> lucky.

Yes - it is a poor, misleading error message.

--- Additional comment from ruben on 2010-10-18 11:35:48 EDT ---

> Using openldap with openssl, and no cipher suites are specified, which cipher
> suites are enabled?  All of them?

I'm not sure, I'm still trying to get more debugging output out of openldap. It's awfully quiet about the TLS negotiation phase.

--- Additional comment from rmeggins on 2010-10-18 13:08:58 EDT ---

One more thing - what ldap server are you using, and what TLS configuration settings (including cipher suites) are you using?

--- Additional comment from ruben on 2010-10-18 16:16:36 EDT ---

Just the stock OpenLdap in F-13:

ruben@ldap1b ~]$ rpm -q openldap-servers
openldap-servers-2.4.21-10.fc13.x86_64

The relevant part of my slapd.conf:

TLSCACertificateFile /etc/openldap/cacerts/ca.pem
TLSCertificateFile /etc/pki/tls/certs/slapd.pem
TLSCertificateKeyFile /etc/pki/tls/private/slapd.key
TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

security ssf=256

--- Additional comment from ruben on 2010-10-18 18:37:47 EDT ---

Created attachment 454225 [details]
slapd log showing connection failures

--- Additional comment from ruben on 2010-10-18 18:38:39 EDT ---

Strangely enough, logging in as an ldap user at the console works, over ssh doesn't. Now I'm confused...

I've increased the debug level of sshd. It looks like the pam authentication succeeds, but then my ssh connection just hangs. No prompt nothing.

Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: initializing for "ruben"
Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: setting PAM_RHOST to "5354ad7e.cm-6-5c.dynamic.ziggo.nl"
Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: setting PAM_TTY to "ssh"
Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 50 used once, disabling now
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: monitor_read: checking request 3
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_answer_authserv: service=ssh-connection, style=
Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 3 used once, disabling now
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: monitor_read: checking request 4
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_answer_authrole: role=
Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 4 used once, disabling now
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1224]: debug1: userauth-request for user ruben service ssh-connection method password
Oct 18 23:51:55 ldaptest sshd[1224]: debug1: attempt 1 failures 0
Oct 18 23:51:55 ldaptest sshd[1224]: debug2: input_userauth_request: try method password
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password entering
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 11
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password: waiting for MONITOR_ANS_AUTHPASSWORD
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive_expect entering: type 12
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: monitor_read: checking request 11
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: sshpam_passwd_conv called with 1 messages
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: PAM: password authentication accepted for ruben
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_answer_authpassword: sending result 1
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_send entering: type 12
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password: user authenticated
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_do_pam_account entering
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 51
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive_expect entering: type 52
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive_expect entering: type 51
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: do_pam_account: called
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: do_pam_account pam_acct_mgmt = 0 (Success)
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_send entering: type 52
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_do_pam_account returning 1
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Sending new keys: 0x7fcaa5a54960 0x7fcaa5a49e60
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_newkeys_to_blob: converting 0x7fcaa5a54960
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_newkeys_to_blob: converting 0x7fcaa5a49e60
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: New keys have been sent
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Sending compression state
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 25
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Finished sending state
Oct 18 23:51:55 ldaptest sshd[1223]: Accepted password for ruben from 83.84.173.126 port 45153 ssh2
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: monitor_child_preauth: ruben has been authenticated by privileged process
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Waiting for new keys
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive_expect entering: type 25
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_newkeys_from_blob: 0x7fcaa5b40020(122)
Oct 18 23:51:55 ldaptest sshd[1223]: debug2: mac_setup: found hmac-md5
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Waiting for second key
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_newkeys_from_blob: 0x7fcaa5b40020(122)
Oct 18 23:51:55 ldaptest sshd[1223]: debug2: mac_setup: found hmac-md5
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Getting compression state
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Getting Network I/O buffers
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_share_sync: Share sync
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_share_sync: Share sync end
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: temporarily_use_uid: 10001/10000 (e=0/0)
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: restore_uid: 0/0
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: SELinux support disabled
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: PAM: establishing credentials
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: opening session
Oct 18 23:51:55 ldaptest sshd[1223]: pam_unix(sshd:session): session opened for user ruben by (uid=0)
Oct 18 23:51:55 ldaptest sshd[1223]: User child is on pid 1225
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Oct 18 23:51:59 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Oct 18 23:52:07 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...
Oct 18 23:52:23 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...
Oct 18 23:52:55 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 64 seconds)...
Oct 18 23:53:59 ldaptest sshd[1225]: nss_ldap: could not search LDAP server - Server is unavailable
Oct 18 23:53:59 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Oct 18 23:54:03 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Oct 18 23:54:11 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...
Oct 18 23:54:27 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...

Lets set bind_policy to soft in /etc/nsswitch.conf. Wow, that does the trick, the ssh connection succeeds.

Comment #12 is my ldap log which show the ldap connection failures.

--- Additional comment from rmeggins on 2010-10-18 21:45:04 EDT ---

did you mean /etc/ldap.conf instead of /etc/nsswitch.conf?
The ldap log doesn't correspond to the sshd log?

--- Additional comment from ruben on 2010-10-19 06:50:28 EDT ---

Actually, I meant /etc/nss_ldap.conf :-) /etc/ldap.conf doesn't exists anymore in F14.

The ldap log is from half an hour later, but I was seeing the the exact same behavior as before.

--- Additional comment from ruben on 2010-10-19 15:07:36 EDT ---

I've changed my test setup a little, since I can't increase the logging on the production ldap server.

I'm now testing on a clean F-14 vm against openldap-servers-2.4.22-7.fc14.x86_64, with the following configuration:

/etc/nss_ldap.conf
base dc=example,dc=org
nss_initgroups_minimum_uid 500
uri ldap://ldaptest.example.org
ssl start_tls
tls_cacertfile /etc/openldap/cacerts/ca.pem
tls_checkpeer yes
tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
nss_base_passwd        ou=People,dc=example,dc=org?one
nss_base_shadow        ou=People,dc=example,dc=org?one
nss_base_group         ou=Groups,dc=example,dc=org?one

/etc/pam_ldap.conf:
base dc=example,dc=org
uri ldap://ldaptest.example.org
ssl start_tls
tls_checkpeer yes
tls_cacertfile /etc/openldap/cacerts/ca.pem
pam_password exop
tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

/etc/openldap/ldap.conf:
URI ldap://ldaptest.example.org
BASE dc=example,dc=org
TLS_CACERT /etc/openldap/cacerts/ca.pem
TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
TLS_REQCERT demand

/etc/openldap/slapd.conf (relevant part):
TLSCACertificateFile /etc/openldap/cacerts/ca.pem
TLSCertificateFile /etc/pki/tls/certs/slapd.pem
TLSCertificateKeyFile /etc/pki/tls/private/slapd.key
TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

security tls=256

I've created a self-signed CA, and used it to create a certificate for CN=ldaptest.example.org

Ok, let's do some tests:
- getent passwd works, returns output immediately
- ldapsearch -xZZ works, returns output immediately
- ssh ruben@localhost works, but takes about 6 to 7 minutes.

I'll attach the slapd and sshd logs. Looking at the slapd log I see 44 ldap connections, of which 36 failed.

This is all with nscd disabled.

--- Additional comment from ruben on 2010-10-19 15:10:13 EDT ---

Created attachment 454415 [details]
slapd log showing connection failures while logging in with ssh (no nscd)

--- Additional comment from ruben on 2010-10-19 15:11:29 EDT ---

Created attachment 454416 [details]
sshd log showing nss_ldap reconnection attempts

--- Additional comment from ruben on 2010-10-19 15:26:12 EDT ---

Created attachment 454421 [details]
slapd log showing 2 successful connections

--- Additional comment from ruben on 2010-10-19 15:27:30 EDT ---

Created attachment 454422 [details]
sshd log showing successful login without nss_ldap reconnecting

--- Additional comment from ruben on 2010-10-19 15:28:31 EDT ---

The attachments in comment #19 and #20 show that I'm able to login with ssh as soon as I start nscd.

--- Additional comment from rmeggins on 2010-10-19 15:35:25 EDT ---

There is some sort of TLS problem:

Oct 19 20:29:05 ldaptest slapd[1436]: connection_read(22): TLS accept failure error=-1 id=1007, closing

Unfortunately, the only way to get more detailed information is to use the trace level logging (-d 1) which is very, very verbose.  But it should tell us what the low level moznss problem is.

--- Additional comment from ruben on 2010-10-19 19:31:55 EDT ---

Created attachment 454471 [details]
slapd -d 1 trace output

--- Additional comment from rmeggins on 2010-10-20 17:46:45 EDT ---

Hmm - there are a couple of different failures:

op tag 0x77, time 1287530783
...
TLS: error: accept - force handshake failure: 0 - error -5938:Encountered end of file

and
op tag 0x77, time 1287530895
...
connection_read(18): unable to get TLS client DN, error=49 id=1023


The first one would seem to indicate that the client just dropped the connection, or gave up on the TLS handshake.  The second seems to indicate that the certificate sent by the TLS peer is NULL.

Are you able to reproduce any of these problems with the ldapsearch command line tool, or is it necessary to configure nss_ldap/pam_ldap/sshd/nscd to reproduce this problem?

--- Additional comment from ruben on 2010-10-20 20:09:00 EDT ---

Hi Rich,

I'm not able to reproduce this with just ldapsearch.

About the second failure, what kind of certificate is it expecting?
I'm not (knowingly) using client side certificates.

--- Additional comment from ruben on 2010-10-20 21:50:35 EDT ---

http://www.zytrax.com/books/ldap/ch12/ says that error 49 is harmless if TLSVerifyClient=demand.

I've been reading throught the openldap source.
and indeed, connection_read in connection.c unconditionally calls dnX509peerNormalize().

--- Additional comment from rmeggins on 2010-10-20 22:26:19 EDT ---

(In reply to comment #26)
> http://www.zytrax.com/books/ldap/ch12/ says that error 49 is harmless if
> TLSVerifyClient=demand.
> 
> I've been reading throught the openldap source.
> and indeed, connection_read in connection.c unconditionally calls
> dnX509peerNormalize().

Yes, so that looks ok.

--- Additional comment from ruben on 2010-10-20 22:33:19 EDT ---

Ok, back to failure no. 1 then.

Just to verify that there's no issue with the ldapserver running on localhost, I installed a second vm with F-13, and pointed /etc/ldap.conf at my first server. Everything works fine on F-13.

I've upgraded the second VM to F-14 and I'm seeing the same failures. So it definitely looks like a client side issue.

Are there any other steps I could take?

--- Additional comment from rmeggins on 2010-10-20 22:53:12 EDT ---

(In reply to comment #28)
> Ok, back to failure no. 1 then.
> 
> Just to verify that there's no issue with the ldapserver running on localhost,
> I installed a second vm with F-13, and pointed /etc/ldap.conf at my first
> server. Everything works fine on F-13.
> 
> I've upgraded the second VM to F-14 and I'm seeing the same failures. So it
> definitely looks like a client side issue.
> 
> Are there any other steps I could take?

I don't know if it would be possible to enable the ldap library trace debug level in the client side software.  When you use e.g. ldapsearch -d 1, this calls ldap_set_option(ld, LDAP_OPT_DEBUG_LEVEL, thelevel)
e.g.
 int debuglevel = 1;
 ldap_set_option(ld, LDAP_OPT_DEBUG_LEVEL, &debuglevel);

It looks as though you can set this in nss_ldap - man nss_ldap
       debug <level>
              Specifies  the  debug  level used for logging by the LDAP client
              library.

You might try

 logdir /var/tmp/ldap
 debug 1

in /etc/ldap.conf to see if that gives us a clue.  I suspect it may have something to do with the way the NSS FIPS softoken deals with fork() calls - since it works with ncsd, it may be fork()ing something when not using ncsd.

--- Additional comment from ruben on 2010-10-20 23:11:35 EDT ---

Ok, that works, but it's a lot of information (which also ends up at stdout).

--- Additional comment from ruben on 2010-10-20 23:13:07 EDT ---

Created attachment 454717 [details]
trace output from nss_ldap while ssh'in

--- Additional comment from rmeggins on 2010-10-20 23:21:02 EDT ---

Crud - still no clue :P

Back to the ssf setting - does it work if you set ssf=168 (or don't set ssf at all), but fail if you set ssf=256?  Or was that a red herring?

Thanks for your assistance with this issue.  Looks like the only recourse left is to set up a similar environment and run gdb.

--- Additional comment from ruben on 2010-10-21 10:13:46 EDT ---

Nope, ssf doesn't influence it at all.

I've tried running su under gdb, but I don't get very far:

[root@ldaptest ~]# gdb --args su - ruben
GNU gdb (GDB) Fedora (7.2-16.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /bin/su...Reading symbols from /usr/lib/debug/bin/su.debug...done.
done.
(gdb) b _nss_ldap_getbyname
Function "_nss_ldap_getbyname" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_nss_ldap_getbyname) pending.
(gdb) r
Starting program: /bin/su - ruben
[Thread debugging using libthread_db enabled]
Cannot find new threads: generic error

I could give you access to the VM if you want?

--- Additional comment from rmeggins on 2010-10-21 10:23:55 EDT ---

(In reply to comment #33)
> Nope, ssf doesn't influence it at all.
> 
> I've tried running su under gdb, but I don't get very far:
> 
> [root@ldaptest ~]# gdb --args su - ruben
> GNU gdb (GDB) Fedora (7.2-16.fc14)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /bin/su...Reading symbols from
> /usr/lib/debug/bin/su.debug...done.
> done.
> (gdb) b _nss_ldap_getbyname
> Function "_nss_ldap_getbyname" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 1 (_nss_ldap_getbyname) pending.
> (gdb) r
> Starting program: /bin/su - ruben
> [Thread debugging using libthread_db enabled]
> Cannot find new threads: generic error
> 
> I could give you access to the VM if you want?

Sure!  That would be very helpful.  Please email me directly with the information.

--- Additional comment from rmeggins on 2010-10-21 16:14:06 EDT ---

Thanks for your help.  I believe this problem is caused by nss_ldap calling fork() after the NSS software crypto token has been initialized.  I'm going to try to reproduce the problem with a simple test program.

--- Additional comment from rmeggins on 2010-10-21 16:27:23 EDT ---

Also try this: in the /etc/sysconfig/sshd add

export NSS_STRICT_NOFORK=DISABLED

then service sshd restart

Then try ssh - I did this on the test system and I was able to ssh in as a regular user without using nscd

Note that this is not the final, recommended solution, but you could use it until we figure out how to solve this fork()ing problem.

--- Additional comment from ruben on 2010-10-22 04:29:02 EDT ---

I can live with the workaround for the moment.

Thanks again,

Ruben

--- Additional comment from rmeggins on 2010-10-22 12:19:12 EDT ---

Created attachment 455133 [details]
python script used on F-14 to reproduce

This python script is used to show the bug and the workaround.  Usage:
python olnsstest.py certdir ldapurl binddn bindpw
e.g.
python olnsstest.py /path/to/cert8.db/dir ldap://localhost:389 "cn=directory manager" secret

certdir - full absolute path of directory containing cert8.db, key3.db, secmod.db
ldapurl - ldap (not ldaps) url - the script uses start_tls to start TLS
binddn, bindpw - bind DN and password to use with simple_bind

Requires: Directory Server with TLS capability - certdir which contains CA cert for directory server - python-ldap and python-nss packages on Fedora 14 and later

To reproduce the problem - comment out the line
#os.environ["NSS_STRICT_NOFORK"] = "DISABLED"

If you remove the os.fork(), or remove the nss_init(), it works

If you do the nss_init(), followed by the fork(), followed by the ldap calls, without the env. var. set, you will get a connection failure.

--- Additional comment from mike on 2011-01-18 20:50:48 EST ---

*** Bug 669900 has been marked as a duplicate of this bug. ***

--- Additional comment from mike on 2011-01-18 20:51:33 EST ---

*** Bug 663485 has been marked as a duplicate of this bug. ***

--- Additional comment from mike on 2011-01-18 21:42:01 EST ---

Any work being done on this bug?

My F14 x86_64 w/ 389-ds+Samba server is also affected. The workaround suffices for now, but it shouldn't be a permenant solution.

--- Additional comment from rmeggins on 2011-01-19 12:35:44 EST ---

https://admin.fedoraproject.org/updates/nss-util-3.12.9-1.fc14,nss-softokn-3.12.9-1.fc14,nss-3.12.9-1.fc14,nspr-4.8.7-1.fc14?_csrf_token=4f505a0830875d09f36168715064980b2733a4bc

nss-3.12.9 is available in Testing - please try that and see if it fixes this problem.

--- Additional comment from bbuesker on 2011-01-19 16:20:39 EST ---

nss-3.12.9 did not appear to resolve the problem for me. If I have nscd running, I can su from root to an LDAP user, or ssh in as that LDAP user. However, if I stop nscd, neither the su nor the ssh succeed. If I set NSS_STRICT_NOFORK=DISABLED, then the su works (I did not try ssh with this setting).

--- Additional comment from rmeggins on 2011-01-21 15:30:53 EST ---

It turns out that some additional changes were needed to be made to openldap to take advantage of the new SECMOD_RestartModules API in NSS 3.12.9.  An ITS has been opened in openldap upstream and a patch submitted:
http://www.openldap.org/its/index.cgi?findid=6802

Comment 4 Jan Vcelak 2011-01-26 09:51:09 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
- use of any tool which calls fork() and then initiates TLS connection to LDAP server using libldap
- TLS negotiation will fail as Mozilla NSS require crypto modules intitialization for each process (even for forked process, because it cannot be detected that the modules were initialized in parent process)
- added code which reloads NSS crypto modules before initializing TLS
- TLS negotiation will work as expected in forked processes

Comment 5 Rich Megginson 2011-01-28 15:39:13 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,5 @@
 - use of any tool which calls fork() and then initiates TLS connection to LDAP server using libldap
 - TLS negotiation will fail as Mozilla NSS require crypto modules intitialization for each process (even for forked process, because it cannot be detected that the modules were initialized in parent process)
 - added code which reloads NSS crypto modules before initializing TLS
+- added code to explicitly disable NSS pkcs11 fork() checking in the software token (note that using a 3rd party pkcs11 library or hardware module will still fail as there is no way to turn off fork() checking there)
 - TLS negotiation will work as expected in forked processes

Comment 14 errata-xmlrpc 2011-05-19 13:59:52 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0673.html


Note You need to log in before you can comment on or make changes to this bug.