636956 – openldap can't use TLS after a fork()

Bug 636956 - openldap can't use TLS after a fork()

Summary: openldap can't use TLS after a fork()

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	openldap
Sub Component:
Version:	14
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Jan Vcelak
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	663485 669900 (view as bug list)
Depends On:
Blocks:	671553
TreeView+	depends on / blocked

Reported:	2010-09-23 19:03 UTC by Ruben Kerkhof
Modified:	2013-03-04 01:27 UTC (History)
CC List:	9 users (show)
Fixed In Version:	openldap-2.4.23-10.fc14
Clone Of:
Clones:	671553 (view as bug list)
Environment:
Last Closed:	2011-04-21 05:33:10 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
slapd log showing connection failures (389.23 KB, text/plain) 2010-10-18 22:37 UTC, Ruben Kerkhof	no flags	Details
slapd log showing connection failures while logging in with ssh (no nscd) (450.23 KB, text/plain) 2010-10-19 19:10 UTC, Ruben Kerkhof	no flags	Details
sshd log showing nss_ldap reconnection attempts (28.47 KB, text/plain) 2010-10-19 19:11 UTC, Ruben Kerkhof	no flags	Details
slapd log showing 2 successful connections (41.13 KB, text/plain) 2010-10-19 19:26 UTC, Ruben Kerkhof	no flags	Details
sshd log showing successful login without nss_ldap reconnecting (22.06 KB, text/plain) 2010-10-19 19:27 UTC, Ruben Kerkhof	no flags	Details
slapd -d 1 trace output (887.10 KB, text/plain) 2010-10-19 23:31 UTC, Ruben Kerkhof	no flags	Details
trace output from nss_ldap while ssh'in (381.75 KB, text/plain) 2010-10-21 03:13 UTC, Ruben Kerkhof	no flags	Details
python script used on F-14 to reproduce (992 bytes, text/plain) 2010-10-22 16:19 UTC, Rich Megginson	no flags	Details
strace -f -t su - <LDAP username> (1.70 MB, text/plain) 2011-01-26 23:34 UTC, Brian Buesker	no flags	Details
openldap-2.4.24-use-static-for-nofork-envvar-20110310.patch (865 bytes, patch) 2011-03-10 15:21 UTC, Rich Megginson	no flags	Details \| Diff
gdm crash log (6.39 KB, text/plain) 2011-03-10 17:38 UTC, Kapetanakis Giannis	no flags	Details
gdm crash debug (5.31 KB, text/plain) 2011-03-10 17:38 UTC, Kapetanakis Giannis	no flags	Details
openldap-2.4.24-use-strdup-for-nofork-envvar-20110310.patch (864 bytes, patch) 2011-03-10 17:52 UTC, Rich Megginson	no flags	Details \| Diff
Show Obsolete (2) View All

Description Ruben Kerkhof 2010-09-23 19:03:30 UTC

Description of problem:

After upgrading to nss_ldap-265-6.fc14.x86_64, authenticating with starttls against my openldap server stops working.

Version-Release number of selected component (if applicable):
nss_ldap-265-6.fc14.x86_64
openldap-servers-2.4.19-4.fc12.x86_64

This works fine with nss_ldap-264-9.fc12.x86_64

If I increase the debug level in /etc/nss_ldap.conf, I get the following output:
TLS certificate verification: Error, -12276: Unknown code ___P 12

Comment 1 Nalin Dahyabhai 2010-09-23 20:07:02 UTC

Hmm, this appears to be coming from openldap's recent changeover to use NSS for TLS functionality.  Adding the openldap package maintainer to the CC list.

Comment 2 Jan Vcelak 2010-09-27 08:58:45 UTC

The error codes description can be found on [1]. Quoting:

-12276: SSL_ERROR_BAD_CERT_DOMAIN

"Unable to communicate securely with peer: requested domain name does not match the server's certificate." 

This error code should be returned by the certificate authentication callback function when it detects that the Common Name in the remote server's certificate does not match the hostname sought by the local client, according to the matching rules specified for CERT_VerifyCertName.

--

Ruben, please, can you provide more information about your configuration? Especially which certificate do you use (self-signed, CA signed, ...). Does the hostnames match?

[1] http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslerr.html

Comment 3 Ruben Kerkhof 2010-09-27 12:06:39 UTC

Hi Jan,

Some more info about our setup:

We use 2 ldap servers, ldap1a.example.org and ldap1b.example.org, behind a virtual ip. So our active ldap server is reachable at ldap.example.org.

To make this work with TLS we use the X509v3 Subject Alternative Name:

X509v3 Subject Alternative Name:
                DNS:ldap, DNS:ldap.example.org, DNS:ldap.priv.example.org, DNS:ldap1a.example.org

The certificates of our ldap servers are signed by our puppet ca. So the ca is self-signed.

This setup works fine on Fedora 11/12/13.

One thing I've noticed, is that we have a required tls_ssf of 256. When authenticating with a F-14 client it shows a ssf of 168, which was 256 on older hosts. I've worked around this by setting the cipher list on the client to TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5, but I think it's a strange change in behavior.

Comment 4 Ruben Kerkhof 2010-09-27 12:08:56 UTC

in reply to comment #2:

This might be a separate RFE, but it would be nice if the error message showed up in the logs instead of just the error code. That makes debugging a bit easier.

Comment 5 Jan Vcelak 2010-10-12 16:17:48 UTC

Ruben, is this a whole error message you get?
Can you please attach your ldap.conf? I'm interested in TLS_CERT* directives.

Comment 6 Ruben Kerkhof 2010-10-18 00:58:10 UTC

Sure, sorry this took a while, I had to put the machine we saw this on into production, so I've installed a new one with F-14 on it.

Here's my /etc/openldap/ldap.conf (server names changed to protect the innocent ;-)):

URI ldap://ldap.example.org
BASE dc=example,dc=org
TLS_CACERTDIR /etc/openldap/cacerts
TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

The TLS_CIPHER_SUITE line is needed, because we have a tls_ssf configured of 256. Without this line, the cipher suite used by (presumably) mozilla-nss has a ssf of 168.

This is a regression IMHO.

With the configuration above, ldapsearch -xZZ works. Now onto nss_ldap:

/etc/nss_ldap.conf:

uri ldap://example.org
ssl start_tls
tls_cacertfile /etc/openldap/cacerts/ca.pem
tls_checkpeer yes
tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
debug 4

# getent passwd ruben:

[root@ldaptest ~]# getent passwd ruben
TLS certificate verification: Error, -12276: Unknown code ___P 12
ldap_build_search_req ATTRS: uid userPassword uidNumber gidNumber cn homeDirectory loginShell gecos description objectClass
ruben:x:10001:10000:Ruben Kerkhof:/home/ruben:/bin/bash

WTF? It fails to validate the ca certificate but still returns my ldap user

By the way, why is it trying to read my userPassword?

Ok, so nsswitch lookups work, but how about logging in.

/etc/pam_ldap.conf:

uri ldap://example.org
ssl start_tls
tls_cacertdir /etc/openldap/cacerts
pam_password exop
debug 4

# ssh ruben@localhost

no luck, this just hangs.

so lets try to switch off tls:
/etc/pam_ldap.conf: ssl off

doesn't seem to matter, it hangs my vm.

now lets switch of tls in /etc/nss_ldap.conf: ssl off

ah, that works.

So now let's say that there is an issue with my CA and server certs. Lets disable validation in both /etc/nss_ldap.conf and /etc/pam_ldap.conf: tls_checkpeer no.

Now my vm hangs again, and I cannot login.


I have no either what's going on, but
- mozilla-nss doesn't like my certificates, which used to work fine in F-13
- the tls_checkpeer setting gets ignored.

Please advise.

Comment 7 Jan Vcelak 2010-10-18 11:44:41 UTC

(In reply to comment #6)
> URI ldap://ldap.example.org
> BASE dc=example,dc=org
> TLS_CACERTDIR /etc/openldap/cacerts
> TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
> 
> The TLS_CIPHER_SUITE line is needed, because we have a tls_ssf configured of
> 256. Without this line, the cipher suite used by (presumably) mozilla-nss has a
> ssf of 168.
> 
> This is a regression IMHO.

CC'ing rmeggins. Rich, what do you think? Does OpenSSL and MozNSS have different default cipher suite?

> [root@ldaptest ~]# getent passwd ruben
> TLS certificate verification: Error, -12276: Unknown code ___P 12
> ldap_build_search_req ATTRS: uid userPassword uidNumber gidNumber cn
> homeDirectory loginShell gecos description objectClass
> ruben:x:10001:10000:Ruben Kerkhof:/home/ruben:/bin/bash
> 
> WTF? It fails to validate the ca certificate but still returns my ldap user

The thing is, that moznss assume CA_CERTDIR to be moznss key/cert db. And if this fails, it will produce the error message you see. Then it will try to use CA_CERTDIR as a directory with PEM certificates, as it was done with OpenSSL.

It means, that you certificate can be OK. The error message is not really lucky.

> By the way, why is it trying to read my userPassword?

I think this is all right. Your password can be set in passwd db. It was done so in the past. To increase security, shadow db was "invented".

> Ok, so nsswitch lookups work, but how about logging in.
> 
> /etc/pam_ldap.conf:
> 
> uri ldap://example.org
> ssl start_tls
> tls_cacertdir /etc/openldap/cacerts
> pam_password exop
> debug 4
> 
> # ssh ruben@localhost
> 
> no luck, this just hangs.
> 
> so lets try to switch off tls:
> /etc/pam_ldap.conf: ssl off
> 
> doesn't seem to matter, it hangs my vm.
> 
> now lets switch of tls in /etc/nss_ldap.conf: ssl off
> 
> ah, that works.
> 
> So now let's say that there is an issue with my CA and server certs. Lets
> disable validation in both /etc/nss_ldap.conf and /etc/pam_ldap.conf:
> tls_checkpeer no.
> 
> Now my vm hangs again, and I cannot login.
> 

No idea, more verbose output could help:
ssh -v (-vvv)
sshd -d (-ddd)

> 
> I have no either what's going on, but
> - mozilla-nss doesn't like my certificates, which used to work fine in F-13
> - the tls_checkpeer setting gets ignored.

It likes your certificates. The error message is misleading.

As for tls_checkpeer, I wouldn't say that he setting gets ignored, maybe the problem is elsewhere.

> 
> Please advise.

Comment 8 Rich Megginson 2010-10-18 15:11:05 UTC

> CC'ing rmeggins. Rich, what do you think? Does OpenSSL and MozNSS have
> different default cipher suite?

Using openldap with openssl, and no cipher suites are specified, which cipher suites are enabled?  All of them?

> It means, that you certificate can be OK. The error message is not really
> lucky.

Yes - it is a poor, misleading error message.

Comment 9 Ruben Kerkhof 2010-10-18 15:35:48 UTC

> Using openldap with openssl, and no cipher suites are specified, which cipher
> suites are enabled?  All of them?

I'm not sure, I'm still trying to get more debugging output out of openldap. It's awfully quiet about the TLS negotiation phase.

Comment 10 Rich Megginson 2010-10-18 17:08:58 UTC

One more thing - what ldap server are you using, and what TLS configuration settings (including cipher suites) are you using?

Comment 11 Ruben Kerkhof 2010-10-18 20:16:36 UTC

Just the stock OpenLdap in F-13:

ruben@ldap1b ~]$ rpm -q openldap-servers
openldap-servers-2.4.21-10.fc13.x86_64

The relevant part of my slapd.conf:

TLSCACertificateFile /etc/openldap/cacerts/ca.pem
TLSCertificateFile /etc/pki/tls/certs/slapd.pem
TLSCertificateKeyFile /etc/pki/tls/private/slapd.key
TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

security ssf=256

Comment 12 Ruben Kerkhof 2010-10-18 22:37:47 UTC

Created attachment 454225 [details]
slapd log showing connection failures

Comment 13 Ruben Kerkhof 2010-10-18 22:38:39 UTC

Strangely enough, logging in as an ldap user at the console works, over ssh doesn't. Now I'm confused...

I've increased the debug level of sshd. It looks like the pam authentication succeeds, but then my ssh connection just hangs. No prompt nothing.

Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: initializing for "ruben"
Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: setting PAM_RHOST to "5354ad7e.cm-6-5c.dynamic.ziggo.nl"
Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: setting PAM_TTY to "ssh"
Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 50 used once, disabling now
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: monitor_read: checking request 3
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_answer_authserv: service=ssh-connection, style=
Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 3 used once, disabling now
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: monitor_read: checking request 4
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_answer_authrole: role=
Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 4 used once, disabling now
Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1224]: debug1: userauth-request for user ruben service ssh-connection method password
Oct 18 23:51:55 ldaptest sshd[1224]: debug1: attempt 1 failures 0
Oct 18 23:51:55 ldaptest sshd[1224]: debug2: input_userauth_request: try method password
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password entering
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 11
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password: waiting for MONITOR_ANS_AUTHPASSWORD
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive_expect entering: type 12
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: monitor_read: checking request 11
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: sshpam_passwd_conv called with 1 messages
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: PAM: password authentication accepted for ruben
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_answer_authpassword: sending result 1
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_send entering: type 12
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password: user authenticated
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_do_pam_account entering
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 51
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive_expect entering: type 52
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive_expect entering: type 51
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: do_pam_account: called
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: do_pam_account pam_acct_mgmt = 0 (Success)
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_send entering: type 52
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_do_pam_account returning 1
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Sending new keys: 0x7fcaa5a54960 0x7fcaa5a49e60
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_newkeys_to_blob: converting 0x7fcaa5a54960
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_newkeys_to_blob: converting 0x7fcaa5a49e60
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: New keys have been sent
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Sending compression state
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 25
Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Finished sending state
Oct 18 23:51:55 ldaptest sshd[1223]: Accepted password for ruben from 83.84.173.126 port 45153 ssh2
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: monitor_child_preauth: ruben has been authenticated by privileged process
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Waiting for new keys
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive_expect entering: type 25
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_newkeys_from_blob: 0x7fcaa5b40020(122)
Oct 18 23:51:55 ldaptest sshd[1223]: debug2: mac_setup: found hmac-md5
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Waiting for second key
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_newkeys_from_blob: 0x7fcaa5b40020(122)
Oct 18 23:51:55 ldaptest sshd[1223]: debug2: mac_setup: found hmac-md5
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Getting compression state
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Getting Network I/O buffers
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_share_sync: Share sync
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_share_sync: Share sync end
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: temporarily_use_uid: 10001/10000 (e=0/0)
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: restore_uid: 0/0
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: SELinux support disabled
Oct 18 23:51:55 ldaptest sshd[1223]: debug1: PAM: establishing credentials
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: opening session
Oct 18 23:51:55 ldaptest sshd[1223]: pam_unix(sshd:session): session opened for user ruben by (uid=0)
Oct 18 23:51:55 ldaptest sshd[1223]: User child is on pid 1225
Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering
Oct 18 23:51:55 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Oct 18 23:51:59 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Oct 18 23:52:07 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...
Oct 18 23:52:23 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...
Oct 18 23:52:55 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 64 seconds)...
Oct 18 23:53:59 ldaptest sshd[1225]: nss_ldap: could not search LDAP server - Server is unavailable
Oct 18 23:53:59 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Oct 18 23:54:03 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Oct 18 23:54:11 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...
Oct 18 23:54:27 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...

Lets set bind_policy to soft in /etc/nsswitch.conf. Wow, that does the trick, the ssh connection succeeds.

Comment #12 is my ldap log which show the ldap connection failures.

Comment 14 Rich Megginson 2010-10-19 01:45:04 UTC

did you mean /etc/ldap.conf instead of /etc/nsswitch.conf?
The ldap log doesn't correspond to the sshd log?

Comment 15 Ruben Kerkhof 2010-10-19 10:50:28 UTC

Actually, I meant /etc/nss_ldap.conf :-) /etc/ldap.conf doesn't exists anymore in F14.

The ldap log is from half an hour later, but I was seeing the the exact same behavior as before.

Comment 16 Ruben Kerkhof 2010-10-19 19:07:36 UTC

I've changed my test setup a little, since I can't increase the logging on the production ldap server.

I'm now testing on a clean F-14 vm against openldap-servers-2.4.22-7.fc14.x86_64, with the following configuration:

/etc/nss_ldap.conf
base dc=example,dc=org
nss_initgroups_minimum_uid 500
uri ldap://ldaptest.example.org
ssl start_tls
tls_cacertfile /etc/openldap/cacerts/ca.pem
tls_checkpeer yes
tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
nss_base_passwd        ou=People,dc=example,dc=org?one
nss_base_shadow        ou=People,dc=example,dc=org?one
nss_base_group         ou=Groups,dc=example,dc=org?one

/etc/pam_ldap.conf:
base dc=example,dc=org
uri ldap://ldaptest.example.org
ssl start_tls
tls_checkpeer yes
tls_cacertfile /etc/openldap/cacerts/ca.pem
pam_password exop
tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

/etc/openldap/ldap.conf:
URI ldap://ldaptest.example.org
BASE dc=example,dc=org
TLS_CACERT /etc/openldap/cacerts/ca.pem
TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5
TLS_REQCERT demand

/etc/openldap/slapd.conf (relevant part):
TLSCACertificateFile /etc/openldap/cacerts/ca.pem
TLSCertificateFile /etc/pki/tls/certs/slapd.pem
TLSCertificateKeyFile /etc/pki/tls/private/slapd.key
TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5

security tls=256

I've created a self-signed CA, and used it to create a certificate for CN=ldaptest.example.org

Ok, let's do some tests:
- getent passwd works, returns output immediately
- ldapsearch -xZZ works, returns output immediately
- ssh ruben@localhost works, but takes about 6 to 7 minutes.

I'll attach the slapd and sshd logs. Looking at the slapd log I see 44 ldap connections, of which 36 failed.

This is all with nscd disabled.

Comment 17 Ruben Kerkhof 2010-10-19 19:10:13 UTC

Created attachment 454415 [details]
slapd log showing connection failures while logging in with ssh (no nscd)

Comment 18 Ruben Kerkhof 2010-10-19 19:11:29 UTC

Created attachment 454416 [details]
sshd log showing nss_ldap reconnection attempts

Comment 19 Ruben Kerkhof 2010-10-19 19:26:12 UTC

Created attachment 454421 [details]
slapd log showing 2 successful connections

Comment 20 Ruben Kerkhof 2010-10-19 19:27:30 UTC

Created attachment 454422 [details]
sshd log showing successful login without nss_ldap reconnecting

Comment 21 Ruben Kerkhof 2010-10-19 19:28:31 UTC

The attachments in comment #19 and #20 show that I'm able to login with ssh as soon as I start nscd.

Comment 22 Rich Megginson 2010-10-19 19:35:25 UTC

There is some sort of TLS problem:

Oct 19 20:29:05 ldaptest slapd[1436]: connection_read(22): TLS accept failure error=-1 id=1007, closing

Unfortunately, the only way to get more detailed information is to use the trace level logging (-d 1) which is very, very verbose.  But it should tell us what the low level moznss problem is.

Comment 23 Ruben Kerkhof 2010-10-19 23:31:55 UTC

Created attachment 454471 [details]
slapd -d 1 trace output

Comment 24 Rich Megginson 2010-10-20 21:46:45 UTC

Hmm - there are a couple of different failures:

op tag 0x77, time 1287530783
...
TLS: error: accept - force handshake failure: 0 - error -5938:Encountered end of file

and
op tag 0x77, time 1287530895
...
connection_read(18): unable to get TLS client DN, error=49 id=1023


The first one would seem to indicate that the client just dropped the connection, or gave up on the TLS handshake.  The second seems to indicate that the certificate sent by the TLS peer is NULL.

Are you able to reproduce any of these problems with the ldapsearch command line tool, or is it necessary to configure nss_ldap/pam_ldap/sshd/nscd to reproduce this problem?

Comment 25 Ruben Kerkhof 2010-10-21 00:09:00 UTC

Hi Rich,

I'm not able to reproduce this with just ldapsearch.

About the second failure, what kind of certificate is it expecting?
I'm not (knowingly) using client side certificates.

Comment 26 Ruben Kerkhof 2010-10-21 01:50:35 UTC

http://www.zytrax.com/books/ldap/ch12/ says that error 49 is harmless if TLSVerifyClient=demand.

I've been reading throught the openldap source.
and indeed, connection_read in connection.c unconditionally calls dnX509peerNormalize().

Comment 27 Rich Megginson 2010-10-21 02:26:19 UTC

(In reply to comment #26)
> http://www.zytrax.com/books/ldap/ch12/ says that error 49 is harmless if
> TLSVerifyClient=demand.
> 
> I've been reading throught the openldap source.
> and indeed, connection_read in connection.c unconditionally calls
> dnX509peerNormalize().

Yes, so that looks ok.

Comment 28 Ruben Kerkhof 2010-10-21 02:33:19 UTC

Ok, back to failure no. 1 then.

Just to verify that there's no issue with the ldapserver running on localhost, I installed a second vm with F-13, and pointed /etc/ldap.conf at my first server. Everything works fine on F-13.

I've upgraded the second VM to F-14 and I'm seeing the same failures. So it definitely looks like a client side issue.

Are there any other steps I could take?

Comment 29 Rich Megginson 2010-10-21 02:53:12 UTC

(In reply to comment #28)
> Ok, back to failure no. 1 then.
> 
> Just to verify that there's no issue with the ldapserver running on localhost,
> I installed a second vm with F-13, and pointed /etc/ldap.conf at my first
> server. Everything works fine on F-13.
> 
> I've upgraded the second VM to F-14 and I'm seeing the same failures. So it
> definitely looks like a client side issue.
> 
> Are there any other steps I could take?

I don't know if it would be possible to enable the ldap library trace debug level in the client side software.  When you use e.g. ldapsearch -d 1, this calls ldap_set_option(ld, LDAP_OPT_DEBUG_LEVEL, thelevel)
e.g.
 int debuglevel = 1;
 ldap_set_option(ld, LDAP_OPT_DEBUG_LEVEL, &debuglevel);

It looks as though you can set this in nss_ldap - man nss_ldap
       debug <level>
              Specifies  the  debug  level used for logging by the LDAP client
              library.

You might try

 logdir /var/tmp/ldap
 debug 1

in /etc/ldap.conf to see if that gives us a clue.  I suspect it may have something to do with the way the NSS FIPS softoken deals with fork() calls - since it works with ncsd, it may be fork()ing something when not using ncsd.

Comment 30 Ruben Kerkhof 2010-10-21 03:11:35 UTC

Ok, that works, but it's a lot of information (which also ends up at stdout).

Comment 31 Ruben Kerkhof 2010-10-21 03:13:07 UTC

Created attachment 454717 [details]
trace output from nss_ldap while ssh'in

Comment 32 Rich Megginson 2010-10-21 03:21:02 UTC

Crud - still no clue :P

Back to the ssf setting - does it work if you set ssf=168 (or don't set ssf at all), but fail if you set ssf=256?  Or was that a red herring?

Thanks for your assistance with this issue.  Looks like the only recourse left is to set up a similar environment and run gdb.

Comment 33 Ruben Kerkhof 2010-10-21 14:13:46 UTC

Nope, ssf doesn't influence it at all.

I've tried running su under gdb, but I don't get very far:

[root@ldaptest ~]# gdb --args su - ruben
GNU gdb (GDB) Fedora (7.2-16.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /bin/su...Reading symbols from /usr/lib/debug/bin/su.debug...done.
done.
(gdb) b _nss_ldap_getbyname
Function "_nss_ldap_getbyname" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_nss_ldap_getbyname) pending.
(gdb) r
Starting program: /bin/su - ruben
[Thread debugging using libthread_db enabled]
Cannot find new threads: generic error

I could give you access to the VM if you want?

Comment 34 Rich Megginson 2010-10-21 14:23:55 UTC

(In reply to comment #33)
> Nope, ssf doesn't influence it at all.
> 
> I've tried running su under gdb, but I don't get very far:
> 
> [root@ldaptest ~]# gdb --args su - ruben
> GNU gdb (GDB) Fedora (7.2-16.fc14)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /bin/su...Reading symbols from
> /usr/lib/debug/bin/su.debug...done.
> done.
> (gdb) b _nss_ldap_getbyname
> Function "_nss_ldap_getbyname" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 1 (_nss_ldap_getbyname) pending.
> (gdb) r
> Starting program: /bin/su - ruben
> [Thread debugging using libthread_db enabled]
> Cannot find new threads: generic error
> 
> I could give you access to the VM if you want?

Sure!  That would be very helpful.  Please email me directly with the information.

Comment 35 Rich Megginson 2010-10-21 20:14:06 UTC

Thanks for your help.  I believe this problem is caused by nss_ldap calling fork() after the NSS software crypto token has been initialized.  I'm going to try to reproduce the problem with a simple test program.

Comment 36 Rich Megginson 2010-10-21 20:27:23 UTC

Also try this: in the /etc/sysconfig/sshd add

export NSS_STRICT_NOFORK=DISABLED

then service sshd restart

Then try ssh - I did this on the test system and I was able to ssh in as a regular user without using nscd

Note that this is not the final, recommended solution, but you could use it until we figure out how to solve this fork()ing problem.

Comment 37 Ruben Kerkhof 2010-10-22 08:29:02 UTC

I can live with the workaround for the moment.

Thanks again,

Ruben

Comment 38 Rich Megginson 2010-10-22 16:19:12 UTC

Created attachment 455133 [details]
python script used on F-14 to reproduce

This python script is used to show the bug and the workaround.  Usage:
python olnsstest.py certdir ldapurl binddn bindpw
e.g.
python olnsstest.py /path/to/cert8.db/dir ldap://localhost:389 "cn=directory manager" secret

certdir - full absolute path of directory containing cert8.db, key3.db, secmod.db
ldapurl - ldap (not ldaps) url - the script uses start_tls to start TLS
binddn, bindpw - bind DN and password to use with simple_bind

Requires: Directory Server with TLS capability - certdir which contains CA cert for directory server - python-ldap and python-nss packages on Fedora 14 and later

To reproduce the problem - comment out the line
#os.environ["NSS_STRICT_NOFORK"] = "DISABLED"

If you remove the os.fork(), or remove the nss_init(), it works

If you do the nss_init(), followed by the fork(), followed by the ldap calls, without the env. var. set, you will get a connection failure.

Comment 39 Michael Cronenworth 2011-01-19 01:50:48 UTC

*** Bug 669900 has been marked as a duplicate of this bug. ***

Comment 40 Michael Cronenworth 2011-01-19 01:51:33 UTC

*** Bug 663485 has been marked as a duplicate of this bug. ***

Comment 41 Michael Cronenworth 2011-01-19 02:42:01 UTC

Any work being done on this bug?

My F14 x86_64 w/ 389-ds+Samba server is also affected. The workaround suffices for now, but it shouldn't be a permenant solution.

Comment 42 Rich Megginson 2011-01-19 17:35:44 UTC

https://admin.fedoraproject.org/updates/nss-util-3.12.9-1.fc14,nss-softokn-3.12.9-1.fc14,nss-3.12.9-1.fc14,nspr-4.8.7-1.fc14?_csrf_token=4f505a0830875d09f36168715064980b2733a4bc

nss-3.12.9 is available in Testing - please try that and see if it fixes this problem.

Comment 43 Brian Buesker 2011-01-19 21:20:39 UTC

nss-3.12.9 did not appear to resolve the problem for me. If I have nscd running, I can su from root to an LDAP user, or ssh in as that LDAP user. However, if I stop nscd, neither the su nor the ssh succeed. If I set NSS_STRICT_NOFORK=DISABLED, then the su works (I did not try ssh with this setting).

Comment 44 Rich Megginson 2011-01-21 20:30:53 UTC

It turns out that some additional changes were needed to be made to openldap to take advantage of the new SECMOD_RestartModules API in NSS 3.12.9.  An ITS has been opened in openldap upstream and a patch submitted:
http://www.openldap.org/its/index.cgi?findid=6802

Comment 45 Jan Vcelak 2011-01-25 13:33:09 UTC

Fixed in openldap-2.4.23-7.fc14

Comment 46 Fedora Update System 2011-01-25 13:35:00 UTC

openldap-2.4.23-7.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/openldap-2.4.23-7.fc14

Comment 47 Fedora Update System 2011-01-25 21:01:55 UTC

openldap-2.4.23-7.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update openldap'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/openldap-2.4.23-7.fc14

Comment 48 Brian Buesker 2011-01-26 18:21:18 UTC

I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh into the machine as an LDAP user. With nscd running, I'm able to do both the su and the ssh.

In /var/log/messages, with nscd shut down, I see the following:

Jan 26 10:11:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)...
Jan 26 10:11:23 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)...
Jan 26 10:11:31 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)...
Jan 26 10:11:47 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)...
Jan 26 10:12:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 64 seconds)...
Jan 26 10:13:23 localhost su: nss_ldap: could not search LDAP server - Server is unavailable

I don't know if this is significant, but looking at wireshark, right after the client sends a Client Hello message, it appears to be closing the connection. 

Looks like my comment #43 was slightly incorrect in that an su eventually succeeds, but it takes over 4 minutes when nscd is disabled (compared to less than a second with it enabled).

Comment 49 Rich Megginson 2011-01-26 19:20:20 UTC

(In reply to comment #48)
> I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to
> 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh
> into the machine as an LDAP user. With nscd running, I'm able to do both the su
> and the ssh.

Does it work if you use the workaround from https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36
?

> 
> In /var/log/messages, with nscd shut down, I see the following:
> 
> Jan 26 10:11:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 4
> seconds)...
> Jan 26 10:11:23 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 8
> seconds)...
> Jan 26 10:11:31 localhost su: nss_ldap: reconnecting to LDAP server (sleeping
> 16 seconds)...
> Jan 26 10:11:47 localhost su: nss_ldap: reconnecting to LDAP server (sleeping
> 32 seconds)...
> Jan 26 10:12:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping
> 64 seconds)...
> Jan 26 10:13:23 localhost su: nss_ldap: could not search LDAP server - Server
> is unavailable
> 
> I don't know if this is significant, but looking at wireshark, right after the
> client sends a Client Hello message, it appears to be closing the connection. 
> 
> Looks like my comment #43 was slightly incorrect in that an su eventually
> succeeds, but it takes over 4 minutes when nscd is disabled (compared to less
> than a second with it enabled).

Comment 50 Brian Buesker 2011-01-26 20:35:43 UTC

(In reply to comment #49)
> (In reply to comment #48)
> > I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to
> > 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh
> > into the machine as an LDAP user. With nscd running, I'm able to do both the su
> > and the ssh.
> 
> Does it work if you use the workaround from
> https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36
> ?

Yes, that does work (as it did previously). I was hoping that this update would not require this workaround.

Comment 51 Rich Megginson 2011-01-26 21:38:13 UTC

(In reply to comment #50)
> (In reply to comment #49)
> > (In reply to comment #48)
> > > I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to
> > > 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh
> > > into the machine as an LDAP user. With nscd running, I'm able to do both the su
> > > and the ssh.
> > 
> > Does it work if you use the workaround from
> > https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36
> > ?
> 
> Yes, that does work (as it did previously). I was hoping that this update would
> not require this workaround.

I was afraid of that.  It may be that sshd opens a crypto context before the fork() and attempts to use that same crypto context after the fork() in a child process.  I'll have to review the code.

Comment 52 Rich Megginson 2011-01-26 21:46:25 UTC

Is it possible for you to do an strace of the sshd process during the login?  I'd like to see where the fork()s happen.  You can attach the output to this bug as a text/plain attachment (after removing/obscuring any sensitive data first).

Comment 53 Brian Buesker 2011-01-26 23:34:03 UTC

Created attachment 475498 [details]
strace -f -t su - <LDAP username>

I have attached strace output from an su - to an LDAP user (the name has been changed in the strace output to dummy). Will this be suitable instead of the sshd strace, since the problem is occurring in both places?

Comment 54 Rich Megginson 2011-01-28 15:36:56 UTC

Thanks Brian.  After a discussion with an NSS developer and another developer affected by this issue, we've decided the best course of action is to have openldap explicitly disable fork checking in the software token.

Openldap ITS: http://www.openldap.org/its/index.cgi?findid=6811
patch submitted

Comment 55 Jan Vcelak 2011-02-02 12:42:09 UTC

Fixed in openldap-2.4.23-8.fc14

Comment 56 Fedora Update System 2011-02-02 12:43:11 UTC

openldap-2.4.23-8.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/openldap-2.4.23-8.fc14

Comment 57 Brian Buesker 2011-02-03 03:14:43 UTC

I tried the 2.4.23-8.fc14 update, and I was able to confirm that it does resolve my problem. Thanks.

Comment 58 Rich Megginson 2011-02-03 15:44:44 UTC

(In reply to comment #57)
> I tried the 2.4.23-8.fc14 update, and I was able to confirm that it does
> resolve my problem. Thanks.

Thanks for your help with this Brian, and sorry it took so long to resolve.

Comment 59 Fedora Update System 2011-03-01 12:41:50 UTC

Package openldap-2.4.21-12.fc13:
* should fix your issue,
* was pushed to the Fedora 13 updates-testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing openldap-2.4.21-12.fc13'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/openldap-2.4.21-12.fc13
then log in and leave karma (feedback).

Comment 60 Fedora Update System 2011-03-01 12:42:40 UTC

Package openldap-2.4.23-9.fc14:
* should fix your issue,
* was pushed to the Fedora 14 updates-testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing openldap-2.4.23-9.fc14'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/openldap-2.4.23-9.fc14
then log in and leave karma (feedback).

Comment 61 Kapetanakis Giannis 2011-03-09 21:00:46 UTC

openldap-2.4.23-9.fc14 indeed fixes SSL problem (nss_ldap: failed to bind to LDAP server ldaps://...)

but it creates a new one:

# su - foo
free(0x173bd2) below bottom of memory. (memtop = 0x9898800 membot = 0x973e000)

Mar  9 22:49:28 client kernel: [  496.021546] type=1100 audit(1299703768.918:22601): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:authentication acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success'
Mar  9 22:49:28 client kernel: [  496.021689] type=1101 audit(1299703768.918:22602): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:accounting acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success'
Mar  9 22:49:30 client su: pam_unix(su-l:session): session opened for user foo by admin(uid=0)
Mar  9 22:49:30 client kernel: [  497.562678] type=1105 audit(1299703770.459:22603): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:session_open acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success'
Mar  9 22:49:30 client kernel: [  497.562750] type=1103 audit(1299703770.459:22604): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:setcred acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success'
Mar  9 22:49:31 client kernel: [  498.374379] type=1701 audit(1299703771.272:22605): auid=500 uid=2498 gid=202 ses=4 pid=25246 comm="tcsh" sig=6
Mar  9 22:49:31 client abrt[25247]: saved core dump of pid 25246 (/bin/tcsh) to /var/spool/abrt/ccpp-1299703771-25246.new/coredump (2191360 bytes)
Mar  9 22:49:31 client abrtd: Directory 'ccpp-1299703771-25246' creation detected
Mar  9 22:49:31 client su: pam_unix(su-l:session): session closed for user foo
Mar  9 22:49:31 client kernel: [  498.991478] type=1104 audit(1299703771.888:22606): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:setcred acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success'
Mar  9 22:49:31 client kernel: [  498.991799] type=1106 audit(1299703771.888:22607): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:session_close acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success'
Mar  9 22:49:32 client abrtd: Crash is in database already (dup of /var/spool/abrt/ccpp-1299701675-27295)
Mar  9 22:49:32 client abrtd: Deleting crash ccpp-1299703771-25246 (dup of ccpp-1299701675-27295), sending dbus signal

# groups foo
foo : valid_group

# getent group foo

getent passwd and shadow works

# grep ldap /etc/nsswitch.conf
passwd:     files ldap
shadow:     files ldap
group:      files ldap

after removing ldap from group

#su - foo
id: cannot find name for group ID 202
[foo@client ~]$ 

# gdb /bin/tcsh coredump
(gdb) bt
#0  0x00ed9416 in __kernel_vsyscall ()
#1  0x006e02f1 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0x006e1d5e in abort () at abort.c:92
#3  0x08084e6a in free (cp=0x8f2bd2) at tc.alloc.c:313
#4  0x08063d83 in blkfree (av0=0x9e7d988) at sh.misc.c:158
#5  0x0805af6f in tsetenv (name=0x80a3420 L"GROUP", val=0x9e32608 L"valid_group") at sh.func.c:1725
#6  0x0804d910 in main (argc=<value optimized out>, argv=0xbfa7e4f4) at sh.c:561

# ldapsearch -x '(cn=valid_group)' 

# 202, Groups, example.com
dn: gidnumber=202,ou=Groups,dc=example,dc=com
objectClass: top
objectClass: posixGroup
gidNumber: 202
cn: valid_group

/etc/pam_ldap.conf:
nss_base_group  ou=Groups,dc=example,dc=com

/etc/nss_ldap.conf:
nss_base_group  ou=Groups,dc=example,dc=com

regards,

Giannis

Comment 62 Rich Megginson 2011-03-09 21:18:44 UTC

if you try the workaround from https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36 does it still crash?  Does it crash with bash or some other shell?  Does it crash if you do

# su foo
instead of 
# su - foo
?

Comment 63 Kapetanakis Giannis 2011-03-09 23:44:25 UTC

export NSS_STRICT_NOFORK=DISABLED and bash/tcsh and
openldap-2.4.23-4.fc14.i686 works for ssh and

#su - foo
and 
#su foo 
also ok

however there is a small delay and I also see my iptables logging a packet from the ldap server back to client (not established). So probably the 1st connection is closed somehow badly from the client.

openldap-2.4.23-9.fc14.i686 and /bin/bash also works
WITHOUT export NSS_STRICT_NOFORK=DISABLED

openldap-2.4.23-9.fc14.i686 and /bin/tcsh and 
#su foo
works

#su - foo
free(0xd6dbd2) below bottom of memory. (memtop = 0x83b1000 membot = 0x825c000)
with or without export NSS_STRICT_NOFORK=DISABLED

Comment 64 Kapetanakis Giannis 2011-03-10 13:21:54 UTC

Today also (with openldap-2.4.23-4.fc14.x86_64)
while updating to setroubleshoot-server-3.0.30-1.fc14.x86_64 and
libisofs-1.0.2-1.fc14.x86_64 with yum update

I got again
Mar 10 15:15:01 client dbus-daemon: nss_ldap: failed to bind to LDAP server ldaps://ldap.example.com:636: Can't contact LDAP server

after issuing /etc/init.d/messagebus restart
problem gone away and yum completed (it was hanging)

This might be related to https://bugzilla.redhat.com/show_bug.cgi?id=502072

Comment 65 Rich Megginson 2011-03-10 15:21:37 UTC

Created attachment 483481 [details]
openldap-2.4.24-use-static-for-nofork-envvar-20110310.patch

I don't know what could cause such odd behavior between shells and su invocations.  Perhaps I can't call PR_SetEnv with a string constant?  Here is a patch that uses a static char array instead of a constant string.

Comment 66 Rich Megginson 2011-03-10 16:17:19 UTC

I have a patched version of openldap here: http://rmeggins.fedorapeople.org/openldap/ - could you please give these packages a try?

Comment 67 Kapetanakis Giannis 2011-03-10 17:11:09 UTC

I've tried it. (all these without sssd or nscd)

openldap-2.4.23-10.1.fc14.i686.rpm has no different behavior than openldap-2.4.23-9.fc14.i686.rpm

# su - foo with /bin/tcsh crashes.

Also login from GDM Login screen crashes with /bin/tcsh

Login from GDM with /bin/bash works

All these on new installed f14 virtualbox test box up2date.

Comment 68 Kapetanakis Giannis 2011-03-10 17:19:32 UTC

I also found a quicker way to reproduce it (openldap-2.4.23-10.1.fc14.i686)

ssh root@client

# tcsh
# cd ~foo
free(2f2ec0) below bottom of memory. (memtop = 0x9f17000 membot = 0x9d3d000)
Aborted (core dumped)

The following works (strange)
# cd ~foo
# cd
# tcsh
# cd ~foo

~foo exists and owned by foo

All works (GDM LOGIN also) if nscd is running.

Comment 69 Kapetanakis Giannis 2011-03-10 17:38:27 UTC

Created attachment 483522 [details]
gdm crash log

Comment 70 Kapetanakis Giannis 2011-03-10 17:38:54 UTC

Created attachment 483523 [details]
gdm crash debug

Comment 71 Rich Megginson 2011-03-10 17:52:08 UTC

Created attachment 483525 [details]
openldap-2.4.24-use-strdup-for-nofork-envvar-20110310.patch

I am baffled because these crashes don't appear to have anything to do with openldap.   The only think I can think of is that this is causing problems - libraries/libldap/tls_m.c:
-               PR_SetEnv( "NSS_STRICT_NOFORK=DISABLED" );
which seems perfectly valid.  Using a static variable here seemed to have no effect.  I'm now working on another variation which uses a strdup.

Comment 72 Rich Megginson 2011-03-10 17:58:18 UTC

New patched version of openldap here:
http://rmeggins.fedorapeople.org/openldap/ - could you please give these
packages a try?

If these fail - can you try running with valgrind?  e.g. try this

# valgrind -q --tool=memcheck su - root

Comment 73 Kapetanakis Giannis 2011-03-10 18:49:53 UTC

This version worked for 
su - foo
and
ssh foo@client

env NSS_STRICT_NOFORK=DISABLED
is only available on bash though. I cannot find in tcsh. Don't know if this a problem

Comment 74 Rich Megginson 2011-03-10 20:06:49 UTC

Jan, not sure what the procedure is for this bug.  Do we change the status to ASSIGNED (i.e. reopen the bug) or file a new bug?  I have submitted an upstream ITS: http://www.openldap.org/its/index.cgi?findid=6862 which also contains the patch.

Comment 75 Jan Vcelak 2011-03-11 10:39:36 UTC

Thank you, Rich. I changed the status to ASSIGNED.

New update will be pushed soon.

Comment 76 Jan Vcelak 2011-03-19 00:15:40 UTC

Fixed in openldap-2.4.23-10.fc14

Comment 77 Fedora Update System 2011-03-19 00:20:46 UTC

openldap-2.4.23-10.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/openldap-2.4.23-10.fc14

Comment 78 Fedora Update System 2011-03-19 00:23:41 UTC

openldap-2.4.24-2.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/openldap-2.4.24-2.fc15

Comment 79 Fedora Update System 2011-04-21 05:32:39 UTC

openldap-2.4.24-2.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 80 Fedora Update System 2011-09-25 03:46:09 UTC

openldap-2.4.23-10.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.