Description of problem: After upgrading to nss_ldap-265-6.fc14.x86_64, authenticating with starttls against my openldap server stops working. Version-Release number of selected component (if applicable): nss_ldap-265-6.fc14.x86_64 openldap-servers-2.4.19-4.fc12.x86_64 This works fine with nss_ldap-264-9.fc12.x86_64 If I increase the debug level in /etc/nss_ldap.conf, I get the following output: TLS certificate verification: Error, -12276: Unknown code ___P 12
Hmm, this appears to be coming from openldap's recent changeover to use NSS for TLS functionality. Adding the openldap package maintainer to the CC list.
The error codes description can be found on [1]. Quoting: -12276: SSL_ERROR_BAD_CERT_DOMAIN "Unable to communicate securely with peer: requested domain name does not match the server's certificate." This error code should be returned by the certificate authentication callback function when it detects that the Common Name in the remote server's certificate does not match the hostname sought by the local client, according to the matching rules specified for CERT_VerifyCertName. -- Ruben, please, can you provide more information about your configuration? Especially which certificate do you use (self-signed, CA signed, ...). Does the hostnames match? [1] http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslerr.html
Hi Jan, Some more info about our setup: We use 2 ldap servers, ldap1a.example.org and ldap1b.example.org, behind a virtual ip. So our active ldap server is reachable at ldap.example.org. To make this work with TLS we use the X509v3 Subject Alternative Name: X509v3 Subject Alternative Name: DNS:ldap, DNS:ldap.example.org, DNS:ldap.priv.example.org, DNS:ldap1a.example.org The certificates of our ldap servers are signed by our puppet ca. So the ca is self-signed. This setup works fine on Fedora 11/12/13. One thing I've noticed, is that we have a required tls_ssf of 256. When authenticating with a F-14 client it shows a ssf of 168, which was 256 on older hosts. I've worked around this by setting the cipher list on the client to TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5, but I think it's a strange change in behavior.
in reply to comment #2: This might be a separate RFE, but it would be nice if the error message showed up in the logs instead of just the error code. That makes debugging a bit easier.
Ruben, is this a whole error message you get? Can you please attach your ldap.conf? I'm interested in TLS_CERT* directives.
Sure, sorry this took a while, I had to put the machine we saw this on into production, so I've installed a new one with F-14 on it. Here's my /etc/openldap/ldap.conf (server names changed to protect the innocent ;-)): URI ldap://ldap.example.org BASE dc=example,dc=org TLS_CACERTDIR /etc/openldap/cacerts TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 The TLS_CIPHER_SUITE line is needed, because we have a tls_ssf configured of 256. Without this line, the cipher suite used by (presumably) mozilla-nss has a ssf of 168. This is a regression IMHO. With the configuration above, ldapsearch -xZZ works. Now onto nss_ldap: /etc/nss_ldap.conf: uri ldap://example.org ssl start_tls tls_cacertfile /etc/openldap/cacerts/ca.pem tls_checkpeer yes tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 debug 4 # getent passwd ruben: [root@ldaptest ~]# getent passwd ruben TLS certificate verification: Error, -12276: Unknown code ___P 12 ldap_build_search_req ATTRS: uid userPassword uidNumber gidNumber cn homeDirectory loginShell gecos description objectClass ruben:x:10001:10000:Ruben Kerkhof:/home/ruben:/bin/bash WTF? It fails to validate the ca certificate but still returns my ldap user By the way, why is it trying to read my userPassword? Ok, so nsswitch lookups work, but how about logging in. /etc/pam_ldap.conf: uri ldap://example.org ssl start_tls tls_cacertdir /etc/openldap/cacerts pam_password exop debug 4 # ssh ruben@localhost no luck, this just hangs. so lets try to switch off tls: /etc/pam_ldap.conf: ssl off doesn't seem to matter, it hangs my vm. now lets switch of tls in /etc/nss_ldap.conf: ssl off ah, that works. So now let's say that there is an issue with my CA and server certs. Lets disable validation in both /etc/nss_ldap.conf and /etc/pam_ldap.conf: tls_checkpeer no. Now my vm hangs again, and I cannot login. I have no either what's going on, but - mozilla-nss doesn't like my certificates, which used to work fine in F-13 - the tls_checkpeer setting gets ignored. Please advise.
(In reply to comment #6) > URI ldap://ldap.example.org > BASE dc=example,dc=org > TLS_CACERTDIR /etc/openldap/cacerts > TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 > > The TLS_CIPHER_SUITE line is needed, because we have a tls_ssf configured of > 256. Without this line, the cipher suite used by (presumably) mozilla-nss has a > ssf of 168. > > This is a regression IMHO. CC'ing rmeggins. Rich, what do you think? Does OpenSSL and MozNSS have different default cipher suite? > [root@ldaptest ~]# getent passwd ruben > TLS certificate verification: Error, -12276: Unknown code ___P 12 > ldap_build_search_req ATTRS: uid userPassword uidNumber gidNumber cn > homeDirectory loginShell gecos description objectClass > ruben:x:10001:10000:Ruben Kerkhof:/home/ruben:/bin/bash > > WTF? It fails to validate the ca certificate but still returns my ldap user The thing is, that moznss assume CA_CERTDIR to be moznss key/cert db. And if this fails, it will produce the error message you see. Then it will try to use CA_CERTDIR as a directory with PEM certificates, as it was done with OpenSSL. It means, that you certificate can be OK. The error message is not really lucky. > By the way, why is it trying to read my userPassword? I think this is all right. Your password can be set in passwd db. It was done so in the past. To increase security, shadow db was "invented". > Ok, so nsswitch lookups work, but how about logging in. > > /etc/pam_ldap.conf: > > uri ldap://example.org > ssl start_tls > tls_cacertdir /etc/openldap/cacerts > pam_password exop > debug 4 > > # ssh ruben@localhost > > no luck, this just hangs. > > so lets try to switch off tls: > /etc/pam_ldap.conf: ssl off > > doesn't seem to matter, it hangs my vm. > > now lets switch of tls in /etc/nss_ldap.conf: ssl off > > ah, that works. > > So now let's say that there is an issue with my CA and server certs. Lets > disable validation in both /etc/nss_ldap.conf and /etc/pam_ldap.conf: > tls_checkpeer no. > > Now my vm hangs again, and I cannot login. > No idea, more verbose output could help: ssh -v (-vvv) sshd -d (-ddd) > > I have no either what's going on, but > - mozilla-nss doesn't like my certificates, which used to work fine in F-13 > - the tls_checkpeer setting gets ignored. It likes your certificates. The error message is misleading. As for tls_checkpeer, I wouldn't say that he setting gets ignored, maybe the problem is elsewhere. > > Please advise.
> CC'ing rmeggins. Rich, what do you think? Does OpenSSL and MozNSS have > different default cipher suite? Using openldap with openssl, and no cipher suites are specified, which cipher suites are enabled? All of them? > It means, that you certificate can be OK. The error message is not really > lucky. Yes - it is a poor, misleading error message.
> Using openldap with openssl, and no cipher suites are specified, which cipher > suites are enabled? All of them? I'm not sure, I'm still trying to get more debugging output out of openldap. It's awfully quiet about the TLS negotiation phase.
One more thing - what ldap server are you using, and what TLS configuration settings (including cipher suites) are you using?
Just the stock OpenLdap in F-13: ruben@ldap1b ~]$ rpm -q openldap-servers openldap-servers-2.4.21-10.fc13.x86_64 The relevant part of my slapd.conf: TLSCACertificateFile /etc/openldap/cacerts/ca.pem TLSCertificateFile /etc/pki/tls/certs/slapd.pem TLSCertificateKeyFile /etc/pki/tls/private/slapd.key TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 security ssf=256
Created attachment 454225 [details] slapd log showing connection failures
Strangely enough, logging in as an ldap user at the console works, over ssh doesn't. Now I'm confused... I've increased the debug level of sshd. It looks like the pam authentication succeeds, but then my ssh connection just hangs. No prompt nothing. Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: initializing for "ruben" Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: setting PAM_RHOST to "5354ad7e.cm-6-5c.dynamic.ziggo.nl" Oct 18 23:51:53 ldaptest sshd[1223]: debug1: PAM: setting PAM_TTY to "ssh" Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 50 used once, disabling now Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering Oct 18 23:51:53 ldaptest sshd[1223]: debug3: monitor_read: checking request 3 Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_answer_authserv: service=ssh-connection, style= Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 3 used once, disabling now Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering Oct 18 23:51:53 ldaptest sshd[1223]: debug3: monitor_read: checking request 4 Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_answer_authrole: role= Oct 18 23:51:53 ldaptest sshd[1223]: debug2: monitor_read: 4 used once, disabling now Oct 18 23:51:53 ldaptest sshd[1223]: debug3: mm_request_receive entering Oct 18 23:51:55 ldaptest sshd[1224]: debug1: userauth-request for user ruben service ssh-connection method password Oct 18 23:51:55 ldaptest sshd[1224]: debug1: attempt 1 failures 0 Oct 18 23:51:55 ldaptest sshd[1224]: debug2: input_userauth_request: try method password Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password entering Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 11 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password: waiting for MONITOR_ANS_AUTHPASSWORD Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive_expect entering: type 12 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive entering Oct 18 23:51:55 ldaptest sshd[1223]: debug3: monitor_read: checking request 11 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: sshpam_passwd_conv called with 1 messages Oct 18 23:51:55 ldaptest sshd[1223]: debug1: PAM: password authentication accepted for ruben Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_answer_authpassword: sending result 1 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_send entering: type 12 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_auth_password: user authenticated Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_do_pam_account entering Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 51 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive_expect entering: type 52 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_receive entering Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive_expect entering: type 51 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering Oct 18 23:51:55 ldaptest sshd[1223]: debug1: do_pam_account: called Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: do_pam_account pam_acct_mgmt = 0 (Success) Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_send entering: type 52 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_do_pam_account returning 1 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Sending new keys: 0x7fcaa5a54960 0x7fcaa5a49e60 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_newkeys_to_blob: converting 0x7fcaa5a54960 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_newkeys_to_blob: converting 0x7fcaa5a49e60 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: New keys have been sent Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Sending compression state Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_request_send entering: type 25 Oct 18 23:51:55 ldaptest sshd[1224]: debug3: mm_send_keystate: Finished sending state Oct 18 23:51:55 ldaptest sshd[1223]: Accepted password for ruben from 83.84.173.126 port 45153 ssh2 Oct 18 23:51:55 ldaptest sshd[1223]: debug1: monitor_child_preauth: ruben has been authenticated by privileged process Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Waiting for new keys Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive_expect entering: type 25 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_newkeys_from_blob: 0x7fcaa5b40020(122) Oct 18 23:51:55 ldaptest sshd[1223]: debug2: mac_setup: found hmac-md5 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Waiting for second key Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_newkeys_from_blob: 0x7fcaa5b40020(122) Oct 18 23:51:55 ldaptest sshd[1223]: debug2: mac_setup: found hmac-md5 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Getting compression state Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_get_keystate: Getting Network I/O buffers Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_share_sync: Share sync Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_share_sync: Share sync end Oct 18 23:51:55 ldaptest sshd[1223]: debug1: temporarily_use_uid: 10001/10000 (e=0/0) Oct 18 23:51:55 ldaptest sshd[1223]: debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism Oct 18 23:51:55 ldaptest sshd[1223]: debug1: restore_uid: 0/0 Oct 18 23:51:55 ldaptest sshd[1223]: debug1: SELinux support disabled Oct 18 23:51:55 ldaptest sshd[1223]: debug1: PAM: establishing credentials Oct 18 23:51:55 ldaptest sshd[1223]: debug3: PAM: opening session Oct 18 23:51:55 ldaptest sshd[1223]: pam_unix(sshd:session): session opened for user ruben by (uid=0) Oct 18 23:51:55 ldaptest sshd[1223]: User child is on pid 1225 Oct 18 23:51:55 ldaptest sshd[1223]: debug3: mm_request_receive entering Oct 18 23:51:55 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)... Oct 18 23:51:59 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)... Oct 18 23:52:07 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)... Oct 18 23:52:23 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)... Oct 18 23:52:55 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 64 seconds)... Oct 18 23:53:59 ldaptest sshd[1225]: nss_ldap: could not search LDAP server - Server is unavailable Oct 18 23:53:59 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)... Oct 18 23:54:03 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)... Oct 18 23:54:11 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)... Oct 18 23:54:27 ldaptest sshd[1225]: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)... Lets set bind_policy to soft in /etc/nsswitch.conf. Wow, that does the trick, the ssh connection succeeds. Comment #12 is my ldap log which show the ldap connection failures.
did you mean /etc/ldap.conf instead of /etc/nsswitch.conf? The ldap log doesn't correspond to the sshd log?
Actually, I meant /etc/nss_ldap.conf :-) /etc/ldap.conf doesn't exists anymore in F14. The ldap log is from half an hour later, but I was seeing the the exact same behavior as before.
I've changed my test setup a little, since I can't increase the logging on the production ldap server. I'm now testing on a clean F-14 vm against openldap-servers-2.4.22-7.fc14.x86_64, with the following configuration: /etc/nss_ldap.conf base dc=example,dc=org nss_initgroups_minimum_uid 500 uri ldap://ldaptest.example.org ssl start_tls tls_cacertfile /etc/openldap/cacerts/ca.pem tls_checkpeer yes tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 nss_base_passwd ou=People,dc=example,dc=org?one nss_base_shadow ou=People,dc=example,dc=org?one nss_base_group ou=Groups,dc=example,dc=org?one /etc/pam_ldap.conf: base dc=example,dc=org uri ldap://ldaptest.example.org ssl start_tls tls_checkpeer yes tls_cacertfile /etc/openldap/cacerts/ca.pem pam_password exop tls_ciphers TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 /etc/openldap/ldap.conf: URI ldap://ldaptest.example.org BASE dc=example,dc=org TLS_CACERT /etc/openldap/cacerts/ca.pem TLS_CIPHER_SUITE TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 TLS_REQCERT demand /etc/openldap/slapd.conf (relevant part): TLSCACertificateFile /etc/openldap/cacerts/ca.pem TLSCertificateFile /etc/pki/tls/certs/slapd.pem TLSCertificateKeyFile /etc/pki/tls/private/slapd.key TLSCipherSuite TLSv1+RSA:!EXPORT:!NULL:!RC4:!MD5 security tls=256 I've created a self-signed CA, and used it to create a certificate for CN=ldaptest.example.org Ok, let's do some tests: - getent passwd works, returns output immediately - ldapsearch -xZZ works, returns output immediately - ssh ruben@localhost works, but takes about 6 to 7 minutes. I'll attach the slapd and sshd logs. Looking at the slapd log I see 44 ldap connections, of which 36 failed. This is all with nscd disabled.
Created attachment 454415 [details] slapd log showing connection failures while logging in with ssh (no nscd)
Created attachment 454416 [details] sshd log showing nss_ldap reconnection attempts
Created attachment 454421 [details] slapd log showing 2 successful connections
Created attachment 454422 [details] sshd log showing successful login without nss_ldap reconnecting
The attachments in comment #19 and #20 show that I'm able to login with ssh as soon as I start nscd.
There is some sort of TLS problem: Oct 19 20:29:05 ldaptest slapd[1436]: connection_read(22): TLS accept failure error=-1 id=1007, closing Unfortunately, the only way to get more detailed information is to use the trace level logging (-d 1) which is very, very verbose. But it should tell us what the low level moznss problem is.
Created attachment 454471 [details] slapd -d 1 trace output
Hmm - there are a couple of different failures: op tag 0x77, time 1287530783 ... TLS: error: accept - force handshake failure: 0 - error -5938:Encountered end of file and op tag 0x77, time 1287530895 ... connection_read(18): unable to get TLS client DN, error=49 id=1023 The first one would seem to indicate that the client just dropped the connection, or gave up on the TLS handshake. The second seems to indicate that the certificate sent by the TLS peer is NULL. Are you able to reproduce any of these problems with the ldapsearch command line tool, or is it necessary to configure nss_ldap/pam_ldap/sshd/nscd to reproduce this problem?
Hi Rich, I'm not able to reproduce this with just ldapsearch. About the second failure, what kind of certificate is it expecting? I'm not (knowingly) using client side certificates.
http://www.zytrax.com/books/ldap/ch12/ says that error 49 is harmless if TLSVerifyClient=demand. I've been reading throught the openldap source. and indeed, connection_read in connection.c unconditionally calls dnX509peerNormalize().
(In reply to comment #26) > http://www.zytrax.com/books/ldap/ch12/ says that error 49 is harmless if > TLSVerifyClient=demand. > > I've been reading throught the openldap source. > and indeed, connection_read in connection.c unconditionally calls > dnX509peerNormalize(). Yes, so that looks ok.
Ok, back to failure no. 1 then. Just to verify that there's no issue with the ldapserver running on localhost, I installed a second vm with F-13, and pointed /etc/ldap.conf at my first server. Everything works fine on F-13. I've upgraded the second VM to F-14 and I'm seeing the same failures. So it definitely looks like a client side issue. Are there any other steps I could take?
(In reply to comment #28) > Ok, back to failure no. 1 then. > > Just to verify that there's no issue with the ldapserver running on localhost, > I installed a second vm with F-13, and pointed /etc/ldap.conf at my first > server. Everything works fine on F-13. > > I've upgraded the second VM to F-14 and I'm seeing the same failures. So it > definitely looks like a client side issue. > > Are there any other steps I could take? I don't know if it would be possible to enable the ldap library trace debug level in the client side software. When you use e.g. ldapsearch -d 1, this calls ldap_set_option(ld, LDAP_OPT_DEBUG_LEVEL, thelevel) e.g. int debuglevel = 1; ldap_set_option(ld, LDAP_OPT_DEBUG_LEVEL, &debuglevel); It looks as though you can set this in nss_ldap - man nss_ldap debug <level> Specifies the debug level used for logging by the LDAP client library. You might try logdir /var/tmp/ldap debug 1 in /etc/ldap.conf to see if that gives us a clue. I suspect it may have something to do with the way the NSS FIPS softoken deals with fork() calls - since it works with ncsd, it may be fork()ing something when not using ncsd.
Ok, that works, but it's a lot of information (which also ends up at stdout).
Created attachment 454717 [details] trace output from nss_ldap while ssh'in
Crud - still no clue :P Back to the ssf setting - does it work if you set ssf=168 (or don't set ssf at all), but fail if you set ssf=256? Or was that a red herring? Thanks for your assistance with this issue. Looks like the only recourse left is to set up a similar environment and run gdb.
Nope, ssf doesn't influence it at all. I've tried running su under gdb, but I don't get very far: [root@ldaptest ~]# gdb --args su - ruben GNU gdb (GDB) Fedora (7.2-16.fc14) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /bin/su...Reading symbols from /usr/lib/debug/bin/su.debug...done. done. (gdb) b _nss_ldap_getbyname Function "_nss_ldap_getbyname" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (_nss_ldap_getbyname) pending. (gdb) r Starting program: /bin/su - ruben [Thread debugging using libthread_db enabled] Cannot find new threads: generic error I could give you access to the VM if you want?
(In reply to comment #33) > Nope, ssf doesn't influence it at all. > > I've tried running su under gdb, but I don't get very far: > > [root@ldaptest ~]# gdb --args su - ruben > GNU gdb (GDB) Fedora (7.2-16.fc14) > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /bin/su...Reading symbols from > /usr/lib/debug/bin/su.debug...done. > done. > (gdb) b _nss_ldap_getbyname > Function "_nss_ldap_getbyname" not defined. > Make breakpoint pending on future shared library load? (y or [n]) y > Breakpoint 1 (_nss_ldap_getbyname) pending. > (gdb) r > Starting program: /bin/su - ruben > [Thread debugging using libthread_db enabled] > Cannot find new threads: generic error > > I could give you access to the VM if you want? Sure! That would be very helpful. Please email me directly with the information.
Thanks for your help. I believe this problem is caused by nss_ldap calling fork() after the NSS software crypto token has been initialized. I'm going to try to reproduce the problem with a simple test program.
Also try this: in the /etc/sysconfig/sshd add export NSS_STRICT_NOFORK=DISABLED then service sshd restart Then try ssh - I did this on the test system and I was able to ssh in as a regular user without using nscd Note that this is not the final, recommended solution, but you could use it until we figure out how to solve this fork()ing problem.
I can live with the workaround for the moment. Thanks again, Ruben
Created attachment 455133 [details] python script used on F-14 to reproduce This python script is used to show the bug and the workaround. Usage: python olnsstest.py certdir ldapurl binddn bindpw e.g. python olnsstest.py /path/to/cert8.db/dir ldap://localhost:389 "cn=directory manager" secret certdir - full absolute path of directory containing cert8.db, key3.db, secmod.db ldapurl - ldap (not ldaps) url - the script uses start_tls to start TLS binddn, bindpw - bind DN and password to use with simple_bind Requires: Directory Server with TLS capability - certdir which contains CA cert for directory server - python-ldap and python-nss packages on Fedora 14 and later To reproduce the problem - comment out the line #os.environ["NSS_STRICT_NOFORK"] = "DISABLED" If you remove the os.fork(), or remove the nss_init(), it works If you do the nss_init(), followed by the fork(), followed by the ldap calls, without the env. var. set, you will get a connection failure.
*** Bug 669900 has been marked as a duplicate of this bug. ***
*** Bug 663485 has been marked as a duplicate of this bug. ***
Any work being done on this bug? My F14 x86_64 w/ 389-ds+Samba server is also affected. The workaround suffices for now, but it shouldn't be a permenant solution.
https://admin.fedoraproject.org/updates/nss-util-3.12.9-1.fc14,nss-softokn-3.12.9-1.fc14,nss-3.12.9-1.fc14,nspr-4.8.7-1.fc14?_csrf_token=4f505a0830875d09f36168715064980b2733a4bc nss-3.12.9 is available in Testing - please try that and see if it fixes this problem.
nss-3.12.9 did not appear to resolve the problem for me. If I have nscd running, I can su from root to an LDAP user, or ssh in as that LDAP user. However, if I stop nscd, neither the su nor the ssh succeed. If I set NSS_STRICT_NOFORK=DISABLED, then the su works (I did not try ssh with this setting).
It turns out that some additional changes were needed to be made to openldap to take advantage of the new SECMOD_RestartModules API in NSS 3.12.9. An ITS has been opened in openldap upstream and a patch submitted: http://www.openldap.org/its/index.cgi?findid=6802
Fixed in openldap-2.4.23-7.fc14
openldap-2.4.23-7.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/openldap-2.4.23-7.fc14
openldap-2.4.23-7.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update openldap'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/openldap-2.4.23-7.fc14
I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh into the machine as an LDAP user. With nscd running, I'm able to do both the su and the ssh. In /var/log/messages, with nscd shut down, I see the following: Jan 26 10:11:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 4 seconds)... Jan 26 10:11:23 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 8 seconds)... Jan 26 10:11:31 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 16 seconds)... Jan 26 10:11:47 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 32 seconds)... Jan 26 10:12:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 64 seconds)... Jan 26 10:13:23 localhost su: nss_ldap: could not search LDAP server - Server is unavailable I don't know if this is significant, but looking at wireshark, right after the client sends a Client Hello message, it appears to be closing the connection. Looks like my comment #43 was slightly incorrect in that an su eventually succeeds, but it takes over 4 minutes when nscd is disabled (compared to less than a second with it enabled).
(In reply to comment #48) > I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to > 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh > into the machine as an LDAP user. With nscd running, I'm able to do both the su > and the ssh. Does it work if you use the workaround from https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36 ? > > In /var/log/messages, with nscd shut down, I see the following: > > Jan 26 10:11:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 4 > seconds)... > Jan 26 10:11:23 localhost su: nss_ldap: reconnecting to LDAP server (sleeping 8 > seconds)... > Jan 26 10:11:31 localhost su: nss_ldap: reconnecting to LDAP server (sleeping > 16 seconds)... > Jan 26 10:11:47 localhost su: nss_ldap: reconnecting to LDAP server (sleeping > 32 seconds)... > Jan 26 10:12:19 localhost su: nss_ldap: reconnecting to LDAP server (sleeping > 64 seconds)... > Jan 26 10:13:23 localhost su: nss_ldap: could not search LDAP server - Server > is unavailable > > I don't know if this is significant, but looking at wireshark, right after the > client sends a Client Hello message, it appears to be closing the connection. > > Looks like my comment #43 was slightly incorrect in that an su eventually > succeeds, but it takes over 4 minutes when nscd is disabled (compared to less > than a second with it enabled).
(In reply to comment #49) > (In reply to comment #48) > > I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to > > 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh > > into the machine as an LDAP user. With nscd running, I'm able to do both the su > > and the ssh. > > Does it work if you use the workaround from > https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36 > ? Yes, that does work (as it did previously). I was hoping that this update would not require this workaround.
(In reply to comment #50) > (In reply to comment #49) > > (In reply to comment #48) > > > I tried upgrading openldap to 2.4.23-7.fc14 (and had already upgraded nss to > > > 3.12.9), but I still see that with nscd shut down, I am unable to su or ssh > > > into the machine as an LDAP user. With nscd running, I'm able to do both the su > > > and the ssh. > > > > Does it work if you use the workaround from > > https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36 > > ? > > Yes, that does work (as it did previously). I was hoping that this update would > not require this workaround. I was afraid of that. It may be that sshd opens a crypto context before the fork() and attempts to use that same crypto context after the fork() in a child process. I'll have to review the code.
Is it possible for you to do an strace of the sshd process during the login? I'd like to see where the fork()s happen. You can attach the output to this bug as a text/plain attachment (after removing/obscuring any sensitive data first).
Created attachment 475498 [details] strace -f -t su - <LDAP username> I have attached strace output from an su - to an LDAP user (the name has been changed in the strace output to dummy). Will this be suitable instead of the sshd strace, since the problem is occurring in both places?
Thanks Brian. After a discussion with an NSS developer and another developer affected by this issue, we've decided the best course of action is to have openldap explicitly disable fork checking in the software token. Openldap ITS: http://www.openldap.org/its/index.cgi?findid=6811 patch submitted
Fixed in openldap-2.4.23-8.fc14
openldap-2.4.23-8.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/openldap-2.4.23-8.fc14
I tried the 2.4.23-8.fc14 update, and I was able to confirm that it does resolve my problem. Thanks.
(In reply to comment #57) > I tried the 2.4.23-8.fc14 update, and I was able to confirm that it does > resolve my problem. Thanks. Thanks for your help with this Brian, and sorry it took so long to resolve.
Package openldap-2.4.21-12.fc13: * should fix your issue, * was pushed to the Fedora 13 updates-testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing openldap-2.4.21-12.fc13' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/openldap-2.4.21-12.fc13 then log in and leave karma (feedback).
Package openldap-2.4.23-9.fc14: * should fix your issue, * was pushed to the Fedora 14 updates-testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing openldap-2.4.23-9.fc14' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/openldap-2.4.23-9.fc14 then log in and leave karma (feedback).
openldap-2.4.23-9.fc14 indeed fixes SSL problem (nss_ldap: failed to bind to LDAP server ldaps://...) but it creates a new one: # su - foo free(0x173bd2) below bottom of memory. (memtop = 0x9898800 membot = 0x973e000) Mar 9 22:49:28 client kernel: [ 496.021546] type=1100 audit(1299703768.918:22601): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:authentication acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success' Mar 9 22:49:28 client kernel: [ 496.021689] type=1101 audit(1299703768.918:22602): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:accounting acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success' Mar 9 22:49:30 client su: pam_unix(su-l:session): session opened for user foo by admin(uid=0) Mar 9 22:49:30 client kernel: [ 497.562678] type=1105 audit(1299703770.459:22603): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:session_open acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success' Mar 9 22:49:30 client kernel: [ 497.562750] type=1103 audit(1299703770.459:22604): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:setcred acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success' Mar 9 22:49:31 client kernel: [ 498.374379] type=1701 audit(1299703771.272:22605): auid=500 uid=2498 gid=202 ses=4 pid=25246 comm="tcsh" sig=6 Mar 9 22:49:31 client abrt[25247]: saved core dump of pid 25246 (/bin/tcsh) to /var/spool/abrt/ccpp-1299703771-25246.new/coredump (2191360 bytes) Mar 9 22:49:31 client abrtd: Directory 'ccpp-1299703771-25246' creation detected Mar 9 22:49:31 client su: pam_unix(su-l:session): session closed for user foo Mar 9 22:49:31 client kernel: [ 498.991478] type=1104 audit(1299703771.888:22606): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:setcred acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success' Mar 9 22:49:31 client kernel: [ 498.991799] type=1106 audit(1299703771.888:22607): user pid=25133 uid=0 auid=500 ses=4 msg='op=PAM:session_close acct="foo" exe="/bin/su" hostname=? addr=? terminal=pts/1 res=success' Mar 9 22:49:32 client abrtd: Crash is in database already (dup of /var/spool/abrt/ccpp-1299701675-27295) Mar 9 22:49:32 client abrtd: Deleting crash ccpp-1299703771-25246 (dup of ccpp-1299701675-27295), sending dbus signal # groups foo foo : valid_group # getent group foo getent passwd and shadow works # grep ldap /etc/nsswitch.conf passwd: files ldap shadow: files ldap group: files ldap after removing ldap from group #su - foo id: cannot find name for group ID 202 [foo@client ~]$ # gdb /bin/tcsh coredump (gdb) bt #0 0x00ed9416 in __kernel_vsyscall () #1 0x006e02f1 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0x006e1d5e in abort () at abort.c:92 #3 0x08084e6a in free (cp=0x8f2bd2) at tc.alloc.c:313 #4 0x08063d83 in blkfree (av0=0x9e7d988) at sh.misc.c:158 #5 0x0805af6f in tsetenv (name=0x80a3420 L"GROUP", val=0x9e32608 L"valid_group") at sh.func.c:1725 #6 0x0804d910 in main (argc=<value optimized out>, argv=0xbfa7e4f4) at sh.c:561 # ldapsearch -x '(cn=valid_group)' # 202, Groups, example.com dn: gidnumber=202,ou=Groups,dc=example,dc=com objectClass: top objectClass: posixGroup gidNumber: 202 cn: valid_group /etc/pam_ldap.conf: nss_base_group ou=Groups,dc=example,dc=com /etc/nss_ldap.conf: nss_base_group ou=Groups,dc=example,dc=com regards, Giannis
if you try the workaround from https://bugzilla.redhat.com/show_bug.cgi?id=636956#c36 does it still crash? Does it crash with bash or some other shell? Does it crash if you do # su foo instead of # su - foo ?
export NSS_STRICT_NOFORK=DISABLED and bash/tcsh and openldap-2.4.23-4.fc14.i686 works for ssh and #su - foo and #su foo also ok however there is a small delay and I also see my iptables logging a packet from the ldap server back to client (not established). So probably the 1st connection is closed somehow badly from the client. openldap-2.4.23-9.fc14.i686 and /bin/bash also works WITHOUT export NSS_STRICT_NOFORK=DISABLED openldap-2.4.23-9.fc14.i686 and /bin/tcsh and #su foo works #su - foo free(0xd6dbd2) below bottom of memory. (memtop = 0x83b1000 membot = 0x825c000) with or without export NSS_STRICT_NOFORK=DISABLED
Today also (with openldap-2.4.23-4.fc14.x86_64) while updating to setroubleshoot-server-3.0.30-1.fc14.x86_64 and libisofs-1.0.2-1.fc14.x86_64 with yum update I got again Mar 10 15:15:01 client dbus-daemon: nss_ldap: failed to bind to LDAP server ldaps://ldap.example.com:636: Can't contact LDAP server after issuing /etc/init.d/messagebus restart problem gone away and yum completed (it was hanging) This might be related to https://bugzilla.redhat.com/show_bug.cgi?id=502072
Created attachment 483481 [details] openldap-2.4.24-use-static-for-nofork-envvar-20110310.patch I don't know what could cause such odd behavior between shells and su invocations. Perhaps I can't call PR_SetEnv with a string constant? Here is a patch that uses a static char array instead of a constant string.
I have a patched version of openldap here: http://rmeggins.fedorapeople.org/openldap/ - could you please give these packages a try?
I've tried it. (all these without sssd or nscd) openldap-2.4.23-10.1.fc14.i686.rpm has no different behavior than openldap-2.4.23-9.fc14.i686.rpm # su - foo with /bin/tcsh crashes. Also login from GDM Login screen crashes with /bin/tcsh Login from GDM with /bin/bash works All these on new installed f14 virtualbox test box up2date.
I also found a quicker way to reproduce it (openldap-2.4.23-10.1.fc14.i686) ssh root@client # tcsh # cd ~foo free(2f2ec0) below bottom of memory. (memtop = 0x9f17000 membot = 0x9d3d000) Aborted (core dumped) The following works (strange) # cd ~foo # cd # tcsh # cd ~foo ~foo exists and owned by foo All works (GDM LOGIN also) if nscd is running.
Created attachment 483522 [details] gdm crash log
Created attachment 483523 [details] gdm crash debug
Created attachment 483525 [details] openldap-2.4.24-use-strdup-for-nofork-envvar-20110310.patch I am baffled because these crashes don't appear to have anything to do with openldap. The only think I can think of is that this is causing problems - libraries/libldap/tls_m.c: - PR_SetEnv( "NSS_STRICT_NOFORK=DISABLED" ); which seems perfectly valid. Using a static variable here seemed to have no effect. I'm now working on another variation which uses a strdup.
New patched version of openldap here: http://rmeggins.fedorapeople.org/openldap/ - could you please give these packages a try? If these fail - can you try running with valgrind? e.g. try this # valgrind -q --tool=memcheck su - root
This version worked for su - foo and ssh foo@client env NSS_STRICT_NOFORK=DISABLED is only available on bash though. I cannot find in tcsh. Don't know if this a problem
Jan, not sure what the procedure is for this bug. Do we change the status to ASSIGNED (i.e. reopen the bug) or file a new bug? I have submitted an upstream ITS: http://www.openldap.org/its/index.cgi?findid=6862 which also contains the patch.
Thank you, Rich. I changed the status to ASSIGNED. New update will be pushed soon.
Fixed in openldap-2.4.23-10.fc14
openldap-2.4.23-10.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/openldap-2.4.23-10.fc14
openldap-2.4.24-2.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/openldap-2.4.24-2.fc15
openldap-2.4.24-2.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
openldap-2.4.23-10.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.