Bug 747387

Summary: Unable to contact LDAP Server during winsync.
Product: Red Hat Enterprise Linux 6 Reporter: Gowrishankar Rajaiyan <grajaiya>
Component: nssAssignee: Elio Maldonado Batiz <emaldona>
Status: CLOSED ERRATA QA Contact: Aleš Mareček <amarecek>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.2CC: abokovoy, amarecek, ddumas, emaldona, jgalipea, mkosek, rmeggins, rrelyea, shaines
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nss-3.12.10-15.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 782783 800674 (view as bug list) Environment:
Last Closed: 2011-12-06 12:11:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 800674    
Attachments:
Description Flags
patch for multiinit.c to allow standalone compilation
none
my full multiinit.c
none
skip any load module calls if we are already initted and we aren't adding any new databases
none
bob's patch expanded to deal with thread blocking issues
rrelyea: review-
Skip any modules calls if we are initted and aren't adding databases. rrelyea: review+

Description Gowrishankar Rajaiyan 2011-10-19 17:01:09 UTC
Description of problem:


Version-Release number of selected component (if applicable):
ipa-server-2.1.3-2.el6.x86_64

How reproducible:


Steps to Reproduce:
Make sure IPA server and ADS server are resolvable via DNS.
Make sure /etc/hosts has both IPA and AD entries.

ACTIVE DIRECTORY

1. Set up the Active Directory server to use the SSL server certificate.
2. Transfer IPA CA Cert to your ADS server.
3. Import the CA certificate from Directory Server(host1).
4. Install passsync on ADS:
/**
hostname :  ipaserver.jgalipea.redhat.com
port : 636
binddn : uid=passsync,cn=sysaccounts,cn=etc,dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com
password :  password
Security Device Password :  Secret123
user search :  cn=users,cn=accounts,dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com
**/
5. Trust the IPA Server's CA Certificate
6. reboot

IPA SERVER
1. Install ipa-server on host1
2. host1: ipa-replica-prepare host2.fqdn
3. host2: ipa-replica-install /var/lib/ipa/replica-info-host2.fqdn.gpg
4. copy windows cert to host1
5. date && ipa-replica-manage connect --winsync --passsync=password --cacert=/root/WinCert2.cer dhcp201-112.englab.pnq.redhat.com --binddn "cn=Administrator,cn=Users,dc=dhcp201-112,dc=englab,dc=pnq,dc=redhat,dc=com" --bindpw Secret123 -v -p Secret123

 
Actual results:
[root@decepticons ~]# date && ipa-replica-manage connect --winsync --passsync=password --cacert=/root/WinCert2.cer dhcp201-112.englab.pnq.redhat.com --binddn "cn=Administrator,cn=Users,dc=dhcp201-112,dc=englab,dc=pnq,dc=redhat,dc=com" --bindpw Secret123 -v -p Secret123
Wed Oct 19 21:03:27 IST 2011
Added CA certificate /root/WinCert2.cer to certificate database for decepticons.lab.eng.pnq.redhat.com
Failed to get data from 'decepticons.lab.eng.pnq.redhat.com': {'desc': "Can't contact LDAP server"}
[root@decepticons ~]# 

Expected results:
Sync Agreement to be set up and successfully sync users and passwords from AD.  users and passwords are also synced to ipa host2.


Additional info:
[root@decepticons ~]# date && ipa-replica-manage connect --winsync --passsync=password --cacert=/root/WinCert2.cer dhcp201-112.englab.pnq.redhat.com --binddn "cn=Administrator,cn=Users,dc=dhcp201-112,dc=englab,dc=pnq,dc=redhat,dc=com" --bindpw Secret123 -v -p Secret123
Wed Oct 19 21:03:27 IST 2011
Added CA certificate /root/WinCert2.cer to certificate database for decepticons.lab.eng.pnq.redhat.com
Failed to get data from 'decepticons.lab.eng.pnq.redhat.com': {'desc': "Can't contact LDAP server"}
[root@decepticons ~]# 


/var/log/dirsrv/slapd-LAB-ENG-PNQ-REDHAT-COM/errors:
[19/Oct/2011:21:03:28 +051800] - slapd shutting down - signaling operation threads
[19/Oct/2011:21:03:28 +051800] - slapd shutting down - closing down internal subsystems and plugins
[19/Oct/2011:21:03:28 +051800] - Waiting for 4 database threads to stop
[19/Oct/2011:21:03:29 +051800] - All database threads now stopped
[19/Oct/2011:21:03:29 +051800] - slapd stopped.
[19/Oct/2011:21:03:32 +051800] - 389-Directory/1.2.9.13 B2011.281.321 starting up
[19/Oct/2011:21:03:32 +051800] schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat, dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com
[19/Oct/2011:21:03:32 +051800] schema-compat-plugin - warning: no entries set up under ou=SUDOers, dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com
[19/Oct/2011:21:03:32 +051800] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com--no CoS Templates found, which should be added before the CoS Definition.
[19/Oct/2011:21:03:32 +051800] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com--no CoS Templates found, which should be added before the CoS Definition.
[19/Oct/2011:21:03:32 +051800] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[19/Oct/2011:21:03:32 +051800] - Listening on All Interfaces port 636 for LDAPS requests
[19/Oct/2011:21:03:32 +051800] - Listening on /var/run/slapd-LAB-ENG-PNQ-REDHAT-COM.socket for LDAPI requests

====================

> /usr/lib/python2.6/site-packages/ipaserver/install/replication.py(120)__init__()
-> self.conn.do_simple_bind(bindpw=dirman_passwd)
(Pdb) n
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection 1 1 0
ldap_int_open_connection
ldap_connect_to_host: TCP decepticons.lab.eng.pnq.redhat.com:636
ldap_new_socket: 5
ldap_prepare_socket: 5
ldap_connect_to_host: Trying 10.65.201.77:636
ldap_pvt_connect: fd: 5 tm: -1 async: 0
TLS: could not initialize moznss - error -8192:Unknown code ___f 0.
TLS: could perform TLS system initialization.
TLS: error: could not initialize moznss security context - error -8192:Unknown code ___f 0
TLS: can't create ssl handle.
ldap_err2string
SERVER_DOWN: SERVER_D...erver"},)
> /usr/lib/python2.6/site-packages/ipaserver/install/replication.py(120)__init__()
-> self.conn.do_simple_bind(bindpw=dirman_passwd)

====================


[root@decepticons ~]# LDAPTLS_CACERT=/etc/ipa/ca.crt ldapsearch -ZZ -x -h decepticons.lab.eng.pnq.redhat.com -s base  -b '' namingContexts
# extended LDIF
#
# LDAPv3
# base <> with scope baseObject
# filter: (objectclass=*)
# requesting: namingContexts 
#

#
dn:
namingContexts: dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com

# search result
search: 3
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@decepticons ~]#

Comment 2 Rich Megginson 2011-10-19 19:53:38 UTC
Can easily reproduce this problem with the multiinit.c test program that comes with the NSS source code.  I made the following modification to allow me to pass in the same NSS init arguments used by openldap when using the nsspem module:

-    ctxt = NSS_InitContext(db->arg, "", "", "", initStringPtr,
-		NSS_INIT_NOROOTINIT|(readonly?NSS_INIT_READONLY:0));
+    if (!db->arg || !db->arg[0]) {
+        flags |= NSS_INIT_NOCERTDB|NSS_INIT_NOMODDB;
+    }
+    ctxt = NSS_InitContext(db->arg, "", "", "", initStringPtr, flags);

When using nsspem, I do not have key/cert db files, just PEM files.
The full patch I used is attached because I wanted to avoid using the secutil stuff in order to create a standalone program that can be built outside of the nss source tree.

I built it like this:
gcc -g -o multiinit multinit.c -I/usr/include/nss3 -I/usr/include/nspr4 -lnss3 -lnspr4

I ran it like this:
./multiinit --main_db ~/save --lib1_db "" --main_readonly --lib1_readonly --lib2_db "" --lib2_readonly -v --main_command list_slots --lib1_command list_slots --lib2_command list_slots --order M12miz

Where ~/save contains a real NSS key/cert db with real certs.

I get this output:
* initializing with order "M12miz"*
*NSS_Init for the main program*
*Executing nss command "list_slots" for main*
* Name=NSS Internal Cryptographic Services Token_Name=NSS Generic Crypto Services present=true, ro=true *
* Name=NSS User Private Key and Certificate Services Token_Name=NSS Certificate DB present=true, ro=true *
*NSS_Init for lib1*
>> Unknown code ___f 0

The "Unknown code" is really error -8192 or SEC_ERROR_IO.  I've traced through the NSS code in the debugger, but I really don't understand what it's doing.  Something about the module is already loaded, so it attempts to load it into a new slot (slotID 4) but the module only has two slots available, so it bombs.

I'm going to need some help from Bob or Elio on this one.

Comment 3 Rich Megginson 2011-10-19 19:54:35 UTC
Created attachment 529091 [details]
patch for multiinit.c to allow standalone compilation

Comment 4 Rich Megginson 2011-10-19 19:58:19 UTC
Created attachment 529092 [details]
my full multiinit.c

Comment 5 Alexander Bokovoy 2011-10-19 20:40:16 UTC
I was seeing similar errors that went away once I've added both IPA CA and ADS server certificates to LDAP client config as I described in bug 739241:
-----------------------------------------
  2.1. Copy AD certificate to /etc/openldap/cacerts/
  2.2. Copy IPA CA certificate to /etc/openldap/cacerts/
  2.3. Run cacertdir_rehash /etc/openldap/cacerts/
  2.4. Modify /etc/openldap/ldap.conf, and add if they do not exist:
TLS_CACERTDIR /etc/openldap/cacerts/
TLS_REQCERT allow
-----------------------------------------

This will force all LDAP clients that do not otherwise configure themselves to use specific certificates to use the ones available in the cacertdir.

Comment 6 Bob Relyea 2011-10-19 21:11:09 UTC
1$!@#!@#$!@# bugzilla ate my response....

OK, It looks like the issue is trying to call NSS_InitContext() with nodb on a module that has already been initialized. This should be basically a noop for NSS, as there are no new databases that need to be opened.

What's likely happening is softoken is returning an error when you ask it to open a new slot which has no new databases. Perfectly reasonable thing for softoken to do.

pk11_loadmodule is treating this failure to open a new slot as fatal.

The simplest thing is to have nss_init not try to load any modules if it has the noCertDB and noModDB flags set, since there isn't anything for it to do. 

As a longer term fix, we need to make pk11_loadmodule smarter (the above fix won't handle the case if you only set noCertDB, which I don't think is very common -- usually you are trying to not touch the disk at all when you open with no DB's).

Elio, can you give the team a test build with the patch I'm about to attach?

bob

Comment 7 Bob Relyea 2011-10-19 21:13:32 UTC
Created attachment 529109 [details]
skip any load module calls if we are already initted and we aren't adding any new databases

Elio, this patch is in mozilla/security/nss/lib/nss

Comment 8 Scott Haines 2011-10-19 21:22:27 UTC
Moving to component: nss. Setting necessary flags.

Comment 9 Elio Maldonado Batiz 2011-10-20 03:12:55 UTC
Created attachment 529165 [details]
bob's patch expanded to deal with thread blocking issues

Comment 10 Elio Maldonado Batiz 2011-10-20 03:21:41 UTC
The patch attached worked for me with Rich's reproducer and an empty database.
A scratch build is at https://brewweb.devel.redhat.com/taskinfo?taskID=37246

Comment 11 Gowrishankar Rajaiyan 2011-10-20 13:48:21 UTC
The scratch build link in comment #10 does not have the build. I downloaded the scratch build from https://brewweb.devel.redhat.com/taskinfo?taskID=3724625 and tested on my system. 

[root@decepticons ~]# date && ipa-replica-manage connect --winsync --passsync=password --cacert=/root/WinCert2.cer dhcp201-112.englab.pnq.redhat.com --binddn "cn=Administrator,cn=Users,dc=englab,dc=pnq,dc=redhat,dc=com" --bindpw Secret123 -v -p Secret123
Thu Oct 20 19:13:49 IST 2011
Added CA certificate /root/WinCert2.cer to certificate database for decepticons.lab.eng.pnq.redhat.com
INFO:root:AD Suffix is: DC=englab,DC=pnq,DC=redhat,DC=com
The user for the Windows PassSync service is uid=passsync,cn=sysaccounts,cn=etc,dc=lab,dc=eng,dc=pnq,dc=redhat,dc=com
Windows PassSync entry exists, not resetting password
INFO:root:Added new sync agreement, waiting for it to become ready . . .
INFO:root:Replication Update in progress: FALSE: status: 0 Replica acquired successfully: Incremental update succeeded: start: 20111020134353Z: end: 20111020134353Z
INFO:root:Agreement is ready, starting replication . . .
INFO:root:Failed to create public entry for winsync replica
Starting replication, please wait until this has completed.
Update succeeded
Connected 'decepticons.lab.eng.pnq.redhat.com' to 'dhcp201-112.englab.pnq.redhat.com'
[root@decepticons ~]# 


works as expected.

Comment 12 Bob Relyea 2011-10-20 18:00:58 UTC
Comment on attachment 529165 [details]
bob's patch expanded to deal with thread blocking issues

This was a misunderstanding. I thought this patch was already in the RHEL 6.2. If it's not, we should not put it in now (too high a risk).

This patch does not affect the issue of this bug.

Comment 13 Elio Maldonado Batiz 2011-10-20 18:27:22 UTC
Created attachment 529360 [details]
Skip any modules calls if we are initted and aren't adding databases.

This what we need. Adapted for nss-3.12.10 in RHEL 6.2.

I did have some doubts I made several versions and scratch builds.
The one in https://brewweb.devel.redhat.com/taskinfo?taskID=3724547
is based on this patch if anyone wants to retest.

Comment 14 Bob Relyea 2011-10-20 21:26:58 UTC
Comment on attachment 529360 [details]
Skip any modules calls if we are initted and aren't adding databases.

r+ ;)

Comment 15 Bob Relyea 2011-10-20 21:28:28 UTC
> I did have some doubts

Just to be clear, you mean the massive patch that I r-, not the one I r+'ed;).

bob

Comment 16 Elio Maldonado Batiz 2011-10-20 21:30:33 UTC
(In reply to comment #15)
Yes, that's what I meant.

Comment 24 errata-xmlrpc 2011-12-06 12:11:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1584.html