Bug 469261

Summary: Support server-to-server SASL
Product: [Retired] 389 Reporter: Rich Megginson <rmeggins>
Component: Security - SASLAssignee: Rich Megginson <rmeggins>
Status: CLOSED CURRENTRELEASE QA Contact: Chandrasekar Kannan <ckannan>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.1.3CC: benl, jgalipea, nhosoi, nkinder, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 8.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-29 23:07:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 249650, 467277, 493682    
Attachments:
Description Flags
diffs - part 1
none
new file m4/kerberos.m4
none
new kerberos.m4
none
cvs commit log - part 1
none
diffs - part 2
none
cvs commit log - part 2
none
kerberos improvements
none
new kerberos improvements
none
diffs - part 3 - dna plugin
none
cvs commit log - part 3 - dna
none
diffs - part 4 - pta, winsync
none
cvs commit log - part 4
none
cvs commit log - kerberos improvements
none
diffs - console replication
none
cvs commit log - console repl, winsync
none
diffs - console chaining/database link
none
diffs - server chaining, ssl client without ssl server, code cleanup
none
cvs commit log - console chaining/database link/server cleanup none

Description Rich Megginson 2008-10-30 21:17:11 UTC
Need to be able to support using SASL mechanisms (especially GSSAPI kerberos) when doing server to server communications - replication, chaining, pass through auth.  For kerberos, the server must be able to authenticate using the server's keytab, similar to how client cert auth works with the server using its server certificate.

Comment 1 Rich Megginson 2008-10-31 01:58:14 UTC
Created attachment 322014 [details]
diffs - part 1

Comment 2 Rich Megginson 2008-10-31 01:58:43 UTC
Created attachment 322015 [details]
new file m4/kerberos.m4

Comment 3 Nathan Kinder 2008-11-03 21:13:33 UTC
In the new kerberos.m4 file, what happens if I specify the "--with-kerberos-lib" option, but no "--with-kerberos-inc" option?

It looks like it will end up using krb5-config to find both the libdir and the incdir.  This may make it use a different libdir to build than the one you specified at configure time.

Comment 4 Rich Megginson 2008-11-04 15:50:39 UTC
Created attachment 322439 [details]
new kerberos.m4

Comment 5 Rich Megginson 2008-11-04 18:24:00 UTC
Created attachment 322463 [details]
cvs commit log - part 1

Reviewed by: nkinder, nhosoi, ssorce (Thanks!)
Fix Description: I've created two new functions to handle the client side of LDAP in the server - slapi_ldap_init_ext and slapi_ldap_bind.  These two functions are designed to work with any connection type (ldap, ldaps, ldap+starttls, and eventually ldapi) and bind type (plain, sasl, client cert).  The secure flag has been extended to use a value of 2 to mean use startTLS.  One tricky part is that there is no place to store the startTLS flag in init to pass to bind, so we store that in the clientcontrols field which is currently unused.  We do that because the semantics of ldap_init are not to do any network traffic, but defer that until the bind operation (or whatever the first actual operation is e.g. start_tls).  I plan to replace all of the places in the code that do ldap init and bind with these functions.
I started with replication.  I extended the transport to add tls for startTLS and the bind method to add sasl/gssapi and sasl/digest-md5.  I removed a lot of code from repl5_connection that is now done with just slapi_ldap_init_ext and slapi_ldap_bind.  One tricky part of the replication code is that it polls the connection for write available, using some ldap sdk internals.  I had to fix that code to work within the public ldap api since nspr and sasl muck with the internals in different incompatible ways.
Finally, there is a lot of new kerberos code in the server.  The way the server does sasl/gssapi auth with its keytab is similar to the way it does client cert auth with its ssl server cert.  One big difference is that the server cannot pass the kerberos identity and credentials through the ldap/sasl/gssapi layers directly.  Instead, we have to create a memory credentials cache and set the environment variable to point to it.  This allows the sasl/gssapi layer to grab the credentials for use with kerberos.  The way the code is written, it should also allow "external" kerberos auth e.g. if someone really wants to do some script which does a periodic kinit to refresh the file based cache, that should also work.
I added some kerberos configure options.  configure tries to first use krb5-config to get the compiler and linker information.  If that fails, it just looks for some standard system libraries.  Note that Solaris does not allow direct use of the kerberos api until Solaris 11, so most likely Solaris builds will have to use --without-kerberos (--with-kerberos is on by default).
Fixed a bug in kerberos.m4 found by nkinder.
ssorce has pointed out a few problems with my kerberos usage that will be addressed in the next patch.
Changed the log level in ldap_sasl_get_val - pointed out by nkinder
Platforms tested: Fedora 9, Fedora 8
Flag Day: yes
Doc impact: oh yes

Comment 6 Rich Megginson 2008-11-05 17:03:26 UTC
Created attachment 322613 [details]
diffs - part 2

Comment 7 Rich Megginson 2008-11-05 18:22:00 UTC
Created attachment 322625 [details]
cvs commit log - part 2

Reviewed by: nhosoi (Thanks!)
Fix Description: This part focuses on chaining backend - allowing the mux server to use SASL to connect to the farm server, and allowing SASL authentication to chain.  I had to add two new config parameters for chaining:
nsUseStartTLS - on or off - tell connection to use startTLS - default is off
nsBindMechanism - if absent, will just use simple auth.  If present, this must be one of the supported mechanisms (EXTERNAL, GSSAPI, DIGEST-MD5) - default is absent (simple bind)
The chaining code uses a timeout, so I had to add a timeout to slapi_ldap_bind, and correct the replication code to pass in a NULL for the timeout parameter.
Fixed a bug in the starttls code in slapi_ldap_init_ext.
The sasl code uses an internal search to find the entry corresponding to the sasl user id.  This search could not be chained due to the way it was coded.  So I added a new chainable component called cn=sasl and changed the sasl internal search code to use this component ID.  This allows the sasl code to work with a chained backend.  In order to use chaining with sasl, this component must be set in the chaining configuration nsActiveChainingComponents.  I also discovered that password policy must be configured too, in order for the sasl code to determine if the account is locked out.
I fixed a bug in the sasl mapping debug trace code.
Still to come - sasl mappings to work with all of this new code - kerberos code improvements - changes to pta and dna
Platforms tested: Fedora 8, Fedora 9
Flag Day: yes
Doc impact: yes

Comment 8 Rich Megginson 2008-11-06 21:14:57 UTC
Created attachment 322788 [details]
kerberos improvements

Comment 9 Rich Megginson 2008-11-07 23:27:18 UTC
Created attachment 322914 [details]
new kerberos improvements

Comment 10 Rich Megginson 2008-11-07 23:43:50 UTC
Created attachment 322915 [details]
diffs - part 3 - dna plugin

Comment 11 Rich Megginson 2008-11-10 16:02:50 UTC
Created attachment 323087 [details]
cvs commit log - part 3 - dna

Reviewed by: nkinder (Thanks!)
Fix Description: Changed the DNA code to use the new slapi_ldap_init/slapi_ldap_bind code.  Also changed the code to get the port number to use from the replication agreement.  Added some more replication internal code knowledge to the DNA code (unfortunately).
Platforms tested: Fedora 9
Flag Day: no
Doc impact: yes

Comment 12 Rich Megginson 2008-11-10 20:21:18 UTC
Created attachment 323115 [details]
diffs - part 4 - pta, winsync

Comment 13 Rich Megginson 2008-11-11 00:01:14 UTC
Created attachment 323136 [details]
cvs commit log - part 4

Reviewed by: nhosoi (Thanks!)
Fix Description: Allow pass through auth (PTA) to use starttls.  PTA uses the old style argv config params, so I just added an optional starttls (0, 1) to the end of the list, since there is currently no way to encode the startTLS extop in the LDAP URL.  NOTE: adding support for true pass through auth for sasl or external cert auth will require a lot of work - not sure it's worth it - anyone other than console users can use chaining backend instead.
For windows sync, I just ported the same slapi_ldap_init/slapi_ldap_bind changes made to regular replication to the windows specific code.  The Windows code still needs the do_simple_bind function to check the windows password, but it is not used for server to server bind anymore.  NOTE: Windows does support startTLS, but I did not test the SASL mechanisms with Windows.
Platforms tested: Fedora 9
Flag Day: no
Doc impact: yes

Comment 14 Rich Megginson 2008-11-11 00:06:01 UTC
Simo, can you review the patch "new kerberos improvements" - https://bugzilla.redhat.com/attachment.cgi?id=322914&action=diff - when you get a chance?  Thanks.

Comment 15 Simo Sorce 2008-11-11 05:14:59 UTC
Looks ok to me.
I am a bit unclear on why you look in the ccache for a principal instead of building it right away. But assuming the ccache principal will be ok anyway it does no harm.

Also in the function that checks if the string "looks" like a dn why don't you just parse and validate the dn to make sure? After all this operation happens during a bind and validating a dn will not cost that much.
While '=' is uncommon in a username or principal name it is technically possible to have one so maybe we should do a bit more proper checking there to avoid false positives.

Comment 16 Rich Megginson 2008-11-11 15:05:36 UTC
(In reply to comment #15)
> Looks ok to me.

Thanks.

> I am a bit unclear on why you look in the ccache for a principal instead of
> building it right away. But assuming the ccache principal will be ok anyway it
> does no harm.

One reason is to verify the ccache actually exists.  If ccache is file based, there is no way to verify the ccache exists and is readable (no direct API for that) so the only way to do it is to actually do something that will read from the ccache.  Since I need the principal anyway to check the credentials for expiration, I read the principal from the ccache.

One thing I want to be able to do is support an openldap-like "external" authentication e.g. the credentials are stored in a file based ccache that is refreshed periodically by a script running kinit.  If I detect that case, I log an error message (but only the first time, not every time to avoid cluttering the logs) and continue, assuming the user Knows What He/She Is Doing.  This means one pre-condition to doing server to server SASL/GSSAPI using the server keytab is that there must be no valid KRB5CCNAME file based ccache.  Since the dirsrv user is not a real user, and you cannot login and do a kinit in a shell as this user, it will be almost impossible for this to happen by accident.

> Also in the function that checks if the string "looks" like a dn why don't you
> just parse and validate the dn to make sure?

There is not an API that will parse and validate a DN.  We have APIs that will normalize a DN so that you can compare and collate DN values, but it does not tell you if the string is a valid DN.

> After all this operation happens
> during a bind and validating a dn will not cost that much.
> While '=' is uncommon in a username or principal name it is technically
> possible to have one so maybe we should do a bit more proper checking there to
> avoid false positives.

How likely is it that a kerberos principal will contain the '=' character?  And note that this check is only done for a user supplied principal name.  A principal name from a ccache or keytab is used directly.

I'm open to suggestions about how to parse and validate a DN.  Of course, if there is a kerberos principal named "cn=Directory Manager" or "cn=ldapadmin" that we must support, then I'm not sure how to handle that.

Comment 17 Simo Sorce 2008-11-12 15:31:36 UTC
(In reply to comment #16)
> (In reply to comment #15)

> One thing I want to be able to do is support an openldap-like "external"
> authentication e.g. the credentials are stored in a file based ccache that is
> refreshed periodically by a script running kinit.  If I detect that case, I log
> an error message (but only the first time, not every time to avoid cluttering
> the logs) and continue, assuming the user Knows What He/She Is Doing.  This
> means one pre-condition to doing server to server SASL/GSSAPI using the server
> keytab is that there must be no valid KRB5CCNAME file based ccache.  Since the
> dirsrv user is not a real user, and you cannot login and do a kinit in a shell
> as this user, it will be almost impossible for this to happen by accident.

you must test the case where an admin with a valid ccache do a /etc/init.d/dirsrv restart to make sure this works properly.
(Or even just runs ns-slapd manually for debugging or other purposes)

> There is not an API that will parse and validate a DN.  We have APIs that will
> normalize a DN so that you can compare and collate DN values, but it does not
> tell you if the string is a valid DN.

YEs I meant the function to normalize the DN, if we have something like foo=bar=baz it should probably fail.

> How likely is it that a kerberos principal will contain the '=' character? And

I hope not too likely

> note that this check is only done for a user supplied principal name.  A
> principal name from a ccache or keytab is used directly.

ah ok.
 
> I'm open to suggestions about how to parse and validate a DN.  Of course, if
> there is a kerberos principal named "cn=Directory Manager" or "cn=ldapadmin"
> that we must support, then I'm not sure how to handle that.

Dunno, maybe it is not that important after all, I think one or more '=' in a user name are rare after all.

Comment 18 Rich Megginson 2008-11-12 15:48:59 UTC
> YEs I meant the function to normalize the DN, if we have something like
> foo=bar=baz it should probably fail.

Unfortunately, the function that normalizes DNs will just return the given string if not a DN.  We do not have a function that will definitively return a true/false given a string.  I think looking for '=' is ok in this context:
1) there is no function that will return a definitive true/false
2) it's highly likely a string with a '=' in this context is a DN
3) it's highly likely a string without a '=' in this context is a kerberos principal name
4) we will have an error message logged that username [X] could not authenticate, so the admin will be able to see if GSSAPI was attempted with a DN, or simple bind with a principal name

> you must test the case where an admin with a valid ccache do a
> /etc/init.d/dirsrv restart to make sure this works properly.
> (Or even just runs ns-slapd manually for debugging or other purposes)

I have tested it with the IPA admin user.  I have not actually done all the scripting to do the kinit renewal periodically.  I do not think we will officially test this configuration for RHDS QA.  If we must support server to server SASL/GSSAPI on Solaris, we will have to do something like this, since Solaris (9) does not expose the kerberos API.  I'm not sure if we will. In general, we will only support using the server keytab for server to server GSSAPI auth, unless we find that it is necessary for us to support the external ccache case, for whatever reason.

Other than these, does the actual kerberos code look ok?

Comment 19 Simo Sorce 2008-11-12 16:02:22 UTC
(In reply to comment #18)

> Other than these, does the actual kerberos code look ok?

yep

Comment 20 Rich Megginson 2008-11-12 17:44:09 UTC
Created attachment 323360 [details]
cvs commit log - kerberos improvements

Reviewed by: ssorce (Thanks!)
Fix Description: I made several improvements to the kerberos code at
Simo's suggestion
First look for the principal in the ccache.  If not found, use the
username if it does not look like a DN.  If still not found, construct a
principal using the krb5_sname_to_principal() function to construct
"ldap/fqdn@REALM".
Next, see if the credentials for this principal are still valid.  In
order to grab the credentials from the ccache, I needed to construct the
server principal, which in this case is the TGS service principal (e.g.
krbtgt/REALM@REALM).  If the credentials are present and not expired,
then the code assumes they are ok and does not acquire new credentials.
If the credentials are expired or not found, the code will then use the
keytab to authenticate.
Based on more feedback from Simo, I made some additional changes:
* Go ahead and reacquire the creds if they have expired or will expire in 30 seconds - this is not configurable but could be made to be - 30 seconds should be long enough so that the credentials will not expire by the time they are actually used deep in the ldap/sasl/gssapi/krb code, and short enough so that this won't cause unnecessary credential churn
* Retry the bind in the case of Ticket expired.  There is no way that I can see to get the actual error code - fortunately the extended ldap error message has this information
Platforms tested: Fedora 8, Fedora 9
Flag Day: no
Doc impact: oh yes

Comment 21 Rich Megginson 2008-11-21 21:32:37 UTC
Created attachment 324349 [details]
diffs - console replication

Comment 22 Nathan Kinder 2008-11-21 23:32:06 UTC
The Console changes look good.  One minor thing.  The replication-destination-sslEncrypt-ttip property mentions that the LDAPS port number is 636, but that may not be the case if one is using a custom port.  Perhaps we should just say it uses the LDAPS port and not mention a specific port number ?

Comment 23 Rich Megginson 2008-11-24 16:10:39 UTC
Created attachment 324495 [details]
cvs commit log - console repl, winsync

Reviewed by: nkinder (Thanks!)
Fix Description: This adds support for starttls, gssapi, and digest to the console for setting up replication agreements.
1) Instead of a checkbox for use ssl, I added 3 radio buttons - no ssl, regular ldaps, starttls - note: active directory supports starttls
2) To the ssl auth and simple auth radio buttons, I added gssapi and digest.  The way the logic works is that gssapi is only allowed when using regular ldap, digest and simple bind are allowed always, ssl auth is only allowed with one of the ssl options.  gssapi allows an empty bind dn and password, but digest and simple require a bind dn and password.  NOTE: we do not support anything other than simple bind with active directory in the GUI
I also changed the wording a little bit, and added tool tips (which will hopefully not be too annoying)
I did not add additional checking e.g. the console cannot verify that kerberos is set up properly
Platforms tested: RHEL5
Flag Day: no
Doc impact: oh yes

Comment 24 Rich Megginson 2008-12-01 20:15:49 UTC
Created attachment 325289 [details]
diffs - console chaining/database link

Comment 25 Rich Megginson 2008-12-01 20:28:38 UTC
Created attachment 325290 [details]
diffs - server chaining, ssl client without ssl server, code cleanup

Comment 26 Rich Megginson 2008-12-02 15:31:10 UTC
Created attachment 325381 [details]
cvs commit log - console chaining/database link/server cleanup

Reviewed by: nkinder (Thanks!)
Fix Description: There are two sets of diffs here.  The first set adds tls, gssapi, and digest to the chaining database (aka database link) panels in the console.  I had to add support for revert to some of the code to make the Reset button work without having to retrieve the values from the server each time.  We already store the original values locally in the _origModel - I added code to allow the use of that in the Reset button.
The second set of diffs is for the server.
1) I had to add support for "SIMPLE" for bindMechanism - this translates to LDAP_SASL_SIMPLE for the actual mechanism.  This value is NULL, so I had to add handling for NULL values in the cb config code (slapi_ch_* work fine with NULL values).
2) Added some more debugging/tracing code
3) The server to server SSL code would only work if the server were configured to be an SSL server.  But for the server to be an SSL client, it only needs NSS initialized and to have the CA cert.  It also needs to configured some of the SSL settings and install the correct policy.  I changed the server code to do this.
Platforms tested: RHEL5
Flag Day: no
Doc impact: Yes

Comment 27 Jenny Severance 2009-03-16 16:33:47 UTC
server to server SASL functionality exists and is being tested with automated acceptance testing - GUI is available and looks good.

Comment 28 Chandrasekar Kannan 2009-04-29 23:07:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0455.html