Bug 1059730

Summary: Kerberos does not handle incorrect Active Directory DNS SRV entries correctly
Product: Red Hat Enterprise Linux 6 Reporter: Ivo van Geel <ivo>
Component: krb5Assignee: Nalin Dahyabhai <nalin>
Status: CLOSED ERRATA QA Contact: Patrik Kis <pkis>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.5CC: dpal, ivo, jplans, ksrot, nalin, pkis, rmainz
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: krb5-1.10.3-24.el6 Doc Type: Bug Fix
Doc Text:
Cause: When attempting to locate Kerberos servers using DNS service location, the Kerberos client library did not recognize some of the result codes which could be returned by the resolver libraries. Consequence: Instead of treating some non-fatal result codes as non-fatal errors, in many instances the library would treat them as fatal errors, and fail to locate any servers. Fix: Patches were added to help ensure that these specific result codes could be interpreted properly. Result: These errors no longer occur.
Story Points: ---
Clone Of:
: 1109102 (view as bug list) Environment:
Last Closed: 2014-10-14 08:10:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1061410, 1109102    
Attachments:
Description Flags
Preprocessed section of sendto_kdc.c, containing translate_ai_error(), without defining _GNU_SOURCE
none
Preprocessed section of sendto_kdc.c, containing translate_ai_error(), with _GNU_SOURCE defined
none
Patch file for krb5.spec, with -D_GNU_SOURCE added to CFLAGS none

Description Ivo van Geel 2014-01-30 13:47:53 UTC
Created attachment 857485 [details]
Preprocessed section of sendto_kdc.c, containing translate_ai_error(), without defining _GNU_SOURCE

Description of problem:

I have been trying to connect Samba to an Active Directory forest(A), which has a trust to another forest(B).
The connection to AD forest A works, but the connection to forest B, did not work.

Forest B consist of a domain with 3 AD servers present in SRV records in DNS (external DNS provided by Infoblox).
However one of these three AD server does not exist anymore.

After debugging the connection between Samba and the cross-forest AD trust, I found the reason for the connection to fail in the Kerberos library (krb5-libs).

The AD servers are contacted through the function k5_sendto(), which uses the function resolve_server() which tries to find an IP address for each AD (Kerberos) server.
resolve_server() calls the system function getaddrinfo(), which returns EAI_NODATA (-5) for the nonexisting AD server on my Linux server.

To determine whether getaddrinfo() encounters a critical error, the function translate_ai_error() is called, which contains a case statement for each possible return code for getaddrinfo(). When a critical error is found, translate_ai_error returns a system error code, not equal to 0. Since the return code EAI_NODATA is not a critical error, it should return 0, so k5_sendto() can try to contact the next AD/Kerberos server found.

However, since the Kerberos library (or sendto_kdc.c) is compiled without _GNU_SOURCE being defined, EAI_NODATA is not defined, causing the case statement in translate_ai_error() to hit the default: option and return EINVAL, instead of 0. This in turn, causes k5_sendto() to stop trying to contact any other AD/Kerberos server and fail with an error code.

When compiling sendto_kdc.c with _GNU_SOURCE being defined, EAI_NODATA is defined and the case statement correctly returns 0, which will let the k5_sendto() continue to try the next AD/Kerberos server, which can be contacted succesfully. Samba is then able to contact both AD forests and everything works.


Version-Release number of selected component (if applicable):

CentOS 6.5:
krb5-libs-1.10.3-10.el6_4.6

How reproducible:

Create an AD forest with a number of AD domain servers, where 1 (or more) of the AD servers does not have an A record in DNS, but does have the correct SRV records present in DNS. Then connect Samba to this domain (security=ads) (a cross-forest trust to another domain is probably not necessary) and try to list users from the domain using:

$ id 'DOMAIN\user'

Steps to Reproduce:
1. Set up an AD domain with a number of AD domain servers
2. Install a RedHat Linux server, with Samba
2. Connect Samba to this domain (security=ads)
3. Configure the Linux server to get user information from winbind and authenticate through Kerberos (using authconfig-tui)
4. Remove the DNS A record for one or more of the AD domain servers
5. Try to fetch the information for this user, using: id 'DOMAIN\user'

Actual results:

id: DOMAIN\user: No such user

Expected results:

uid=10000000(DOMAIN\user) gid=10000000(DOMAIN\domain users) groups=10000000(DOMAIN\domain users)

Additional info:

I added 2 preprocessed listings of the translate_ai_error() function, 1 with _GNU_SOURCE being defined and the other file not having _GNU_SOURCE defined.

I also attached a diff file for the krb5.spec file, which results in a Kerberos library which I tested and works in my environment.

Comment 1 Ivo van Geel 2014-01-30 13:48:26 UTC
Created attachment 857486 [details]
Preprocessed section of sendto_kdc.c, containing translate_ai_error(), with _GNU_SOURCE defined

Comment 2 Ivo van Geel 2014-01-30 13:49:21 UTC
Created attachment 857487 [details]
Patch file for krb5.spec, with -D_GNU_SOURCE added to CFLAGS

Comment 4 Patrik Kis 2014-03-27 10:56:28 UTC
I'm trying to figure out how to reproduce this issue in a simpler way than having several AD servers. And if my understanding is correct, that this is all about lack of DNS records, I think it could work just to set up a local DNS with certain records and just one AD server (or maybe even without an AD server). So I'm wondering if it would be possible to capture the communication between the host and server while the issue happen so I could reproduce the issue easily as I do not have the possibility to use more AD servers.
The option would be if you could test the fix, once it is ready.

Comment 6 Ivo van Geel 2014-03-28 09:53:11 UTC
To properly test a fix, you will need to have 2 AD servers (a cross-domain trust is not necessary). The first AD server, returned from DNS (SRV entry), should not be resolvable (no A record), the second one should work.

The kerberos library should fail in this case, for not trying other AD servers after having tried the first one.

The patched kerberos library will continue trying other AD servers and try to contact the second one, which will return a result.

I will set up a testing environment, where I am able to test the fix. I'll send you the details on how to set it up properly.

Comment 7 Karel Srot 2014-03-28 09:56:36 UTC
Thank you very much.

Comment 11 Ivo van Geel 2014-08-07 08:51:01 UTC
First off: Sorry for the late response, I have been busy on other projects.

Last few weeks, I have been trying to replicate the bug on a test environment, but have so far been unsuccesful.

I found the bug in our production environment, containing an Infoblox DNS server. This DNS server responds differently from standard Windows or Bind DNS servers. Most DNS servers reply with "No such name"; error code -2 (EAI_NONAME).

The Infoblox DNS server in production however, responds with "No error"; error code -5 (EAI_NODATA). A newly installed Infoblox DNS server (which uses Bind) will not respond with EAI_NODATA, but with EAI_NONAME, which does not trigger the bug I reported here.

I am now trying to configure Bind to respond with "No error", but I don't know how long this will take me.

Comment 12 Patrik Kis 2014-08-08 11:02:15 UTC
(In reply to Ivo van Geel from comment #11)
> First off: Sorry for the late response, I have been busy on other projects.
> 
> Last few weeks, I have been trying to replicate the bug on a test
> environment, but have so far been unsuccesful.
> 
> I found the bug in our production environment, containing an Infoblox DNS
> server. This DNS server responds differently from standard Windows or Bind
> DNS servers. Most DNS servers reply with "No such name"; error code -2
> (EAI_NONAME).
> 
> The Infoblox DNS server in production however, responds with "No error";
> error code -5 (EAI_NODATA). A newly installed Infoblox DNS server (which
> uses Bind) will not respond with EAI_NODATA, but with EAI_NONAME, which does
> not trigger the bug I reported here.
> 
> I am now trying to configure Bind to respond with "No error", but I don't
> know how long this will take me.

Thank you for trying to reproduce the issue.
We were able to verify the fix by simulating various return values of getaddrinfo() and check that EAI_NODATA is not considered as a critical error by kerberos. This is a kind of simplification of the test scenario, but should be sufficient to verify the bug fix.
Therefore, we do not really need any more to test the full scenario, although it would be beneficial, especially for you to see if the problem was solved in your environment. On the other hand, if the problem disappeared on your site, I'm not sure if it is worth of investing long time into searching for alternate scenarios.

Comment 13 errata-xmlrpc 2014-10-14 08:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1389.html