Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause: When attempting to locate Kerberos servers using DNS service location, the Kerberos client library did not recognize some of the result codes which could be returned by the resolver libraries.
Consequence: Instead of treating some non-fatal result codes as non-fatal errors, in many instances the library would treat them as fatal errors, and fail to locate any servers.
Fix: Patches were added to help ensure that these specific result codes could be interpreted properly.
Result: These errors no longer occur.
Created attachment 857485[details]
Preprocessed section of sendto_kdc.c, containing translate_ai_error(), without defining _GNU_SOURCE
Description of problem:
I have been trying to connect Samba to an Active Directory forest(A), which has a trust to another forest(B).
The connection to AD forest A works, but the connection to forest B, did not work.
Forest B consist of a domain with 3 AD servers present in SRV records in DNS (external DNS provided by Infoblox).
However one of these three AD server does not exist anymore.
After debugging the connection between Samba and the cross-forest AD trust, I found the reason for the connection to fail in the Kerberos library (krb5-libs).
The AD servers are contacted through the function k5_sendto(), which uses the function resolve_server() which tries to find an IP address for each AD (Kerberos) server.
resolve_server() calls the system function getaddrinfo(), which returns EAI_NODATA (-5) for the nonexisting AD server on my Linux server.
To determine whether getaddrinfo() encounters a critical error, the function translate_ai_error() is called, which contains a case statement for each possible return code for getaddrinfo(). When a critical error is found, translate_ai_error returns a system error code, not equal to 0. Since the return code EAI_NODATA is not a critical error, it should return 0, so k5_sendto() can try to contact the next AD/Kerberos server found.
However, since the Kerberos library (or sendto_kdc.c) is compiled without _GNU_SOURCE being defined, EAI_NODATA is not defined, causing the case statement in translate_ai_error() to hit the default: option and return EINVAL, instead of 0. This in turn, causes k5_sendto() to stop trying to contact any other AD/Kerberos server and fail with an error code.
When compiling sendto_kdc.c with _GNU_SOURCE being defined, EAI_NODATA is defined and the case statement correctly returns 0, which will let the k5_sendto() continue to try the next AD/Kerberos server, which can be contacted succesfully. Samba is then able to contact both AD forests and everything works.
Version-Release number of selected component (if applicable):
CentOS 6.5:
krb5-libs-1.10.3-10.el6_4.6
How reproducible:
Create an AD forest with a number of AD domain servers, where 1 (or more) of the AD servers does not have an A record in DNS, but does have the correct SRV records present in DNS. Then connect Samba to this domain (security=ads) (a cross-forest trust to another domain is probably not necessary) and try to list users from the domain using:
$ id 'DOMAIN\user'
Steps to Reproduce:
1. Set up an AD domain with a number of AD domain servers
2. Install a RedHat Linux server, with Samba
2. Connect Samba to this domain (security=ads)
3. Configure the Linux server to get user information from winbind and authenticate through Kerberos (using authconfig-tui)
4. Remove the DNS A record for one or more of the AD domain servers
5. Try to fetch the information for this user, using: id 'DOMAIN\user'
Actual results:
id: DOMAIN\user: No such user
Expected results:
uid=10000000(DOMAIN\user) gid=10000000(DOMAIN\domain users) groups=10000000(DOMAIN\domain users)
Additional info:
I added 2 preprocessed listings of the translate_ai_error() function, 1 with _GNU_SOURCE being defined and the other file not having _GNU_SOURCE defined.
I also attached a diff file for the krb5.spec file, which results in a Kerberos library which I tested and works in my environment.
I'm trying to figure out how to reproduce this issue in a simpler way than having several AD servers. And if my understanding is correct, that this is all about lack of DNS records, I think it could work just to set up a local DNS with certain records and just one AD server (or maybe even without an AD server). So I'm wondering if it would be possible to capture the communication between the host and server while the issue happen so I could reproduce the issue easily as I do not have the possibility to use more AD servers.
The option would be if you could test the fix, once it is ready.
To properly test a fix, you will need to have 2 AD servers (a cross-domain trust is not necessary). The first AD server, returned from DNS (SRV entry), should not be resolvable (no A record), the second one should work.
The kerberos library should fail in this case, for not trying other AD servers after having tried the first one.
The patched kerberos library will continue trying other AD servers and try to contact the second one, which will return a result.
I will set up a testing environment, where I am able to test the fix. I'll send you the details on how to set it up properly.
First off: Sorry for the late response, I have been busy on other projects.
Last few weeks, I have been trying to replicate the bug on a test environment, but have so far been unsuccesful.
I found the bug in our production environment, containing an Infoblox DNS server. This DNS server responds differently from standard Windows or Bind DNS servers. Most DNS servers reply with "No such name"; error code -2 (EAI_NONAME).
The Infoblox DNS server in production however, responds with "No error"; error code -5 (EAI_NODATA). A newly installed Infoblox DNS server (which uses Bind) will not respond with EAI_NODATA, but with EAI_NONAME, which does not trigger the bug I reported here.
I am now trying to configure Bind to respond with "No error", but I don't know how long this will take me.
(In reply to Ivo van Geel from comment #11)
> First off: Sorry for the late response, I have been busy on other projects.
>
> Last few weeks, I have been trying to replicate the bug on a test
> environment, but have so far been unsuccesful.
>
> I found the bug in our production environment, containing an Infoblox DNS
> server. This DNS server responds differently from standard Windows or Bind
> DNS servers. Most DNS servers reply with "No such name"; error code -2
> (EAI_NONAME).
>
> The Infoblox DNS server in production however, responds with "No error";
> error code -5 (EAI_NODATA). A newly installed Infoblox DNS server (which
> uses Bind) will not respond with EAI_NODATA, but with EAI_NONAME, which does
> not trigger the bug I reported here.
>
> I am now trying to configure Bind to respond with "No error", but I don't
> know how long this will take me.
Thank you for trying to reproduce the issue.
We were able to verify the fix by simulating various return values of getaddrinfo() and check that EAI_NODATA is not considered as a critical error by kerberos. This is a kind of simplification of the test scenario, but should be sufficient to verify the bug fix.
Therefore, we do not really need any more to test the full scenario, although it would be beneficial, especially for you to see if the problem was solved in your environment. On the other hand, if the problem disappeared on your site, I'm not sure if it is worth of investing long time into searching for alternate scenarios.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHSA-2014-1389.html
Created attachment 857485 [details] Preprocessed section of sendto_kdc.c, containing translate_ai_error(), without defining _GNU_SOURCE Description of problem: I have been trying to connect Samba to an Active Directory forest(A), which has a trust to another forest(B). The connection to AD forest A works, but the connection to forest B, did not work. Forest B consist of a domain with 3 AD servers present in SRV records in DNS (external DNS provided by Infoblox). However one of these three AD server does not exist anymore. After debugging the connection between Samba and the cross-forest AD trust, I found the reason for the connection to fail in the Kerberos library (krb5-libs). The AD servers are contacted through the function k5_sendto(), which uses the function resolve_server() which tries to find an IP address for each AD (Kerberos) server. resolve_server() calls the system function getaddrinfo(), which returns EAI_NODATA (-5) for the nonexisting AD server on my Linux server. To determine whether getaddrinfo() encounters a critical error, the function translate_ai_error() is called, which contains a case statement for each possible return code for getaddrinfo(). When a critical error is found, translate_ai_error returns a system error code, not equal to 0. Since the return code EAI_NODATA is not a critical error, it should return 0, so k5_sendto() can try to contact the next AD/Kerberos server found. However, since the Kerberos library (or sendto_kdc.c) is compiled without _GNU_SOURCE being defined, EAI_NODATA is not defined, causing the case statement in translate_ai_error() to hit the default: option and return EINVAL, instead of 0. This in turn, causes k5_sendto() to stop trying to contact any other AD/Kerberos server and fail with an error code. When compiling sendto_kdc.c with _GNU_SOURCE being defined, EAI_NODATA is defined and the case statement correctly returns 0, which will let the k5_sendto() continue to try the next AD/Kerberos server, which can be contacted succesfully. Samba is then able to contact both AD forests and everything works. Version-Release number of selected component (if applicable): CentOS 6.5: krb5-libs-1.10.3-10.el6_4.6 How reproducible: Create an AD forest with a number of AD domain servers, where 1 (or more) of the AD servers does not have an A record in DNS, but does have the correct SRV records present in DNS. Then connect Samba to this domain (security=ads) (a cross-forest trust to another domain is probably not necessary) and try to list users from the domain using: $ id 'DOMAIN\user' Steps to Reproduce: 1. Set up an AD domain with a number of AD domain servers 2. Install a RedHat Linux server, with Samba 2. Connect Samba to this domain (security=ads) 3. Configure the Linux server to get user information from winbind and authenticate through Kerberos (using authconfig-tui) 4. Remove the DNS A record for one or more of the AD domain servers 5. Try to fetch the information for this user, using: id 'DOMAIN\user' Actual results: id: DOMAIN\user: No such user Expected results: uid=10000000(DOMAIN\user) gid=10000000(DOMAIN\domain users) groups=10000000(DOMAIN\domain users) Additional info: I added 2 preprocessed listings of the translate_ai_error() function, 1 with _GNU_SOURCE being defined and the other file not having _GNU_SOURCE defined. I also attached a diff file for the krb5.spec file, which results in a Kerberos library which I tested and works in my environment.