Bug 576801

Summary: glibc: Inconsistency of IPv4/IPv6 caching with getaddrinfo / nscd
Product: Red Hat Enterprise Linux 8 Reporter: Ted Rule <ejtr>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED UPSTREAM QA Contact: qe-baseos-tools-bugs
Severity: low Docs Contact:
Priority: low    
Version: 8.2CC: anrussel, ashankar, codonell, cww, dj, dkochuka, frimik, fweimer, jan.iven, jaroslaw.polok, law, mnewsome, pandrade, pfrankli, sbroz, thomas.oulevey
Target Milestone: rc   
Target Release: 8.2   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-02 15:42:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1594286    

Description Ted Rule 2010-03-25 09:14:37 UTC
Description of problem:

Whilst trying to improve the caching of some DNS A records, we found that nscd doesn't appear to cache DNS responses containing multiple A records. I eventually found a reference in the source code to this, but this was not made explicit in the documentation as far as I could see.

Being curious, I wondered if the same lack of caching applied when a DNS response included one A record and potentially multiple AAAA records - which might be a more likely scenario in a future dual-stack universe.

After running several tests, I came to the conclusion that it all depends how you ask the question.

If one queries for both A and AAAA records at the same time using getaddrinfo(AF_UNSPEC), then all the answers are apparently always cached, even if multiple A or AAAA records exist.

If one queries for JUST A records using getaddrinfo(AF_INET), or JUST AAAA records using getaddrinfo(AF_INET6), then the answers are not cached if multiple records are returned.

In the source code, I think I can see that nscd's getaddrinfo() hook actually calls gethostbyname() or gethostbyname2() instead when called with AF_INET or AF_INET6 as the preferred address family, and the gethostbyname/gethostbyname2 code tree employs a different caching policy.

In the ideal world, I feel, nscd should be able to cache any A and AAAA records irrespective of how the question is made, AND it ought to emulate/preserve the DNS Round Robin feature so that a balancing functionality can be maintained.

If DNSRR capability is difficult to preserve, a sysadmin might prefer to use lwresd, but I don't believe one can hook into the lwres libraries via nsswitch.conf, which is an awful shame. Can this be fixed?

If a proper cache-always+DNSRR fix is too tricky for the present, it might be more consistent in the meantime to get nscd to NOT cache a getaddrinfo(AF_UNSPEC) response if there are either multiple A records OR multiple AAAA records, ( but DO cache if there is one A record and one AAAA record ).


Version-Release number of selected component (if applicable):

nscd-2.5-42.el5_4.3


How reproducible:

Use ssh against a DNS name which has multiple A and AAAA records, but where none of the hosts referenced are running sshd.

At the same time, monitor DNS traffic from the client host, with nscd running.

"ssh hostname" will use getaddrinfo(AF_UNSPEC) to ask for all A and AAAA records, "ssh -4 hostname" will use getaddrinfo(AF_INET) to ask for just A records, and "ssh -6 hostname" will use getaddrinfo(AF_INET6) to ask for just AAAA records.

Probing nscd/DNS with the different ssh requests will show how some DNS responses are cached, and some not.

The additional test is to use the same "ssh" calls against hostnames corresponding to one A record and one AAAA record. This should result in the answers always being cached.

Comment 3 Jeff Law 2013-03-27 18:59:52 UTC
We're not going to try to address this in Red Hat Enterprise Linux 5; however, Red Hat Enterprise Linux 6 still needs to be evaluated to see if it suffers from this problem.

Comment 5 Mikael Fridh 2013-08-14 11:20:47 UTC
Present in EL6 (.4) as far as I can see. 

Fix coming in 6.5?

Comment 6 Mikael Fridh 2013-08-20 14:55:39 UTC
Code comment discussing this "special case":

http://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/hstcache.c;h=0d421fcbbb5e8823b660973e08b73e15e0dac3c8;hb=HEAD#l240

This is hardly a special case anymore... Upstream bug filed:

http://sourceware.org/bugzilla/show_bug.cgi?id=15862

Comment 7 Carlos O'Donell 2013-08-23 18:20:08 UTC
(In reply to Mikael Fridh from comment #5)
> Present in EL6 (.4) as far as I can see. 
> 
> Fix coming in 6.5?

No fix is planned for rhel-6.5.

I will keep the status updated on this ticket as we make progress.

Comment 11 Carlos O'Donell 2015-01-07 03:08:35 UTC
While we have made progress on the upstream issue, and Alex has done some good work there, we still need to run this kind of change through significant upstream testing. Putting it into RHEL 6 is not a good plan in my opinion. Therefore I'm moving this bug directly to RHEL 7.2 for analysis and possible inclusion there.

Comment 19 Florian Weimer 2018-04-24 15:16:23 UTC
*** Bug 1568149 has been marked as a duplicate of this bug. ***

Comment 22 Thomas Oulevey 2019-06-17 11:04:29 UTC
Is there news for RHEL 7.7 ? 7.2 is long past ;-)

Comment 23 Carlos O'Donell 2019-06-25 16:50:57 UTC
Red Hat Enterprise Linux 7 is entering Maintenance Phase Support 1 this year and only Urgent priority bug fixes will be considered. This issue is not urgent and we are moving this to Red Hat Enterprise Linux 8 for further consideration.

Comment 24 Florian Weimer 2019-07-02 15:42:27 UTC
This needs to be fixed upstream first.

Our long-standing recommendation is to use a local caching resolver on the host in addition to nscd, and not cache hosts in nscd at all.