Description of problem: Whilst trying to improve the caching of some DNS A records, we found that nscd doesn't appear to cache DNS responses containing multiple A records. I eventually found a reference in the source code to this, but this was not made explicit in the documentation as far as I could see. Being curious, I wondered if the same lack of caching applied when a DNS response included one A record and potentially multiple AAAA records - which might be a more likely scenario in a future dual-stack universe. After running several tests, I came to the conclusion that it all depends how you ask the question. If one queries for both A and AAAA records at the same time using getaddrinfo(AF_UNSPEC), then all the answers are apparently always cached, even if multiple A or AAAA records exist. If one queries for JUST A records using getaddrinfo(AF_INET), or JUST AAAA records using getaddrinfo(AF_INET6), then the answers are not cached if multiple records are returned. In the source code, I think I can see that nscd's getaddrinfo() hook actually calls gethostbyname() or gethostbyname2() instead when called with AF_INET or AF_INET6 as the preferred address family, and the gethostbyname/gethostbyname2 code tree employs a different caching policy. In the ideal world, I feel, nscd should be able to cache any A and AAAA records irrespective of how the question is made, AND it ought to emulate/preserve the DNS Round Robin feature so that a balancing functionality can be maintained. If DNSRR capability is difficult to preserve, a sysadmin might prefer to use lwresd, but I don't believe one can hook into the lwres libraries via nsswitch.conf, which is an awful shame. Can this be fixed? If a proper cache-always+DNSRR fix is too tricky for the present, it might be more consistent in the meantime to get nscd to NOT cache a getaddrinfo(AF_UNSPEC) response if there are either multiple A records OR multiple AAAA records, ( but DO cache if there is one A record and one AAAA record ). Version-Release number of selected component (if applicable): nscd-2.5-42.el5_4.3 How reproducible: Use ssh against a DNS name which has multiple A and AAAA records, but where none of the hosts referenced are running sshd. At the same time, monitor DNS traffic from the client host, with nscd running. "ssh hostname" will use getaddrinfo(AF_UNSPEC) to ask for all A and AAAA records, "ssh -4 hostname" will use getaddrinfo(AF_INET) to ask for just A records, and "ssh -6 hostname" will use getaddrinfo(AF_INET6) to ask for just AAAA records. Probing nscd/DNS with the different ssh requests will show how some DNS responses are cached, and some not. The additional test is to use the same "ssh" calls against hostnames corresponding to one A record and one AAAA record. This should result in the answers always being cached.
We're not going to try to address this in Red Hat Enterprise Linux 5; however, Red Hat Enterprise Linux 6 still needs to be evaluated to see if it suffers from this problem.
Present in EL6 (.4) as far as I can see. Fix coming in 6.5?
Code comment discussing this "special case": http://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/hstcache.c;h=0d421fcbbb5e8823b660973e08b73e15e0dac3c8;hb=HEAD#l240 This is hardly a special case anymore... Upstream bug filed: http://sourceware.org/bugzilla/show_bug.cgi?id=15862
(In reply to Mikael Fridh from comment #5) > Present in EL6 (.4) as far as I can see. > > Fix coming in 6.5? No fix is planned for rhel-6.5. I will keep the status updated on this ticket as we make progress.
While we have made progress on the upstream issue, and Alex has done some good work there, we still need to run this kind of change through significant upstream testing. Putting it into RHEL 6 is not a good plan in my opinion. Therefore I'm moving this bug directly to RHEL 7.2 for analysis and possible inclusion there.
*** Bug 1568149 has been marked as a duplicate of this bug. ***
Is there news for RHEL 7.7 ? 7.2 is long past ;-)
Red Hat Enterprise Linux 7 is entering Maintenance Phase Support 1 this year and only Urgent priority bug fixes will be considered. This issue is not urgent and we are moving this to Red Hat Enterprise Linux 8 for further consideration.
This needs to be fixed upstream first. Our long-standing recommendation is to use a local caching resolver on the host in addition to nscd, and not cache hosts in nscd at all.