Description of problem: This isn't really a bug with sssd. Just an implication of using sssd (or a limitation if you like) sssd is not supposed to be used with nscd (generates warning in sssd logs). If using Network Manager (especially in a corporate setting with Spanning Tree and not portfast), it takes a while for the interface to come up. In the meantime services will start. These services will be reading an incorrect or empty resolv.conf (which they never seem to reread). Formally (on older Fedora and RHEL) starting nscd would cause the applications resolving to now be correct (as it sits between the app and the base libc resolver functions). Any network changes can be handled by nscd dynamically for all running programs. The services that this really breaks for us here are the nfs ones: Jul 9 15:21:33 localhost rpc.statd[1067]: No canonical hostname found for 10.110.45.10 Jul 9 15:21:33 localhost rpc.statd[1067]: STAT_FAIL to navar for SM_MON of 10.110.45.10 Jul 9 15:21:33 localhost kernel: lockd: cannot monitor tay Jul 9 15:21:33 localhost rpc.statd[1067]: No canonical hostname found for 10.110.45.10 Jul 9 15:21:33 localhost rpc.statd[1067]: STAT_FAIL to navar for SM_MON of 10.110.45.10 etc So NFS locking is broken on this machine (unless the service is manually started). Workaround I'm trying is to only enable caching for hosts in /etc/nscd.conf. I guess this may fix when Ticket #357 "SSSD should replace NSCD" in the upstream: https://fedorahosted.org/sssd/ticket/357
I am not sure that it will eliminate all the cases when the nscd would have to be considered. We will still look into replacing it in a long run as mentioned in https://fedorahosted.org/sssd/ticket/357 . Thanks for the workaround.
Thanks. I look forward to the day when we have no more nscd
(In reply to comment #2) > Thanks. > > I look forward to the day when we have no more nscd External contributions are always welcome :-)
This is not a bug in SSSD. If there are services that are not rereading /etc/resolv.conf when it changes, this is a bug in that service, or libc. While I personally believe that this is a bug in glibc, the official word on this is that individual applications and services should be required to call the glibc res_init() function when the /etc/resolv.conf changes, in order to reread it. The fact that nscd mitigates this problem is actually not reliable. NSCD is merely holding on to the cache it had when the system was shut down, but there is in fact no guarantee that this is the correct resolv.conf anyway (if, for example, a laptop has been moved onto a new network).
I agree, however it would seem reasonable for something to sit between the applications and the bare resolver because: A/ It's the only thing that would need informed when the network setup changes, getting all the apps to monitor resolv.conf seems unlikely and unscalable. B/ Different processes are likely to be resolving the same things hostnames (e.g the same servers). C/ The standard resolver is very poor at handling unresponsive DNS servers, when the first goes offline or when say a laptop is unplugged from the network. From these nscd is the only option to get close to this just now ( "A" especially) and makes RHEL/Fedora usable on a corporate laptop (getting network manager to get resolv.conf to be reread by nscd and flush the hosts cache).
There are plans for SSSD to eventually support the host map in nsswitch (https://fedorahosted.org/sssd/ticket/359) but it's not high on our priority list. Another alternative you might consider would be installing dnsmasq on these systems.