Bug 613085

Summary: Having no nscd can break early starting services
Product: [Fedora] Fedora Reporter: Colin.Simpson
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: dpal, jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 613096 (view as bug list) Environment:
Last Closed: 2010-11-09 16:49:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 613096    

Description Colin.Simpson 2010-07-09 16:31:43 UTC
Description of problem:

This isn't really a bug with sssd. Just an implication of using sssd (or a limitation if you like) 

sssd is not supposed to be used with nscd (generates warning in sssd logs).

If using Network Manager (especially in a corporate setting with Spanning Tree and not portfast), it takes a while for the interface to come up. In the meantime services will start. These services will be reading an incorrect or empty resolv.conf (which they never seem to reread). 

Formally (on older Fedora and RHEL) starting nscd would cause the applications resolving to now be correct (as it sits between the app and the base libc resolver functions). Any network changes can be handled by nscd dynamically for all running programs.

The services that this really breaks for us here are the nfs ones:

Jul  9 15:21:33 localhost rpc.statd[1067]: No canonical hostname found for 10.110.45.10
Jul  9 15:21:33 localhost rpc.statd[1067]: STAT_FAIL to navar for SM_MON of 10.110.45.10
Jul  9 15:21:33 localhost kernel: lockd: cannot monitor tay
Jul  9 15:21:33 localhost rpc.statd[1067]: No canonical hostname found for 10.110.45.10
Jul  9 15:21:33 localhost rpc.statd[1067]: STAT_FAIL to navar for SM_MON of 10.110.45.10
etc

So NFS locking is broken on this machine (unless the service is manually started).

Workaround I'm trying is to only enable caching for hosts in /etc/nscd.conf.

I guess this may fix when Ticket #357 "SSSD should replace NSCD" in the upstream:

https://fedorahosted.org/sssd/ticket/357

Comment 1 Dmitri Pal 2010-07-09 17:06:28 UTC
I am not sure that it will eliminate all the cases when the nscd would have to be considered. We will still look into replacing it in a long run as mentioned in https://fedorahosted.org/sssd/ticket/357 . 

Thanks for the workaround.

Comment 2 Colin.Simpson 2010-07-09 17:25:11 UTC
Thanks. 

I look forward to the day when we have no more nscd

Comment 3 Dmitri Pal 2010-07-09 17:33:39 UTC
(In reply to comment #2)
> Thanks. 
> 
> I look forward to the day when we have no more nscd    

External contributions are always welcome :-)

Comment 4 Stephen Gallagher 2010-11-09 16:49:45 UTC
This is not a bug in SSSD. If there are services that are not rereading /etc/resolv.conf when it changes, this is a bug in that service, or libc.

While I personally believe that this is a bug in glibc, the official word on this is that individual applications and services should be required to call the glibc res_init() function when the /etc/resolv.conf changes, in order to reread it.

The fact that nscd mitigates this problem is actually not reliable. NSCD is merely holding on to the cache it had when the system was shut down, but there is in fact no guarantee that this is the correct resolv.conf anyway (if, for example, a laptop has been moved onto a new network).

Comment 5 Colin.Simpson 2010-11-11 14:57:20 UTC
I agree, however it would seem reasonable for something to sit between the applications and the bare resolver because:

A/ It's the only thing that would need informed when the network setup changes, getting all the apps to monitor resolv.conf seems unlikely and unscalable.

B/ Different processes are likely to be resolving the same things hostnames (e.g the same servers).

C/ The standard resolver is very poor at handling unresponsive DNS servers, when the first goes offline or when say a laptop is unplugged from the network.

From these nscd is the only option to get close to this just now ( "A" especially) and makes RHEL/Fedora usable on a corporate laptop (getting network manager to get resolv.conf to be reread by nscd and flush the hosts cache).

Comment 6 Stephen Gallagher 2010-11-11 15:02:02 UTC
There are plans for SSSD to eventually support the host map in nsswitch (https://fedorahosted.org/sssd/ticket/359) but it's not high on our priority list.

Another alternative you might consider would be installing dnsmasq on these systems.