Bug 1101782

Summary: autofs configured with sssd is not finding any maps
Product: Red Hat Enterprise Linux 7 Reporter: Jacob Hunt <jhunt>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: xiaoli feng <xifeng>
Severity: medium Docs Contact: Milan Navratil <mnavrati>
Priority: medium    
Version: 7.0CC: arusso, dpal, dwysocha, eguan, grajaiya, ikent, jgalipea, jhrozek, jhunt, lslebodn, miturria, mkosek, pbrezina, rvdwees, sbose, yoguma
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: sync-to-jira
Fixed In Version: autofs-5.0.7-63.el7 Doc Type: Bug Fix
Doc Text:
Setting the retry timeout can now prevent *autofs* from starting without mounts from SSSD When starting the *autofs* utility, the `sss` map source was previously sometimes not ready to provide map information, but `sss` did not return an appropriate error to distinguish between the `map does not exist` and `not available` condition. As a consequence, automounting did not work correctly, and *autofs* started without mounts from SSSD. To fix this bug, *autofs* retries asking SSSD for the master map when the `map does not exist` error occurs for a configurable amount of time. Now, you can set the retry timeout to a suitable value so that the master map is read and *autofs* starts as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 12:43:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1113639, 1382602, 1892184    
Bug Blocks: 1044717, 1113520, 1203710, 1295577, 1313485, 1385242    
Attachments:
Description Flags
Patch - work around sss startup delay
none
Patch - add sss master map wait config option
none
Patch - fix work around sss startup delay none

Description Jacob Hunt 2014-05-27 22:42:40 UTC
Description of problem:

autofs is configured with sssd. We are having trouble with autofs not mounting filesystems after the first reboot when the system is built. The automounter is not finding any maps from sssd.

Version-Release number of selected component (if applicable):

sssd-1.11.2-65.el7.x86_64
sssd-ad-1.11.2-65.el7.x86_64
sssd-client-1.11.2-65.el7.x86_64
sssd-common-1.11.2-65.el7.x86_64
sssd-common-pac-1.11.2-65.el7.x86_64
sssd-ipa-1.11.2-65.el7.x86_64
sssd-krb5-1.11.2-65.el7.x86_64
sssd-krb5-common-1.11.2-65.el7.x86_64
sssd-ldap-1.11.2-65.el7.x86_64
sssd-proxy-1.11.2-65.el7.x86_64


Additional info:

It appears to that the autofs and sssd services are coming up before NetworkManager completes bringing up the interface. If you look at the messages file you will see that link doesn't come up until about 10-15 seconds after sssd and autofs starts. The sssd services seems to retry, but autofs doesn't.

Switching the line in /usr/lib/systemd/system/sssd.service to:

After=syslog.target network.target NetworkManager-wait-online.service

Seems to have resolved the issue.

Comment 1 Jakub Hrozek 2014-05-29 10:04:57 UTC
From the logs it appears that even if SSSD is able to detect configuration has changed with its netlink integration, automounter only asks for the maps once after startup.

Startup is only part of the problem, the same would happen for a machine that would change networks, flaky connection etc.

Ian, what about a dispatcher script for NM that would SIGHUP automounter when networking conditions change?

Comment 2 Ian Kent 2014-05-30 03:13:34 UTC
(In reply to Jakub Hrozek from comment #1)
> From the logs it appears that even if SSSD is able to detect configuration
> has changed with its netlink integration, automounter only asks for the maps
> once after startup.
> 
> Startup is only part of the problem, the same would happen for a machine
> that would change networks, flaky connection etc.
> 
> Ian, what about a dispatcher script for NM that would SIGHUP automounter
> when networking conditions change?

Maybe but ...

At one point I was working on some patches to make autofs wait
until the master map was available at start up. Map re-loads
are supposed to ignore fails and use the existing map.

I shelved that because I had what I thought worked but the user
claimed it didn't and I was unable to get sufficient info to
continue.

What are your thoughts on me continuing with that?

Ian

Comment 3 Jakub Hrozek 2014-06-04 07:42:36 UTC
(In reply to Ian Kent from comment #2)
> (In reply to Jakub Hrozek from comment #1)
> > From the logs it appears that even if SSSD is able to detect configuration
> > has changed with its netlink integration, automounter only asks for the maps
> > once after startup.
> > 
> > Startup is only part of the problem, the same would happen for a machine
> > that would change networks, flaky connection etc.
> > 
> > Ian, what about a dispatcher script for NM that would SIGHUP automounter
> > when networking conditions change?
> 
> Maybe but ...
> 
> At one point I was working on some patches to make autofs wait
> until the master map was available at start up. Map re-loads
> are supposed to ignore fails and use the existing map.
> 
> I shelved that because I had what I thought worked but the user
> claimed it didn't and I was unable to get sufficient info to
> continue.
> 
> What are your thoughts on me continuing with that?
> 
> Ian

Interesting idea!

Would autofs periodically poll until it gets some form of authoritative response?

How would you deal with the situation where SSSD is configured to serve automounter maps but not maps are actually present on the LDAP side? Could we simply differentiate between 'search completed but 0 results' and 'could not complete search' ?

Please note I will be away until Jun 15th, so my answers will be delayed.

Comment 4 Ian Kent 2014-06-05 09:52:57 UTC
(In reply to Jakub Hrozek from comment #3)
> (In reply to Ian Kent from comment #2)
> > (In reply to Jakub Hrozek from comment #1)
> > > From the logs it appears that even if SSSD is able to detect configuration
> > > has changed with its netlink integration, automounter only asks for the maps
> > > once after startup.
> > > 
> > > Startup is only part of the problem, the same would happen for a machine
> > > that would change networks, flaky connection etc.
> > > 
> > > Ian, what about a dispatcher script for NM that would SIGHUP automounter
> > > when networking conditions change?
> > 
> > Maybe but ...
> > 
> > At one point I was working on some patches to make autofs wait
> > until the master map was available at start up. Map re-loads
> > are supposed to ignore fails and use the existing map.
> > 
> > I shelved that because I had what I thought worked but the user
> > claimed it didn't and I was unable to get sufficient info to
> > continue.
> > 
> > What are your thoughts on me continuing with that?
> > 
> > Ian
> 
> Interesting idea!
> 
> Would autofs periodically poll until it gets some form of authoritative
> response?

At startup the master map is a must have so waiting until it
gets gets one is sensible enough.

> 
> How would you deal with the situation where SSSD is configured to serve
> automounter maps but not maps are actually present on the LDAP side? Could
> we simply differentiate between 'search completed but 0 results' and 'could
> not complete search' ?

Again it's only at startup and it continues to wait if the
connection fails. That is different from 'search completed
but 0 results' so yes, that's what I was trying to do.

I hope I still have the patches ....

Comment 5 Jakub Hrozek 2014-06-18 09:45:09 UTC
Sorry for the late response, I was on vacation for the last 10 days.

Is it OK to reassign this bugzilla to autofs, then?

Comment 6 Jakub Hrozek 2014-06-25 12:16:12 UTC
Based on comment #4, I'm moving the component to autofs as it doesn't seem any changes from SSSD are needed. Please correct me if I'm wrong.

Comment 7 Ian Kent 2014-06-26 02:57:06 UTC
(In reply to Jakub Hrozek from comment #6)
> Based on comment #4, I'm moving the component to autofs as it doesn't seem
> any changes from SSSD are needed. Please correct me if I'm wrong.

Yeah, I've been able to locate a reasonably up to date
version of the patches I spoke about.

It was difficult due to me doing an upgade to F20 and my
hard disk failing not too long afterward. So I've been a
bit distracted.

I've had a look at them and given it some thought.

The patches assume that a map source will return a connect
failure for the case where the map can't be read at start
up. That might not be the the best approach but an empty
master map is a valid state so we probably can't use that
to identify this case.

Ideally sss would return a connection failure until it has
successfully connected and read the maps at startup. Clearly,
once the maps have been read at start up they should continue
to be used when a server becomes unavailable.

Perhaps we need a second bug that this bug depends on?

Comment 8 Jakub Hrozek 2014-06-26 14:13:40 UTC
(In reply to Ian Kent from comment #7)
> (In reply to Jakub Hrozek from comment #6)
> > Based on comment #4, I'm moving the component to autofs as it doesn't seem
> > any changes from SSSD are needed. Please correct me if I'm wrong.
> 
> Yeah, I've been able to locate a reasonably up to date
> version of the patches I spoke about.
> 
> It was difficult due to me doing an upgade to F20 and my
> hard disk failing not too long afterward. So I've been a
> bit distracted.
> 

No problem, thank you very much for digging them up!

> I've had a look at them and given it some thought.
> 
> The patches assume that a map source will return a connect
> failure for the case where the map can't be read at start
> up. That might not be the the best approach but an empty
> master map is a valid state so we probably can't use that
> to identify this case.
> 
> Ideally sss would return a connection failure until it has
> successfully connected and read the maps at startup. Clearly,
> once the maps have been read at start up they should continue
> to be used when a server becomes unavailable.
> 
> Perhaps we need a second bug that this bug depends on?

Yes, that sounds reasonable, I will file one. Do you have any particular return code in mind (maybe something that is used by other modules in your patches?)

Comment 9 Jakub Hrozek 2014-06-26 14:18:42 UTC
(In reply to Ian Kent from comment #7)
> (In reply to Jakub Hrozek from comment #6)
> The patches assume that a map source will return a connect
> failure for the case where the map can't be read at start
> up. That might not be the the best approach but an empty
> master map is a valid state so we probably can't use that
> to identify this case.

By the way, I think this connection error should only be returned in case the sssd cache is empty. If there are any maps, we should fall back to the cached maps..

Comment 10 Jakub Hrozek 2014-06-26 14:23:47 UTC
The new bugzilla is:
https://bugzilla.redhat.com/show_bug.cgi?id=1113639

Comment 11 Ian Kent 2014-06-27 01:05:18 UTC
(In reply to Jakub Hrozek from comment #9)
> (In reply to Ian Kent from comment #7)
> > (In reply to Jakub Hrozek from comment #6)
> > The patches assume that a map source will return a connect
> > failure for the case where the map can't be read at start
> > up. That might not be the the best approach but an empty
> > master map is a valid state so we probably can't use that
> > to identify this case.
> 
> By the way, I think this connection error should only be returned in case
> the sssd cache is empty. If there are any maps, we should fall back to the
> cached maps..

I'm pretty much at the mercy of other subsystems with that.
I'll have a look around and see what would be best, a connection
fail or refusal is what I'm after, same as what we'd see if the
network or server was down.

Comment 12 Ian Kent 2014-06-27 01:15:34 UTC
(In reply to Jakub Hrozek from comment #9)
> (In reply to Ian Kent from comment #7)
> > (In reply to Jakub Hrozek from comment #6)
> > The patches assume that a map source will return a connect
> > failure for the case where the map can't be read at start
> > up. That might not be the the best approach but an empty
> > master map is a valid state so we probably can't use that
> > to identify this case.
> 
> By the way, I think this connection error should only be returned in case
> the sssd cache is empty. If there are any maps, we should fall back to the
> cached maps..

That's not quite what I was saying.

An empty map is valid but not knowing if it's empty or not at
start up is the case we need to return a connection failure for.

Once sss has been able to read the map then it's been cached
and should continue to be used even if the server becomes
unreachable.

Comment 14 Jakub Hrozek 2014-10-09 15:38:55 UTC
Users of other distributions are running into this problem. I'm marking the BZ as public. We've marked sensitive comments as private anyway.

Comment 18 Ian Kent 2015-07-09 02:00:24 UTC
(In reply to Jakub Hrozek from comment #10)
> The new bugzilla is:
> https://bugzilla.redhat.com/show_bug.cgi?id=1113639

I see this bug is rhel-7.3 ? and I still believe that I'll
need that to make autofs function properly.

So I'll need to defer this bug until 7.3 too.

Comment 32 Ian Kent 2017-03-20 07:31:33 UTC
Created attachment 1264704 [details]
Patch - work around sss startup delay

Comment 33 Ian Kent 2017-03-20 07:32:46 UTC
Created attachment 1264705 [details]
Patch - add sss master map wait config option

Comment 34 Lukas Slebodnik 2017-03-20 09:14:23 UTC
(In reply to Ian Kent from comment #32)
> Created attachment 1264704 [details]
> Patch - work around sss startup delay

FYI, the bug was in sssd
https://pagure.io/SSSD/sssd/issue/3140
and a little bit related https://pagure.io/SSSD/sssd/issue/3080
Both are already fixed in sssd-1.15.0+ (rhel7.4)

Comment 35 Ian Kent 2017-03-20 10:12:00 UTC
(In reply to Lukas Slebodnik from comment #34)
> (In reply to Ian Kent from comment #32)
> > Created attachment 1264704 [details]
> > Patch - work around sss startup delay
> 
> FYI, the bug was in sssd
> https://pagure.io/SSSD/sssd/issue/3140
> and a little bit related https://pagure.io/SSSD/sssd/issue/3080
> Both are already fixed in sssd-1.15.0+ (rhel7.4)

I know and that's why the default setting is a timeout that
disables it.

I included the change because it's 1 part of 2 changes from
RHEL-6 and I'd like to keep RHEL-6 and 7 in sync.

The change is pretty straight forward and if there is some
other unexpected problem that comes up where this can help
then it'll be useful.

Comment 36 Lukas Slebodnik 2017-03-20 10:49:10 UTC
(In reply to Ian Kent from comment #35)
> (In reply to Lukas Slebodnik from comment #34)
> > (In reply to Ian Kent from comment #32)
> > > Created attachment 1264704 [details]
> > > Patch - work around sss startup delay
> > 
> > FYI, the bug was in sssd
> > https://pagure.io/SSSD/sssd/issue/3140
> > and a little bit related https://pagure.io/SSSD/sssd/issue/3080
> > Both are already fixed in sssd-1.15.0+ (rhel7.4)
> 
> I know and that's why the default setting is a timeout that
> disables it.
> 
> I included the change because it's 1 part of 2 changes from
> RHEL-6 and I'd like to keep RHEL-6 and 7 in sync.
> 
Sure

> The change is pretty straight forward and if there is some
> other unexpected problem that comes up where this can help
> then it'll be useful.

Agree

It was just a FYI :-)

Comment 43 Ian Kent 2017-03-30 08:19:15 UTC
Created attachment 1267441 [details]
Patch - fix work around sss startup delay

Comment 47 errata-xmlrpc 2017-08-01 12:43:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2213