Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1802251

Summary: Autofs will only mount share once if sss is first in nsswitch.conf
Product: Red Hat Enterprise Linux 8 Reporter: Oscar Santos <ossantos>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: Kun Wang <kunwan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.1CC: dwysocha, ikent, jhrozek, pbrezina, rbergant, xzhou
Target Milestone: rcKeywords: Regression
Target Release: 8.0Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:57:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test debug patch
none
Patch - fix a regression with map instance lookup
none
Patch - fix a regression with map instance lookup none

Description Oscar Santos 2020-02-12 17:50:07 UTC
Description of problem:
Autofs will only mount a share once if sss is first in nsswitch.conf.

The first time the call is made to mount the share, there is no issues. But once unmounted any other calls to mount that same share fail. From the pcap, we are not seeing nfs being called to mount the share again.

Version-Release number of selected component (if applicable):
autofs-5.1.4-35.el8.x86_64

How reproducible:
Very

Steps to Reproduce:
1.setup autofs-5.1.4-35.el8.x86_64 on rhel 8.1 (this will automount nfs share)
2. setup nsswitch.conf file to show sss first.
cat etc/nsswitch.conf 

passwd:     sss files systemd
group:      sss files systemd
netgroup:   sss files
automount:  sss files <-----------
services:   sss files
shadow:     files sss
hosts:      files dns myhostname
bootparams: files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
publickey:  files
aliases:    files

3. Use wild cards to mount home dir (* options server:/share/path/&) seem this behavior does not happen when hard coded. (/home/user options server:/share/path)

4. Mount share using autofs

5. Wait for time out and unmount of share

6. Try to mount share again.

Actual results:
Share will only mount the first time, and no other times after it has been unmounted.

Expected results:
Share should automount every time it is call. (after it has been unmounted)

Additional info:
It seems to come down to the wild cards being used (*&). This worked fine in rhel 6 and rhel 7, was not until rhel 8 when the client seen the issue.

Comment 1 Ian Kent 2020-02-13 02:11:44 UTC
I see you've identified where the problem is occurring here.

Feb 12 11:25:08 rhel80 automount[17324]: handle_packet_missing_indirect: token 26, name testdir, request pid 13539
Feb 12 11:25:08 rhel80 automount[17324]: attempting to mount entry /nfs/home/testdir
Feb 12 11:25:08 rhel80 automount[17324]: lookup_mount: lookup(sss): looking up testdir
Feb 12 11:25:08 rhel80 automount[17324]: setautomntent: lookup(sss): setautomntent: No such file or directory
Feb 12 11:25:08 rhel80 automount[17324]: lookup_mount: lookup(file): looking up testdir
Feb 12 11:25:08 rhel80 automount[17324]: dev_ioctl_send_fail: token = 26
Feb 12 11:25:08 rhel80 automount[17324]: failed to mount /nfs/home/testdir

The way this works is autofs does a lookup using the source (sss) first on
every lookup.

It uses the result to try and guess if the map has changed.
If the lookup fails it should consult the cache but that's not happening.
That's partly to do with the return values from the sss library not having
enough granularity. That's being prioritized upstream in sss, not sure when
changes will be available.

The point is a connection failure returns the same error for map not found
as it does for service not available or not configured. A map not found =>
don't consult the cache but service not available => do consult the cache.

I don't know why the sss library call is failing when trying to establish a
connection with sss, that's the way it's been done from the beginning.

I'll have a look around and compare RHEL-7 to RHEL-8 but I suspect this
are hasn't changed much, if at all.

I am wondering, due to that return limitation, maybe I can do use cached
values regardless since the map entry must have been seen at some point.

But that has all sorts of implications for map changes, and cache stale
entry cleaning, and negative cache entries that prevent heavy usage of
non-existent map entry lookup.

Let me look around further and get back to you.
Ian

Comment 2 Ian Kent 2020-02-13 02:40:39 UTC
Mmm ... the only changes to the sss code in autofs between RHEL-7.8 and
RHEL-8.1 are the monotonic time changes that weren't backported from
upstream.

They surely won't make any difference to the behaviour of the sss
library function _sss_setautomntent(), which is what's being used
here, or the autofs code that calls it.

Comment 3 Ian Kent 2020-02-13 02:44:51 UTC
Jakub,

Are you aware of any changes to the sss autofs library between RHEL-7.8
and RHEL-8.1 that I'm not up with?

Comment 4 Jakub Hrozek 2020-02-13 11:27:50 UTC
(In reply to Ian Kent from comment #3)
> Jakub,
> 
> Are you aware of any changes to the sss autofs library between RHEL-7.8
> and RHEL-8.1 that I'm not up with?

I'm not on the SSSD team anymore, so only from the high level from which I'm observing the project -- I know that Pavel Brezina was implementing some speedups for looking up large maps. But honestly I am even not sure if those changes made it to 8.1 or not.

I'll redirect the needinfo to Pavel.

Comment 5 Roberto Bergantinos 2020-02-13 16:18:49 UTC
hi Ian !

I've been playing a bit with reproducer, added some debug on lookup_mount(file module) and cache_lookup
looks we are returning NSS_STATUS_TRYAGAIN on lookup_mount (file) 
since ap->entry->current does not match with mapent->source found on the case :

Feb 13 16:52:22 rhel80 automount[28365]: handle_packet: type = 3
Feb 13 16:52:22 rhel80 automount[28365]: handle_packet_missing_indirect: token 305, name testdir, request pid 13539
Feb 13 16:52:22 rhel80 automount[28365]: attempting to mount entry /nfs/home/testdir
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(sss): looking up testdir
Feb 13 16:52:22 rhel80 automount[28365]: setautomntent: lookup(sss): setautomntent: No such file or directory
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): looking up testdir
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): current master source auto.home
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key *, on mapent 0x7f10840030e0, mkey: *
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key *, on cache, mapent 0x7f10840030e0, mkey: *
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): LOOKUP KEY: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key testdir, on mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on cache, mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key testdir, on mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on cache, mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR: TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home, 0x7f1084001ed0

code :

int lookup_mount(struct autofs_point *ap, const char *name, int name_len, void *context)
{
         struct lookup_context *ctxt = (struct lookup_context *) context;
         struct map_source *source;
...
         struct mapent *me;
...
         source = ap->entry->current;
...
         me = match_cached_key(ap, MODPREFIX, source, lkp_key);
         if (me && me->mapent) {
                 if (me && (me->source == source || *me->key == '/')) {
                         strcpy(mapent_buf, me->mapent);
                         mapent = mapent_buf; <<<-- [**]
                 }
         }
...
         if (!mapent) {
                 free(lkp_key);
                 return NSS_STATUS_TRYAGAIN;  <<----
         }

current does not match mapent->source found on the cache, despite name is the same.

Appears first lookup on sss source messes with the cache in a way that affect later lookups
for file source.

Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR: TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home, 0x7f1084001ed0

i'll attach the debug patch if you want to take a look

Comment 6 Roberto Bergantinos 2020-02-13 16:22:30 UTC
Created attachment 1662985 [details]
test debug patch

Comment 7 Ian Kent 2020-02-13 23:15:29 UTC
(In reply to Roberto Bergantinos from comment #5)
> hi Ian !

Thanks for this.

> 
> I've been playing a bit with reproducer, added some debug on
> lookup_mount(file module) and cache_lookup
> looks we are returning NSS_STATUS_TRYAGAIN on lookup_mount (file) 
> since ap->entry->current does not match with mapent->source found on the
> case :

Yes, sorry about that, I hadn't looked far enough in the case notes.

While I'm not sure your analysis about the source is a problem the
case notes do imply the problem could be in a different location and
not in the sss module.

> 
> Feb 13 16:52:22 rhel80 automount[28365]: handle_packet: type = 3
> Feb 13 16:52:22 rhel80 automount[28365]: handle_packet_missing_indirect:
> token 305, name testdir, request pid 13539
> Feb 13 16:52:22 rhel80 automount[28365]: attempting to mount entry
> /nfs/home/testdir
> Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(sss): looking
> up testdir
> Feb 13 16:52:22 rhel80 automount[28365]: setautomntent: lookup(sss):
> setautomntent: No such file or directory
> Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): looking
> up testdir
> Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): current
> master source auto.home
> Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key *, on
> mapent 0x7f10840030e0, mkey: *
> Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key *, on
> cache, mapent 0x7f10840030e0, mkey: *
> Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): LOOKUP
> KEY: testdir
> Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key
> testdir, on mapent 0x7f10700031c0, mkey: testdir
> Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on
> cache, mapent 0x7f10700031c0, mkey: testdir
> Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key
> testdir, on mapent 0x7f10700031c0, mkey: testdir
> Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on
> cache, mapent 0x7f10700031c0, mkey: testdir
> Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR:
> TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home,
> 0x7f1084001ed0
> 
> code :
> 
> int lookup_mount(struct autofs_point *ap, const char *name, int name_len,
> void *context)
> {
>          struct lookup_context *ctxt = (struct lookup_context *) context;
>          struct map_source *source;
> ...
>          struct mapent *me;
> ...
>          source = ap->entry->current;
> ...
>          me = match_cached_key(ap, MODPREFIX, source, lkp_key);
>          if (me && me->mapent) {
>                  if (me && (me->source == source || *me->key == '/')) {
>                          strcpy(mapent_buf, me->mapent);
>                          mapent = mapent_buf; <<<-- [**]
>                  }
>          }
> ...
>          if (!mapent) {
>                  free(lkp_key);
>                  return NSS_STATUS_TRYAGAIN;  <<----
>          }
> 
> current does not match mapent->source found on the cache, despite name is
> the same.
> 
> Appears first lookup on sss source messes with the cache in a way that
> affect later lookups
> for file source.

Could be but there are multiple map sources.

Assuming the key was found in the files source the first time around the
resulting map entry should have been added to the map entry cache ... but
it appears it isn't being found the second time around.

So the implication is that there's a regression in the current RHEL-8
and possibly also introduced into the current RHEL-7 (possibley not yet
seen by the customer).

> 
> Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR:
> TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home,
> 0x7f1084001ed0
> 
> i'll attach the debug patch if you want to take a look

Thanks, I'll have to broaden my investigation.

Ian

Comment 8 Ian Kent 2020-02-14 10:36:37 UTC
(In reply to Roberto Bergantinos from comment #5)
> 
> Appears first lookup on sss source messes with the cache in a way that
> affect later lookups
> for file source.
> 

And your absolutely right.

This could have been much harder to spot if it wasn't for the excellent
work of everyone involved, including the customer, thanks everyone!

I have a patch for this now which I'll post here but I'll wait a while
and review it again to see if I can spot anything I've missed before
posting a test build.

Ian

Comment 9 Ian Kent 2020-02-14 10:39:43 UTC
Created attachment 1663076 [details]
Patch - fix a regression with map instance lookup

The patch is against the current upstream development source,
back port will follow once I've checked the patch again.

Comment 10 Pavel Březina 2020-02-14 11:23:29 UTC
Clearing need info since the issue was found in autofs.

Comment 11 Ian Kent 2020-02-17 01:28:46 UTC
Created attachment 1663422 [details]
Patch - fix a regression with map instance lookup

Patch has been updated and back ported to RHEL-8 autofs.

Comment 13 Ian Kent 2020-02-17 05:58:06 UTC
I have built a test package that includes the patch here.

It's located at:
http://people.redhat.com/~ikent/autofs-5.1.4-39.el8/

Please have this tested to confirm the change fixes the problem.

The RHEL-8.2 freeze is coming in a few days and I'd like to get this
into RHEL-8.2 becuase it's a regression so we will need to have it
verified by the customer as soon as possible.

Although I'm tempted to commit the change to dist-git anyway as I
was able to duplicate the problem, understand what caused it and
verify the change fixes the described problem.

But lets see how we go.

Ian

Comment 20 Ian Kent 2020-02-17 09:48:19 UTC
(In reply to Roberto Bergantinos from comment #16)
> hmmm... issue only reproduces using a /etc/ relative UNIX pathname
> 
> /home home.map -t 10      ---> reproduced
> /home /etc/home.map -t 10 ---> not reproduced
> 
> customer is not following strictly the format then, auto.master(5) specifies
> an absolute pathname :
> 
> =========================
>        map
>               Name of the map to use.  This is an absolute UNIX pathname for
> maps of types file, dir, or program, and the name of a database in the case
> for maps of type yp, nisplus, or hesiod or the dn of an LDAP  entry  for
>               maps of type ldap.
> 
> =========================
> 
> Then i guess we should avoid using relative pathname as it can break the
> cache on a concrete setup like this ?

This is not a problem because those types, "file", "dir" or "program" need
absolute paths (and "file" or "program" maps should be detected automatically)
and the type only needs to be specified for "dir" since it's not an individual
map. It's not mentioned but the database (I often use the term map source)
name can also be "files" and also now "sss" is also a valid nsswitch name.

Mmm ... that man page needs to be updated ...

Comment 21 Roberto Bergantinos 2020-02-17 09:51:26 UTC
(In reply to Ian Kent from comment #20)
> (In reply to Roberto Bergantinos from comment #16)
> > hmmm... issue only reproduces using a /etc/ relative UNIX pathname
>
> 
> This is not a problem because those types, "file", "dir" or "program" need
> absolute paths (and "file" or "program" maps should be detected
> automatically)
> and the type only needs to be specified for "dir" since it's not an
> individual
> map. It's not mentioned but the database (I often use the term map source)
> name can also be "files" and also now "sss" is also a valid nsswitch name.
> 
> Mmm ... that man page needs to be updated ...

possibly, yes. 

Got it.

Btw, Ian, tested http://people.redhat.com/~ikent/autofs-5.1.4-39.el8/ 
and i cannot reproduce the problem

rgds
roberto

Comment 28 errata-xmlrpc 2020-04-28 16:57:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1866