Bug 1802251
| Summary: | Autofs will only mount share once if sss is first in nsswitch.conf | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Oscar Santos <ossantos> | ||||||||
| Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Kun Wang <kunwan> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 8.1 | CC: | dwysocha, ikent, jhrozek, pbrezina, rbergant, xzhou | ||||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||||
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-04-28 16:57:33 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Oscar Santos
2020-02-12 17:50:07 UTC
I see you've identified where the problem is occurring here. Feb 12 11:25:08 rhel80 automount[17324]: handle_packet_missing_indirect: token 26, name testdir, request pid 13539 Feb 12 11:25:08 rhel80 automount[17324]: attempting to mount entry /nfs/home/testdir Feb 12 11:25:08 rhel80 automount[17324]: lookup_mount: lookup(sss): looking up testdir Feb 12 11:25:08 rhel80 automount[17324]: setautomntent: lookup(sss): setautomntent: No such file or directory Feb 12 11:25:08 rhel80 automount[17324]: lookup_mount: lookup(file): looking up testdir Feb 12 11:25:08 rhel80 automount[17324]: dev_ioctl_send_fail: token = 26 Feb 12 11:25:08 rhel80 automount[17324]: failed to mount /nfs/home/testdir The way this works is autofs does a lookup using the source (sss) first on every lookup. It uses the result to try and guess if the map has changed. If the lookup fails it should consult the cache but that's not happening. That's partly to do with the return values from the sss library not having enough granularity. That's being prioritized upstream in sss, not sure when changes will be available. The point is a connection failure returns the same error for map not found as it does for service not available or not configured. A map not found => don't consult the cache but service not available => do consult the cache. I don't know why the sss library call is failing when trying to establish a connection with sss, that's the way it's been done from the beginning. I'll have a look around and compare RHEL-7 to RHEL-8 but I suspect this are hasn't changed much, if at all. I am wondering, due to that return limitation, maybe I can do use cached values regardless since the map entry must have been seen at some point. But that has all sorts of implications for map changes, and cache stale entry cleaning, and negative cache entries that prevent heavy usage of non-existent map entry lookup. Let me look around further and get back to you. Ian Mmm ... the only changes to the sss code in autofs between RHEL-7.8 and RHEL-8.1 are the monotonic time changes that weren't backported from upstream. They surely won't make any difference to the behaviour of the sss library function _sss_setautomntent(), which is what's being used here, or the autofs code that calls it. Jakub, Are you aware of any changes to the sss autofs library between RHEL-7.8 and RHEL-8.1 that I'm not up with? (In reply to Ian Kent from comment #3) > Jakub, > > Are you aware of any changes to the sss autofs library between RHEL-7.8 > and RHEL-8.1 that I'm not up with? I'm not on the SSSD team anymore, so only from the high level from which I'm observing the project -- I know that Pavel Brezina was implementing some speedups for looking up large maps. But honestly I am even not sure if those changes made it to 8.1 or not. I'll redirect the needinfo to Pavel. hi Ian !
I've been playing a bit with reproducer, added some debug on lookup_mount(file module) and cache_lookup
looks we are returning NSS_STATUS_TRYAGAIN on lookup_mount (file)
since ap->entry->current does not match with mapent->source found on the case :
Feb 13 16:52:22 rhel80 automount[28365]: handle_packet: type = 3
Feb 13 16:52:22 rhel80 automount[28365]: handle_packet_missing_indirect: token 305, name testdir, request pid 13539
Feb 13 16:52:22 rhel80 automount[28365]: attempting to mount entry /nfs/home/testdir
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(sss): looking up testdir
Feb 13 16:52:22 rhel80 automount[28365]: setautomntent: lookup(sss): setautomntent: No such file or directory
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): looking up testdir
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): current master source auto.home
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key *, on mapent 0x7f10840030e0, mkey: *
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key *, on cache, mapent 0x7f10840030e0, mkey: *
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): LOOKUP KEY: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key testdir, on mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on cache, mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key testdir, on mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on cache, mapent 0x7f10700031c0, mkey: testdir
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR: TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home, 0x7f1084001ed0
code :
int lookup_mount(struct autofs_point *ap, const char *name, int name_len, void *context)
{
struct lookup_context *ctxt = (struct lookup_context *) context;
struct map_source *source;
...
struct mapent *me;
...
source = ap->entry->current;
...
me = match_cached_key(ap, MODPREFIX, source, lkp_key);
if (me && me->mapent) {
if (me && (me->source == source || *me->key == '/')) {
strcpy(mapent_buf, me->mapent);
mapent = mapent_buf; <<<-- [**]
}
}
...
if (!mapent) {
free(lkp_key);
return NSS_STATUS_TRYAGAIN; <<----
}
current does not match mapent->source found on the cache, despite name is the same.
Appears first lookup on sss source messes with the cache in a way that affect later lookups
for file source.
Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR: TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home, 0x7f1084001ed0
i'll attach the debug patch if you want to take a look
Created attachment 1662985 [details]
test debug patch
(In reply to Roberto Bergantinos from comment #5) > hi Ian ! Thanks for this. > > I've been playing a bit with reproducer, added some debug on > lookup_mount(file module) and cache_lookup > looks we are returning NSS_STATUS_TRYAGAIN on lookup_mount (file) > since ap->entry->current does not match with mapent->source found on the > case : Yes, sorry about that, I hadn't looked far enough in the case notes. While I'm not sure your analysis about the source is a problem the case notes do imply the problem could be in a different location and not in the sss module. > > Feb 13 16:52:22 rhel80 automount[28365]: handle_packet: type = 3 > Feb 13 16:52:22 rhel80 automount[28365]: handle_packet_missing_indirect: > token 305, name testdir, request pid 13539 > Feb 13 16:52:22 rhel80 automount[28365]: attempting to mount entry > /nfs/home/testdir > Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(sss): looking > up testdir > Feb 13 16:52:22 rhel80 automount[28365]: setautomntent: lookup(sss): > setautomntent: No such file or directory > Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): looking > up testdir > Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): current > master source auto.home > Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key *, on > mapent 0x7f10840030e0, mkey: * > Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key *, on > cache, mapent 0x7f10840030e0, mkey: * > Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): LOOKUP > KEY: testdir > Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key > testdir, on mapent 0x7f10700031c0, mkey: testdir > Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on > cache, mapent 0x7f10700031c0, mkey: testdir > Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: looking for key > testdir, on mapent 0x7f10700031c0, mkey: testdir > Feb 13 16:52:22 rhel80 automount[28365]: cache_lookup: found key testdir, on > cache, mapent 0x7f10700031c0, mkey: testdir > Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR: > TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home, > 0x7f1084001ed0 > > code : > > int lookup_mount(struct autofs_point *ap, const char *name, int name_len, > void *context) > { > struct lookup_context *ctxt = (struct lookup_context *) context; > struct map_source *source; > ... > struct mapent *me; > ... > source = ap->entry->current; > ... > me = match_cached_key(ap, MODPREFIX, source, lkp_key); > if (me && me->mapent) { > if (me && (me->source == source || *me->key == '/')) { > strcpy(mapent_buf, me->mapent); > mapent = mapent_buf; <<<-- [**] > } > } > ... > if (!mapent) { > free(lkp_key); > return NSS_STATUS_TRYAGAIN; <<---- > } > > current does not match mapent->source found on the cache, despite name is > the same. > > Appears first lookup on sss source messes with the cache in a way that > affect later lookups > for file source. Could be but there are multiple map sources. Assuming the key was found in the files source the first time around the resulting map entry should have been added to the map entry cache ... but it appears it isn't being found the second time around. So the implication is that there's a regression in the current RHEL-8 and possibly also introduced into the current RHEL-7 (possibley not yet seen by the customer). > > Feb 13 16:52:22 rhel80 automount[28365]: lookup_mount: lookup(file): ERROR: > TRY AGAIN: source : auto.home, 0x7f1070003320 ME source : auto.home, > 0x7f1084001ed0 > > i'll attach the debug patch if you want to take a look Thanks, I'll have to broaden my investigation. Ian (In reply to Roberto Bergantinos from comment #5) > > Appears first lookup on sss source messes with the cache in a way that > affect later lookups > for file source. > And your absolutely right. This could have been much harder to spot if it wasn't for the excellent work of everyone involved, including the customer, thanks everyone! I have a patch for this now which I'll post here but I'll wait a while and review it again to see if I can spot anything I've missed before posting a test build. Ian Created attachment 1663076 [details]
Patch - fix a regression with map instance lookup
The patch is against the current upstream development source,
back port will follow once I've checked the patch again.
Clearing need info since the issue was found in autofs. Created attachment 1663422 [details]
Patch - fix a regression with map instance lookup
Patch has been updated and back ported to RHEL-8 autofs.
I have built a test package that includes the patch here. It's located at: http://people.redhat.com/~ikent/autofs-5.1.4-39.el8/ Please have this tested to confirm the change fixes the problem. The RHEL-8.2 freeze is coming in a few days and I'd like to get this into RHEL-8.2 becuase it's a regression so we will need to have it verified by the customer as soon as possible. Although I'm tempted to commit the change to dist-git anyway as I was able to duplicate the problem, understand what caused it and verify the change fixes the described problem. But lets see how we go. Ian (In reply to Roberto Bergantinos from comment #16) > hmmm... issue only reproduces using a /etc/ relative UNIX pathname > > /home home.map -t 10 ---> reproduced > /home /etc/home.map -t 10 ---> not reproduced > > customer is not following strictly the format then, auto.master(5) specifies > an absolute pathname : > > ========================= > map > Name of the map to use. This is an absolute UNIX pathname for > maps of types file, dir, or program, and the name of a database in the case > for maps of type yp, nisplus, or hesiod or the dn of an LDAP entry for > maps of type ldap. > > ========================= > > Then i guess we should avoid using relative pathname as it can break the > cache on a concrete setup like this ? This is not a problem because those types, "file", "dir" or "program" need absolute paths (and "file" or "program" maps should be detected automatically) and the type only needs to be specified for "dir" since it's not an individual map. It's not mentioned but the database (I often use the term map source) name can also be "files" and also now "sss" is also a valid nsswitch name. Mmm ... that man page needs to be updated ... (In reply to Ian Kent from comment #20) > (In reply to Roberto Bergantinos from comment #16) > > hmmm... issue only reproduces using a /etc/ relative UNIX pathname > > > This is not a problem because those types, "file", "dir" or "program" need > absolute paths (and "file" or "program" maps should be detected > automatically) > and the type only needs to be specified for "dir" since it's not an > individual > map. It's not mentioned but the database (I often use the term map source) > name can also be "files" and also now "sss" is also a valid nsswitch name. > > Mmm ... that man page needs to be updated ... possibly, yes. Got it. Btw, Ian, tested http://people.redhat.com/~ikent/autofs-5.1.4-39.el8/ and i cannot reproduce the problem rgds roberto Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1866 |