Bug 2013218
Summary: | autofs lookups for unknown mounts are delayed for 50s | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | u.sibiller <u.sibiller> | ||||||||||||
Component: | sssd | Assignee: | Pavel Březina <pbrezina> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | shridhar <sgadekar> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 8.4 | CC: | aboscatt, atikhono, dlavu, grajaiya, ikent, jhrozek, knweiss, lslebodn, mzidek, pbrezina, sgoveas, tscherf | ||||||||||||
Target Milestone: | rc | Keywords: | Regression, Triaged, ZStream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | sync-to-jira | ||||||||||||||
Fixed In Version: | sssd-2.6.1-1.el8 | Doc Type: | If docs needed, set a value | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 2016922 2016923 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2022-05-10 15:26:45 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 2016922, 2016923 | ||||||||||||||
Attachments: |
|
Description
u.sibiller@atos.net
2021-10-12 12:09:38 UTC
Created attachment 1832193 [details]
our sssd.conf
Created attachment 1832194 [details]
logfile sssd_autofs.log
Here you can see sssd/autofs doing 51 CRs (Cache requests, I guess).
Looks like SSSD is offline which makes autofs to retry again and again. Can you please also share your domain logs to see why SSSD is offline? Or is it expected that SSSD is offline? Well, I do not get what "offline" means in that context. The ssd service is up an running: ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2021-10-15 16:26:51 CEST; 7min ago Main PID: 118292 (sssd) Tasks: 8 (limit: 11388) Memory: 61.8M CGroup: /system.slice/sssd.service ├─118292 /usr/sbin/sssd -i --logger=files ├─118293 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files ├─118294 /usr/libexec/sssd/sssd_be --domain DOMAIN1.OURNET --uid 0 --gid 0 --logger=files ├─118295 /usr/libexec/sssd/sssd_be --domain DOMAIN2.OURNET --uid 0 --gid 0 --logger=files ├─118296 /usr/libexec/sssd/sssd_be --domain unmatched --uid 0 --gid 0 --logger=files ├─118297 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files ├─118298 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files └─118299 /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --logger=files Oct 15 16:26:51 vumca615 systemd[1]: Starting System Security Services Daemon... Oct 15 16:26:51 vumca615 sssd[sssd][118292]: Starting up Oct 15 16:26:51 vumca615 sssd[be[implicit_files]][118293]: Starting up Oct 15 16:26:51 vumca615 sssd[be[DOMAIN2.OURNET]][118295]: Starting up Oct 15 16:26:51 vumca615 sssd[be[DOMAIN1.OURNET]][118294]: Starting up Oct 15 16:26:51 vumca615 sssd[be[unmatched]][118296]: Starting up Oct 15 16:26:51 vumca615 sssd[autofs][118299]: Starting up Oct 15 16:26:51 vumca615 sssd[nss][118297]: Starting up Oct 15 16:26:51 vumca615 sssd[pam][118298]: Starting up Oct 15 16:26:51 vumca615 systemd[1]: Started System Security Services Daemon. I'll attach a new log. Created attachment 1833464 [details]
sssdlogs with debug_level 9 everywhere (names and IPs changed with sed)
Created attachment 1833465 [details]
sssd logs, names and IPs mangled
This sssd status ist the one that matched the attached log tarball, above one was the wrong one (copy and paste mistake) systemctl status sssd ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2021-10-15 16:37:09 CEST; 35min ago Main PID: 118621 (sssd) Tasks: 8 (limit: 11388) Memory: 62.5M CGroup: /system.slice/sssd.service ├─118621 /usr/sbin/sssd -i --logger=files ├─118623 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files ├─118624 /usr/libexec/sssd/sssd_be --domain DOMAIN1.OURNET --uid 0 --gid 0 --logger=files ├─118625 /usr/libexec/sssd/sssd_be --domain DOMAIN2.OUTNET --uid 0 --gid 0 --logger=files ├─118626 /usr/libexec/sssd/sssd_be --domain unmatched --uid 0 --gid 0 --logger=files ├─118627 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files ├─118628 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files └─118629 /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --logger=files Oct 15 16:37:09 vumca615 systemd[1]: Starting System Security Services Daemon... Oct 15 16:37:09 vumca615 sssd[sssd][118621]: Starting up Oct 15 16:37:09 vumca615 sssd[be[DOMAIN2.OURNET]][118625]: Starting up Oct 15 16:37:09 vumca615 sssd[be[implicit_files]][118623]: Starting up Oct 15 16:37:09 vumca615 sssd[be[DOMAIN1.OURNET]][118624]: Starting up Oct 15 16:37:09 vumca615 sssd[be[unmatched]][118626]: Starting up Oct 15 16:37:09 vumca615 sssd[pam][118628]: Starting up Oct 15 16:37:09 vumca615 sssd[autofs][118629]: Starting up Oct 15 16:37:09 vumca615 sssd[nss][118627]: Starting up Oct 15 16:37:09 vumca615 systemd[1]: Started System Security Services Daemon. (In reply to Pavel Březina from comment #3) > Looks like SSSD is offline which makes autofs to retry again and again. Can > you please also share your domain logs to see why SSSD is offline? Or is it > expected that SSSD is offline? I had a similar situation with an autofs report a while ago. From logs it looked like the reporter had an incorrect configuration so I couldn't get further. This might be a case where we there is no "correct" return. In this case the map key doesn't exist so an ENOENT return is needed. But we might not be able to do that. For example, the cache isn't populated, the backend isn't responding for some reason, sss returns server down to autofs (need to verify that) and autofs waits ... so, in theory, both sss and autofs are doing the right thing but the user experience isn't ok. The bit that gets confusing is that the server shouldn't be down in this case so something isn't quite right. (In reply to u.sibiller from comment #2) > Created attachment 1832194 [details] > logfile sssd_autofs.log > > Here you can see sssd/autofs doing 51 CRs (Cache requests, I guess). The log appears to indicate there are multiple cache lookups returning key not found, first from cache, then from provider and then from cache again. For some reason (and there's no indication sss thinks the provider is down) ENOENT isn't being returned. Perhaps the logic: cache lookup fail <- not found -> lookup from provider <- not found -> lookup cache again <- not found isn't quite right in that the second cache lookup shouldn't be happening since the provider lookup has been done and appears to have succeeded at returning not found. But maybe the provider "looks" like it's down so the second cache lookup is done. Or the provider lookup is returning not found but ENOENT isn't being returned as it should. Not knowing the sss code I don't know ... It could be autofs not handling the return correctly causing those retries seen in the sss log, probably a server down return ... since that's pretty much the only return what will cause autofs to continue the wait in this case (getautomntbyname_r()). I might have found the issue in SSSD, specifically this: https://github.com/pbrezina/sssd/commit/9a977270d802ed9506b9acc82011f04a06ead7ec Please, try this scratch build and check if the issue still persist: https://pbrezina.fedorapeople.org/scratch/2013218/ Yes, this reduces the number of cache requests to 3, see attached log. Thanks for the quick fix! Created attachment 1834208 [details]
log with fix included in sssd-2.4.0.el8-4.2.1
Please note that these packages depend on samba-client-libs 4.13.3-5.el8_4 which RHN does not seem to know. Also note the packages point to https://pagure.io/SSSD/sssd/ which is not the current home of the sssd project anymore. (In reply to u.sibiller from comment #13) > Yes, this reduces the number of cache requests to 3, see attached log. That's interesting, it sounds right. If the lookup key isn't found autofs will check for wildcard matches, for historic reasons, it will look for '/' and if that's not found it will look for '*' (the usual wildcard key). Upstream ticket: https://github.com/SSSD/sssd/issues/5832 Upstream PR: https://github.com/SSSD/sssd/pull/5833 Thank you for testing the scratch build. I have opened pull request against upstream. Hmm, "internal target release" 8.6.0 is quite far away. So is there a change to get fixed packages for 8.4 (as written before, above package depend on sama.client-libs 4.13.3-5.el8_4 which is not available (yet?)? (In reply to u.sibiller from comment #21) > Hmm, "internal target release" 8.6.0 is quite far away. So is there a change > to get fixed packages for 8.4 (as written before, above package depend on > sama.client-libs 4.13.3-5.el8_4 which is not available (yet?)? Package from the comment 12 is merely a "test" build (i.e. "for testing purpose only"), it is not "supported". Reported bug has to be fixed in a currently being developed release first, i.e. 8.6/9.0. Backport of the fix to previously released minor RHEL versions will be considered as well. So "there is a chance" but no promise. (In reply to u.sibiller from comment #21) > Hmm, "internal target release" 8.6.0 is quite far away. So is there a change > to get fixed packages for 8.4 (as written before, above package depend on > sama.client-libs 4.13.3-5.el8_4 which is not available (yet?)? Fix for 8.4 will be tracked via bz 2016922 Pushed PR: https://github.com/SSSD/sssd/pull/5833 * `master` * bb94a18f0f0cba1e9fb5abf78b995d69e5f3c559 - cache_req: return success for autofs when ENOENT is returned from provider Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2070 |