Description of problem: After upgrading to sssd-2.0.0-4.fc29.x86_64, autofs-5.1.4-21.fc29.x86_64 seems to have trouble finding its mappings as quickly as it did. Immediately after logging in, if I do $ ls /nas/home/james (autofs NFSv4 mount) I get a 'no such file or directory' error. However if I wait a few minutes and try again, it works. The only clue seems to be an error message about "getautomntent_r: Invalid argument" in the logs for autofs: Nov 10 09:13:48 kowalski.cb.ettle systemd[1]: Starting Automounts filesystems on demand... Nov 10 09:15:18 kowalski.cb.ettle automount[1192]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument Nov 10 09:15:18 kowalski.cb.ettle systemd[1]: Started Automounts filesystems on demand. From journalctl -u sssd in this boot: Nov 10 09:13:43 kowalski.cb.ettle systemd[1]: Starting System Security Services Daemon... Nov 10 09:13:43 kowalski.cb.ettle sssd[682]: Starting up Nov 10 09:13:43 kowalski.cb.ettle sssd[be[implicit_files]][751]: Starting up Nov 10 09:13:43 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Starting up Nov 10 09:13:44 kowalski.cb.ettle sssd[nss][753]: Starting up Nov 10 09:13:44 kowalski.cb.ettle sssd[autofs][757]: Starting up Nov 10 09:13:44 kowalski.cb.ettle sssd[pac][758]: Starting up Nov 10 09:13:44 kowalski.cb.ettle sssd[sudo][754]: Starting up Nov 10 09:13:44 kowalski.cb.ettle sssd[ssh][756]: Starting up Nov 10 09:13:44 kowalski.cb.ettle sssd[pam][755]: Starting up Nov 10 09:13:45 kowalski.cb.ettle systemd[1]: Started System Security Services Daemon. Nov 10 09:13:46 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Backend is offline Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1 Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1 Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1 Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 2 Nov 10 09:13:49 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Backend is online Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1 Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1 Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1 Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 2 Works OK with no delay if I downgrade to sssd-2.0.0-3.fc29.x86_64.
Some delays are reported also in BZ1644919. Jakub promised to prepare a scratch builds but I cannot see it in comments. James. it's just FYI and bugs can be unrelated. But might be good to know anbout other bug. And I am sorry for troubles.
Note that this is not consistent. I have one workstation that does it repeatedly, but two others that do not. No idea what the difference is.
Anything further on this? Or debug tips so I can gather more information?
*** Bug 1560290 has been marked as a duplicate of this bug. ***
Any progress on this? This issue is preventing autofs from working. I downgraded autofs as described in comment 0 and confirmed that in the previous version things were working. sss failures still are reported but they are reported much faster and therefore autofs eventually starts. For now, I have removed sss from nsswitch.conf's automount option as a solution as I am not using any enterprise auto mounts (that I know of).
Still seen in sssd-2.1.0-2.fc30.x86_64, autofs-5.1.5-3.fc30.x86_64. Currently using workaround described above -- i.e., don't configure FreeIPA automounts and write the relevant /etc/auto.whatever bits by hand.
I also use autofs with maps defined in LDAP. If I select sss as the NSS provider, then I see the following in my logs: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument autofs.service: Start operation timed out. Terminating. Running "sss_cache -E" allows autofs to start properly. However, things revert back to the error condition above after a reboot. In order to allow things to work in a persistent way that survives reboots, I have replaced "sss" with "ldap" as my NSS provider for autofs in /etc/nsswitch.conf. I would prefer to consistently use sss.
FWIW, I've started seeing this on an Oracle Linux 8 server after updating to 8.1: $ rpm -qa |grep -E "(autofs|sssd)" |sort autofs-5.1.4-35.0.1.el8.x86_64 libsss_autofs-2.2.0-19.el8.x86_64 sssd-ad-2.2.0-19.el8.x86_64 sssd-client-2.2.0-19.el8.x86_64 sssd-common-2.2.0-19.el8.x86_64 sssd-common-pac-2.2.0-19.el8.x86_64 sssd-kcm-2.2.0-19.el8.x86_64 sssd-krb5-2.2.0-19.el8.x86_64 sssd-krb5-common-2.2.0-19.el8.x86_64 Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Paths. Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Timers. Nov 15 23:19:02 paaitstest01 systemd[1801]: Listening on D-Bus User Message Bus Socket. Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Sockets. Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Basic System. Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Default. Nov 15 23:19:02 paaitstest01 systemd[1801]: Startup finished in 46ms. Nov 15 23:19:02 paaitstest01 systemd[1]: Started User Manager for UID 48517. Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument Nov 15 23:20:22 paaitstest01 automount[1333]: lookup(file): failed to read included master map auto.master Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master files auto.master Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null) Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry /misc Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry /net Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master dir /etc/auto.master.d Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(dir): scandir: /etc/auto.master.d Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry +auto.master Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master files auto.master Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null) Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master sss auto.master Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null) Nov 15 23:20:22 paaitstest01 automount[1333]: lookup(file): failed to read included master map auto.master Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /misc Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-misc Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map file /etc/auto.misc Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null) Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /misc with timeout 300, freq 75 seconds Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /misc Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /net Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-net Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map hosts (null) Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null) Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: lookup(hosts): read hosts map Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: lookup(hosts): map not browsable, update existing host entries only Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /net with timeout 300, freq 75 seconds Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /net Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /nethome Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-nethome Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map files auto.home Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map sss auto.home Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null) Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: map read not needed, so not done Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /nethome with timeout 300, freq 75 seconds Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /nethome A 1m20s delay in starting autofs, and therefore completing multi-user.target
As suggested by James, it doesn't happen for all the servers with the same configuration consistently. Just after boot I can see this: $ sudo systemctl list-jobs JOB UNIT TYPE STATE 258 systemd-update-utmp-runlevel.service start waiting 239 autofs.service start running 120 multi-user.target start waiting
By comparison RHEL/OL 8 has the following packages, and I've not seen the problem there: $ rpm -qa |grep -E "(autofs|sssd)"|sort autofs-5.1.4-29.0.1.el8.x86_64 libsss_autofs-2.0.0-43.el8_0.3.x86_64 sssd-ad-2.0.0-43.el8_0.3.x86_64 sssd-client-2.0.0-43.el8_0.3.x86_64 sssd-common-2.0.0-43.el8_0.3.x86_64 sssd-common-pac-2.0.0-43.el8_0.3.x86_64 sssd-kcm-2.0.0-43.el8_0.3.x86_64 sssd-krb5-2.0.0-43.el8_0.3.x86_64 sssd-krb5-common-2.0.0-43.el8_0.3.x86_64
8.0 that is!
This problem seems to remain in Fedora 31.
I can confirm this problem on Fedora 31. In my case autofs service always takes 90 seconds to start (seems to be a timeout), only then automounts work: Jan 11 07:16:39 x.y.z systemd[1]: Starting Automounts filesystems on demand... Jan 11 07:18:09 x.y.z automount[1532]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument Jan 11 07:18:09 x.y.z automount[1532]: setautomntent: lookup(sss): setautomntent: No such file or directory Jan 11 07:18:09 x.y.z systemd[1]: Started Automounts filesystems on demand.
We are also experiencing this issue on Fedora 31 and CentOS 8.1. The issue seems to be that sssd is not yet online when autofs starts, and this causes a delay. This can easily be reproduced: systemctl stop autofs.service pkill -USR1 sssd ; sleep 5 ; pkill -USR2 sssd In that 5s sleep, start autofs. You will notice that autofs takes +90s to start. If you run the automount command manually in strace, there's a read call that takes 90s. As a workaround I've set the following parameter in the [autofs] section of sssd.conf: client_idle_timeout = 10 This reduces the autofs startup time from 90s to about 15s. There seems to be something wrong with sssd autofs if sssd is in offline mode. Regards, Rik
Related upstream ticket https://pagure.io/SSSD/sssd/issue/4113
This package has changed maintainer in the Fedora. Reassigning to the new maintainer of this component.
Created attachment 1697980 [details] vagrant script for reproducer
Hello all, I've got weird autofs/sssd errors as well. I still have them with a centos8.2 ipa server using a fedora32 ipaclient. I've build a reproducer in vagrant. To use it, install vagrant on a fedora32 host. Configure the host for kvm virtualization. create an empty directory, - mkdir vagrant put the Vagrant script in that dir cd to that dir check the script if the DOMAIN_URI var is ok give the command : vagrant up wait for a long time for the script to finish when the script is finished: log in to the fedoraclient with vagrant: vagrant ssh fedoraclient become root: sudo -i check to see if the home directory is mounted: mount it will claim to be active however a directory listing will show the home folder of the user test is missing: ls -als /home restart autofs: systemctl restart autofs and now the home dir is mounted, list the home dir: ls -als /home and you will see the test user home dir. if for some reason the mount is working the first time, run the reproducer script as root /root/reproducer.sh and try again after the reboot. Cheers Rob
I found this fixed with Fedora 32. I use Kerberos for authentication and LDAP/NFS/autofs for home directories. Has anyone else had luck? I configured my host with `authselect select sssd --force`. The result, which seems to work fine, follows: # Generated by authselect on Fri Jun 19 16:36:27 2020 # Do not modify this file manually. # If you want to make changes to nsswitch.conf please modify # /etc/authselect/user-nsswitch.conf and run 'authselect apply-changes'. # # Note that your changes may not be applied as they may be # overwritten by selected profile. Maps set in the authselect # profile takes always precedence and overwrites the same maps # set in the user file. Only maps that are not set by the profile # are applied from the user file. # # For example, if the profile sets: # passwd: sss files # and /etc/authselect/user-nsswitch.conf contains: # passwd: files # hosts: files dns # the resulting generated nsswitch.conf will be: # passwd: sss files # from profile # hosts: files dns # from user file passwd: sss files systemd group: sss files systemd netgroup: sss files automount: sss files services: sss files # Included from /etc/authselect/user-nsswitch.conf # # /etc/nsswitch.conf # # Name Service Switch config file. This file should be # sorted with the most-used services at the beginning. # # Valid databases are: aliases, ethers, group, gshadow, hosts, # initgroups, netgroup, networks, passwd, protocols, publickey, # rpc, services, and shadow. # # Valid service provider entries include (in alphabetical order): # # compat Use /etc files plus *_compat pseudo-db # db Use the pre-processed /var/db files # dns Use DNS (Domain Name Service) # files Use the local files in /etc # hesiod Use Hesiod (DNS) for user lookups # nis Use NIS (NIS version 2), also called YP # nisplus Use NIS+ (NIS version 3) # # See `info libc 'NSS Basics'` for more information. # # Commonly used alternative service providers (may need installation): # # ldap Use LDAP directory server # myhostname Use systemd host names # mymachines Use systemd machine names # mdns*, mdns*_minimal Use Avahi mDNS/DNS-SD # resolve Use systemd resolved resolver # sss Use System Security Services Daemon (sssd) # systemd Use systemd for dynamic user option # winbind Use Samba winbind support # wins Use Samba wins support # wrapper Use wrapper module for testing # # Notes: # # 'sssd' performs its own 'files'-based caching, so it should generally # come before 'files'. # # WARNING: Running nscd with a secondary caching service like sssd may # lead to unexpected behaviour, especially with how long # entries are cached. # # Installation instructions: # # To use 'db', install the appropriate package(s) (provide 'makedb' and # libnss_db.so.*), and place the 'db' in front of 'files' for entries # you want to be looked up first in the databases, like this: # # passwd: db files # shadow: db files # group: db files # In order of likelihood of use to accelerate lookup. shadow: files sss hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname aliases: files ethers: files gshadow: files # Allow initgroups to default to the setting for group. # initgroups: files networks: files dns protocols: files publickey: files rpc: files
Hi everyone. I'm sorry for the delay. We have changed the autofs responder in sssd-2.2.3 quite a lot (available since F31 from March this year) that should improve the situation. We are also working with autofs maintainer to improve the communication between autofs and sssd further so this no longer happens. In the mean time, you can try setting sss_master_map_wait in autofs.conf to some non-zero value. Please, let me know if this helps and if there are further issues.
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Pushed PR: https://github.com/SSSD/sssd/pull/5343 * `master` * 075519bceca7a8f4fa28a0b7c538f2f50d552d13 - configure: check for stdatomic.h * 8a22d4ad45f5fc8e888be693539495093c2b3c35 - autofs: correlate errors for different protocol versions * 34c519a4851194164befc150df8e768431e66405 - autofs: disable fast reply * 9098108a7142513fa04afdf92a2c1b3ac002c56e - autofs: translate ERR_OFFLINE to EHOSTDOWN * e50258da70b67ff1b0f928e2e7875bc2fa32dfde - autofs: return ERR_OFFLINE if we fail to get information from backend and cache is empty * 3f0ba4c2dcf9126b0f94bca4a056b516759d25c1 - cache_req: allow cache_req to return ERR_OFFLINE if all dp request failed These patches will help autofs to distinguish between "sssd is offline" and "sssd did not found any autofs maps". Note that it also requires update in autofs which will land separately. Until then, please use "sss_master_map_wait" autofs option.
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle. Changing version to 34.
Fixed since sssd-2.4.1