Bug 1648521 - Regression: sssd sssd-2.0.0-4 slows down autofs - 'getautomntent_r: Invalid argument'
Summary: Regression: sssd sssd-2.0.0-4 slows down autofs - 'getautomntent_r: Invalid a...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: sssd
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Zidek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1560290 (view as bug list)
Depends On:
Blocks: 1803075
TreeView+ depends on / blocked
 
Reported: 2018-11-10 09:23 UTC by James Ettle
Modified: 2020-03-13 14:14 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1803075 (view as bug list)
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description James Ettle 2018-11-10 09:23:45 UTC
Description of problem:
After upgrading to sssd-2.0.0-4.fc29.x86_64, autofs-5.1.4-21.fc29.x86_64 seems to have trouble finding its mappings as quickly as it did. Immediately after logging in, if I do
 
  $ ls /nas/home/james  (autofs NFSv4 mount)

I get a 'no such file or directory' error. However if I wait a few minutes and try again, it works. The only clue seems to be an error message about "getautomntent_r: Invalid argument" in the logs for autofs:


Nov 10 09:13:48 kowalski.cb.ettle systemd[1]: Starting Automounts filesystems on demand...
Nov 10 09:15:18 kowalski.cb.ettle automount[1192]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
Nov 10 09:15:18 kowalski.cb.ettle systemd[1]: Started Automounts filesystems on demand.


From journalctl -u sssd in this boot:


Nov 10 09:13:43 kowalski.cb.ettle systemd[1]: Starting System Security Services Daemon...
Nov 10 09:13:43 kowalski.cb.ettle sssd[682]: Starting up
Nov 10 09:13:43 kowalski.cb.ettle sssd[be[implicit_files]][751]: Starting up
Nov 10 09:13:43 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[nss][753]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[autofs][757]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[pac][758]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[sudo][754]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[ssh][756]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[pam][755]: Starting up
Nov 10 09:13:45 kowalski.cb.ettle systemd[1]: Started System Security Services Daemon.
Nov 10 09:13:46 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Backend is offline
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 2
Nov 10 09:13:49 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Backend is online
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 2


Works OK with no delay if I downgrade to sssd-2.0.0-3.fc29.x86_64.

Comment 1 Lukas Slebodnik 2018-11-10 15:39:25 UTC
Some delays are reported also in BZ1644919. Jakub promised to prepare a scratch builds but I cannot see it in comments.

James. it's just FYI and bugs can be unrelated. But might be good to know anbout other bug.

And I am sorry for troubles.

Comment 2 James Ettle 2018-11-10 17:32:15 UTC
Note that this is not consistent. I have one workstation that does it repeatedly, but two others that do not. No idea what the difference is.

Comment 3 James Ettle 2018-12-18 23:32:20 UTC
Anything further on this? Or debug tips so I can gather more information?

Comment 4 James Ettle 2018-12-18 23:33:18 UTC
*** Bug 1560290 has been marked as a duplicate of this bug. ***

Comment 5 Larry O'Leary 2019-05-12 21:16:01 UTC
Any progress on this? 

This issue is preventing autofs from working. I downgraded autofs as described in comment 0 and confirmed that in the previous version things were working. sss failures still are reported but they are reported much faster and therefore autofs eventually starts.

For now, I have removed sss from nsswitch.conf's automount option as a solution as I am not using any enterprise auto mounts (that I know of).

Comment 6 James Ettle 2019-05-12 21:30:52 UTC
Still seen in sssd-2.1.0-2.fc30.x86_64, autofs-5.1.5-3.fc30.x86_64.

Currently using workaround described above -- i.e., don't configure FreeIPA automounts and write the relevant /etc/auto.whatever bits by hand.

Comment 7 W. Michael Petullo 2019-08-28 00:56:19 UTC
I also use autofs with maps defined in LDAP.

If I select sss as the NSS provider, then I see the following in my logs:

lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
autofs.service: Start operation timed out. Terminating.

Running "sss_cache -E" allows autofs to start properly. However, things revert back to the error condition above after a reboot.

In order to allow things to work in a persistent way that survives reboots, I have replaced "sss" with "ldap" as my NSS provider for autofs in /etc/nsswitch.conf.

I would prefer to consistently use sss.

Comment 8 John Beranek 2019-11-15 23:32:15 UTC
FWIW, I've started seeing this on an Oracle Linux 8 server after updating to 8.1:

$ rpm -qa |grep -E "(autofs|sssd)" |sort
autofs-5.1.4-35.0.1.el8.x86_64
libsss_autofs-2.2.0-19.el8.x86_64
sssd-ad-2.2.0-19.el8.x86_64
sssd-client-2.2.0-19.el8.x86_64
sssd-common-2.2.0-19.el8.x86_64
sssd-common-pac-2.2.0-19.el8.x86_64
sssd-kcm-2.2.0-19.el8.x86_64
sssd-krb5-2.2.0-19.el8.x86_64
sssd-krb5-common-2.2.0-19.el8.x86_64

Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Paths.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Timers.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Listening on D-Bus User Message Bus Socket.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Sockets.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Basic System.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Default.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Startup finished in 46ms.
Nov 15 23:19:02 paaitstest01 systemd[1]: Started User Manager for UID 48517.
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup(file): failed to read included master map auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master files auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry /misc
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry /net
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master dir /etc/auto.master.d
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(dir): scandir: /etc/auto.master.d
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry +auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master files auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master sss auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup(file): failed to read included master map auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /misc
Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-misc
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map file /etc/auto.misc
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /misc with timeout 300, freq 75 seconds
Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /misc
Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /net
Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-net
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map hosts (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: lookup(hosts): read hosts map
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: lookup(hosts): map not browsable, update existing host entries only
Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /net with timeout 300, freq 75 seconds
Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /net
Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /nethome
Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-nethome
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map files auto.home
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map sss auto.home
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: map read not needed, so not done
Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /nethome with timeout 300, freq 75 seconds
Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /nethome

A 1m20s delay in starting autofs, and therefore completing multi-user.target

Comment 9 John Beranek 2019-11-15 23:45:36 UTC
As suggested by James, it doesn't happen for all the servers with the same configuration consistently.

Just after boot I can see this:

$ sudo systemctl list-jobs
JOB UNIT                                 TYPE  STATE
258 systemd-update-utmp-runlevel.service start waiting
239 autofs.service                       start running
120 multi-user.target                    start waiting

Comment 10 John Beranek 2019-11-15 23:47:45 UTC
By comparison RHEL/OL 8 has the following packages, and I've not seen the problem there:

$ rpm -qa |grep -E "(autofs|sssd)"|sort
autofs-5.1.4-29.0.1.el8.x86_64
libsss_autofs-2.0.0-43.el8_0.3.x86_64
sssd-ad-2.0.0-43.el8_0.3.x86_64
sssd-client-2.0.0-43.el8_0.3.x86_64
sssd-common-2.0.0-43.el8_0.3.x86_64
sssd-common-pac-2.0.0-43.el8_0.3.x86_64
sssd-kcm-2.0.0-43.el8_0.3.x86_64
sssd-krb5-2.0.0-43.el8_0.3.x86_64
sssd-krb5-common-2.0.0-43.el8_0.3.x86_64

Comment 11 John Beranek 2019-11-15 23:51:36 UTC
8.0 that is!

Comment 12 W. Michael Petullo 2019-12-20 00:08:57 UTC
This problem seems to remain in Fedora 31.

Comment 13 Johannes Beichter 2020-01-11 06:55:16 UTC
I can confirm this problem on Fedora 31. In my case autofs service always takes 90 seconds to start (seems to be a timeout), only then automounts work:

Jan 11 07:16:39 x.y.z systemd[1]: Starting Automounts filesystems on demand...
Jan 11 07:18:09 x.y.z automount[1532]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
Jan 11 07:18:09 x.y.z automount[1532]: setautomntent: lookup(sss): setautomntent: No such file or directory
Jan 11 07:18:09 x.y.z systemd[1]: Started Automounts filesystems on demand.

Comment 24 Rik Theys 2020-02-18 08:19:17 UTC
We are also experiencing this issue on Fedora 31 and CentOS 8.1.

The issue seems to be that sssd is not yet online when autofs starts, and this causes a delay.

This can easily be reproduced:

systemctl stop autofs.service
pkill -USR1 sssd ; sleep 5 ; pkill -USR2 sssd

In that 5s sleep, start autofs.

You will notice that autofs takes +90s to start. If you run the automount command manually in strace, there's a read call that takes 90s.

As a workaround I've set the following parameter in the [autofs] section of sssd.conf:

client_idle_timeout = 10

This reduces the autofs startup time from 90s to about 15s.

There seems to be something wrong with sssd autofs if sssd is in offline mode.

Regards,
Rik

Comment 25 Tomas Halman 2020-03-13 14:14:43 UTC
Related upstream ticket https://pagure.io/SSSD/sssd/issue/4113


Note You need to log in before you can comment on or make changes to this bug.