Bug 1648521 - Regression: sssd sssd-2.0.0-4 slows down autofs - 'getautomntent_r: Invalid argument'
Summary: Regression: sssd sssd-2.0.0-4 slows down autofs - 'getautomntent_r: Invalid a...
Keywords:
Status: POST
Alias: None
Product: Fedora
Classification: Fedora
Component: sssd
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pavel Březina
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1560290 (view as bug list)
Depends On:
Blocks: 1803075
TreeView+ depends on / blocked
 
Reported: 2018-11-10 09:23 UTC by James
Modified: 2021-02-09 15:07 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1803075 (view as bug list)
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
vagrant script for reproducer (14.99 KB, text/plain)
2020-06-18 15:43 UTC, rob.verduijn
no flags Details

Description James 2018-11-10 09:23:45 UTC
Description of problem:
After upgrading to sssd-2.0.0-4.fc29.x86_64, autofs-5.1.4-21.fc29.x86_64 seems to have trouble finding its mappings as quickly as it did. Immediately after logging in, if I do
 
  $ ls /nas/home/james  (autofs NFSv4 mount)

I get a 'no such file or directory' error. However if I wait a few minutes and try again, it works. The only clue seems to be an error message about "getautomntent_r: Invalid argument" in the logs for autofs:


Nov 10 09:13:48 kowalski.cb.ettle systemd[1]: Starting Automounts filesystems on demand...
Nov 10 09:15:18 kowalski.cb.ettle automount[1192]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
Nov 10 09:15:18 kowalski.cb.ettle systemd[1]: Started Automounts filesystems on demand.


From journalctl -u sssd in this boot:


Nov 10 09:13:43 kowalski.cb.ettle systemd[1]: Starting System Security Services Daemon...
Nov 10 09:13:43 kowalski.cb.ettle sssd[682]: Starting up
Nov 10 09:13:43 kowalski.cb.ettle sssd[be[implicit_files]][751]: Starting up
Nov 10 09:13:43 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[nss][753]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[autofs][757]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[pac][758]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[sudo][754]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[ssh][756]: Starting up
Nov 10 09:13:44 kowalski.cb.ettle sssd[pam][755]: Starting up
Nov 10 09:13:45 kowalski.cb.ettle systemd[1]: Started System Security Services Daemon.
Nov 10 09:13:46 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Backend is offline
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 2
Nov 10 09:13:49 kowalski.cb.ettle sssd[be[cb.ettle]][752]: Backend is online
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 1
Nov 10 09:13:49 kowalski.cb.ettle sssd_be[752]: GSSAPI client step 2


Works OK with no delay if I downgrade to sssd-2.0.0-3.fc29.x86_64.

Comment 1 Lukas Slebodnik 2018-11-10 15:39:25 UTC
Some delays are reported also in BZ1644919. Jakub promised to prepare a scratch builds but I cannot see it in comments.

James. it's just FYI and bugs can be unrelated. But might be good to know anbout other bug.

And I am sorry for troubles.

Comment 2 James 2018-11-10 17:32:15 UTC
Note that this is not consistent. I have one workstation that does it repeatedly, but two others that do not. No idea what the difference is.

Comment 3 James 2018-12-18 23:32:20 UTC
Anything further on this? Or debug tips so I can gather more information?

Comment 4 James 2018-12-18 23:33:18 UTC
*** Bug 1560290 has been marked as a duplicate of this bug. ***

Comment 5 Larry O'Leary 2019-05-12 21:16:01 UTC
Any progress on this? 

This issue is preventing autofs from working. I downgraded autofs as described in comment 0 and confirmed that in the previous version things were working. sss failures still are reported but they are reported much faster and therefore autofs eventually starts.

For now, I have removed sss from nsswitch.conf's automount option as a solution as I am not using any enterprise auto mounts (that I know of).

Comment 6 James 2019-05-12 21:30:52 UTC
Still seen in sssd-2.1.0-2.fc30.x86_64, autofs-5.1.5-3.fc30.x86_64.

Currently using workaround described above -- i.e., don't configure FreeIPA automounts and write the relevant /etc/auto.whatever bits by hand.

Comment 7 W. Michael Petullo 2019-08-28 00:56:19 UTC
I also use autofs with maps defined in LDAP.

If I select sss as the NSS provider, then I see the following in my logs:

lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
autofs.service: Start operation timed out. Terminating.

Running "sss_cache -E" allows autofs to start properly. However, things revert back to the error condition above after a reboot.

In order to allow things to work in a persistent way that survives reboots, I have replaced "sss" with "ldap" as my NSS provider for autofs in /etc/nsswitch.conf.

I would prefer to consistently use sss.

Comment 8 John Beranek 2019-11-15 23:32:15 UTC
FWIW, I've started seeing this on an Oracle Linux 8 server after updating to 8.1:

$ rpm -qa |grep -E "(autofs|sssd)" |sort
autofs-5.1.4-35.0.1.el8.x86_64
libsss_autofs-2.2.0-19.el8.x86_64
sssd-ad-2.2.0-19.el8.x86_64
sssd-client-2.2.0-19.el8.x86_64
sssd-common-2.2.0-19.el8.x86_64
sssd-common-pac-2.2.0-19.el8.x86_64
sssd-kcm-2.2.0-19.el8.x86_64
sssd-krb5-2.2.0-19.el8.x86_64
sssd-krb5-common-2.2.0-19.el8.x86_64

Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Paths.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Timers.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Listening on D-Bus User Message Bus Socket.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Sockets.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Basic System.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Reached target Default.
Nov 15 23:19:02 paaitstest01 systemd[1801]: Startup finished in 46ms.
Nov 15 23:19:02 paaitstest01 systemd[1]: Started User Manager for UID 48517.
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup(file): failed to read included master map auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master files auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry /misc
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry /net
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master dir /etc/auto.master.d
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(dir): scandir: /etc/auto.master.d
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_master: lookup(file): read entry +auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master files auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_master: reading master sss auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup(file): failed to read included master map auto.master
Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /misc
Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-misc
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map file /etc/auto.misc
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /misc with timeout 300, freq 75 seconds
Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /misc
Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /net
Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-net
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map hosts (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: lookup(hosts): read hosts map
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: lookup(hosts): map not browsable, update existing host entries only
Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /net with timeout 300, freq 75 seconds
Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /net
Nov 15 23:20:22 paaitstest01 automount[1333]: master_do_mount: mounting /nethome
Nov 15 23:20:22 paaitstest01 automount[1333]: automount_path_to_fifo: fifo name /run/autofs.fifo-nethome
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map files auto.home
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_nss_read_map: reading map sss auto.home
Nov 15 23:20:22 paaitstest01 automount[1333]: do_init: parse(sun): init gathered global options: (null)
Nov 15 23:20:22 paaitstest01 automount[1333]: lookup_read_map: map read not needed, so not done
Nov 15 23:20:22 paaitstest01 automount[1333]: mounted indirect on /nethome with timeout 300, freq 75 seconds
Nov 15 23:20:22 paaitstest01 automount[1333]: st_ready: st_ready(): state = 0 path /nethome

A 1m20s delay in starting autofs, and therefore completing multi-user.target

Comment 9 John Beranek 2019-11-15 23:45:36 UTC
As suggested by James, it doesn't happen for all the servers with the same configuration consistently.

Just after boot I can see this:

$ sudo systemctl list-jobs
JOB UNIT                                 TYPE  STATE
258 systemd-update-utmp-runlevel.service start waiting
239 autofs.service                       start running
120 multi-user.target                    start waiting

Comment 10 John Beranek 2019-11-15 23:47:45 UTC
By comparison RHEL/OL 8 has the following packages, and I've not seen the problem there:

$ rpm -qa |grep -E "(autofs|sssd)"|sort
autofs-5.1.4-29.0.1.el8.x86_64
libsss_autofs-2.0.0-43.el8_0.3.x86_64
sssd-ad-2.0.0-43.el8_0.3.x86_64
sssd-client-2.0.0-43.el8_0.3.x86_64
sssd-common-2.0.0-43.el8_0.3.x86_64
sssd-common-pac-2.0.0-43.el8_0.3.x86_64
sssd-kcm-2.0.0-43.el8_0.3.x86_64
sssd-krb5-2.0.0-43.el8_0.3.x86_64
sssd-krb5-common-2.0.0-43.el8_0.3.x86_64

Comment 11 John Beranek 2019-11-15 23:51:36 UTC
8.0 that is!

Comment 12 W. Michael Petullo 2019-12-20 00:08:57 UTC
This problem seems to remain in Fedora 31.

Comment 13 Johannes Beichter 2020-01-11 06:55:16 UTC
I can confirm this problem on Fedora 31. In my case autofs service always takes 90 seconds to start (seems to be a timeout), only then automounts work:

Jan 11 07:16:39 x.y.z systemd[1]: Starting Automounts filesystems on demand...
Jan 11 07:18:09 x.y.z automount[1532]: lookup_read_master: lookup(sss): getautomntent_r: Invalid argument
Jan 11 07:18:09 x.y.z automount[1532]: setautomntent: lookup(sss): setautomntent: No such file or directory
Jan 11 07:18:09 x.y.z systemd[1]: Started Automounts filesystems on demand.

Comment 24 Rik Theys 2020-02-18 08:19:17 UTC
We are also experiencing this issue on Fedora 31 and CentOS 8.1.

The issue seems to be that sssd is not yet online when autofs starts, and this causes a delay.

This can easily be reproduced:

systemctl stop autofs.service
pkill -USR1 sssd ; sleep 5 ; pkill -USR2 sssd

In that 5s sleep, start autofs.

You will notice that autofs takes +90s to start. If you run the automount command manually in strace, there's a read call that takes 90s.

As a workaround I've set the following parameter in the [autofs] section of sssd.conf:

client_idle_timeout = 10

This reduces the autofs startup time from 90s to about 15s.

There seems to be something wrong with sssd autofs if sssd is in offline mode.

Regards,
Rik

Comment 25 Tomas Halman 2020-03-13 14:14:43 UTC
Related upstream ticket https://pagure.io/SSSD/sssd/issue/4113

Comment 26 Fedora Admin user for bugzilla script actions 2020-06-18 14:59:18 UTC
This package has changed maintainer in the Fedora.
Reassigning to the new maintainer of this component.

Comment 27 rob.verduijn 2020-06-18 15:43:12 UTC
Created attachment 1697980 [details]
vagrant script for reproducer

Comment 28 rob.verduijn 2020-06-18 15:46:20 UTC
Hello all,

I've got weird autofs/sssd errors as well.
I still have them with a centos8.2 ipa server using a fedora32 ipaclient.

I've build a reproducer in vagrant.

To use it, install vagrant on a fedora32 host.
Configure the host for kvm virtualization.

create an empty directory,
- mkdir vagrant

put the Vagrant script in that dir
cd to that dir

check the script if the DOMAIN_URI var is ok

give the command : vagrant up

wait for a long time for the script to finish

when the script is finished:
log in to the fedoraclient with vagrant:
vagrant ssh fedoraclient
become root:
sudo -i

check to see if the home directory is mounted:
mount
it will claim to be active
however a directory listing will show the home folder of the user test is missing:
ls -als /home

restart autofs:
systemctl restart autofs

and now the home dir is mounted, list the home dir:
ls -als /home 
and you will see the test user home dir.

if for some reason the mount is working the first time, run the reproducer script as root
/root/reproducer.sh
and try again after the reboot.

Cheers
Rob

Comment 29 W. Michael Petullo 2020-06-19 22:11:08 UTC
I found this fixed with Fedora 32. I use Kerberos for authentication and LDAP/NFS/autofs for home directories. Has anyone else had luck?

I configured my host with `authselect select sssd --force`.

The result, which seems to work fine, follows:

# Generated by authselect on Fri Jun 19 16:36:27 2020
# Do not modify this file manually.

# If you want to make changes to nsswitch.conf please modify
# /etc/authselect/user-nsswitch.conf and run 'authselect apply-changes'.
#
# Note that your changes may not be applied as they may be
# overwritten by selected profile. Maps set in the authselect
# profile takes always precedence and overwrites the same maps
# set in the user file. Only maps that are not set by the profile
# are applied from the user file.
#
# For example, if the profile sets:
#     passwd: sss files
# and /etc/authselect/user-nsswitch.conf contains:
#     passwd: files
#     hosts: files dns
# the resulting generated nsswitch.conf will be:
#     passwd: sss files # from profile
#     hosts: files dns  # from user file

passwd:     sss files systemd
group:      sss files systemd
netgroup:   sss files
automount:  sss files
services:   sss files

# Included from /etc/authselect/user-nsswitch.conf

#
# /etc/nsswitch.conf
#
# Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# Valid databases are: aliases, ethers, group, gshadow, hosts,
# initgroups, netgroup, networks, passwd, protocols, publickey,
# rpc, services, and shadow.
#
# Valid service provider entries include (in alphabetical order):
#
#	compat			Use /etc files plus *_compat pseudo-db
#	db			Use the pre-processed /var/db files
#	dns			Use DNS (Domain Name Service)
#	files			Use the local files in /etc
#	hesiod			Use Hesiod (DNS) for user lookups
#	nis			Use NIS (NIS version 2), also called YP
#	nisplus			Use NIS+ (NIS version 3)
#
# See `info libc 'NSS Basics'` for more information.
#
# Commonly used alternative service providers (may need installation):
#
#	ldap			Use LDAP directory server
#	myhostname		Use systemd host names
#	mymachines		Use systemd machine names
#	mdns*, mdns*_minimal	Use Avahi mDNS/DNS-SD
#	resolve			Use systemd resolved resolver
#	sss			Use System Security Services Daemon (sssd)
#	systemd			Use systemd for dynamic user option
#	winbind			Use Samba winbind support
#	wins			Use Samba wins support
#	wrapper			Use wrapper module for testing
#
# Notes:
#
# 'sssd' performs its own 'files'-based caching, so it should generally
# come before 'files'.
#
# WARNING: Running nscd with a secondary caching service like sssd may
# 	   lead to unexpected behaviour, especially with how long
# 	   entries are cached.
#
# Installation instructions:
#
# To use 'db', install the appropriate package(s) (provide 'makedb' and
# libnss_db.so.*), and place the 'db' in front of 'files' for entries
# you want to be looked up first in the databases, like this:
#
# passwd:    db files
# shadow:    db files
# group:     db files

# In order of likelihood of use to accelerate lookup.
shadow:     files sss
hosts:      files mdns4_minimal [NOTFOUND=return] dns myhostname

aliases:    files
ethers:     files
gshadow:    files
# Allow initgroups to default to the setting for group.
# initgroups: files
networks:   files dns
protocols:  files
publickey:  files
rpc:        files

Comment 30 Pavel Březina 2020-07-28 11:49:54 UTC
Hi everyone. I'm sorry for the delay. We have changed the autofs responder in sssd-2.2.3 quite a lot (available since F31 from March this year) that should improve the situation. We are also working with autofs maintainer to improve the communication between autofs and sssd further so this no longer happens.

In the mean time, you can try setting sss_master_map_wait in autofs.conf to some non-zero value. Please, let me know if this helps and if there are further issues.

Comment 32 Ben Cotton 2020-11-03 15:05:05 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 33 Pavel Březina 2020-12-04 11:00:32 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5343

* `master`
    * 075519bceca7a8f4fa28a0b7c538f2f50d552d13 - configure: check for stdatomic.h
    * 8a22d4ad45f5fc8e888be693539495093c2b3c35 - autofs: correlate errors for different protocol versions
    * 34c519a4851194164befc150df8e768431e66405 - autofs: disable fast reply
    * 9098108a7142513fa04afdf92a2c1b3ac002c56e - autofs: translate ERR_OFFLINE to EHOSTDOWN
    * e50258da70b67ff1b0f928e2e7875bc2fa32dfde - autofs: return ERR_OFFLINE if we fail to get information from backend and cache is empty
    * 3f0ba4c2dcf9126b0f94bca4a056b516759d25c1 - cache_req: allow cache_req to return ERR_OFFLINE if all dp request failed

These patches will help autofs to distinguish between "sssd is offline" and "sssd did not found any autofs maps". Note that it also requires update in autofs which will land separately. Until then, please use "sss_master_map_wait" autofs option.

Comment 34 Ben Cotton 2021-02-09 15:07:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.


Note You need to log in before you can comment on or make changes to this bug.