RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2165143 - Autofs reports can't connect to sssd, retry for 10 seconds when real problem is empty LDAP object
Summary: Autofs reports can't connect to sssd, retry for 10 seconds when real problem ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: autofs
Version: 8.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ian Kent
QA Contact: Kun Wang
URL:
Whiteboard:
Depends On:
Blocks: 2170287
TreeView+ depends on / blocked
 
Reported: 2023-01-27 21:20 UTC by Bill Marshall
Modified: 2023-05-16 11:06 UTC (History)
3 users (show)

Fixed In Version: autofs-5.1.4-93.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2170287 (view as bug list)
Environment:
Last Closed: 2023-05-16 09:05:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-146785 0 None None None 2023-01-27 21:21:22 UTC
Red Hat Product Errata RHBA-2023:2970 0 None None None 2023-05-16 09:05:59 UTC

Description Bill Marshall 2023-01-27 21:20:42 UTC
Description of problem:
Customer is using autofs LDAP mappings stored in AD LDAP They have multiple sites, multiple environments (prod,preprod,dev,etc.) and we believed each environment had its own place in LDAP.

automount -m doesn't report any mappings for /home

Only thing visible in logs:
`automount[2400]: can't connect to sssd, retry for 10 seconds`
 
In reality, sss was fine, connection to LDAP was fine, but there was no data returned from LDAP. sss<>autofs didn’t communicate the actual details or the problem.

Version-Release number of selected component (if applicable):
autofs-5.1.4-83.el8.x86_64
sssd-2.7.3-4.el8_7.1.x86_64

How reproducible:
Once the environment is setup, easy to reproduce. We could fix/break/fix by changing the base DN in sssd.conf

Steps to Reproduce:
1. configure sssd against AD
2. set the ldap_autofs_search_base in sssd.conf to point at an empty space
3. restart sssd, autofs as needed

Actual results:
automount[2400]: can't connect to sssd, retry for 10 seconds


Expected results:
A useful error message that sssd is fine, but no data is available at the ldap_autofs_search_base 

Additional info:
strace of autofs during failure:


[pid 17986] socket(AF_UNIX, SOCK_STREAM, 0) = 4
[pid 17986] fcntl(4, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 17986] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 17986] fcntl(4, F_GETFD)           = 0
[pid 17986] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 17986] connect(4, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/autofs"}, 110) = 0
[pid 17986] fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
[pid 17986] poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
[pid 17986] sendto(4, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
[pid 17986] poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
[pid 17986] sendto(4, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
[pid 17986] poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
[pid 17986] read(4, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
[pid 17986] poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
[pid 17986] read(4, "\1\0\0\0", 4)      = 4
[pid 17986] poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
[pid 17986] sendto(4, "\34\0\0\0\321\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16
[pid 17986] poll([{fd=4, events=POLLOUT}], 1, 300000) = 1 ([{fd=4, revents=POLLOUT}])
[pid 17986] sendto(4, "auto.master\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
[pid 17986] poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
[pid 17986] read(4, "\30\0\0\0\321\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
[pid 17986] poll([{fd=4, events=POLLIN}], 1, 300000) = 1 ([{fd=4, revents=POLLIN}])
[pid 17986] read(4, "\0\0\0\0\0\0\0\0", 8) = 8
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43
[pid 17986] write(2, "\n", 1

Comment 1 Ian Kent 2023-01-29 01:21:25 UTC
Yes, it does look like the handling of this failure case isn't right.

Could you please provide an autofs debug log including the start up of autofs
and an example of reproducing the problem.

Possibly the easiest way to do that is to set "logging = debug" in /etc/autofs.conf
and use "journalctl -f | tee autofs.log".

Comment 2 Ian Kent 2023-01-30 03:04:08 UTC
(In reply to Bill Marshall from comment #0)
> Description of problem:
> Customer is using autofs LDAP mappings stored in AD LDAP They have multiple
> sites, multiple environments (prod,preprod,dev,etc.) and we believed each
> environment had its own place in LDAP.
> 
> automount -m doesn't report any mappings for /home
> 
> Only thing visible in logs:
> `automount[2400]: can't connect to sssd, retry for 10 seconds`
>  
> In reality, sss was fine, connection to LDAP was fine, but there was no data
> returned from LDAP. sss<>autofs didn’t communicate the actual details or the
> problem.

I may have duplicated this based on your problem description.
I would still like the debug log if you can get it please.

> 
> Version-Release number of selected component (if applicable):
> autofs-5.1.4-83.el8.x86_64
> sssd-2.7.3-4.el8_7.1.x86_64

It looks like we have all the current fixes for sss communication here, that's
good.

> 
> How reproducible:
> Once the environment is setup, easy to reproduce. We could fix/break/fix by
> changing the base DN in sssd.conf
> 
> Steps to Reproduce:
> 1. configure sssd against AD
> 2. set the ldap_autofs_search_base in sssd.conf to point at an empty space
> 3. restart sssd, autofs as needed
> 
> Actual results:
> automount[2400]: can't connect to sssd, retry for 10 seconds
> 
> 
> Expected results:
> A useful error message that sssd is fine, but no data is available at the
> ldap_autofs_search_base 

autofs doesn't know anything about the sss configuration so the best I can
do is try and work out what the actual error condition is.

The difficulty identifying the actual failure cases so that autofs can take
sensible action and issue sensible messages has frequently been a challenge.

If I have duplicated this correctly those messages "can't connect to sssd,
 retry for 10 seconds" don't actually sleep for 10 seconds before continuing,
right?

Comment 4 Bill Marshall 2023-01-30 14:45:48 UTC
Alexander from sss is copied on this bug, so he can help w/ the autofs<>sss communication.

Re: lack of 10 sec sleep.

I do see repeated groups of 3 in messages:
2023-01-23T16:00:32.803144+00:00 host automount[483501]: can't connect to sssd, retry for 10 seconds
2023-01-23T16:00:32.831333+00:00 host automount[483501]: can't connect to sssd, retry for 10 seconds
2023-01-23T16:00:32.857864+00:00 host automount[483501]: can't connect to sssd, retry for 10 seconds
2023-01-23T16:03:57.231849+00:00 host automount[484120]: can't connect to sssd, retry for 10 seconds
2023-01-23T16:03:57.249833+00:00 host automount[484120]: can't connect to sssd, retry for 10 seconds
2023-01-23T16:03:57.275312+00:00 host automount[484120]: can't connect to sssd, retry for 10 seconds
2023-01-23T19:01:47.758438+00:00 host automount[505339]: can't connect to sssd, retry for 10 seconds
2023-01-23T19:01:47.791376+00:00 host automount[505339]: can't connect to sssd, retry for 10 seconds
2023-01-23T19:01:47.813129+00:00 host automount[505339]: can't connect to sssd, retry for 10 seconds
2023-01-23T19:15:16.102579+00:00 host automount[507408]: can't connect to sssd, retry for 10 seconds
2023-01-23T19:15:16.146749+00:00 host automount[507408]: can't connect to sssd, retry for 10 seconds
2023-01-23T19:15:16.164480+00:00 host automount[507408]: can't connect to sssd, retry for 10 seconds

Also in the shorter strace we ran:
% grep retry strace-automount1.txt
read(3, " nfs_retry_interval, nfs_retry_i"..., 4096) = 4096
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43

As for autofs in debug mode. The system we were working on is no longer available. I can see if a test environment can be setup.

Otherwise, I realized I had an strace of a debug session. If this useful? I can likely attach the whole strace if I sanitize it.

% grep write  strace-automount1.txt | grep -v '\\n'
write(2, "Starting automounter version 5.1"..., 65Starting automounter version 5.1.4-83.el8, master map auto.master) = 65
write(2, "using kernel protocol version 5."..., 34using kernel protocol version 5.05) = 34
[pid 17986] write(2, "lookup_nss_read_master: reading "..., 54lookup_nss_read_master: reading master sss auto.master) = 54
[pid 17986] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43
[pid 17986] write(2, "lookup_nss_read_master: auto.mas"..., 69lookup_nss_read_master: auto.master not found, replacing '.' with '_') = 69
[pid 17986] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43
[pid 17986] write(2, "lookup_nss_read_master: no map -"..., 58lookup_nss_read_master: no map - continuing to next source) = 58
[pid 17986] write(2, "lookup_nss_read_master: reading "..., 56lookup_nss_read_master: reading master files auto.master) = 56
[pid 17986] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17986] write(2, "lookup_read_master: lookup(file)"..., 50lookup_read_master: lookup(file): read entry /misc) = 50
[pid 17986] write(2, "lookup_read_master: lookup(file)"..., 49lookup_read_master: lookup(file): read entry /net) = 49
[pid 17986] write(2, "lookup_read_master: lookup(file)"..., 68lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d) = 68
[pid 17986] write(2, "lookup_nss_read_master: reading "..., 61lookup_nss_read_master: reading master dir /etc/auto.master.d) = 61
[pid 17986] write(2, "lookup_read_master: lookup(dir):"..., 60lookup_read_master: lookup(dir): scandir: /etc/auto.master.d) = 60
[pid 17986] write(2, "lookup_read_master: lookup(file)"..., 57lookup_read_master: lookup(file): read entry +auto.master) = 57
[pid 17986] write(2, "lookup_nss_read_master: reading "..., 54lookup_nss_read_master: reading master sss auto.master) = 54
[pid 17986] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect to sssd, retry for 10 seconds) = 43
[pid 17986] write(2, "lookup_nss_read_master: no map -"..., 58lookup_nss_read_master: no map - continuing to next source) = 58
[pid 17986] write(2, "lookup_nss_read_master: reading "..., 56lookup_nss_read_master: reading master files auto.master) = 56
[pid 17986] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17986] write(2, "master_do_mount: mounting /misc", 31master_do_mount: mounting /misc) = 31
[pid 17996] write(2, "automount_path_to_fifo: fifo nam"..., 55automount_path_to_fifo: fifo name /run/autofs.fifo-misc) = 55
[pid 17996] write(2, "lookup_nss_read_map: reading map"..., 52lookup_nss_read_map: reading map file /etc/auto.misc) = 52
[pid 17996] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17996] write(2, "remount_active_mount: trying to "..., 57remount_active_mount: trying to re-connect to mount /misc) = 57
[pid 17996] write(2, "mounted indirect on /misc with t"..., 57mounted indirect on /misc with timeout 30, freq 8 seconds) = 57
[pid 17996] write(2, "remount_active_mount: re-connect"..., 49remount_active_mount: re-connected to mount /misc) = 49
[pid 17996] write(2, "st_ready: st_ready(): state = 0 "..., 42st_ready: st_ready(): state = 0 path /misc) = 42
[pid 17986] write(2, "master_do_mount: mounting /net", 30master_do_mount: mounting /net) = 30
[pid 17999] write(2, "automount_path_to_fifo: fifo nam"..., 54automount_path_to_fifo: fifo name /run/autofs.fifo-net) = 54
[pid 17999] write(2, "lookup_nss_read_map: reading map"..., 45lookup_nss_read_map: reading map hosts (null)) = 45
[pid 17999] write(2, "do_init: parse(sun): init gather"..., 57do_init: parse(sun): init gathered global options: (null)) = 57
[pid 17999] write(2, "lookup_read_map: lookup(hosts): "..., 46lookup_read_map: lookup(hosts): read hosts map) = 46
[pid 17999] write(2, "lookup_read_map: lookup(hosts): "..., 84lookup_read_map: lookup(hosts): map not browsable, update existing host entries only) = 84
[pid 17999] write(2, "remount_active_mount: trying to "..., 56remount_active_mount: trying to re-connect to mount /net) = 56
[pid 17999] write(2, "mounted indirect on /net with ti"..., 56mounted indirect on /net with timeout 30, freq 8 seconds) = 56
[pid 17999] write(2, "remount_active_mount: re-connect"..., 48remount_active_mount: re-connected to mount /net) = 48
[pid 17999] write(2, "st_ready: st_ready(): state = 0 "..., 41st_ready: st_ready(): state = 0 path /net) = 41

Comment 5 Alexander Bokovoy 2023-01-30 16:06:55 UTC
Add Alexey to give a help hand from SSSD side.

Comment 6 Ian Kent 2023-01-31 00:13:36 UTC
(In reply to Bill Marshall from comment #4)
> Alexander from sss is copied on this bug, so he can help w/ the autofs<>sss
> communication.
> 
> Re: lack of 10 sec sleep.
> 
> I do see repeated groups of 3 in messages:
> 2023-01-23T16:00:32.803144+00:00 host automount[483501]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T16:00:32.831333+00:00 host automount[483501]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T16:00:32.857864+00:00 host automount[483501]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T16:03:57.231849+00:00 host automount[484120]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T16:03:57.249833+00:00 host automount[484120]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T16:03:57.275312+00:00 host automount[484120]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T19:01:47.758438+00:00 host automount[505339]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T19:01:47.791376+00:00 host automount[505339]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T19:01:47.813129+00:00 host automount[505339]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T19:15:16.102579+00:00 host automount[507408]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T19:15:16.146749+00:00 host automount[507408]: can't connect to
> sssd, retry for 10 seconds
> 2023-01-23T19:15:16.164480+00:00 host automount[507408]: can't connect to
> sssd, retry for 10 seconds
> 
> Also in the shorter strace we ran:
> % grep retry strace-automount1.txt
> read(3, " nfs_retry_interval, nfs_retry_i"..., 4096) = 4096
> [pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect
> to sssd, retry for 10 seconds) = 43
> [pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect
> to sssd, retry for 10 seconds) = 43
> [pid 17986] write(2, "can't connect to sssd, retry for"..., 43can't connect
> to sssd, retry for 10 seconds) = 43
> 
> As for autofs in debug mode. The system we were working on is no longer
> available. I can see if a test environment can be setup.

Ok, probably not a problem now since I'm pretty sure I am reproducing what
your seeing.

> 
> Otherwise, I realized I had an strace of a debug session. If this useful? I
> can likely attach the whole strace if I sanitize it.

Unfortunately I use strace only for specific things and this isn't one
of those cases so it's not that useful.

I'll continue looking at what I have, at the very least there looks to
be an ordering problem that adds an additional call as well as the error
handling problem.


Ian

Comment 7 Alexey Tikhonov 2023-01-31 14:17:54 UTC
(In reply to Alexander Bokovoy from comment #5)
> Add Alexey to give a help hand from SSSD side.

It looks like Ian already nailed an issue. But please let me know if anything required from SSSD side.

Comment 15 errata-xmlrpc 2023-05-16 09:05:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (autofs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2970


Note You need to log in before you can comment on or make changes to this bug.