2084108 – autofs is slow to mount when doing lookups returns multiple entries

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2084108 - autofs is slow to mount when doing lookups returns multiple entries

Summary: autofs is slow to mount when doing lookups returns multiple entries

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	autofs
Sub Component:
Version:	9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ian Kent
QA Contact:	Kun Wang
Docs Contact:
URL:
Whiteboard:
Depends On:	2009953
Blocks:
TreeView+	depends on / blocked

Reported:	2022-05-11 12:29 UTC by Ian Kent
Modified:	2022-11-15 13:03 UTC (History)
CC List:	3 users (show)
Fixed In Version:	autofs-5.1.7-30.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2009953
Environment:
Last Closed:	2022-11-15 11:16:17 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-121718	0	None	None	None	2022-05-11 12:38:41 UTC
Red Hat Product Errata	RHBA-2022:8311	0	None	None	None	2022-11-15 11:16:20 UTC

Description Ian Kent 2022-05-11 12:29:59 UTC

+++ This bug was initially created as a clone of Bug #2009953 +++

Description of problem:
autofs takes 5sec or more to mount filesystems when looking returns multiple addresses (e.g. both IPv4 & IPv6).

When the host name is added as a single entry in /etc/hosts it is almost instantaneous.  If two entries are added it slows.

Host is local and other programs (ssh, etc) respond almost immediately.

Version-Release number of selected component (if applicable):
autofs-5.1.7-17.fc34.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Set up automount
2. Add single entry in /etc/hosts and try mount - note time
3. Add multiple entries in /etc/hosts and try mount - note time

Actual results:
Single entry mounts almost instantly, multiple entries take some time to complete mount.

Expected results:
Single and multiple entries should mount in approximately the same time, almost instantly.

Additional info:
The only option that seems to relate, "use_hostname_for_mounts" seems to have no effect on the issue.

--- Additional comment from Ian Kent on 2021-10-04 00:00:25 UTC ---

Provide a debug log that demonstrates the difference.

I could take all the entries in my host file out and check but I doubt
it would make a difference.

You don't make it clear if it's purely the presence of entries in the
hosts file or the lookup of a different host to mount a differnt mount?

--- Additional comment from Frank Crawford on 2021-10-04 01:07:23 UTC ---

@ikent Originally I noticed it from DNS lookups which returned 2 entries, IPv4 and IPv6, but have duplicated just with /etc/hosts entries.

I have the following two lines in my /etc/hosts

fdd2:7aad:d478:1::cb10:cc01	bits.crawford.emu.id.au bits
#203.16.204.1	bits.crawford.emu.id.au bits

If I uncomment the second one, so both are available, then the problem occurs.  If I have either of them commented out then there is no issue.

--- Additional comment from Frank Crawford on 2021-10-04 01:08:55 UTC ---

Also, how do you want me to go about generating a debug log?

--- Additional comment from Ian Kent on 2021-10-04 12:24:10 UTC ---

(In reply to Frank Crawford from comment #3)
> Also, how do you want me to go about generating a debug log?

In /etc/autofs.conf ensure that you have "logging = debug" without a
comment marker next to it. Reload or restart autofs afterthe change.

You can use "journalctl -f | tee full.log" to capture to a log file.
You may want to filter the log after capture so you only retain autofs
log entries with somthing like "grep automount full.log > autofs.log".

The point of this will be to identify where the delay is occurring or
at least narrow down the search.

--- Additional comment from Frank Crawford on 2021-10-05 02:24:34 UTC ---



--- Additional comment from Frank Crawford on 2021-10-05 02:25:11 UTC ---



--- Additional comment from Frank Crawford on 2021-10-05 02:33:51 UTC ---

Thanks for your prompt response.

Okay, I can see what is going on now.  The additional item you need is the output from rpcinfo:

   program version(s) netid(s)                         service     owner
    100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
    100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
    100024  1         tcp6,udp6,tcp,udp                status      29
    100003  4,3       tcp6,tcp                         nfs         superuser
    100227  3         tcp6,tcp                         nfs_acl     superuser
    100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser

and you can see that nfs is disabled over UDP for this configuration.

So, with a single entry it doesn't do a performance test, so just takes the first entry defaulting to NFS 4 and TCP, but for multiple entries it does a test and has to time out for the UDP test (2 x 3 sec or so).

Now the only thing is that both the NFS server config and the autofs config are pretty much standard, so there is a bit of a configuration mismatch between the two.

For reference the NFS server config comes from:
nfs-utils-2.5.4-2.rc3.fc34.x86_64

While I now know what to fix, who should be making sure these configs aren't in conflict?

--- Additional comment from Ian Kent on 2021-10-05 03:06:35 UTC ---

(In reply to Frank Crawford from comment #7)
> Thanks for your prompt response.
> 
> Okay, I can see what is going on now.  The additional item you need is the
> output from rpcinfo:
> 
>    program version(s) netid(s)                         service     owner
>     100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
>     100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
>     100024  1         tcp6,udp6,tcp,udp                status      29
>     100003  4,3       tcp6,tcp                         nfs         superuser
>     100227  3         tcp6,tcp                         nfs_acl     superuser
>     100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser
> 
> and you can see that nfs is disabled over UDP for this configuration.

No surprise there, it's been that way for a while now I think.

> 
> So, with a single entry it doesn't do a performance test, so just takes the
> first entry defaulting to NFS 4 and TCP, but for multiple entries it does a
> test and has to time out for the UDP test (2 x 3 sec or so).

Yes, that's the way it's meant to work, autofs is trying to find out if the
server is available because there's more than one choice, again that's normal.

> 
> Now the only thing is that both the NFS server config and the autofs config
> are pretty much standard, so there is a bit of a configuration mismatch
> between the two.
> 
> For reference the NFS server config comes from:
> nfs-utils-2.5.4-2.rc3.fc34.x86_64
> 
> While I now know what to fix, who should be making sure these configs aren't
> in conflict?

There's no NFS config is autofs except an entry that tells autofs what the
default NFS version is for mounts, usually NFSv4.

So I don't think that's a problem.

I think what your saying, intentionally or implicitly, is that autofs should
check rpcbind as part of the selection process before probing anything.

Not a bad thought and in fact quite sensible but I'll need to think about
what I would need to do for that for a bit ...

Ian

--- Additional comment from Frank Crawford on 2021-10-05 09:34:09 UTC ---

Yes, the only configuration I can find is on the NFS server end to enable UDP (which I haven't yet tried to turn on).

The only other point is for the issue to be documented somewhere (which is one reason I originally opened this ticket), since I saw a few comments about it in my search but no real reason for what the issue is.

As for a fix in autofs, that is up to you, and while I think checking rpcbind is good, I don't know how difficult it would be and what other issues might occur.  Certainly, I expect the issue to become more and more common as IPv6 appears more often.

--- Additional comment from Ian Kent on 2021-10-06 00:55:49 UTC ---

(In reply to Frank Crawford from comment #9)
> Yes, the only configuration I can find is on the NFS server end to enable
> UDP (which I haven't yet tried to turn on).

That's right, and you won't even be able to enable it at some point
in the future if/when UDP is disabled in kernel.

These sort of things are out of my control and are things that I
need to adapt to as they change.

> 
> The only other point is for the issue to be documented somewhere (which is
> one reason I originally opened this ticket), since I saw a few comments
> about it in my search but no real reason for what the issue is.

So far it doesn't appear to be IPv6 specific to me, machines can have
more than one IP address and there's no guarantee they will be running
a server on all or any of there IP addresses.

The fact is probing is fairly expensive so not doing it when rpcbind
tells up that is a fairy cheap way to avoid it.

I'm not sure what needs to be documented, overall this is a pretty
normal variant of target server assessment which autofs needs to do
automatically and the fact that you don't see other problems means
it works fairly well. 

> 
> As for a fix in autofs, that is up to you, and while I think checking
> rpcbind is good, I don't know how difficult it would be and what other
> issues might occur.  Certainly, I expect the issue to become more and more
> common as IPv6 appears more often.

Yes, that could happen.

I have a bunch of fairly low level rpc code already so it might not
be too much hassle.

--- Additional comment from Frank Crawford on 2021-10-06 04:39:59 UTC ---

Thanks Ian.

From my point of view you may as well close this ticket, unless you want to leave it open for any reason.  I would like to know if you ever do add the check of rpcbind, or any other fix, but I assume it will be in the changlog or something, won't it?

--- Additional comment from Ian Kent on 2021-10-06 04:52:34 UTC ---

(In reply to Frank Crawford from comment #11)
> Thanks Ian.
> 
> From my point of view you may as well close this ticket, unless you want to
> leave it open for any reason.  I would like to know if you ever do add the
> check of rpcbind, or any other fix, but I assume it will be in the changlog
> or something, won't it?

It would except that it turns out what's happening is a bit more subtle
than we thought.

From the output you provided and the logs:
> 
>    program version(s) netid(s)                         service     owner
>     100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
>     100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
>     100024  1         tcp6,udp6,tcp,udp                status      29
>     100003  4,3       tcp6,tcp                         nfs         superuser
>     100227  3         tcp6,tcp                         nfs_acl     superuser
>     100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser
> 
> and you can see that nfs is disabled over UDP for this configuration.

And from the log.

> automount[319499]: get_nfs_info: called with host bits(203.16.204.1) proto 6 version 0x60
> automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000262
> automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000185
> automount[319499]: get_nfs_info: host bits cost 223 weight 0
> automount[319499]: get_nfs_info: called with host bits(203.16.204.1) proto 17 version 0x60

Here autofs is checking NFS info for IPv4 udp and the NFS NULL proc ping is failing
quickly. 

> automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 6 version 0x60
> automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000147
> automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000159
> automount[319499]: get_nfs_info: host bits cost 152 weight 0
> automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 17 version 0x60
> automount[319499]: get_nfs_info: host NFS ping timed out

But here the NFS NULL proc ping for IPv6 udp doesn't fail quickly and it isn't
obeying the set timeout (10 seconds) either.

That's unexpected.

I can probably work around it in the existing rpc code where this is done
but I'll need to get IPv6 working locally first.

And yes, there will be a changelog entry that says something like what you
expect.

I will leave the bug open though, so I can use it in the update submission.

Ian

--- Additional comment from Ian Kent on 2021-10-06 05:01:59 UTC ---

(In reply to Ian Kent from comment #12)
> 
> > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 6 version 0x60
> > automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000147
> > automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000159
> > automount[319499]: get_nfs_info: host bits cost 152 weight 0
> > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 17 version 0x60
> > automount[319499]: get_nfs_info: host NFS ping timed out
> 
> But here the NFS NULL proc ping for IPv6 udp doesn't fail quickly and it
> isn't
> obeying the set timeout (10 seconds) either.

Oh, wait, my mistake.
In this case I set the timeout to 3 seconds, it is obeying the time out.

So I'm probably not catching an error so I can bail out earlier if the
server doesn't provide the service for the given protocol ...

Anyway, I'll need to work on it.

--- Additional comment from Ian Kent on 2021-10-11 08:28:22 UTC ---

Can you try this autofs build:
https://koji.fedoraproject.org/koji/taskinfo?taskID=77058439

--- Additional comment from Frank Crawford on 2021-10-11 08:58:34 UTC ---

This version seems to fix the issue.  It is now as fast with two entries as a single entry.

--- Additional comment from Ian Kent on 2021-10-12 00:32:06 UTC ---

(In reply to Frank Crawford from comment #15)
> Created attachment 1831779 [details]
> autofs debug with new autofs build
> 
> This version seems to fix the issue.  It is now as fast with two entries as
> a single entry.

That's what I observed too but I also saw an error on connect(2).
The address parameters looked ok so I'm not sure why that happened.
It's probably because I'm using link local IPv6 addresses.

If you could provide a debug log so I can see if those connect(2)
errors are present in your environment too that would be great.

If all goes well I'll submit an update.

--- Additional comment from Frank Crawford on 2021-10-12 11:09:44 UTC ---

Did the additional attachment (autofs debug with new autofs build) not have the details in you wanted?

Are there additional options I should turn on?

--- Additional comment from Ian Kent on 2021-10-13 01:47:45 UTC ---

(In reply to Frank Crawford from comment #17)
> Did the additional attachment (autofs debug with new autofs build) not have
> the details in you wanted?
> 
> Are there additional options I should turn on?

Oops!
I was tried, I didn't see the attachment.
I didn't see any errors calling connect(2) on the IPv6 address so I think
we are good.

I'll get on with an update.
Thanks for your help.

--- Additional comment from Fedora Update System on 2021-10-14 02:42:06 UTC ---

FEDORA-2021-ed153b4aeb has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ed153b4aeb

--- Additional comment from Fedora Update System on 2021-10-14 15:57:47 UTC ---

FEDORA-2021-ed153b4aeb has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ed153b4aeb`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ed153b4aeb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

--- Additional comment from Fedora Update System on 2021-10-17 21:58:35 UTC ---

FEDORA-2021-ed153b4aeb has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 1 Ian Kent 2022-05-11 12:38:27 UTC

The changes made to autofs to resolve this bug in Fedora resulted in a
regression where rpcbind was consulted for the specific autofs case
where the NFS v4 mount is required to be able to mount without consulting
rpcbind.

While working on this case originally I observed TCP connect errors that
slowed the mount connection time. I'm seeing those same errors in the
autofs-5.1.7 based RHEL-9 package testing.

So I need to add the patches for this bug and also a couple of patches
from autofs-5.1.8 that should resolve this.

Comment 7 errata-xmlrpc 2022-11-15 11:16:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (autofs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8311

Note You need to log in before you can comment on or make changes to this bug.