Bug 2009953

Summary: autofs is slow to mount when doing lookups returns multiple entries
Product: [Fedora] Fedora Reporter: Frank Crawford <frank>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 34CC: frank, ikent
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.1.7-18.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2084108 (view as bug list) Environment:
Last Closed: 2021-10-17 21:58:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2084108    
Attachments:
Description Flags
autofs debug when 2 entries in /etc/hosts
none
autofs debug when single entry in /etc/hosts
none
autofs debug with new autofs build none

Description Frank Crawford 2021-10-02 10:19:28 UTC
Description of problem:
autofs takes 5sec or more to mount filesystems when looking returns multiple addresses (e.g. both IPv4 & IPv6).

When the host name is added as a single entry in /etc/hosts it is almost instantaneous.  If two entries are added it slows.

Host is local and other programs (ssh, etc) respond almost immediately.

Version-Release number of selected component (if applicable):
autofs-5.1.7-17.fc34.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Set up automount
2. Add single entry in /etc/hosts and try mount - note time
3. Add multiple entries in /etc/hosts and try mount - note time

Actual results:
Single entry mounts almost instantly, multiple entries take some time to complete mount.

Expected results:
Single and multiple entries should mount in approximately the same time, almost instantly.

Additional info:
The only option that seems to relate, "use_hostname_for_mounts" seems to have no effect on the issue.

Comment 1 Ian Kent 2021-10-04 00:00:25 UTC
Provide a debug log that demonstrates the difference.

I could take all the entries in my host file out and check but I doubt
it would make a difference.

You don't make it clear if it's purely the presence of entries in the
hosts file or the lookup of a different host to mount a differnt mount?

Comment 2 Frank Crawford 2021-10-04 01:07:23 UTC
@ikent Originally I noticed it from DNS lookups which returned 2 entries, IPv4 and IPv6, but have duplicated just with /etc/hosts entries.

I have the following two lines in my /etc/hosts

fdd2:7aad:d478:1::cb10:cc01	bits.crawford.emu.id.au bits
#203.16.204.1	bits.crawford.emu.id.au bits

If I uncomment the second one, so both are available, then the problem occurs.  If I have either of them commented out then there is no issue.

Comment 3 Frank Crawford 2021-10-04 01:08:55 UTC
Also, how do you want me to go about generating a debug log?

Comment 4 Ian Kent 2021-10-04 12:24:10 UTC
(In reply to Frank Crawford from comment #3)
> Also, how do you want me to go about generating a debug log?

In /etc/autofs.conf ensure that you have "logging = debug" without a
comment marker next to it. Reload or restart autofs afterthe change.

You can use "journalctl -f | tee full.log" to capture to a log file.
You may want to filter the log after capture so you only retain autofs
log entries with somthing like "grep automount full.log > autofs.log".

The point of this will be to identify where the delay is occurring or
at least narrow down the search.

Comment 5 Frank Crawford 2021-10-05 02:24:34 UTC
Created attachment 1829243 [details]
autofs debug when 2 entries in /etc/hosts

Comment 6 Frank Crawford 2021-10-05 02:25:11 UTC
Created attachment 1829244 [details]
autofs debug when single entry in /etc/hosts

Comment 7 Frank Crawford 2021-10-05 02:33:51 UTC
Thanks for your prompt response.

Okay, I can see what is going on now.  The additional item you need is the output from rpcinfo:

   program version(s) netid(s)                         service     owner
    100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
    100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
    100024  1         tcp6,udp6,tcp,udp                status      29
    100003  4,3       tcp6,tcp                         nfs         superuser
    100227  3         tcp6,tcp                         nfs_acl     superuser
    100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser

and you can see that nfs is disabled over UDP for this configuration.

So, with a single entry it doesn't do a performance test, so just takes the first entry defaulting to NFS 4 and TCP, but for multiple entries it does a test and has to time out for the UDP test (2 x 3 sec or so).

Now the only thing is that both the NFS server config and the autofs config are pretty much standard, so there is a bit of a configuration mismatch between the two.

For reference the NFS server config comes from:
nfs-utils-2.5.4-2.rc3.fc34.x86_64

While I now know what to fix, who should be making sure these configs aren't in conflict?

Comment 8 Ian Kent 2021-10-05 03:06:35 UTC
(In reply to Frank Crawford from comment #7)
> Thanks for your prompt response.
> 
> Okay, I can see what is going on now.  The additional item you need is the
> output from rpcinfo:
> 
>    program version(s) netid(s)                         service     owner
>     100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
>     100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
>     100024  1         tcp6,udp6,tcp,udp                status      29
>     100003  4,3       tcp6,tcp                         nfs         superuser
>     100227  3         tcp6,tcp                         nfs_acl     superuser
>     100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser
> 
> and you can see that nfs is disabled over UDP for this configuration.

No surprise there, it's been that way for a while now I think.

> 
> So, with a single entry it doesn't do a performance test, so just takes the
> first entry defaulting to NFS 4 and TCP, but for multiple entries it does a
> test and has to time out for the UDP test (2 x 3 sec or so).

Yes, that's the way it's meant to work, autofs is trying to find out if the
server is available because there's more than one choice, again that's normal.

> 
> Now the only thing is that both the NFS server config and the autofs config
> are pretty much standard, so there is a bit of a configuration mismatch
> between the two.
> 
> For reference the NFS server config comes from:
> nfs-utils-2.5.4-2.rc3.fc34.x86_64
> 
> While I now know what to fix, who should be making sure these configs aren't
> in conflict?

There's no NFS config is autofs except an entry that tells autofs what the
default NFS version is for mounts, usually NFSv4.

So I don't think that's a problem.

I think what your saying, intentionally or implicitly, is that autofs should
check rpcbind as part of the selection process before probing anything.

Not a bad thought and in fact quite sensible but I'll need to think about
what I would need to do for that for a bit ...

Ian

Comment 9 Frank Crawford 2021-10-05 09:34:09 UTC
Yes, the only configuration I can find is on the NFS server end to enable UDP (which I haven't yet tried to turn on).

The only other point is for the issue to be documented somewhere (which is one reason I originally opened this ticket), since I saw a few comments about it in my search but no real reason for what the issue is.

As for a fix in autofs, that is up to you, and while I think checking rpcbind is good, I don't know how difficult it would be and what other issues might occur.  Certainly, I expect the issue to become more and more common as IPv6 appears more often.

Comment 10 Ian Kent 2021-10-06 00:55:49 UTC
(In reply to Frank Crawford from comment #9)
> Yes, the only configuration I can find is on the NFS server end to enable
> UDP (which I haven't yet tried to turn on).

That's right, and you won't even be able to enable it at some point
in the future if/when UDP is disabled in kernel.

These sort of things are out of my control and are things that I
need to adapt to as they change.

> 
> The only other point is for the issue to be documented somewhere (which is
> one reason I originally opened this ticket), since I saw a few comments
> about it in my search but no real reason for what the issue is.

So far it doesn't appear to be IPv6 specific to me, machines can have
more than one IP address and there's no guarantee they will be running
a server on all or any of there IP addresses.

The fact is probing is fairly expensive so not doing it when rpcbind
tells up that is a fairy cheap way to avoid it.

I'm not sure what needs to be documented, overall this is a pretty
normal variant of target server assessment which autofs needs to do
automatically and the fact that you don't see other problems means
it works fairly well. 

> 
> As for a fix in autofs, that is up to you, and while I think checking
> rpcbind is good, I don't know how difficult it would be and what other
> issues might occur.  Certainly, I expect the issue to become more and more
> common as IPv6 appears more often.

Yes, that could happen.

I have a bunch of fairly low level rpc code already so it might not
be too much hassle.

Comment 11 Frank Crawford 2021-10-06 04:39:59 UTC
Thanks Ian.

From my point of view you may as well close this ticket, unless you want to leave it open for any reason.  I would like to know if you ever do add the check of rpcbind, or any other fix, but I assume it will be in the changlog or something, won't it?

Comment 12 Ian Kent 2021-10-06 04:52:34 UTC
(In reply to Frank Crawford from comment #11)
> Thanks Ian.
> 
> From my point of view you may as well close this ticket, unless you want to
> leave it open for any reason.  I would like to know if you ever do add the
> check of rpcbind, or any other fix, but I assume it will be in the changlog
> or something, won't it?

It would except that it turns out what's happening is a bit more subtle
than we thought.

From the output you provided and the logs:
> 
>    program version(s) netid(s)                         service     owner
>     100000  2,3,4     local,udp,tcp,udp6,tcp6          portmapper  superuser
>     100005  3,2,1     tcp6,udp6,tcp,udp                mountd      superuser
>     100024  1         tcp6,udp6,tcp,udp                status      29
>     100003  4,3       tcp6,tcp                         nfs         superuser
>     100227  3         tcp6,tcp                         nfs_acl     superuser
>     100021  4,3,1     tcp6,udp6,tcp,udp                nlockmgr    superuser
> 
> and you can see that nfs is disabled over UDP for this configuration.

And from the log.

> automount[319499]: get_nfs_info: called with host bits(203.16.204.1) proto 6 version 0x60
> automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000262
> automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000185
> automount[319499]: get_nfs_info: host bits cost 223 weight 0
> automount[319499]: get_nfs_info: called with host bits(203.16.204.1) proto 17 version 0x60

Here autofs is checking NFS info for IPv4 udp and the NFS NULL proc ping is failing
quickly. 

> automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 6 version 0x60
> automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000147
> automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000159
> automount[319499]: get_nfs_info: host bits cost 152 weight 0
> automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 17 version 0x60
> automount[319499]: get_nfs_info: host NFS ping timed out

But here the NFS NULL proc ping for IPv6 udp doesn't fail quickly and it isn't
obeying the set timeout (10 seconds) either.

That's unexpected.

I can probably work around it in the existing rpc code where this is done
but I'll need to get IPv6 working locally first.

And yes, there will be a changelog entry that says something like what you
expect.

I will leave the bug open though, so I can use it in the update submission.

Ian

Comment 13 Ian Kent 2021-10-06 05:01:59 UTC
(In reply to Ian Kent from comment #12)
> 
> > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 6 version 0x60
> > automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000147
> > automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000159
> > automount[319499]: get_nfs_info: host bits cost 152 weight 0
> > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 17 version 0x60
> > automount[319499]: get_nfs_info: host NFS ping timed out
> 
> But here the NFS NULL proc ping for IPv6 udp doesn't fail quickly and it
> isn't
> obeying the set timeout (10 seconds) either.

Oh, wait, my mistake.
In this case I set the timeout to 3 seconds, it is obeying the time out.

So I'm probably not catching an error so I can bail out earlier if the
server doesn't provide the service for the given protocol ...

Anyway, I'll need to work on it.

Comment 14 Ian Kent 2021-10-11 08:28:22 UTC
Can you try this autofs build:
https://koji.fedoraproject.org/koji/taskinfo?taskID=77058439

Comment 15 Frank Crawford 2021-10-11 08:58:34 UTC
Created attachment 1831779 [details]
autofs debug with new autofs build

This version seems to fix the issue.  It is now as fast with two entries as a single entry.

Comment 16 Ian Kent 2021-10-12 00:32:06 UTC
(In reply to Frank Crawford from comment #15)
> Created attachment 1831779 [details]
> autofs debug with new autofs build
> 
> This version seems to fix the issue.  It is now as fast with two entries as
> a single entry.

That's what I observed too but I also saw an error on connect(2).
The address parameters looked ok so I'm not sure why that happened.
It's probably because I'm using link local IPv6 addresses.

If you could provide a debug log so I can see if those connect(2)
errors are present in your environment too that would be great.

If all goes well I'll submit an update.

Comment 17 Frank Crawford 2021-10-12 11:09:44 UTC
Did the additional attachment (autofs debug with new autofs build) not have the details in you wanted?

Are there additional options I should turn on?

Comment 18 Ian Kent 2021-10-13 01:47:45 UTC
(In reply to Frank Crawford from comment #17)
> Did the additional attachment (autofs debug with new autofs build) not have
> the details in you wanted?
> 
> Are there additional options I should turn on?

Oops!
I was tried, I didn't see the attachment.
I didn't see any errors calling connect(2) on the IPv6 address so I think
we are good.

I'll get on with an update.
Thanks for your help.

Comment 19 Fedora Update System 2021-10-14 02:42:06 UTC
FEDORA-2021-ed153b4aeb has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ed153b4aeb

Comment 20 Fedora Update System 2021-10-14 15:57:47 UTC
FEDORA-2021-ed153b4aeb has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ed153b4aeb`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ed153b4aeb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 Fedora Update System 2021-10-17 21:58:35 UTC
FEDORA-2021-ed153b4aeb has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.