Bug 2009953
Summary: | autofs is slow to mount when doing lookups returns multiple entries | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Frank Crawford <frank> | ||||||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 34 | CC: | frank, ikent | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | autofs-5.1.7-18.fc34 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 2084108 (view as bug list) | Environment: | |||||||||
Last Closed: | 2021-10-17 21:58:35 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 2084108 | ||||||||||
Attachments: |
|
Description
Frank Crawford
2021-10-02 10:19:28 UTC
Provide a debug log that demonstrates the difference. I could take all the entries in my host file out and check but I doubt it would make a difference. You don't make it clear if it's purely the presence of entries in the hosts file or the lookup of a different host to mount a differnt mount? @ikent Originally I noticed it from DNS lookups which returned 2 entries, IPv4 and IPv6, but have duplicated just with /etc/hosts entries. I have the following two lines in my /etc/hosts fdd2:7aad:d478:1::cb10:cc01 bits.crawford.emu.id.au bits #203.16.204.1 bits.crawford.emu.id.au bits If I uncomment the second one, so both are available, then the problem occurs. If I have either of them commented out then there is no issue. Also, how do you want me to go about generating a debug log? (In reply to Frank Crawford from comment #3) > Also, how do you want me to go about generating a debug log? In /etc/autofs.conf ensure that you have "logging = debug" without a comment marker next to it. Reload or restart autofs afterthe change. You can use "journalctl -f | tee full.log" to capture to a log file. You may want to filter the log after capture so you only retain autofs log entries with somthing like "grep automount full.log > autofs.log". The point of this will be to identify where the delay is occurring or at least narrow down the search. Created attachment 1829243 [details]
autofs debug when 2 entries in /etc/hosts
Created attachment 1829244 [details]
autofs debug when single entry in /etc/hosts
Thanks for your prompt response. Okay, I can see what is going on now. The additional item you need is the output from rpcinfo: program version(s) netid(s) service owner 100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser 100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser 100024 1 tcp6,udp6,tcp,udp status 29 100003 4,3 tcp6,tcp nfs superuser 100227 3 tcp6,tcp nfs_acl superuser 100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser and you can see that nfs is disabled over UDP for this configuration. So, with a single entry it doesn't do a performance test, so just takes the first entry defaulting to NFS 4 and TCP, but for multiple entries it does a test and has to time out for the UDP test (2 x 3 sec or so). Now the only thing is that both the NFS server config and the autofs config are pretty much standard, so there is a bit of a configuration mismatch between the two. For reference the NFS server config comes from: nfs-utils-2.5.4-2.rc3.fc34.x86_64 While I now know what to fix, who should be making sure these configs aren't in conflict? (In reply to Frank Crawford from comment #7) > Thanks for your prompt response. > > Okay, I can see what is going on now. The additional item you need is the > output from rpcinfo: > > program version(s) netid(s) service owner > 100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser > 100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser > 100024 1 tcp6,udp6,tcp,udp status 29 > 100003 4,3 tcp6,tcp nfs superuser > 100227 3 tcp6,tcp nfs_acl superuser > 100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser > > and you can see that nfs is disabled over UDP for this configuration. No surprise there, it's been that way for a while now I think. > > So, with a single entry it doesn't do a performance test, so just takes the > first entry defaulting to NFS 4 and TCP, but for multiple entries it does a > test and has to time out for the UDP test (2 x 3 sec or so). Yes, that's the way it's meant to work, autofs is trying to find out if the server is available because there's more than one choice, again that's normal. > > Now the only thing is that both the NFS server config and the autofs config > are pretty much standard, so there is a bit of a configuration mismatch > between the two. > > For reference the NFS server config comes from: > nfs-utils-2.5.4-2.rc3.fc34.x86_64 > > While I now know what to fix, who should be making sure these configs aren't > in conflict? There's no NFS config is autofs except an entry that tells autofs what the default NFS version is for mounts, usually NFSv4. So I don't think that's a problem. I think what your saying, intentionally or implicitly, is that autofs should check rpcbind as part of the selection process before probing anything. Not a bad thought and in fact quite sensible but I'll need to think about what I would need to do for that for a bit ... Ian Yes, the only configuration I can find is on the NFS server end to enable UDP (which I haven't yet tried to turn on). The only other point is for the issue to be documented somewhere (which is one reason I originally opened this ticket), since I saw a few comments about it in my search but no real reason for what the issue is. As for a fix in autofs, that is up to you, and while I think checking rpcbind is good, I don't know how difficult it would be and what other issues might occur. Certainly, I expect the issue to become more and more common as IPv6 appears more often. (In reply to Frank Crawford from comment #9) > Yes, the only configuration I can find is on the NFS server end to enable > UDP (which I haven't yet tried to turn on). That's right, and you won't even be able to enable it at some point in the future if/when UDP is disabled in kernel. These sort of things are out of my control and are things that I need to adapt to as they change. > > The only other point is for the issue to be documented somewhere (which is > one reason I originally opened this ticket), since I saw a few comments > about it in my search but no real reason for what the issue is. So far it doesn't appear to be IPv6 specific to me, machines can have more than one IP address and there's no guarantee they will be running a server on all or any of there IP addresses. The fact is probing is fairly expensive so not doing it when rpcbind tells up that is a fairy cheap way to avoid it. I'm not sure what needs to be documented, overall this is a pretty normal variant of target server assessment which autofs needs to do automatically and the fact that you don't see other problems means it works fairly well. > > As for a fix in autofs, that is up to you, and while I think checking > rpcbind is good, I don't know how difficult it would be and what other > issues might occur. Certainly, I expect the issue to become more and more > common as IPv6 appears more often. Yes, that could happen. I have a bunch of fairly low level rpc code already so it might not be too much hassle. Thanks Ian. From my point of view you may as well close this ticket, unless you want to leave it open for any reason. I would like to know if you ever do add the check of rpcbind, or any other fix, but I assume it will be in the changlog or something, won't it? (In reply to Frank Crawford from comment #11) > Thanks Ian. > > From my point of view you may as well close this ticket, unless you want to > leave it open for any reason. I would like to know if you ever do add the > check of rpcbind, or any other fix, but I assume it will be in the changlog > or something, won't it? It would except that it turns out what's happening is a bit more subtle than we thought. From the output you provided and the logs: > > program version(s) netid(s) service owner > 100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser > 100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser > 100024 1 tcp6,udp6,tcp,udp status 29 > 100003 4,3 tcp6,tcp nfs superuser > 100227 3 tcp6,tcp nfs_acl superuser > 100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser > > and you can see that nfs is disabled over UDP for this configuration. And from the log. > automount[319499]: get_nfs_info: called with host bits(203.16.204.1) proto 6 version 0x60 > automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000262 > automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000185 > automount[319499]: get_nfs_info: host bits cost 223 weight 0 > automount[319499]: get_nfs_info: called with host bits(203.16.204.1) proto 17 version 0x60 Here autofs is checking NFS info for IPv4 udp and the NFS NULL proc ping is failing quickly. > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 6 version 0x60 > automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000147 > automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000159 > automount[319499]: get_nfs_info: host bits cost 152 weight 0 > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 17 version 0x60 > automount[319499]: get_nfs_info: host NFS ping timed out But here the NFS NULL proc ping for IPv6 udp doesn't fail quickly and it isn't obeying the set timeout (10 seconds) either. That's unexpected. I can probably work around it in the existing rpc code where this is done but I'll need to get IPv6 working locally first. And yes, there will be a changelog entry that says something like what you expect. I will leave the bug open though, so I can use it in the update submission. Ian (In reply to Ian Kent from comment #12) > > > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 6 version 0x60 > > automount[319499]: get_nfs_info: nfs v4 rpc ping time: 0.000147 > > automount[319499]: get_nfs_info: nfs v3 rpc ping time: 0.000159 > > automount[319499]: get_nfs_info: host bits cost 152 weight 0 > > automount[319499]: get_nfs_info: called with host bits(fdd2:7aad:d478:1::cb10:cc01) proto 17 version 0x60 > > automount[319499]: get_nfs_info: host NFS ping timed out > > But here the NFS NULL proc ping for IPv6 udp doesn't fail quickly and it > isn't > obeying the set timeout (10 seconds) either. Oh, wait, my mistake. In this case I set the timeout to 3 seconds, it is obeying the time out. So I'm probably not catching an error so I can bail out earlier if the server doesn't provide the service for the given protocol ... Anyway, I'll need to work on it. Can you try this autofs build: https://koji.fedoraproject.org/koji/taskinfo?taskID=77058439 Created attachment 1831779 [details]
autofs debug with new autofs build
This version seems to fix the issue. It is now as fast with two entries as a single entry.
(In reply to Frank Crawford from comment #15) > Created attachment 1831779 [details] > autofs debug with new autofs build > > This version seems to fix the issue. It is now as fast with two entries as > a single entry. That's what I observed too but I also saw an error on connect(2). The address parameters looked ok so I'm not sure why that happened. It's probably because I'm using link local IPv6 addresses. If you could provide a debug log so I can see if those connect(2) errors are present in your environment too that would be great. If all goes well I'll submit an update. Did the additional attachment (autofs debug with new autofs build) not have the details in you wanted? Are there additional options I should turn on? (In reply to Frank Crawford from comment #17) > Did the additional attachment (autofs debug with new autofs build) not have > the details in you wanted? > > Are there additional options I should turn on? Oops! I was tried, I didn't see the attachment. I didn't see any errors calling connect(2) on the IPv6 address so I think we are good. I'll get on with an update. Thanks for your help. FEDORA-2021-ed153b4aeb has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-ed153b4aeb FEDORA-2021-ed153b4aeb has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-ed153b4aeb` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-ed153b4aeb See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-ed153b4aeb has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. |