Bug 217665
Summary: | RFE: Timeout of non-existing networks should be quicker | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Huffman <bloch> | ||||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Ben Levenson <benl> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5 | CC: | ikent, jmoyer, kzak, triage | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | bzcl34nup | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-04-04 12:54:57 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Adam Huffman
2006-11-29 09:22:04 UTC
(In reply to comment #0) > Description of problem: > As described in 217664, after a change to my home network Nautilus and GNOME > Terminal are hanging at session startup while they wait for autofs to mount an > NFS server that no longer exists at that address. Could autofs be altered so as > to timeout more quickly in the case of a non-existent network? > I'm not sure if it is actually autofs that is causing this delay or in fact mount(8) or both. We need to establish the time line of request->autofs->mount. It's easy to tell if it is mount(8) that is waiting by logging into a console, then switch back to gnome and start the login, switch back to the console and check if you have a mount sub-process. Keep monitoring until you can give some information about 1) how long it is before autofs spawns mount(8) and 2) how long it is until mount gives up. Ian In case it's relevant, it looks as though the problem is caused initially by gnome-terminal trying to open obsolete tabs. Anyway, I've been logged in for 3-4 minutes now and here are the relevant processes: 19605 ? S 0:00 /usr/sbin/automount --timeout=60 /net program /etc/auto.net 19606 ? S 0:00 /bin/bash /etc/auto.net 192.168.0.10 19607 ? S 0:00 /usr/sbin/showmount --no-headers -e 192.168.0.10 19608 ? S 0:00 sort -k 1 19609 ? S 0:00 awk -v key=192.168.0.10 -v opts=-fstype=nfs,hard,intr,nodev,nosuid -- ??BEGIN?{ ORS=""; first=1 }???{ if (first) { print opts; first=0 }; print " \\\n\t" $1, key ":" $1 }??END?{ if (!first) print "\n"; else exit 1 }?? 19610 ? S 0:00 sed s/#/\\#/g 192.168.0. is the network that no longer exists. No actual invocation of mount(8) I can see. (In reply to comment #2) > In case it's relevant, it looks as though the problem is caused initially by > gnome-terminal trying to open obsolete tabs. Yep. I think this is relevant. > > Anyway, I've been logged in for 3-4 minutes now and here are the relevant processes: > > 19605 ? S 0:00 /usr/sbin/automount --timeout=60 /net program > /etc/auto.net > 19606 ? S 0:00 /bin/bash /etc/auto.net 192.168.0.10 > 19607 ? S 0:00 /usr/sbin/showmount --no-headers -e 192.168.0.10 Same issue I suspect. Ian If I can put together a patch to check this out can you apply it to the nfs-utils package and try it out? Ian Happy to try the patch, yes. Adam (In reply to comment #5) > Happy to try the patch, yes. But before we go down this path what version of nfs-utils are you using? When I do this I get very different results: [raven@raven lib]$ time /usr/sbin/showmount -e budgie mount clntudp_create: RPC: Port mapper failure - RPC: Unable to receive real 0m9.004s user 0m0.000s sys 0m0.002s [raven@raven lib]$ time /usr/sbin/showmount --no-headers -e 192.168.0.10 mount clntudp_create: RPC: Port mapper failure - RPC: Timed out real 1m0.475s user 0m0.000s sys 0m0.002s Ian This is the version installed: nfs-utils-1.0.8-3.fc5 (In reply to comment #7) > This is the version installed: nfs-utils-1.0.8-3.fc5 Yes. I think that was the same version I used for the test above. The showmount command is somewhat simpler than mount so it's good to work with to demonstarate the issue. I expect that once showmount is patched the problem will move to mount itself, but I'm not sure yet. From what I can see showmount itself tries to use a short timeout but the RPC clnttcp_create and clntudp_create calls will call portmap internally to get the port to use if it is set to 0 in the passed address structure. This uses the internal timeouts, basically 60 seconds, which slowes things down. Also the for tcp a blocking connect is used which can take quite a while to timeout. showmount tries tcp then udp so a fail can go through several lengthy waits before failing. Another thing I noticed in showmount is that RPC procedure calls don't use a call to the service null procedure before performing the call itself. This is not required but can sometimes return a fail somewhat more quickly than just going ahead and calling the procedure entry point straight away. I've grabbed some of the autofs code and put together a patch for you to try out. While this may be good for showmount because a timeout on something that would succeed is not to serious. It's a completely different proposal for mount and umount because using short timeouts for them could cause mount failures on mounts that should succeed. This can happen if someone is using a VPN that has high latency but acceptable throughput. Ian Created attachment 142929 [details]
Attempt to demonstrate some methods to control RPC timeouts for showmount
Please give this a try and we'll see where it takes us.
(In reply to comment #9) > Created an attachment (id=142929) [edit] > Attempt to demonstrate some methods to control RPC timeouts for showmount > > Please give this a try and we'll see where it takes us. Have you had a chance to check this out? Ian I've been away from that machine over the past week. I'll post a report here as soon as I've had the time to try the patch. I have just applied your patch to 1.0.8-4, the version yumdownloader obtained for me. It has made a dramatic difference - I only had to wait 1-2 minutes before gnome-terminal and nautilus started working properly (in case I hadn't pointed this out previously, the long timeout appeared to be caused by both gnome-terminal and nautilus). Previously it was more like 10 minutes. Actually, now I'm finding that autofs mounts no longer work, while manual NFS mounts do work. I'm seeing this sort of error: Dec 15 11:00:16 asus automount[5148]: attempting to mount entry /net/bloch Dec 15 11:00:16 asus automount[5772]: >> rpc mount null: RPC: Unable to receive; errno = Connection refused Dec 15 11:00:16 asus automount[5772]: lookup(program): lookup for bloch failed Dec 15 11:00:16 asus automount[5772]: failed to mount /net/bloch Dec 15 11:00:16 asus automount[5772]: umount_multi: no mounts found under /net/bloch Dec 15 11:00:16 asus automount[5148]: attempting to mount entry /net/bloch Dec 15 11:00:17 asus automount[5778]: >> rpc mount null: RPC: Unable to receive; errno = Connection refused Dec 15 11:00:17 asus automount[5778]: lookup(program): lookup for bloch failed Dec 15 11:00:17 asus automount[5778]: failed to mount /net/bloch Dec 15 11:00:17 asus automount[5778]: umount_multi: no mounts found under /net/bloch Dec 15 11:00:17 asus automount[5148]: attempting to mount entry /net/bloch Dec 15 11:00:17 asus automount[5784]: >> rpc mount null: RPC: Unable to receive; errno = Connection refused Dec 15 11:00:17 asus automount[5784]: lookup(program): lookup for bloch failed Dec 15 11:00:17 asus automount[5784]: failed to mount /net/bloch Dec 15 11:00:17 asus automount[5784]: umount_multi: no mounts found under /net/bloch (In reply to comment #13) > Actually, now I'm finding that autofs mounts no longer work, while manual NFS > mounts do work. I'm seeing this sort of error: > > Dec 15 11:00:16 asus automount[5148]: attempting to mount entry /net/bloch > Dec 15 11:00:16 asus automount[5772]: >> rpc mount null: RPC: Unable to receive; > errno = Connection refused Dramatically reduces timeouts, but doesn't actually work! I wish I could solve all my problems that way. Haha. I'll build this on an FC5 install and see what happens, sorry. Ian Created attachment 143759 [details]
Attempt to demonstrate some methods to control RPC timeouts for showmount (take 2)
Oops, silly mistake.
Please try this one.
Yes, automounting works again with your latest patch. Hi Karel, Can you check this bug out please? I posted the patch here upstream but am not sure of the status at the moment. It may need to be considered for inclusion in FC nfs-utils. Ian Re-assigning to SteveD who is nfs-utils mainatiner. Note, I reviewed this patch few weeks ago and I think there shoudl be: ret = connect_nb(sock, &saddr, &tout); - if (ret == -1) { + if (ret < 0) { because connect_nb() returns -errno on error. So -1 is not enough. (In reply to comment #18) > Re-assigning to SteveD who is nfs-utils mainatiner. > > Note, I reviewed this patch few weeks ago and I think there shoudl be: > > ret = connect_nb(sock, &saddr, &tout); > - if (ret == -1) { > + if (ret < 0) { > > because connect_nb() returns -errno on error. So -1 is not enough. Thanks Karl I'll fix this up and repost. Ian Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers |