Bug 217665

Summary:

RFE: Timeout of non-existing networks should be quicker

Product:

[Fedora] Fedora

Reporter:

Adam Huffman <bloch>

Component:

nfs-utils

Assignee:

Steve Dickson <steved>

Status:

CLOSED RAWHIDE

QA Contact:

Ben Levenson <benl>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

ikent, jmoyer, kzak, triage

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

bzcl34nup

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-04-04 12:54:57 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Attempt to demonstrate some methods to control RPC timeouts for showmount	none
Attempt to demonstrate some methods to control RPC timeouts for showmount (take 2)	none

Description Adam Huffman 2006-11-29 09:22:04 UTC

Description of problem:
As described in 217664, after a change to my home network Nautilus and GNOME
Terminal are hanging at session startup while they wait for autofs to mount an
NFS server that no longer exists at that address.  Could autofs be altered so as
to timeout more quickly in the case of a non-existent network?

Version-Release number of selected component (if applicable):
4.1.4-33

How reproducible:
Every time

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Ian Kent 2006-11-29 12:55:49 UTC

(In reply to comment #0)
> Description of problem:
> As described in 217664, after a change to my home network Nautilus and GNOME
> Terminal are hanging at session startup while they wait for autofs to mount an
> NFS server that no longer exists at that address.  Could autofs be altered so as
> to timeout more quickly in the case of a non-existent network?
> 

I'm not sure if it is actually autofs that is causing this
delay or in fact mount(8) or both. We need to establish
the time line of request->autofs->mount.

It's easy to tell if it is mount(8) that is waiting by
logging into a console, then switch back to gnome and
start the login, switch back to the console and check
if you have a mount sub-process. Keep monitoring until
you can give some information about 1) how long it is
before autofs spawns mount(8) and 2) how long it is until
mount gives up.

Ian

Comment 2 Adam Huffman 2006-11-30 01:13:02 UTC

In case it's relevant, it looks as though the problem is caused initially by
gnome-terminal trying to open obsolete tabs.

Anyway, I've been logged in for 3-4 minutes now and here are the relevant processes:

19605 ?        S      0:00 /usr/sbin/automount --timeout=60 /net program
/etc/auto.net
19606 ?        S      0:00 /bin/bash /etc/auto.net 192.168.0.10
19607 ?        S      0:00 /usr/sbin/showmount --no-headers -e 192.168.0.10
19608 ?        S      0:00 sort -k 1
19609 ?        S      0:00 awk -v key=192.168.0.10 -v
opts=-fstype=nfs,hard,intr,nodev,nosuid -- ??BEGIN?{ ORS=""; first=1 }???{ if
(first) { print opts; first=0 }; print " \\\n\t" $1, key ":" $1 }??END?{ if
(!first) print "\n"; else exit 1 }??
19610 ?        S      0:00 sed s/#/\\#/g

192.168.0. is the network that no longer exists.

No actual invocation of mount(8) I can see.

Comment 3 Ian Kent 2006-11-30 05:52:54 UTC

(In reply to comment #2)
> In case it's relevant, it looks as though the problem is caused initially by
> gnome-terminal trying to open obsolete tabs.

Yep. I think this is relevant.

> 
> Anyway, I've been logged in for 3-4 minutes now and here are the relevant
processes:
> 
> 19605 ?        S      0:00 /usr/sbin/automount --timeout=60 /net program
> /etc/auto.net
> 19606 ?        S      0:00 /bin/bash /etc/auto.net 192.168.0.10
> 19607 ?        S      0:00 /usr/sbin/showmount --no-headers -e 192.168.0.10

Same issue I suspect.

Ian

Comment 4 Ian Kent 2006-11-30 06:14:45 UTC

If I can put together a patch to check this out can you
apply it to the nfs-utils package and try it out?

Ian

Comment 5 Adam Huffman 2006-11-30 09:36:38 UTC

Happy to try the patch, yes.

Adam

Comment 6 Ian Kent 2006-11-30 11:05:57 UTC

(In reply to comment #5)
> Happy to try the patch, yes.

But before we go down this path what version of nfs-utils
are you using?

When I do this I get very different results:
[raven@raven lib]$ time /usr/sbin/showmount -e budgie
mount clntudp_create: RPC: Port mapper failure - RPC: Unable to receive

real    0m9.004s
user    0m0.000s
sys     0m0.002s

[raven@raven lib]$ time /usr/sbin/showmount --no-headers -e 192.168.0.10
mount clntudp_create: RPC: Port mapper failure - RPC: Timed out

real    1m0.475s
user    0m0.000s
sys     0m0.002s

Ian

Comment 7 Adam Huffman 2006-12-01 16:32:12 UTC

This is the version installed:  nfs-utils-1.0.8-3.fc5

Comment 8 Ian Kent 2006-12-06 08:17:10 UTC

(In reply to comment #7)
> This is the version installed:  nfs-utils-1.0.8-3.fc5

Yes. I think that was the same version I used for the
test above.

The showmount command is somewhat simpler than mount so it's
good to work with to demonstarate the issue. I expect that
once showmount is patched the problem will move to mount
itself, but I'm not sure yet.

From what I can see showmount itself tries to use a short
timeout but the RPC clnttcp_create and clntudp_create
calls will call portmap internally to get the port to
use if it is set to 0 in the passed address structure.
This uses the internal timeouts, basically 60 seconds,
which slowes things down. Also the for tcp a blocking
connect is used which can take quite a while to timeout.

showmount tries tcp then udp so a fail can go through
several lengthy waits before failing.

Another thing I noticed in showmount is that RPC
procedure calls don't use a call to the service null
procedure before performing the call itself. This is
not required but can sometimes return a fail somewhat
more quickly than just going ahead and calling the
procedure entry point straight away.

I've grabbed some of the autofs code and put together
a patch for you to try out. While this may be good for
showmount because a timeout on something that would
succeed is not to serious. It's a completely different
proposal for mount and umount because using short
timeouts for them could cause mount failures on mounts
that should succeed. This can happen if someone is using
a VPN that has high latency but acceptable throughput.

Ian

Comment 9 Ian Kent 2006-12-06 08:22:25 UTC

Created attachment 142929 [details]
Attempt to demonstrate some methods to control RPC timeouts for showmount

Please give this a try and we'll see where it takes us.

Comment 10 Ian Kent 2006-12-12 05:55:05 UTC

(In reply to comment #9)
> Created an attachment (id=142929) [edit]
> Attempt to demonstrate some methods to control RPC timeouts for showmount
> 
> Please give this a try and we'll see where it takes us.

Have you had a chance to check this out?

Ian

Comment 11 Adam Huffman 2006-12-14 01:45:00 UTC

I've been away from that machine over the past week.  I'll post a report here as
soon as I've had the time to try the patch.

Comment 12 Adam Huffman 2006-12-15 10:02:26 UTC

I have just applied your patch to 1.0.8-4, the version yumdownloader obtained
for me.  It has made a dramatic difference - I only had to wait 1-2 minutes
before gnome-terminal and nautilus started working properly (in case I hadn't
pointed this out previously, the long timeout appeared to be caused by both
gnome-terminal and nautilus).  Previously it was more like 10 minutes.

Comment 13 Adam Huffman 2006-12-15 11:02:03 UTC

Actually, now I'm finding that autofs mounts no longer work, while manual NFS
mounts do work.  I'm seeing this sort of error:

Dec 15 11:00:16 asus automount[5148]: attempting to mount entry /net/bloch
Dec 15 11:00:16 asus automount[5772]: >> rpc mount null: RPC: Unable to receive;
errno = Connection refused
Dec 15 11:00:16 asus automount[5772]: lookup(program): lookup for bloch failed
Dec 15 11:00:16 asus automount[5772]: failed to mount /net/bloch
Dec 15 11:00:16 asus automount[5772]: umount_multi: no mounts found under /net/bloch
Dec 15 11:00:16 asus automount[5148]: attempting to mount entry /net/bloch
Dec 15 11:00:17 asus automount[5778]: >> rpc mount null: RPC: Unable to receive;
errno = Connection refused
Dec 15 11:00:17 asus automount[5778]: lookup(program): lookup for bloch failed
Dec 15 11:00:17 asus automount[5778]: failed to mount /net/bloch
Dec 15 11:00:17 asus automount[5778]: umount_multi: no mounts found under /net/bloch
Dec 15 11:00:17 asus automount[5148]: attempting to mount entry /net/bloch
Dec 15 11:00:17 asus automount[5784]: >> rpc mount null: RPC: Unable to receive;
errno = Connection refused
Dec 15 11:00:17 asus automount[5784]: lookup(program): lookup for bloch failed
Dec 15 11:00:17 asus automount[5784]: failed to mount /net/bloch
Dec 15 11:00:17 asus automount[5784]: umount_multi: no mounts found under /net/bloch

Comment 14 Ian Kent 2006-12-15 12:48:18 UTC

(In reply to comment #13)
> Actually, now I'm finding that autofs mounts no longer work, while manual NFS
> mounts do work.  I'm seeing this sort of error:
> 
> Dec 15 11:00:16 asus automount[5148]: attempting to mount entry /net/bloch
> Dec 15 11:00:16 asus automount[5772]: >> rpc mount null: RPC: Unable to receive;
> errno = Connection refused

Dramatically reduces timeouts, but doesn't actually work!
I wish I could solve all my problems that way.

Haha.

I'll build this on an FC5 install and see what happens,
sorry.

Ian

Comment 15 Ian Kent 2006-12-15 13:50:03 UTC

Created attachment 143759 [details]
Attempt to demonstrate some methods to control RPC timeouts for showmount (take 2)

Oops, silly mistake.
Please try this one.

Comment 16 Adam Huffman 2006-12-15 19:07:41 UTC

Yes, automounting works again with your latest patch.

Comment 17 Ian Kent 2007-02-14 06:24:19 UTC

Hi Karel,

Can you check this bug out please?

I posted the patch here upstream but am not sure of the
status at the moment. It may need to be considered for
inclusion in FC nfs-utils.

Ian

Comment 18 Karel Zak 2007-03-01 22:15:04 UTC

Re-assigning to SteveD who is nfs-utils mainatiner.

Note, I reviewed this patch few weeks ago and I think there shoudl be:

  ret = connect_nb(sock, &saddr, &tout);
- if (ret == -1) {
+ if (ret < 0) {

because connect_nb() returns -errno on error. So -1 is not enough.

Comment 19 Ian Kent 2007-03-02 01:44:26 UTC

(In reply to comment #18)
> Re-assigning to SteveD who is nfs-utils mainatiner.
> 
> Note, I reviewed this patch few weeks ago and I think there shoudl be:
> 
>   ret = connect_nb(sock, &saddr, &tout);
> - if (ret == -1) {
> + if (ret < 0) {
> 
> because connect_nb() returns -errno on error. So -1 is not enough.

Thanks Karl I'll fix this up and repost.

Ian

Comment 20 Bug Zapper 2008-04-04 05:00:17 UTC

Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers