Bug 155566

Summary:	mount sometimes uses insecure ports for nfs
Product:	Red Hat Enterprise Linux 4	Reporter:	Thomas J. Baker <tjb>
Component:	nfs-utils	Assignee:	Steve Dickson <steved>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Ben Levenson <benl>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.0	CC:	andy, joey, mhansen
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:	RHEL4U4	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-12-21 10:16:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Thomas J. Baker 2005-04-21 14:21:28 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.7) Gecko/20050416 Fedora/1.0.3-1.3.1 Firefox/1.0.3

Description of problem:
Since upgrading to RHEL4, we've had problems with automounts not working because the mountd uses insecure ports when requesting the mount. This happens at 4:02am every night when logwatches from 65+ systems come in to a central server and are then sent to an exploded alias list. Some recipients use .forward files in the auomounted home directories. The problem only seems to happen at that time. The rest of the time, the mounts work fine. I'm not sure if this is a bug or a feature but it's not addressed in the release notes and it seems to be contrary to what RHEL3 did. Do tcp mounts work differently than udp mounts? Actually, I tried changing the problematic mounts to udp but it didn't help. What has changed?

Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-46

How reproducible:
Sometimes

Steps to Reproduce:
1. have many automounts occur at the same time
2. 
3.
  

Actual Results:  some fail due to requests for mounts going out on insecure ports

Expected Results:  mount requests go out on secure ports and the mounts work

Additional info:

Comment 1 Steve Dickson 2005-04-22 14:09:11 UTC

What's the version of the util-linux rpm your using?
Also what error are you seeing that makes you believe
that rpc.mountd is not using secure ports?

Comment 2 Thomas J. Baker 2005-04-22 14:26:15 UTC

util-linux-2.12a-16.EL4.6

and the clients say this:

pr 20 04:02:05 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58479
Apr 20 04:02:05 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58547
Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58612
Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58672
Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58711
Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58738
Apr 20 04:02:08 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59000
Apr 20 04:02:09 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59040
Apr 20 04:02:09 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59115
Apr 20 04:02:09 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59157
Apr 20 04:02:10 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59201
Apr 20 04:02:10 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59218
Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59597
Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59604
Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59611
Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59620
Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59627
Apr 20 04:02:24 hobbes rpc.mountd: refused mount request from
blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59634

Comment 3 Thomas J. Baker 2005-04-22 14:29:39 UTC

Check that, the servers are logging illegal port. Client blackstar running rhel4
is mounting various other systems.

Comment 4 Thomas J. Baker 2005-04-22 15:14:21 UTC

I guess I'm mixing my terms. I say insecure when I mean unprivileged. (insecure
is the export option). RHEL4 is using unprivileged ports when requesting an nfs
mount some of the time.

Comment 5 Joe Pruett 2005-08-12 21:13:32 UTC

we see very similar things but the remote server is a netapp.  there is a bug
under rhel3 (154678) that looks to be the same thing.  too many tcp ports get
left in time_wait, so a non-privileged port gets used.  and it happens at night
for us as well, probably backups or something like that.

setting the server to allow insecure mounts will work around it, but the fix
listed for 154678 can hopefully be put into rhel4.

Comment 6 Mikael M. Hansen 2005-08-17 10:16:16 UTC

Hi

We also have it. Exactly the same symptoms. We'll try to use the insecure
options and hopefully this will introduce some continuity in the mounts. What
the status of this from RedHat's side. Is it something we can expect to be fixed
in an update?

Comment 7 Mikael M. Hansen 2005-08-19 11:00:42 UTC

Well. Now we've tested the "solution" with insecure ports. But that only gives
us another type of error on the client (from /var/log/messages):

Aug 17 14:02:43 elm automount[4703]: >> nfs bindresvport: Address already in use
Aug 17 14:02:43 elm automount[4703]: mount(nfs): nfs: mount failure
lfs1.cs.aau.dk:/q/lfs1_10/gauss on /user/gauss
Aug 17 14:02:43 elm automount[4703]: failed to mount /user/gauss
Aug 17 14:02:43 elm automount[4705]: failed to mount /user/gauss/.public_html
Aug 17 14:02:43 elm automount[4706]: failed to mount /user/gauss/.htaccess

Path and all are correct. So why this happened we do not know. But it must be a
bug somewhere. I have not found any existing bugs that resembles. Does anyone
have any idea?

As the problem occured on a production machine (apparently some load is needed
to trigger the problems) we decided to make the setup of the new host (RHEL4
Upd1) the same as the old server (RHEL3 Upd 2). So we changed to udp based
mounts (/etc/auto.master):
/user           auto_user       udp,intr,soft,bg,rsize=32768,wsize=32768
/coll           xauto_coll8     udp,intr,soft,bg,rsize=32768,wsize=32768
/pack           xauto_pack8     udp,intr,soft,bg,rsize=32768,wsize=32768
/project        auto_project    udp,intr,soft,bg,rsize=32768,wsize=32768
/n              /etc/auto.net


We still have insecure in the exports of the server. That should be removed, but
doing so just created the Permision denied problem due to the many ports left in
the TIME_WAIT state. That should naturally be fixed.

Now this setup is the best we have seen so far. But we see a few messages like
this in /var/log/messages:

Aug 19 10:49:15 elm automount[9731]: >> mount: backgrounding
"lfs1.cs.aau.dk:/q/lfs1_8/schroll"


In the debug of the autofs for the same pid we see:

Aug 19 10:49:15 elm automount[9731]: lookup(yp): looking up schroll
Aug 19 10:49:15 elm automount[9731]: ret = 1
Aug 19 10:49:15 elm automount[9731]: lookup(yp): schroll ->
lfs1.cs.aau.dk:/q/lfs1_8/schroll
Aug 19 10:49:15 elm automount[9731]: parse(sun): expanded entry:
lfs1.cs.aau.dk:/q/lfs1_8/schroll
Aug 19 10:49:15 elm automount[9731]: parse(sun): gathered options:
udp,intr,soft,bg,rsize=32768,wsize=32768
Aug 19 10:49:15 elm automount[2206]: sig 1 switching from 1 to 4
Aug 19 10:49:15 elm automount[9728]: failed to mount /user/.htaccess
Aug 19 10:49:15 elm automount[9731]: parse(sun):
dequote("lfs1.cs.aau.dk:/q/lfs1_8/schroll") -> lfs1.cs.aau.dk:/q/lfs1_8/schroll
Aug 19 10:49:15 elm automount[2206]: get_pkt: state 1, next 4
Aug 19 10:49:15 elm automount[9728]: umount_multi: path=/user/.htaccess incl=1 
Aug 19 10:49:15 elm automount[9731]: parse(sun): core of entry:
options=udp,intr,soft,bg,rsize=32768,wsize=32768, loc=lfs1.cs.aau.dk:/q/lfs1_8/
schroll
Aug 19 10:49:15 elm automount[9731]: parse(sun): mounting root /user, mountpoint
schroll, what lfs1.cs.aau.dk:/q/lfs1_8/schroll, fstype nfs, op
tions udp,intr,soft,bg,rsize=32768,wsize=32768 
Aug 19 10:49:15 elm automount[9731]: mount(nfs):  root=/user name=schroll
what=lfs1.cs.aau.dk:/q/lfs1_8/schroll, fstype=nfs, options=udp,intr,s
oft,bg,rsize=32768,wsize=32768
Aug 19 10:49:15 elm automount[9731]: mount(nfs): nfs
options="udp,intr,soft,bg,rsize=32768,wsize=32768", nosymlink=0
Aug 19 10:49:15 elm automount[9731]: mount(nfs): is_local_mount:
lfs1.cs.aau.dk:/q/lfs1_8/schroll
Aug 19 10:49:15 elm automount[9731]: mount(nfs): from
lfs1.cs.aau.dk:/q/lfs1_8/schroll elected lfs1.cs.aau.dk:/q/lfs1_8/schroll
Aug 19 10:49:15 elm automount[9731]: mount(nfs): calling mkdir_path /user/schroll
Aug 19 10:49:15 elm automount[9731]: mount(nfs): calling mount -t nfs -s  -o
udp,intr,soft,bg,rsize=32768,wsize=32768 lfs1.cs.aau.dk:/q/lfs1_8/
schroll /user/schroll
Aug 19 10:49:15 elm automount[2206]: st_readmap: status 2 
Aug 19 10:49:15 elm automount[9731]: >> mount: backgrounding
"lfs1.cs.aau.dk:/q/lfs1_8/schroll"


Now this is interesting. It's backgrounding the mount process for some reason.
we can not see why. Included above is also the parent process  (pid 2206) to the
automount process that was spawned to mount the scroll homedirectory. Why the
parent gives the following line and wether it has any importance I do not know:

Aug 19 10:49:15 elm automount[2206]: sig 1 switching from 1 to 4
 
We can also see that the automount process has spawned a child with a ps -ef |
grep automount:
root      2206     1  0 09:54 ?        00:00:15 /usr/sbin/automount --timeout=60
--debug /user yp auto_user udp,intr,soft,bg,rsize=32768,wsize=32768
root      2257     1  0 09:54 ?        00:00:00 /usr/sbin/automount --timeout=60
--debug /coll yp xauto_coll8 udp,intr,soft,bg,rsize=32768,wsize=32768
root      2324     1  0 09:54 ?        00:00:00 /usr/sbin/automount --timeout=60
--debug /pack yp xauto_pack8 udp,intr,soft,bg,rsize=32768,wsize=32768
root      2395     1  0 09:54 ?        00:00:00 /usr/sbin/automount --timeout=60
--debug /project yp auto_project udp,intr,soft,bg,rsize=32768,wsize=32768
root      2465     1  0 09:54 ?        00:00:00 /usr/sbin/automount --timeout=60
--debug /n program /etc/auto.net
root      9731  2206  0 10:49 ?        00:00:00 /usr/sbin/automount --timeout=60
--debug /user yp auto_user udp,intr,soft,bg,rsize=32768,wsize=32768

Now the problems is that while the process is backgrounded the directory is,
naturally, not mounted. But hte backgrounded process does not appear to ever
return or exit in any way.

Killing the process usually results in kernel panic (seen two times). Or it just
becomes defunct and the automounter stops mounting for that particular map (/user)

Any hints on what to do with this?

Now perhaps the following is unrelated or perhaps it is related, I'm not sure.
But a netstat -a sometimes result in one or more:

warning, got duplicate tcp line.

on stderr. This can not be a good sign?


Additional info:

Kernel: 2.6.9-11.ELsmp
OS: Red Hat Enterprise Linux WS release 4 (Nahant Update 1)
nfs-utils-1.0.6-46
autofs-4.1.3-131

/etc/sysconfig/autofs is standard except for a -d in the DAEMONOPTIONS line.

Any help is greatly apprectiated as we strugle quite a great deal with this. We
have set a deadline for monday 22/8-2005 before returning to RHEL3 on at least 2
server (nfs clients) that might have similar characteristics as the client
described here. The description might be a bit unstructured, but looking
isolated on the last situation it should be clear enough, I hope :)

Comment 8 Joe Pruett 2005-08-20 18:15:39 UTC

i am also seeing the 'address already in use' now.  so my nightly cron jobs
still don't run successfully.

Comment 9 Steve Dickson 2005-08-22 13:04:31 UTC

It appears your running out of privilege ports (i.e. ports that
are between 1 and 1023. Although in reality this port ranged
is a bit smaller due to system network daemons).

When privilege ports become exhausted, non-privilege
(or insecure) ports are tried.  Since servers, by default,  are
configured not to accept connection on non-privilege ports,
those connect requests are failed .

The main problem is that it take two TCP connections for every
1 mount (one connection to rpc.mountd to get the file handle, and
the other for that actual mount)  and the first connection ends
up in TIME_WAIT, basically making a port unusable for a
minute or so.

A couple work-around could be, use udp or allow your
NFS server to accept connection from non-privilege ports.
The true solution is to make one connection per server instead
of one connection per filesystem... something we need to
push upstream on....

Comment 11 Andy Ross 2005-09-02 15:52:48 UTC

What is the purpose behind this feature anyway?  It has been a *long* time since
software has been able to assume that ports <1024 are "secure" in any meaningful
way.  And it disallows lots of useful applications, e.g. NFS-mounting accross a
NAT router.

I looked for a configuration option somewhere (in /etc/exports, exportfs, or nfsd)
that would turn this behavior off, but couldn't fine one...

Comment 12 Steve Dickson 2005-09-02 20:20:42 UTC

exports(5) states:

       secure This  option  requires  that  requests originate on an internet
              port less than IPPORT_RESERVED (1024). This  option  is  on  by
              default. To turn it off, specify insecure.

Which means exports porting filesystem is 'insecure' will not require
clients to used secure ports.

Comment 13 Mikael M. Hansen 2005-09-08 14:24:26 UTC

Is there any prognosis on the resolution of this bug? 

We have several other servers that are less likely to ran into the "not enough
available ports" problems, but still the chances are there.

Also I'm not convinced that the problems is only related to mounts. We have many
TIME_WAIT connections against NIS servers. A number like 20 is not uncommon on
our workstations (Only one user). So there must be something wrong inside the
kernel, perhaps connections in general are not closed properly. Atleast the
problem also exists in the NIS code.

Comment 14 Steve Dickson 2005-09-09 09:51:21 UTC

Your correct, this is a port exhaustion issue than a mount issue.
In RHEL4 U2 (in which the beta release is now available) we've
made some changes to the glibc RPC client routines that should
address is this issue. Please up2date to the beta release to see
if this helps...

Comment 15 Mikael M. Hansen 2006-01-02 11:11:24 UTC

First, sorry for the late reply. 

We have no available server to test it on, they are all back to running RHEL3.
We will no be able to test earlier than summer 2006. So for my part let us say
it is fixed.