Bug 155566
Summary: | mount sometimes uses insecure ports for nfs | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Thomas J. Baker <tjb> |
Component: | nfs-utils | Assignee: | Steve Dickson <steved> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ben Levenson <benl> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | andy, joey, mhansen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHEL4U4 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-12-21 10:16:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Thomas J. Baker
2005-04-21 14:21:28 UTC
What's the version of the util-linux rpm your using? Also what error are you seeing that makes you believe that rpc.mountd is not using secure ports? util-linux-2.12a-16.EL4.6 and the clients say this: pr 20 04:02:05 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58479 Apr 20 04:02:05 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58547 Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58612 Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58672 Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58711 Apr 20 04:02:06 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 58738 Apr 20 04:02:08 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59000 Apr 20 04:02:09 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59040 Apr 20 04:02:09 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59115 Apr 20 04:02:09 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59157 Apr 20 04:02:10 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59201 Apr 20 04:02:10 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59218 Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59597 Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59604 Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59611 Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59620 Apr 20 04:02:23 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59627 Apr 20 04:02:24 hobbes rpc.mountd: refused mount request from blackstar.sr.unh.edu for /data/home/rcc/rfsl (/data/home): illegal port 59634 Check that, the servers are logging illegal port. Client blackstar running rhel4 is mounting various other systems. I guess I'm mixing my terms. I say insecure when I mean unprivileged. (insecure is the export option). RHEL4 is using unprivileged ports when requesting an nfs mount some of the time. we see very similar things but the remote server is a netapp. there is a bug under rhel3 (154678) that looks to be the same thing. too many tcp ports get left in time_wait, so a non-privileged port gets used. and it happens at night for us as well, probably backups or something like that. setting the server to allow insecure mounts will work around it, but the fix listed for 154678 can hopefully be put into rhel4. Hi We also have it. Exactly the same symptoms. We'll try to use the insecure options and hopefully this will introduce some continuity in the mounts. What the status of this from RedHat's side. Is it something we can expect to be fixed in an update? Well. Now we've tested the "solution" with insecure ports. But that only gives us another type of error on the client (from /var/log/messages): Aug 17 14:02:43 elm automount[4703]: >> nfs bindresvport: Address already in use Aug 17 14:02:43 elm automount[4703]: mount(nfs): nfs: mount failure lfs1.cs.aau.dk:/q/lfs1_10/gauss on /user/gauss Aug 17 14:02:43 elm automount[4703]: failed to mount /user/gauss Aug 17 14:02:43 elm automount[4705]: failed to mount /user/gauss/.public_html Aug 17 14:02:43 elm automount[4706]: failed to mount /user/gauss/.htaccess Path and all are correct. So why this happened we do not know. But it must be a bug somewhere. I have not found any existing bugs that resembles. Does anyone have any idea? As the problem occured on a production machine (apparently some load is needed to trigger the problems) we decided to make the setup of the new host (RHEL4 Upd1) the same as the old server (RHEL3 Upd 2). So we changed to udp based mounts (/etc/auto.master): /user auto_user udp,intr,soft,bg,rsize=32768,wsize=32768 /coll xauto_coll8 udp,intr,soft,bg,rsize=32768,wsize=32768 /pack xauto_pack8 udp,intr,soft,bg,rsize=32768,wsize=32768 /project auto_project udp,intr,soft,bg,rsize=32768,wsize=32768 /n /etc/auto.net We still have insecure in the exports of the server. That should be removed, but doing so just created the Permision denied problem due to the many ports left in the TIME_WAIT state. That should naturally be fixed. Now this setup is the best we have seen so far. But we see a few messages like this in /var/log/messages: Aug 19 10:49:15 elm automount[9731]: >> mount: backgrounding "lfs1.cs.aau.dk:/q/lfs1_8/schroll" In the debug of the autofs for the same pid we see: Aug 19 10:49:15 elm automount[9731]: lookup(yp): looking up schroll Aug 19 10:49:15 elm automount[9731]: ret = 1 Aug 19 10:49:15 elm automount[9731]: lookup(yp): schroll -> lfs1.cs.aau.dk:/q/lfs1_8/schroll Aug 19 10:49:15 elm automount[9731]: parse(sun): expanded entry: lfs1.cs.aau.dk:/q/lfs1_8/schroll Aug 19 10:49:15 elm automount[9731]: parse(sun): gathered options: udp,intr,soft,bg,rsize=32768,wsize=32768 Aug 19 10:49:15 elm automount[2206]: sig 1 switching from 1 to 4 Aug 19 10:49:15 elm automount[9728]: failed to mount /user/.htaccess Aug 19 10:49:15 elm automount[9731]: parse(sun): dequote("lfs1.cs.aau.dk:/q/lfs1_8/schroll") -> lfs1.cs.aau.dk:/q/lfs1_8/schroll Aug 19 10:49:15 elm automount[2206]: get_pkt: state 1, next 4 Aug 19 10:49:15 elm automount[9728]: umount_multi: path=/user/.htaccess incl=1 Aug 19 10:49:15 elm automount[9731]: parse(sun): core of entry: options=udp,intr,soft,bg,rsize=32768,wsize=32768, loc=lfs1.cs.aau.dk:/q/lfs1_8/ schroll Aug 19 10:49:15 elm automount[9731]: parse(sun): mounting root /user, mountpoint schroll, what lfs1.cs.aau.dk:/q/lfs1_8/schroll, fstype nfs, op tions udp,intr,soft,bg,rsize=32768,wsize=32768 Aug 19 10:49:15 elm automount[9731]: mount(nfs): root=/user name=schroll what=lfs1.cs.aau.dk:/q/lfs1_8/schroll, fstype=nfs, options=udp,intr,s oft,bg,rsize=32768,wsize=32768 Aug 19 10:49:15 elm automount[9731]: mount(nfs): nfs options="udp,intr,soft,bg,rsize=32768,wsize=32768", nosymlink=0 Aug 19 10:49:15 elm automount[9731]: mount(nfs): is_local_mount: lfs1.cs.aau.dk:/q/lfs1_8/schroll Aug 19 10:49:15 elm automount[9731]: mount(nfs): from lfs1.cs.aau.dk:/q/lfs1_8/schroll elected lfs1.cs.aau.dk:/q/lfs1_8/schroll Aug 19 10:49:15 elm automount[9731]: mount(nfs): calling mkdir_path /user/schroll Aug 19 10:49:15 elm automount[9731]: mount(nfs): calling mount -t nfs -s -o udp,intr,soft,bg,rsize=32768,wsize=32768 lfs1.cs.aau.dk:/q/lfs1_8/ schroll /user/schroll Aug 19 10:49:15 elm automount[2206]: st_readmap: status 2 Aug 19 10:49:15 elm automount[9731]: >> mount: backgrounding "lfs1.cs.aau.dk:/q/lfs1_8/schroll" Now this is interesting. It's backgrounding the mount process for some reason. we can not see why. Included above is also the parent process (pid 2206) to the automount process that was spawned to mount the scroll homedirectory. Why the parent gives the following line and wether it has any importance I do not know: Aug 19 10:49:15 elm automount[2206]: sig 1 switching from 1 to 4 We can also see that the automount process has spawned a child with a ps -ef | grep automount: root 2206 1 0 09:54 ? 00:00:15 /usr/sbin/automount --timeout=60 --debug /user yp auto_user udp,intr,soft,bg,rsize=32768,wsize=32768 root 2257 1 0 09:54 ? 00:00:00 /usr/sbin/automount --timeout=60 --debug /coll yp xauto_coll8 udp,intr,soft,bg,rsize=32768,wsize=32768 root 2324 1 0 09:54 ? 00:00:00 /usr/sbin/automount --timeout=60 --debug /pack yp xauto_pack8 udp,intr,soft,bg,rsize=32768,wsize=32768 root 2395 1 0 09:54 ? 00:00:00 /usr/sbin/automount --timeout=60 --debug /project yp auto_project udp,intr,soft,bg,rsize=32768,wsize=32768 root 2465 1 0 09:54 ? 00:00:00 /usr/sbin/automount --timeout=60 --debug /n program /etc/auto.net root 9731 2206 0 10:49 ? 00:00:00 /usr/sbin/automount --timeout=60 --debug /user yp auto_user udp,intr,soft,bg,rsize=32768,wsize=32768 Now the problems is that while the process is backgrounded the directory is, naturally, not mounted. But hte backgrounded process does not appear to ever return or exit in any way. Killing the process usually results in kernel panic (seen two times). Or it just becomes defunct and the automounter stops mounting for that particular map (/user) Any hints on what to do with this? Now perhaps the following is unrelated or perhaps it is related, I'm not sure. But a netstat -a sometimes result in one or more: warning, got duplicate tcp line. on stderr. This can not be a good sign? Additional info: Kernel: 2.6.9-11.ELsmp OS: Red Hat Enterprise Linux WS release 4 (Nahant Update 1) nfs-utils-1.0.6-46 autofs-4.1.3-131 /etc/sysconfig/autofs is standard except for a -d in the DAEMONOPTIONS line. Any help is greatly apprectiated as we strugle quite a great deal with this. We have set a deadline for monday 22/8-2005 before returning to RHEL3 on at least 2 server (nfs clients) that might have similar characteristics as the client described here. The description might be a bit unstructured, but looking isolated on the last situation it should be clear enough, I hope :) i am also seeing the 'address already in use' now. so my nightly cron jobs still don't run successfully. It appears your running out of privilege ports (i.e. ports that are between 1 and 1023. Although in reality this port ranged is a bit smaller due to system network daemons). When privilege ports become exhausted, non-privilege (or insecure) ports are tried. Since servers, by default, are configured not to accept connection on non-privilege ports, those connect requests are failed . The main problem is that it take two TCP connections for every 1 mount (one connection to rpc.mountd to get the file handle, and the other for that actual mount) and the first connection ends up in TIME_WAIT, basically making a port unusable for a minute or so. A couple work-around could be, use udp or allow your NFS server to accept connection from non-privilege ports. The true solution is to make one connection per server instead of one connection per filesystem... something we need to push upstream on.... What is the purpose behind this feature anyway? It has been a *long* time since software has been able to assume that ports <1024 are "secure" in any meaningful way. And it disallows lots of useful applications, e.g. NFS-mounting accross a NAT router. I looked for a configuration option somewhere (in /etc/exports, exportfs, or nfsd) that would turn this behavior off, but couldn't fine one... exports(5) states: secure This option requires that requests originate on an internet port less than IPPORT_RESERVED (1024). This option is on by default. To turn it off, specify insecure. Which means exports porting filesystem is 'insecure' will not require clients to used secure ports. Is there any prognosis on the resolution of this bug? We have several other servers that are less likely to ran into the "not enough available ports" problems, but still the chances are there. Also I'm not convinced that the problems is only related to mounts. We have many TIME_WAIT connections against NIS servers. A number like 20 is not uncommon on our workstations (Only one user). So there must be something wrong inside the kernel, perhaps connections in general are not closed properly. Atleast the problem also exists in the NIS code. Your correct, this is a port exhaustion issue than a mount issue. In RHEL4 U2 (in which the beta release is now available) we've made some changes to the glibc RPC client routines that should address is this issue. Please up2date to the beta release to see if this helps... First, sorry for the late reply. We have no available server to test it on, they are all back to running RHEL3. We will no be able to test earlier than summer 2006. So for my part let us say it is fixed. |