Bug 818759

Summary: autofs RPC UDP bind() but never releases
Product: Red Hat Enterprise Linux 5 Reporter: David <phoned>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.7CC: ikent, phoned
Target Milestone: rcFlags: phoned: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-02 13:16:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David 2012-05-03 21:00:26 UTC
Description of problem:
autofs bind()s to a udp socket to what looks to be receiving the results from an RPC request. This socket is never closed once the request is received.

Version-Release number of selected component (if applicable):
At least autofs-5.0.1-0.rc2.156.el5_7.1.x86_64 and possibly more.

How reproducible:
- Mount a data location
- Run pfiles2 on automount process and notice ports in use.
- Try to bind to one of these ports

- pfiles2 is a systemtap script obtainable from: https://access.redhat.com/knowledge/solutions/45294

Steps to Reproduce:
1. Mount a data location using automount
2. Run pfiles2 against automount process and observe extra port in use.
3. Try to bind() to the port using nc -l (for this example, in our situation it was ISV software which had a hard-coded port)

(You could also tcpdump port sunrpc to the nfs server, note the source port, and try to nc -l that source port)

Actual results:
Port is in use by automount still.

Expected results:
- Port should not be in use and should be available to system.
- We have LOTS of automount activity and eventually this consumes ports that our users try to use.

Additional info:
I have a support case # that hit this problem. I can open another case for this if that would help.

Comment 1 Ian Kent 2012-05-04 03:22:25 UTC
What does the map entry used look like?

Comment 2 Ian Kent 2012-05-04 05:10:47 UTC
I can't reproduce this, how about more information about the
environment and an actual example, with the mount map entry,
showing how you you verified the issue.

Comment 3 David 2012-05-04 15:58:06 UTC
Hi Ian,

I will admit it is not reproducible 100% of the time (which could make this difficult). We have lots of RHEL 5.7 systems which have these sockets in use though, so it's not just a one-off issue.

Here is a system that is currently exhibiting the problem.

pfiles2 shows automount having network sockets bound, but they are not listening or accepting connections (which is why they don't show up in lsof or netstat) -- this was uncovered during my original support case.

As you can see, nothing (like nc) can bind to these ports until autofs is restarted, at which time they are properly released.

Since we do *lots* of backups to automount locations, we had some systems eventually use ports that new software needed to use.

Most these are mounting netapp filers which are at version 8.x.

I had to obscure the log file a bit, as this is a work system.

Thanks,

# uname -a
Linux XXXXXX 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
# rpm -q autofs
autofs-5.0.1-0.rc2.156.el5_7.1
# /var/tmp/pfiles2 `pgrep automount` | grep port 
        sockname: AF_INET SERVERIP  port: 60313
        sockname: AF_INET SERVERIP  port: 42008
        sockname: AF_INET SERVERIP  port: 43515
        sockname: AF_INET SERVERIP  port: 36265
        sockname: AF_INET SERVERIP  port: 54974
        sockname: AF_INET SERVERIP  port: 54913
        sockname: AF_INET SERVERIP  port: 37698
        sockname: AF_INET SERVERIP  port: 35656
        sockname: AF_INET SERVERIP  port: 53627
        sockname: AF_INET SERVERIP  port: 49729
        sockname: AF_INET SERVERIP  port: 49947
        sockname: AF_INET SERVERIP  port: 39910
        sockname: AF_INET SERVERIP  port: 36949
        sockname: AF_INET SERVERIP  port: 56495
        sockname: AF_INET SERVERIP  port: 51626
        sockname: AF_INET SERVERIP  port: 39208
        sockname: AF_INET SERVERIP  port: 43320
        sockname: AF_INET SERVERIP  port: 51062
        sockname: AF_INET SERVERIP  port: 43946
        sockname: AF_INET SERVERIP  port: 60131
        sockname: AF_INET SERVERIP  port: 45480
        sockname: AF_INET SERVERIP  port: 42266
        sockname: AF_INET SERVERIP  port: 54112
        sockname: AF_INET SERVERIP  port: 54256
        sockname: AF_INET SERVERIP  port: 33705
        sockname: AF_INET SERVERIP  port: 35375
        sockname: AF_INET SERVERIP  port: 37729
        sockname: AF_INET SERVERIP  port: 33943
        sockname: AF_INET SERVERIP  port: 46559
        sockname: AF_INET SERVERIP  port: 54881
        sockname: AF_INET SERVERIP  port: 59169
        sockname: AF_INET SERVERIP  port: 48312
        sockname: AF_INET SERVERIP  port: 34267
        sockname: AF_INET SERVERIP  port: 44124
        sockname: AF_INET SERVERIP  port: 43373
        sockname: AF_INET SERVERIP  port: 45528
        sockname: AF_INET SERVERIP  port: 50535
        sockname: AF_INET SERVERIP  port: 38191
        sockname: AF_INET SERVERIP  port: 55374
# nc -l 33943
nc: Address already in use
# nc -l 50535
nc: Address already in use
# netstat -anp | egrep '33943|50535'
# service autofs restart
Stopping automount:                                        [  OK  ]
Starting automount:                                        [  OK  ]
# /var/tmp/pfiles2 `pgrep automount` | grep port 
# nc -l 33943

# nc -l 50535


#

Comment 4 Ian Kent 2012-05-05 10:54:10 UTC
(In reply to comment #3)
> Hi Ian,
> 
> I will admit it is not reproducible 100% of the time (which could make this
> difficult). We have lots of RHEL 5.7 systems which have these sockets in use
> though, so it's not just a one-off issue.
> 
> Here is a system that is currently exhibiting the problem.
> 
> pfiles2 shows automount having network sockets bound, but they are not
> listening or accepting connections (which is why they don't show up in lsof or
> netstat) -- this was uncovered during my original support case.
> 
> As you can see, nothing (like nc) can bind to these ports until autofs is
> restarted, at which time they are properly released.
> 
> Since we do *lots* of backups to automount locations, we had some systems
> eventually use ports that new software needed to use.
> 
> Most these are mounting netapp filers which are at version 8.x.
> 
> I had to obscure the log file a bit, as this is a work system.

Sure, I get that there's a problem but we still need to work
out if it is in fact the automount code or some other code or
a fault some other mechanism, like file handle closes at exit
of forked processes.

For example, if your using simple indirect or direct mounts
automount doesn't bind to any UDP port so we know we're looking
for something more obscure. That's why I asked for example
maps, but be sure to include all different constructs so we
know if we need to look at the automount rpc code too.

I really can't see how that code could do this since all the
rpc clients are contained in structures local to the functions
that call the that code and my initial analysis shows matching
close(2) calls for all bind(2) calls (for map entries that will
cause them).

A full debug log might also give us a lead.
We can make the bug private to RedHat associates (and those we
include in the bug cc list) if that is needed.

Ian

Comment 5 David 2012-05-07 16:08:24 UTC
Hi Ian,

I suspect that's definitely true. So far, I've been unable to strace any user-land process which is using port numbers in the range I've seen (automount uses < 1024, mount.nfs by itself shows more than one rpc connection, but I can only see one of them in strace).

In either case, our maps on the comment above are very simple with an ldap backend.

$ egrep -v '^#' /etc/auto.master 
/home   ldap:automountMapName=auto_home,ou=XXYY,ou=Unix,o=ZZ --ghost
/net /etc/auto.net

All entries are of the form:
dn: automountKey=unixtest,automountMapName=auto_home,ou=XXYY,ou=unix,o=ZZ
automountKey: unixtest
automountInformation: WWWW:/vol/vol01/home/&
cn: unixtest
nisMapEntry: WWWW:/vol/vol01/home/&
nisMapName: auto_home
objectClass: automount
objectClass: nisObject
objectClass: top

With just a few of them having the absolute path to the nfs location.

I will try to capture more information when it triggers this issue.

Comment 6 Ian Kent 2012-05-15 01:50:03 UTC
(In reply to comment #5)
> Hi Ian,
> 
> I suspect that's definitely true. So far, I've been unable to strace any
> user-land process which is using port numbers in the range I've seen (automount
> uses < 1024, mount.nfs by itself shows more than one rpc connection, but I can
> only see one of them in strace).

That doesn't sound good.
I went to a lot of trouble, the point of the autofs RPC code, to
get autofs to use mostly higher numbered ports.

> 
> In either case, our maps on the comment above are very simple with an ldap
> backend.
> 
> $ egrep -v '^#' /etc/auto.master 
> /home   ldap:automountMapName=auto_home,ou=XXYY,ou=Unix,o=ZZ --ghost
> /net /etc/auto.net
> 
> All entries are of the form:
> dn: automountKey=unixtest,automountMapName=auto_home,ou=XXYY,ou=unix,o=ZZ
> automountKey: unixtest
> automountInformation: WWWW:/vol/vol01/home/&
> cn: unixtest
> nisMapEntry: WWWW:/vol/vol01/home/&
> nisMapName: auto_home
> objectClass: automount
> objectClass: nisObject
> objectClass: top
> 
> With just a few of them having the absolute path to the nfs location.

Presumably you mean some don't use the & substitution.

These are simple indirect mounts so in RHEL-5 automount
shouldn't be calling the autofs RPC code at all.

There are other sources of UDP usage such as getaddrinfo(3)
and the like and if you're using the internal hosts map that
will call into the autofs RPC code.

Ian

Comment 7 RHEL Program Management 2014-03-07 12:46:34 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 8 RHEL Program Management 2014-06-02 13:16:24 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).