Bug 103401

Summary: RPC (on boot) port selection collisions between various applications
Product: Red Hat Enterprise Linux 6 Reporter: Steve Bonneville <sbonnevi>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: NEW --- QA Contact: qe-baseos-tools
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: aleksey, areichow, ashankar, berend.de.schouwer, borisv, bruce_friedman, btotty, bugs, bugzilla, casmith, chris, cmc, crn1, cseraphi, csimpson, ddumas, dpal, dyocum, ejtr, erco, fhirtz, fweimer, giardina, hamiller, iannis, igeorgex, jakub, jay, jeremy, jkeck, jorton, jplans, jss, k.georgiou, ksrot, lhecking, linux_support, marcobillpeter, mgalgoci, mkarg, mnewsome, morioka, ohudlick, okapi, orion, ovasik, pfrankli, pyaduvan, raines, rc040203, rdieter, riek, rollercow, rrussell, saa_ch11, smayhew, smooge, sputhenp, tao, tjb, trondeg, tru, twaugh, zing
Target Milestone: rcKeywords: FutureFeature, Reopened, Tracking
Target Release: ---   
Hardware: All   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=223937
https://bugzilla.redhat.com/show_bug.cgi?id=455859
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-11 09:59:17 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On: 723228, 786076, 790682, 790684, 790685, 790686, 790687, 790690, 802240, 913509    
Bug Blocks: 1269194, 168996, 170445, 756082    

Description Steve Bonneville 2003-08-29 16:13:45 EDT
Description of problem:

On boot, ypbind occasionally grabs port 631/udp, blocking CUPS from binding to
the port.  This is a glibc problem because ypbind is a RPC service that has its
port assigned dynamically through bindresvport().

The code in libc/sunrpc/bindrsvprt.c shows the port number is assigned purely
based on the PID of the ypbind process, something like

  port = (PID % 424) + 600

The PID seems to vary slightly from reboot to reboot, but generally is in the
870s on the machine in question, resulting in ports assigned in the vicinity of
630.  CUPS (actually, IPP) has ports 631/tcp and 631/udp reserved.  NIS starts
first, so it wins, and since CUPS has a reserved Well Known Port, it can't
relocate and loses.

Version-Release number of selected component (if applicable):
glibc-2.3.2-27.9

How reproducible:
Depends entirely on the PID handed to ypbind on boot, and the exact set of
services configured affects this.  

Suggested Fix:
The glibc algorithm already blacklists all reserved ports below 600, presumably
to avoid this exact problem.  Consider altering the code to blacklist 5 to 8
additional ports in the 600-1023 range that are or may be in common use:

  631     (IPP == CUPS)
  636     (LDAPS)
  749     (Kerberos V kadmin)
  873     (rsyncd)
  992-995 (SSL-enabled telnet, IMAP, IRC, and POP3)

The ports lost could be recovered, if desired, by allowing ports in the 590-600
range to be assigned by bindrsvprt().
Comment 1 Jakub Jelinek 2003-09-01 17:43:29 EDT
According to Ulrich Drepper, daemons requiring specific ports in the 600-1023
range need to be started before any daemons using bindresvport.
Comment 2 Tim Waugh 2003-09-02 12:29:53 EDT
I think that portmap ought to be tought how to avoid a list of ports. 
Otherwise, once I've stopped cupsd (for whatever reason) and restarted ypbind, I
can no longer be sure that cupsd will ever start again.
Comment 3 Tim Waugh 2003-09-02 17:06:25 EDT
(see above comment)
Comment 4 Steve Bonneville 2003-09-03 02:04:59 EDT
If you accept Ulrich's argument that this is a service bug, then this is NOT
just a CUPS bug.  It's also a bug against xinetd-2.3.12-0.3E, openldap-2.0.27-9,
and krb5-server-1.2.7-15.  ALL of the services in the distribution that I know
about that use Well Known Ports in the 600-1023 range start up AFTER ypbind, not
before! 



 



Comment 5 Tim Waugh 2003-09-03 11:04:29 EDT
This problem should be solved in portmap.  Here is a proof-of-concept, which I
think should be merged into portmap: http://cyberelk.net/tim/portreserve.  In
summary: portmap should read a directory of configuration files, one per daemon,
each containing the name of the service that is out of bounds.

(I know that bindresvport is the culprit, but what other users of that are there
really?)
Comment 6 Steve Bonneville 2003-11-14 19:51:46 EST
I'm moving this to RHEL 3 because we're still seeing this in operation
on servers with that version installed.

I don't know what else uses bindresvport, if anything, I just know
that if you have ypserv and CUPS running, there's a pretty decent
chance of a collision on boot which breaks CUPS.

Comment 7 Joe Orton 2003-12-24 05:32:08 EST
spamassassin's spamd is another possible collision (though not in
RHEL), it uses port 783 by default.
Comment 8 Aleksey Nogin 2004-01-06 23:50:06 EST
*** Bug 83985 has been marked as a duplicate of this bug. ***
Comment 9 Tim Waugh 2004-01-15 11:40:41 EST
*** Bug 113586 has been marked as a duplicate of this bug. ***
Comment 10 Jeff Minelli 2004-01-16 15:09:27 EST
NFS also causes port collisions, more specifically rpc.rquotad.
Comment 11 Orion Poplawski 2004-05-12 12:35:31 EDT
I just had a collision between rpc.mountd and cups.
Comment 13 Alexandre Oliva 2004-07-20 15:42:40 EDT
Regarding command #1 (daemons requiring specific ports in the 600-1023
range need to be started before any daemons using bindresvport): if
you consider that ypbind (that uses bindresvport) is often necessary
to obtain user information, host information, service port numbers,
etc, and that starting services that require such information before
it's available won't work, we've got a catch 22 situation.  We really
need a better way to reserve ports such that portmap doesn't take them
over.

portreserve looks like a perfect solution for the problem, except that
it can't rely on th ypbind services map, so there's a slight risk that
it might reserve a port based on /etc/services that turns out to be
different in the NIS map, or even that the port number can't be
identified because it's only defined in the services map.  I suppose
it's a resonable requirement to have any ports that are to be reserved
by portreserve defined in /etc/services or some other database that's
available early enough in the boot.
Comment 14 Bill Nottingham 2004-08-05 17:07:37 EDT
*** Bug 125962 has been marked as a duplicate of this bug. ***
Comment 15 Steve Dickson 2004-08-12 14:22:48 EDT
reassing to Jakub the glibc maintainer
Comment 16 Ulrich Drepper 2004-09-28 01:54:01 EDT
This is no glibc issue.  Lacking a better idea, I'll assign it to
initscripts since a solution à la portreserve would be part of
initscripts.  Changing bindresvport() is no option since there are no
universally available ports (look at the IANA list).  So an extern
solution like portreserve is needed.  I think it can work nicely and
should not be hard the integrate.
Comment 21 Bruce Friedman 2005-02-04 11:34:14 EST
I am seeing a similar issue now on Fedora Core 3 where rpc.mountd grabs port 783
before spamassassin has a chance to grab it.

Need to move nfs init sequence from 60 to 81 to place it after spamassassin to
clear the problem.

Should the port be reserved, or is rearrangement of init order more appropriate?
Comment 26 Steve Bonneville 2005-03-15 09:47:32 EST
As an update, and not that it should be a surprise since nothing has been
changed, but we are seeing the exact issue I originally reported (on Red Hat
Linux 9, as I recall) on Red Hat Enterprise Linux 4 as well.
Comment 30 Paul Raines 2005-07-16 17:25:39 EDT
Just want to add that I have seen this on several RHEL4 clients too.
Also, having ypbind start AFTER other services is not an option as
ypbind is needed to see all accounts and many services need to
see those accounts when starting (e.g.imapd for port 993)
Comment 31 Jeff Minelli 2005-07-28 21:12:58 EDT
Is this ever going to be fixed? This is a wide spread problem that will affect almost everyone at some 
point....
Comment 32 Jeff Minelli 2005-07-28 21:13:49 EDT
Is this ever going to be fixed? This is a wide spread problem that will affect almost everyone at some 
point....
Comment 33 John Dennis 2005-07-29 14:03:48 EDT
*** Bug 154800 has been marked as a duplicate of this bug. ***
Comment 34 Ralf Corsepius 2005-09-15 15:09:52 EDT
I would propose to make this issue a release blocker for FC5.
Comment 35 Bill Nottingham 2005-09-21 17:33:54 EDT
This problem is being considered for a future major release of Red Hat
Enterprise Linux. Red Hat does not currently plan to provide a resolution for
this in a Red Hat Enterprise Linux update for currently deployed systems.

With the goal of minimizing risk of change for deployed systems, and in response
to customer and partner requirements, Red Hat takes a conservative approach when
evaluating changes for inclusion in maintenance updates for currently deployed
products. The primary objectives of update releases are to enable new hardware
platform support and to resolve critical defects. 

Comment 36 Bill Nottingham 2005-09-29 15:07:54 EDT
*** Bug 51904 has been marked as a duplicate of this bug. ***
Comment 37 Jeremy Sanders 2005-10-17 09:02:42 EDT
I reported this some time ago in
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1014 

glibc seems the obvious place to fix this problem.
Comment 38 Jakub Jelinek 2005-10-17 09:06:08 EDT
And you were also told there it is a bad idea and glibc is not going to change
in this regard.
Comment 39 Jeremy Sanders 2005-10-17 09:16:53 EDT
This will be my only comment to prevent bugspam, but it says nowhere in the bug
that this is a bad idea, or why it is a bad idea. Please can someone elaborate
on this? All these issues are caused by a single point of failure - glibc. It
seems the obvious place to solve the issue. It seems pretty hairbrained to have
a special daemon to handle this. All the apps, and any 3rd party ones too, need
to be rewritten to use the daemon. Why couldn't there be a configuration file
stating the ports for bindresvport() to avoid?
Comment 41 Mike Jang 2006-01-12 15:45:11 EST
(In reply to comment #34)
> I would propose to make this issue a release blocker for FC5.

FYI, I see this problem in FC5 test 1, as reported in 154800 (which was closed
as a duplicate of this bug).
Comment 42 Orion Poplawski 2006-04-24 18:17:40 EDT
Perhaps we can reopen this and actually work on a fix? Target FC6?
Comment 43 Tim Waugh 2006-05-16 11:50:49 EDT
*** Bug 191950 has been marked as a duplicate of this bug. ***
Comment 44 Tim Waugh 2006-05-16 11:51:32 EDT
*** Bug 189144 has been marked as a duplicate of this bug. ***
Comment 45 chris 2006-12-10 10:30:29 EST
*** Bug 218216 has been marked as a duplicate of this bug. ***
Comment 46 Jan Engelhardt 2007-05-30 04:12:53 EDT
Also see https://bugzilla.novell.com/show_bug.cgi?id=262341
Comment 47 Bill Nottingham 2007-10-04 11:36:39 EDT
*** Bug 318461 has been marked as a duplicate of this bug. ***
Comment 48 Orion Poplawski 2007-10-29 18:19:14 EDT
Seems like leaving this in "Closed Deferred" means it will never get worked on.
   Is there ever a hope of a fix?
Comment 49 Stephen John Smoogen 2007-10-29 20:19:35 EDT
Any bug in RHEL-3 is dead as far as I can tell unless it is a security problem.
Comment 50 Matthew Galgoci 2007-10-29 20:33:32 EDT
This bug is still present in RHEL4U5 as I had a system hit this just a few weeks
ago. It may well be present in rhel5 also, though I haven't verified.
Comment 51 Steve Bonneville 2007-10-29 21:09:32 EDT
It is present in 5 and 5.1; at least some of the bugs marked as duplicates above
reflect that.  #218216 is the same bug in Fedora Core 6 filed against spamd in
spamassassin (port 783), for example, which is effectively RHEL 5 for this purpose.

Comment 53 Orion Poplawski 2007-10-29 22:24:56 EDT
Also present in Fedora 7, which is what got me digging this old thing up again.
 Now the service is "rpcbind" which acquires the ports.
Comment 54 Jeremy Sanders 2007-10-30 05:02:21 EDT
I still haven't seen any technical reasons why it portmap couldn't check in a
special directory for files containing reserved ports and not use these. See
comment #5. These files could be installed by RPM.
Comment 66 Boris Vinarsky 2008-03-10 20:15:36 EDT
On RHEL-4 we had practically the same problem. The program that listened on the 
port TCP 631 was rpc.statd. As a result cups died on startup with the message:
startListening: Unable to bind socket for address 7f000001:631 - Address 
already in use.
While Red Hat decides what is the best way to fix this permanently can Red Hat 
offer a workaround?
Comment 67 Jay Rishel 2008-05-12 11:13:58 EDT
I just got hit with this on rhel4, rpc.statd took port 631, preventing cups from
starting.   
Comment 68 Kazutoshi Morioka 2008-10-03 22:46:09 EDT
I got same problem on Fedora 9. krb5kdc port 750. Please fix this.
Comment 69 Kazutoshi Morioka 2008-10-03 22:49:21 EDT
I got same problem on Fedora 9. krb5kdc port 750. Please fix this.
Comment 72 Tim Waugh 2008-10-14 11:46:13 EDT
The workaround is: use portreserve (see comment #5).

This is already included in Fedora 10, and CUPS in Fedora 10 makes use of it in its initscript.

spamassassin has been mentioned as another service that could make use of portreserve.  Changing component and reassigning.
Comment 73 Stephen John Smoogen 2008-10-14 12:02:44 EDT
Can portreserve work with RHEL-4/5? If it can, I would be happy to get it into EPEL as a help.
Comment 74 Tim Waugh 2008-10-14 12:24:05 EDT
Yes, I don't see any reason it wouldn't work.  However, services that want to use it need to modify their initscript (to call 'portrelease').

Packaging it for EPEL might help for third-party applications though, and other EPEL packages.
Comment 75 Jeremy Sanders 2008-10-14 12:31:48 EDT
Can someone enlighten me how portreserve is not a race condition waiting to happen? As far as I can tell, you get portreserve to release the port before starting the service. How do you make sure nothing else gets in there first before the portreserve command and the service-starting command? I suspect it's unlikely that something could slip in between these commands, but is this good programming?
Comment 76 Tim Waugh 2008-10-14 12:47:13 EDT
(In reply to comment #75)
> Can someone enlighten me how portreserve is not a race condition waiting to
> happen? As far as I can tell, you get portreserve to release the port before
> starting the service. How do you make sure nothing else gets in there first
> before the portreserve command and the service-starting command? I suspect it's
> unlikely that something could slip in between these commands, but is this good
> programming?

Without support in glibc/portmap there is no generic way around this.
Comment 77 Jay Rishel 2009-01-21 10:15:10 EST
have any of the portreserve changes in Fedora 10 made it into the RHEL5.3 release? or would this wait until RHEL6?
Comment 78 Tim Waugh 2009-01-21 10:22:41 EST
It's not in RHEL-5.3.
Comment 79 Chris Schanzle 2009-06-26 12:20:13 EDT
Confirming 5.3 has issues.  After updating and rebooting 105 clients with a recent kernel patch + others, one had this issue logged in /var/log/cups/error_log:

E [25/Jun/2009:21:11:11 -0400] Unable to bind broadcast socket - Address already in use.

Cups didn't get broadcasted print queues until after restarting cups.

The curious thing is that I hard-bind rpc services; status, nlockmgr, ypbind are listing on *my* chosen ports, non of the conflicting with cups's 631.

We've also had past (but none this time) random issues with rsyncd not being able to bind to its listening port, I'm extrapolating the root cause is the same.
Comment 80 Warren Togami 2009-12-03 00:10:52 EST
I'm adding the portreserve hack to spamassassin-3.3.0 for RHEL-6.  But this is not a complete solution.  Arbitrary other services can still cause this failure.
Comment 81 Tim Waugh 2009-12-03 04:30:54 EST
Warren: how so?

The only way I know of that it can fail is after the protected service is stopped.  There's no way to close that hold without kernel support.
Comment 86 Mark Hittinger 2010-02-25 12:46:45 EST
Some ethernet NICs (broadcom, intel gigabit) have something called
ASF enabled.  This causes the NIC to gobble stuff sent to port 623
and port 664!  So there are some other ports that we need to exclude.

RPC mounts avoid these ports by using sunrpc.min_resvport which is
set to 665 these days.  So the code for mounts won't use ports below
665.

Maybe bindrsvprt.c could be modified to also use sunrpc.min_resvport
and sunrpc.max_resvport.

In the 623/664 cases there is no collision with another daemon.  The
affected NICs lose the packets.

ypmatch and ypcat will hang with do_ypcall: clnt_call: RPC: Timed out
when they pick 623 or 664 on the affected hardware.

sshd (particularly when AllowGroups is used) will also hang on the 
affected hardware when it selects 623 or 664 to do an NIS lookup.

A kludge is to use xinetd to grab the 623/664 ports for a workaround.
Comment 88 Colin Simpson 2010-07-21 07:00:34 EDT
Just been hit with this again:

# netstat -anp | grep 631
tcp        0      0 0.0.0.0:631                 0.0.0.0:*                   LISTEN      25591/cupsd         
udp        0      0 0.0.0.0:631                 0.0.0.0:*                               24623/rpc.statd     

Whilst portreserve looks like a fix, from my understanding is it not a bit of a hack around this.  Shouldn't portmaper not just be given a list of ports to never use from a file that gets appended to as services are added from packages? Or am I missing something?
Comment 89 Tim Waugh 2010-07-21 10:20:08 EDT
See comment #16.
Comment 90 Colin Simpson 2010-07-21 10:52:43 EDT
comment #16 doesn't answer why we need portreserve, it just says that there are no free unallocated low ports in the official allocation list. That's true, but shouldn't portmapper have a notion of services THIS system has or needs reserved low and not allocate them. 

Or is there just no easy way of allocating a low port in a system call that can exclude ports we want to hold back? Sounds like a new call is called for eventually?
Comment 91 releng-rhel@redhat.com 2010-11-11 09:59:17 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.
Comment 95 RHEL Product and Program Management 2011-07-05 20:03:48 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.
Comment 102 Chris Schanzle 2011-12-08 10:44:58 EST
(In reply to comment #91)
> Red Hat Enterprise Linux 6.0 is now available and should resolve
> the problem described in this bug report.

It won't since there is no portrelease support in the 6.x ypbind scripts.
Comment 103 Orion Poplawski 2011-12-08 10:48:17 EST
It's not ypbind that needs to use portreserve/release, it's everything else that uses ports in the rpc port range.
Comment 104 Jeremy Sanders 2011-12-08 10:58:28 EST
Dovecot is missing support portreserve/release (at least in Scientific Linux 6.1).
Comment 105 Chris Schanzle 2011-12-08 12:57:51 EST
(In reply to comment #103)
> It's not ypbind that needs to use portreserve/release, it's everything else
> that uses ports in the rpc port range.

What I discovered via portreserve(1) and testing is that when portreserve starts, it listens on all the reserved ports that have been configured.  Just before the real daemon starts, it needs to run portrelease so the *real* daemon can listen on it.  Look at:

grep portrelease /etc/init.d/*
cat /etc/portreserve/cups

In the case of ypbind, which can listen either on a random port (unfixable) or a fixed port (via setting OTHER_YPBIND_OPTS).  I hard-code to port 900 by setting 

/etc/sysconfig/network:OTHER_YPBIND_OPTS="-p 900"

To fix ypbind, I see two potential ways - (1) it's init script needs to parse out the -p parameter and if you get something, call portrelease on that port.  Something like:

ypport=`echo $OTHER_YPBIND_OPTS | sed 's/.*-p[[:space:]]*\(\<.*\>\).*/\1/'`
[ -n "$ypport" -a -x /sbin/portrelease ] && /sbin/portrelease $ypport &>/dev/null || :

Or (2), since the user needs to reserve a port by dropping a file into /etc/portreserve/ anyway, force them to use the service named 'ypbind' and just portrelease that:
[ -x /sbin/portrelease ] && /sbin/portrelease ypbind &>/dev/null || :

Hopefully I'll get time to test this today.

Should I open a new bug on RHEL6 for this issue?  I searched and could not find one...
Comment 110 Karel Srot 2012-01-31 06:34:18 EST
I have changed the component to "distribution". Please file a new bug for each affected component and use this one as a tracker.

rsync bug filed as Bug 786076.
Comment 120 Dan Yocum 2012-05-03 15:33:21 EDT
This bug exists in RHELv5.x, too.  I noticed it when trying to use heartbeat (port 694).  

Can we get portreserve backported to RHELv5.x?
Comment 121 Ondrej Vasik 2012-06-27 04:49:16 EDT
Just having portreserve in RHEL-5.X will not solve the issue, you need to update all the packages which already use it in RHEL-6 to achieve same results. And this is IMHO very unlikely.
Comment 122 Dan Yocum 2012-06-27 13:00:38 EDT
I wasn't suggesting updating all the RHEL5 packages that would need portreserve, but having the package itself as part of the distro (instead of having to go to some third party repo, which is what I ended up doing) would go a long way to allow a sysadmin solve the problem, IMO.  

Even if it made it into EPEL for RHEL5 I think I'd be happy...
Comment 125 Harold Miller 2012-10-23 16:44:00 EDT
I attached this bug to case 00721174, as the new RHS product can (and will) assign a port for every storage brick. In huge systems we have seen this consume all available ports under 1024.
Comment 128 Ted Rule 2014-04-27 08:48:49 EDT
Having just been caught out by this bug in CentOS6 with heartbeat, ( UDP Port 694 ), and looking at the AFS /IPMI Port 623/664 bug mentioned in previous comments, it doesn't seem unreasonable to me to make portmap/rpcbind honour sunrpc.min_resvport/sunrpc.max_resvport as per comment 86.

I presume rpc.mountd honours these proc settings, and the original setting of 600 minimum was modified specifically so as to avoid IPMI hardware clashes. If that's the case, then it would seem rpcbind is STILL vulnerable to a potential clash with IPMI hardware as long as a suitably configured portreserve is not running, even if no actual "real" daemons use those ports. Since there's willingness to avoid NFS mounts clashing, why not "protect" rpcbind with the same tweak?

If this were done, then sysadmins who only have a few clashing services in the 6xx range, but only a modicum of simultaneous RPC services/ports, could avoid the whole mess by simply raising sunrpc.min_resvport to 700, for instance.

In more complex cases, I admit that it might well be necessary to employ portreserve, but the /proc setting might well be sufficient in many cases.
Comment 129 Lars Hecking 2015-02-04 11:01:08 EST
I would just like to add that I have this problem on CentOS6 with ypserv *every* time I reboot a particular server - 6.0 through 6.6. And occasionally on CentOS 5.11 with ypbind/cups.
Comment 130 Greg Ercolano 2015-03-16 18:42:07 EDT
I haven't seen this solution mentioned in this thread:
/proc/sys/net/ipv4/ip_local_reserved_ports

Supposedly port numbers added to this file will be skipped over by the
glibc functions that auto-assign reserved ports as used by tools like
nfs/nis/etc. 

Some interesting details here:
http://www.mjmwired.net/kernel/Documentation/networking/ip-sysctl.txt#716
https://lkml.org/lkml/2012/3/10/187
Comment 132 Aaron Reichow 2016-03-08 14:54:37 EST
Any chance we'll see a fix? Having this issue on RHEL 6.7 with rpc.statd. No port collisions, but in our environment we are required to have specific ports/port ranges defined for a particular service. There doesn't seem to be anyway to force rpc.statd to use a particular port rather than follow the pid % 424 + 600 formula.