Bug 103401
Summary: | glibc: [RFE] RPC (on boot) port selection collisions between various applications | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Steve Bonneville <sbonnevi> |
Component: | glibc | Assignee: | glibc team <glibc-bugzilla> |
Status: | CLOSED UPSTREAM | QA Contact: | qe-baseos-tools-bugs |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 8.2 | CC: | aleksey, areichow, ashankar, berend.de.schouwer, borisv, bruce_friedman, bugzilla, casmith, chris, cmc, codonell, Colin.Simpson, crn1, cseraphi, cww, ddumas, dj, dpal, dyocum, ejtr, erco, fhirtz, fweimer, giardina, hamiller, igeorgex, jakub, jay, jeremy, jkeck, jorton, jplans, jss, k.georgiou, ksrot, lhecking, linux_support, marcobillpeter, maurizio.antillon, mkarg, mnewsome, okapi, orion, ovasik, pasteur, pfrankli, raines, rc040203, rdieter, riek, rkudyba, rollercow, rrussell, saa_ch11, smayhew, smooge, sputhenp, tao, tjb, trondeg, twaugh, zing |
Target Milestone: | rc | Keywords: | FutureFeature, Triaged |
Target Release: | 8.2 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-06 14:37:26 UTC | Type: | Enhancement |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 723228, 786076, 790682, 790684, 790685, 790686, 790687, 790690, 802240, 913509 | ||
Bug Blocks: | 168996, 170445, 756082, 1594286 |
Description
Steve Bonneville
2003-08-29 20:13:45 UTC
According to Ulrich Drepper, daemons requiring specific ports in the 600-1023 range need to be started before any daemons using bindresvport. I think that portmap ought to be tought how to avoid a list of ports. Otherwise, once I've stopped cupsd (for whatever reason) and restarted ypbind, I can no longer be sure that cupsd will ever start again. (see above comment) If you accept Ulrich's argument that this is a service bug, then this is NOT just a CUPS bug. It's also a bug against xinetd-2.3.12-0.3E, openldap-2.0.27-9, and krb5-server-1.2.7-15. ALL of the services in the distribution that I know about that use Well Known Ports in the 600-1023 range start up AFTER ypbind, not before! This problem should be solved in portmap. Here is a proof-of-concept, which I think should be merged into portmap: http://cyberelk.net/tim/portreserve. In summary: portmap should read a directory of configuration files, one per daemon, each containing the name of the service that is out of bounds. (I know that bindresvport is the culprit, but what other users of that are there really?) I'm moving this to RHEL 3 because we're still seeing this in operation on servers with that version installed. I don't know what else uses bindresvport, if anything, I just know that if you have ypserv and CUPS running, there's a pretty decent chance of a collision on boot which breaks CUPS. spamassassin's spamd is another possible collision (though not in RHEL), it uses port 783 by default. *** Bug 83985 has been marked as a duplicate of this bug. *** *** Bug 113586 has been marked as a duplicate of this bug. *** NFS also causes port collisions, more specifically rpc.rquotad. I just had a collision between rpc.mountd and cups. Regarding command #1 (daemons requiring specific ports in the 600-1023 range need to be started before any daemons using bindresvport): if you consider that ypbind (that uses bindresvport) is often necessary to obtain user information, host information, service port numbers, etc, and that starting services that require such information before it's available won't work, we've got a catch 22 situation. We really need a better way to reserve ports such that portmap doesn't take them over. portreserve looks like a perfect solution for the problem, except that it can't rely on th ypbind services map, so there's a slight risk that it might reserve a port based on /etc/services that turns out to be different in the NIS map, or even that the port number can't be identified because it's only defined in the services map. I suppose it's a resonable requirement to have any ports that are to be reserved by portreserve defined in /etc/services or some other database that's available early enough in the boot. *** Bug 125962 has been marked as a duplicate of this bug. *** reassing to Jakub the glibc maintainer This is no glibc issue. Lacking a better idea, I'll assign it to initscripts since a solution à la portreserve would be part of initscripts. Changing bindresvport() is no option since there are no universally available ports (look at the IANA list). So an extern solution like portreserve is needed. I think it can work nicely and should not be hard the integrate. I am seeing a similar issue now on Fedora Core 3 where rpc.mountd grabs port 783 before spamassassin has a chance to grab it. Need to move nfs init sequence from 60 to 81 to place it after spamassassin to clear the problem. Should the port be reserved, or is rearrangement of init order more appropriate? As an update, and not that it should be a surprise since nothing has been changed, but we are seeing the exact issue I originally reported (on Red Hat Linux 9, as I recall) on Red Hat Enterprise Linux 4 as well. Just want to add that I have seen this on several RHEL4 clients too. Also, having ypbind start AFTER other services is not an option as ypbind is needed to see all accounts and many services need to see those accounts when starting (e.g.imapd for port 993) Is this ever going to be fixed? This is a wide spread problem that will affect almost everyone at some point.... Is this ever going to be fixed? This is a wide spread problem that will affect almost everyone at some point.... *** Bug 154800 has been marked as a duplicate of this bug. *** I would propose to make this issue a release blocker for FC5. This problem is being considered for a future major release of Red Hat Enterprise Linux. Red Hat does not currently plan to provide a resolution for this in a Red Hat Enterprise Linux update for currently deployed systems. With the goal of minimizing risk of change for deployed systems, and in response to customer and partner requirements, Red Hat takes a conservative approach when evaluating changes for inclusion in maintenance updates for currently deployed products. The primary objectives of update releases are to enable new hardware platform support and to resolve critical defects. *** Bug 51904 has been marked as a duplicate of this bug. *** I reported this some time ago in http://sources.redhat.com/bugzilla/show_bug.cgi?id=1014 glibc seems the obvious place to fix this problem. And you were also told there it is a bad idea and glibc is not going to change in this regard. This will be my only comment to prevent bugspam, but it says nowhere in the bug that this is a bad idea, or why it is a bad idea. Please can someone elaborate on this? All these issues are caused by a single point of failure - glibc. It seems the obvious place to solve the issue. It seems pretty hairbrained to have a special daemon to handle this. All the apps, and any 3rd party ones too, need to be rewritten to use the daemon. Why couldn't there be a configuration file stating the ports for bindresvport() to avoid? (In reply to comment #34) > I would propose to make this issue a release blocker for FC5. FYI, I see this problem in FC5 test 1, as reported in 154800 (which was closed as a duplicate of this bug). Perhaps we can reopen this and actually work on a fix? Target FC6? *** Bug 191950 has been marked as a duplicate of this bug. *** *** Bug 189144 has been marked as a duplicate of this bug. *** *** Bug 218216 has been marked as a duplicate of this bug. *** *** Bug 318461 has been marked as a duplicate of this bug. *** Seems like leaving this in "Closed Deferred" means it will never get worked on. Is there ever a hope of a fix? Any bug in RHEL-3 is dead as far as I can tell unless it is a security problem. This bug is still present in RHEL4U5 as I had a system hit this just a few weeks ago. It may well be present in rhel5 also, though I haven't verified. It is present in 5 and 5.1; at least some of the bugs marked as duplicates above reflect that. #218216 is the same bug in Fedora Core 6 filed against spamd in spamassassin (port 783), for example, which is effectively RHEL 5 for this purpose. Also present in Fedora 7, which is what got me digging this old thing up again. Now the service is "rpcbind" which acquires the ports. I still haven't seen any technical reasons why it portmap couldn't check in a special directory for files containing reserved ports and not use these. See comment #5. These files could be installed by RPM. On RHEL-4 we had practically the same problem. The program that listened on the port TCP 631 was rpc.statd. As a result cups died on startup with the message: startListening: Unable to bind socket for address 7f000001:631 - Address already in use. While Red Hat decides what is the best way to fix this permanently can Red Hat offer a workaround? I just got hit with this on rhel4, rpc.statd took port 631, preventing cups from starting. I got same problem on Fedora 9. krb5kdc port 750. Please fix this. I got same problem on Fedora 9. krb5kdc port 750. Please fix this. The workaround is: use portreserve (see comment #5). This is already included in Fedora 10, and CUPS in Fedora 10 makes use of it in its initscript. spamassassin has been mentioned as another service that could make use of portreserve. Changing component and reassigning. Can portreserve work with RHEL-4/5? If it can, I would be happy to get it into EPEL as a help. Yes, I don't see any reason it wouldn't work. However, services that want to use it need to modify their initscript (to call 'portrelease'). Packaging it for EPEL might help for third-party applications though, and other EPEL packages. Can someone enlighten me how portreserve is not a race condition waiting to happen? As far as I can tell, you get portreserve to release the port before starting the service. How do you make sure nothing else gets in there first before the portreserve command and the service-starting command? I suspect it's unlikely that something could slip in between these commands, but is this good programming? (In reply to comment #75) > Can someone enlighten me how portreserve is not a race condition waiting to > happen? As far as I can tell, you get portreserve to release the port before > starting the service. How do you make sure nothing else gets in there first > before the portreserve command and the service-starting command? I suspect it's > unlikely that something could slip in between these commands, but is this good > programming? Without support in glibc/portmap there is no generic way around this. have any of the portreserve changes in Fedora 10 made it into the RHEL5.3 release? or would this wait until RHEL6? It's not in RHEL-5.3. Confirming 5.3 has issues. After updating and rebooting 105 clients with a recent kernel patch + others, one had this issue logged in /var/log/cups/error_log: E [25/Jun/2009:21:11:11 -0400] Unable to bind broadcast socket - Address already in use. Cups didn't get broadcasted print queues until after restarting cups. The curious thing is that I hard-bind rpc services; status, nlockmgr, ypbind are listing on *my* chosen ports, non of the conflicting with cups's 631. We've also had past (but none this time) random issues with rsyncd not being able to bind to its listening port, I'm extrapolating the root cause is the same. I'm adding the portreserve hack to spamassassin-3.3.0 for RHEL-6. But this is not a complete solution. Arbitrary other services can still cause this failure. Warren: how so? The only way I know of that it can fail is after the protected service is stopped. There's no way to close that hold without kernel support. Some ethernet NICs (broadcom, intel gigabit) have something called ASF enabled. This causes the NIC to gobble stuff sent to port 623 and port 664! So there are some other ports that we need to exclude. RPC mounts avoid these ports by using sunrpc.min_resvport which is set to 665 these days. So the code for mounts won't use ports below 665. Maybe bindrsvprt.c could be modified to also use sunrpc.min_resvport and sunrpc.max_resvport. In the 623/664 cases there is no collision with another daemon. The affected NICs lose the packets. ypmatch and ypcat will hang with do_ypcall: clnt_call: RPC: Timed out when they pick 623 or 664 on the affected hardware. sshd (particularly when AllowGroups is used) will also hang on the affected hardware when it selects 623 or 664 to do an NIS lookup. A kludge is to use xinetd to grab the 623/664 ports for a workaround. Just been hit with this again: # netstat -anp | grep 631 tcp 0 0 0.0.0.0:631 0.0.0.0:* LISTEN 25591/cupsd udp 0 0 0.0.0.0:631 0.0.0.0:* 24623/rpc.statd Whilst portreserve looks like a fix, from my understanding is it not a bit of a hack around this. Shouldn't portmaper not just be given a list of ports to never use from a file that gets appended to as services are added from packages? Or am I missing something? See comment #16. comment #16 doesn't answer why we need portreserve, it just says that there are no free unallocated low ports in the official allocation list. That's true, but shouldn't portmapper have a notion of services THIS system has or needs reserved low and not allocate them. Or is there just no easy way of allocating a low port in a system call that can exclude ports we want to hold back? Sounds like a new call is called for eventually? Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. (In reply to comment #91) > Red Hat Enterprise Linux 6.0 is now available and should resolve > the problem described in this bug report. It won't since there is no portrelease support in the 6.x ypbind scripts. It's not ypbind that needs to use portreserve/release, it's everything else that uses ports in the rpc port range. Dovecot is missing support portreserve/release (at least in Scientific Linux 6.1). (In reply to comment #103) > It's not ypbind that needs to use portreserve/release, it's everything else > that uses ports in the rpc port range. What I discovered via portreserve(1) and testing is that when portreserve starts, it listens on all the reserved ports that have been configured. Just before the real daemon starts, it needs to run portrelease so the *real* daemon can listen on it. Look at: grep portrelease /etc/init.d/* cat /etc/portreserve/cups In the case of ypbind, which can listen either on a random port (unfixable) or a fixed port (via setting OTHER_YPBIND_OPTS). I hard-code to port 900 by setting /etc/sysconfig/network:OTHER_YPBIND_OPTS="-p 900" To fix ypbind, I see two potential ways - (1) it's init script needs to parse out the -p parameter and if you get something, call portrelease on that port. Something like: ypport=`echo $OTHER_YPBIND_OPTS | sed 's/.*-p[[:space:]]*\(\<.*\>\).*/\1/'` [ -n "$ypport" -a -x /sbin/portrelease ] && /sbin/portrelease $ypport &>/dev/null || : Or (2), since the user needs to reserve a port by dropping a file into /etc/portreserve/ anyway, force them to use the service named 'ypbind' and just portrelease that: [ -x /sbin/portrelease ] && /sbin/portrelease ypbind &>/dev/null || : Hopefully I'll get time to test this today. Should I open a new bug on RHEL6 for this issue? I searched and could not find one... I have changed the component to "distribution". Please file a new bug for each affected component and use this one as a tracker. rsync bug filed as Bug 786076. This bug exists in RHELv5.x, too. I noticed it when trying to use heartbeat (port 694). Can we get portreserve backported to RHELv5.x? Just having portreserve in RHEL-5.X will not solve the issue, you need to update all the packages which already use it in RHEL-6 to achieve same results. And this is IMHO very unlikely. I wasn't suggesting updating all the RHEL5 packages that would need portreserve, but having the package itself as part of the distro (instead of having to go to some third party repo, which is what I ended up doing) would go a long way to allow a sysadmin solve the problem, IMO. Even if it made it into EPEL for RHEL5 I think I'd be happy... I attached this bug to case 00721174, as the new RHS product can (and will) assign a port for every storage brick. In huge systems we have seen this consume all available ports under 1024. Having just been caught out by this bug in CentOS6 with heartbeat, ( UDP Port 694 ), and looking at the AFS /IPMI Port 623/664 bug mentioned in previous comments, it doesn't seem unreasonable to me to make portmap/rpcbind honour sunrpc.min_resvport/sunrpc.max_resvport as per comment 86. I presume rpc.mountd honours these proc settings, and the original setting of 600 minimum was modified specifically so as to avoid IPMI hardware clashes. If that's the case, then it would seem rpcbind is STILL vulnerable to a potential clash with IPMI hardware as long as a suitably configured portreserve is not running, even if no actual "real" daemons use those ports. Since there's willingness to avoid NFS mounts clashing, why not "protect" rpcbind with the same tweak? If this were done, then sysadmins who only have a few clashing services in the 6xx range, but only a modicum of simultaneous RPC services/ports, could avoid the whole mess by simply raising sunrpc.min_resvport to 700, for instance. In more complex cases, I admit that it might well be necessary to employ portreserve, but the /proc setting might well be sufficient in many cases. I would just like to add that I have this problem on CentOS6 with ypserv *every* time I reboot a particular server - 6.0 through 6.6. And occasionally on CentOS 5.11 with ypbind/cups. I haven't seen this solution mentioned in this thread: /proc/sys/net/ipv4/ip_local_reserved_ports Supposedly port numbers added to this file will be skipped over by the glibc functions that auto-assign reserved ports as used by tools like nfs/nis/etc. Some interesting details here: http://www.mjmwired.net/kernel/Documentation/networking/ip-sysctl.txt#716 https://lkml.org/lkml/2012/3/10/187 Any chance we'll see a fix? Having this issue on RHEL 6.7 with rpc.statd. No port collisions, but in our environment we are required to have specific ports/port ranges defined for a particular service. There doesn't seem to be anyway to force rpc.statd to use a particular port rather than follow the pid % 424 + 600 formula. Seeing this with Dovecot on Fedora 28: Jul 24 11:12:24 ourdomain dovecot[1838]: Error: service(imap-login): listen(::, 993) failed: Address already in use Jul 24 11:12:24 ourdomain dovecot[1838]: master: Error: service(pop3-login): listen(*, 995) failed: Address already in use Jul 24 11:12:24 ourdomain dovecot[1838]: master: Error: service(pop3-login): listen(::, 995) failed: Address already in use Jul 24 11:12:24 ourdomain dovecot[1838]: Fatal: Failed to start listeners Jul 24 11:12:24 ourdomain dovecot[1838]: master: Error: service(imap-login): listen(*, 993) failed: Address already in use Jul 24 11:12:24 ourdomain dovecot[1838]: master: Error: service(imap-login): listen(::, 993) failed: Address already in use Jul 24 11:12:24 ourdomain dovecot[1838]: master: Fatal: Failed to start listeners Jul 24 11:12:24 ourdomain systemd[1]: dovecot.service: Control process exited, code=exited status=89 Jul 24 11:12:24 ourdomain systemd[1]: dovecot.service: Failed with result 'exit-code'. Jul 24 11:12:24 ourdomain systemd[1]: Failed to start Dovecot IMAP/POP3 email server. netstat -lnp | grep 993 unix 2 [ ACC ] STREAM LISTENING 39993 1075/httpd /run/httpd/cgisock.972 (In reply to RobbieTheK from comment #137) > Seeing this with Dovecot on Fedora 28: > Jul 24 11:12:24 ourdomain dovecot[1838]: Error: service(imap-login): > listen(::, 993) failed: Address already in use > netstat -lnp | grep 993 > unix 2 [ ACC ] STREAM LISTENING 39993 1075/httpd > /run/httpd/cgisock.972 That is something else, the grep doesn't show a TCP socket on port 993. This is still happening after a reboot, Fedora 28, Nov 8 12:21:41 kopernik dovecot[1386]: Error: service(pop3-login): listen(*, 995) failed: Address already in use Nov 8 12:21:41 kopernik dovecot[1386]: Error: service(pop3-login): listen(::, 995) failed: Address already in use Nov 8 12:21:41 kopernik dovecot[1386]: Error: service(imap-login): listen(*, 993) failed: Address already in use Nov 8 12:21:41 kopernik dovecot[1386]: Error: service(imap-login): listen(::, 993) failed: Address already in use After restarting dovecot it works: netstat -lnp | grep 993 tcp 0 0 0.0.0.0:993 0.0.0.0:* LISTEN 4622/dovecot tcp6 0 0 :::993 :::* LISTEN 4622/dovecot We are tracking this upstream. It is unclear whether bindresvport will remain as a public API in glibc. So far, it has not been subject to the Sun RPC transition in glibc, although libtirpc implements the symbol as well. It may therefore be possible to turn the glibc symbol in to a compat symbol, effectively removing it from the glibc API. In this case, it is unlikely that upstream will implement a denylist, similar to what is available in libtirpc today. Applications built on Red Hat Enterprise Linux 8 can already use the libtirpc denylist functionality for bindresvport by linking against libtirpc. I filed bug 1854147 against libtirpc with the denylist additions suggested in comment 0. |