Bug 1179820

Summary: Kerberos KDC connection limit too low
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Dingman <adingman>
Component: krb5Assignee: Robbie Harwood <rharwood>
Status: CLOSED NOTABUG QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: low Docs Contact:
Priority: unspecified    
Version: 6.8CC: dpal, jplans, nalin, pkis, rharwood
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-07 17:25:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Dingman 2015-01-07 15:39:27 UTC
Description of problem:

The Kerberos KDC has a low, hard-coded limit on concurrent tcp connections. This results in large numbers of INFO level log entries about dropped connections when many clients are trying to contact the KDC. It could lead to additional problems with more clients.

My current scenario is not one that should be followed in real life, but similar situations could occur with larger numbers of clients that are not misconfigured. I anticipate regularly having many hundreds, potentially thousands, of connected clients booting nearly simultaneously within the year.

Version-Release number of selected component (if applicable):

krb5-server-1.10.3-33.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Configure 62 clients to use a KDC
2. Prevent replies from the KDC from reaching the clients
3. observe KDC logs as clients attempt to contact KDC

Actual results:

Logs fill up with dropped connection messages.

Expected results:

affected clients fail to authenticate, but do not exhaust all available open connections before they time out.

Additional info:

Comment 2 Roland Mainz 2015-01-16 20:44:01 UTC
(In reply to Andrew Dingman from comment #0)
> Description of problem:
> 
> The Kerberos KDC has a low, hard-coded limit on concurrent tcp connections.

Which limit do you mean ?

Comment 3 Roland Mainz 2015-01-16 20:58:46 UTC
AFAIK we have the following default limits:
- 1 process (by default)
- max. 45 connections as defined via |max_tcp_or_rpc_data_connections| in ./krb5/src/lib/apputils/net-server.c (per process ?)
- 1024 sockets/files max. as defined via Linux default resource limit for file descriptors (see $ ulimit -n #)
- [optionally] a fd limit imposed when |select()| is used, see |FD_SETSIZE|

Comment 4 Roland Mainz 2015-01-16 21:07:33 UTC
The 45 connection limits defined per |max_tcp_or_rpc_data_connections| in ./krb5/src/lib/apputils/net-server.c may be (theory) a result of the old 64 fd resource limit for user processes in SystemV+BSD4.3 (which still exists in modern Solaris/Illumos and maybe *BSD, too) ... maybe 45 was picked so there are at least 19 fds available for other purposes like config files and plugins.

AFAIK a possible fix may be to query the resource limit for file descriptors ($ ulimit -n #) and then do a |MAX(result/2, 45)| ... but that is likely a question for the upstream Kerberos5 list (and we need conformation that it's really this 45 connection limit which is bothering you...) ...

Comment 5 Dmitri Pal 2015-01-16 23:58:25 UTC
I think this is an invalid test case.
Blocking replies makes clients think that the connection is broken and they will start reconnecting again after a timeout which is in fact hard coded.
But the test claims that this is the server that it fault. This seems to be wrong. It is the client that is at fault but so fat this is by design and has nothing to do with the scalability of the server..

Comment 6 Andrew Dingman 2015-01-18 20:39:34 UTC
I only just realized that comment 3 was not a reply to comment 2. Yes, the 45 connection limit in net-server.c is the one I was referring to. Neither ulimits nor file handle limits were coming into play. I was expecting to need to tweak those, but found this instead.

I agree that the test case is not something that should ever happen. Our cloud provider thoughtlessly applied a firewall rule that more or less turned all outbound Kerberos traffic into half-open attacks. It is just the first thing that happened in my infrastructure to bring the limit to my attention.

Comment 9 Robbie Harwood 2015-10-07 17:25:10 UTC
This sounds like a misconfiguration.  Additionally, if there's an actual problem with a new use case here, it would be better suited for RHEL7, not RHEL6.