Bug 441243

Summary: kernel keyring quotas exceeded
Product: Red Hat Enterprise Linux 5 Reporter: Berthold Cogel <cogel>
Component: kernelAssignee: Cong Wang <amwang>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.1CC: dhowells, duck, jlayton, jtluka, jwest, lwang, rkhan, rprice, stephan.wiesand, tao, torkel
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 20:41:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533192, 554476, 557291    

Description Berthold Cogel 2008-04-07 12:05:58 UTC
Description of problem:
We're using OpenAFS on our systems and most of our webpages are stored in AFS.
We have a lot of small projects for which a separate server would be a waste of
'metal'. Even in a virtual environment. So we're hosting a lot of apache
instances on a single machine. 

Beause suexec doesn't work in an AFS environment, each instance is started by
root with its own IP (to be able to talk HTTPS) and in a PAG with a separate
token for a service user (to isolate the projects). Although each apache
switches over to the service user, the initial tokens are acquired by root. 

On RHEL 3 with the old 2.4 kernel this was never a problem. But now... with RHEl
5 the Kernel keyring quotas are to restricted for our environment.

As result of this problem, processes are running unauthenticated and are unable
to deliver the requested data. As an additional problem the login for users with
home directories in AFS (our webmasters get limited access) gets impossible on
these systems.  
 
We just hit this wall while migrating from RHEl 3 to RHEL 5 with some of our
webservers.  
 
[root@lvr11 ~]# cat /proc/key-users
    0:    99 98/98 96/100 1681/10000
   32:     2 2/2 2/100 56/10000
   38:     2 2/2 2/100 56/10000
   43:     2 2/2 2/100 56/10000
   51:     2 2/2 2/100 56/10000
   68:     2 2/2 2/100 56/10000
   81:     2 2/2 2/100 56/10000
   99:     2 2/2 2/100 56/10000
  348:     2 2/2 2/100 58/10000
42216:     2 2/2 2/100 62/10000
55188:     3 3/3 3/100 72/10000
56537:     2 2/2 2/100 62/10000
63743:     2 2/2 2/100 62/10000
68054:     2 2/2 2/100 62/10000
 
....
 
Btw.: We have some machines (RHEL 3) with about hundred (!) different projects
which need tokens.
 
For us, this limitations are a real showstopper for our migration from RHEL 3 to
RHEL 5. On our webservers RHEL 5 is nearly useless at the moment.
 
There is a patch available from David Howells which makes these limits
configurable via /proc/sys: http://lkml.org/lkml/2008/3/28/225
 
We request a backport of this patch to the RHEL 5 kernel as soon as possible.
 
Version-Release number of selected component (if applicable):
2.6.18-53.1.13.el5

How reproducible:
Each time 

Steps to Reproduce:
1. In AFS environment call pagsh in different terminal windows (lot of...)
2. Call 'klog <user>' to get a token (or kinit in krb5 environment)
3. Try to read data only available for <user> in AFS 
  
Actual results:
Some processes are unauthenticated in AFS. 

Expected results:
Each process is authenticated in AFS.

Additional info:

Comment 1 Issue Tracker 2008-09-08 17:46:08 UTC
Correct Template
----------------

SEG RFE Template

[Customer/Frontline driven section] -- This section should be completed by
the front-line engineer with the assistance of the customer.

1.) Who is the customer?

University of Cologne | Universität zu Köln
http://www.pressoffice.uni-koeln.de/


2.) What is the exact nature of the problem trying to be solved with this
request?

Customer is requesting a kernel modification: keyring quotas controllable
through /proc/sys.
---
The problem is caused by the value for KEYQUOTA_MAX_KEYS (100) in the
kernel source.
Kernel keyring quotas are too restricted for the customer's environment.
The limit of 100 rings for a single user is too small for their usecase;
OpenAFS is the solution that hits this limit, but every other application
that does so would have the very same problem.  
We are not the only people to have been bitten by this limit, hence the
patch from David Howells (a RedHat employee).
As result of this problem, processes are running unauthenticated and are
unable to deliver the requested data. As an additional problem the login
for users with home directories in AFS is impossible.
We just hit this wall while migrating from RHEl 3 to RHEL 5 with some of
our webservers.
---


3.) What, if any, business requirements are satisfied by this request?
(What is the use case context?)

They're using OpenAFS on their systems and most of their webpages are
stored in AFS. They have a lot of small projects for which a separate
server would be a waste of 'metal'. Even in a virtual environment. So
several apache instances are hosted on a single machine. Because suexec
doesn't work in an AFS environment, each instance is started by root with
its own IP (to be able to talk HTTPS) and in a PAG with a separate token
for a service user (to isolate the projects). Although each apache
switches over to the service user, the initial tokens are acquired by
root.

On RHEL 3 with the old 2.4 kernel this was never a problem.
They have some machines (RHEL 3) with about hundred (!) different projects
which need tokens. 

For the customer, this limitations are a real showstopper for their
migration from RHEL 3 to RHEL 5.
On all their webservers RHEL 5 is nearly useless at the moment.


4.) List the functional requirement(s) for performing the action(s) that
are not presently possible. Please focus on describing the problem related
requirements without projecting any specific solution.

The only functional requirement is to be able to change the
KEYQUOTA_MAX_KEYS kernel parameter via /proc/sys and /etc/sysctl.conf.

5.) Each functional requirement must have clear acceptance criteria so Red
Hat understands what success looks like. If test cases can be provided this
would be even more ideal (bonus points for RHTS test cases).

A patch was written by a Red Hat engineer (David Howells) and apparently
there are no technical reasons that this patch can't be used in RHEL-5.
The keys are also used by CIFS now too, possibly in RHEL-5 and the patch
doesn't break kABI in any way.


6.) What is the desired release vehicle to satisfy these requirements?
Major or Minor release?

Hotfix, Minor. In the next kernel update if possible.


7.) Please justify with reference to the release vehicle policy described
in the RHEL Inclusion Criteria wiki page

Since the customer is still stuck on RHEL3 because of that missing
functionality, I think he wouldn't mind using the latest kernel.


8.) What package(s) are affected by this RFE? (List "new" if new
technology is likely to be required)

kernel.


[Red Hat Sales/Frontline] -- This section should be completed by the front
line engineer with the assistance of the account manager/sales rep.

9.) Who is the sales sponsor?

Daniel Stiff

10.) What is the Red Hat business opportunity with this customer?

Currently have over 170 RHEL subscriptions which we are trying to co-term
for 6 months now! Do also participate in a statewide program of several
Northrhine-Westfalian Universities in evaluating the Satellite server for
use of its multi - org functionalities.

11.) What is the status and risk to the contract if this RFE is not
satisfied?

That we loose a possible sales opportunity of up to 80K Euros! 


[Red Hat Engineering] -- this section will be completed by development
engineering
12.) What is the scope of this request for work required and risk?

*** Answer 12 ***
13.) What technology (specific list of packages) is affected by this RFE
if not fully captured above?

*** Answer 13 ***


Internal Status set to 'Waiting on SEG'
Severity set to: High
Priority set to: 2

This event sent from IssueTracker by jwest 
 issue 173344

Comment 2 RHEL Program Management 2009-02-16 15:43:02 UTC
Updating PM score.

Comment 10 Björn Torkelsson 2009-11-30 11:59:05 UTC
Is anything happening with this? 

We are hit by this aswell as we have somewhere between 50-75 users concurrently using openafs on the same machine, after a while the users end up in the uid_session keyring instead of getting a new session keyring when logging in.

Comment 20 Issue Tracker 2010-04-15 13:51:04 UTC
Event posted on 04-14-2010 06:31pm EDT by dmosby

Justification appears below. I am told that they will have
to move from Red Hat if they cannot get this increased.
I have no way to know if that is true. Initially they asked
for the patch which was first mentioned in this Issue Tracker.
Only when we told them that they would not be getting it did
they further explain the problem and we found that they did
not actually require all the features of the patch. All they
require is expanding the hard coded limit of 100. It seems a
pretty safe fix to alter that to 1000 and recompile.
-----------------------------------------------------

We are using large machines (48 way systems) using a grid based job
queuing system, where each job requires 4 keyrings and the systems can
handle 96 jobs. So, with the number of jobs that these systems can run,
we are exceeding the hard-coded limit of 100 keyrings (for root). This
means that we are only utilizing a fraction of the system. This seems to
be a fairly wide-spread issue that I'm sure other companies and users
are running into especially since systems are getting more and more
cores/capacity. Based on this capacity lost in the grid, many of our
hardware design team's  schedules will dramatically slip. 10+ highly
visible projects will be impacted if this is not fixed.

Please let me know as soon as possible if you find out one way or
another if there is hope in getting an official hotfix. If not, we need
to make other plans as soon as possible.



This event sent from IssueTracker by jkachuck 
 issue 500943

Comment 23 Jarod Wilson 2010-05-03 16:52:36 UTC
in kernel-2.6.18-198.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 32 errata-xmlrpc 2011-01-13 20:41:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html