Bug 163395 - Hard limits on NR_OPEN, NR_FILE, INR_OPEN and FD_SETSIZE cause problems when a large number of files are open.
Summary: Hard limits on NR_OPEN, NR_FILE, INR_OPEN and FD_SETSIZE cause problems when ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Peter Staubach
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-07-15 20:16 UTC by Ryan Woodsmall
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-08-30 14:34:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ryan Woodsmall 2005-07-15 20:16:43 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
I help run a small server cluster for the University of Missouri, Columbia.  We host numerous Apache virtual hosts on each of the cluster nodes.  Upon adding a few new vhosts yesterday, some of our existing customers started experiencing a PHP error cited as "failed to open stream: Too many open files."

After digging around with lsof, ulimit and sysctl and playing with values for the maximum number of open files, we thought we had the problem solved.  However, this morning our users were receiving the same message.  After digging a bit further, we realized we were hitting a hard limit somewhere in the kernel.  I manually patched the kernel and glibc-kernheaders packages to reflect larger values for NR_OPEN, NR_FILE, INR_OPEN and FD_SETSIZE (in fs.h, limits.h and posix_types.h).  I rebuilt custom RPMs and installed them on our four cluster nodes.  This appears to have helped with the file descriptors problem.

The machines all have 2GB of RAM and their primary function is web and SMB serving, though they are used for interactive logins as well.  There are >250 virtual hosts on each of the cluster nodes, and each writes to a number of logs, causing more files to be opened.  The problem is a bit tough to reproduce, as it only shows up as a direct result of a loaded box: the more active HTTP connections we have, the higher the chance of the problem showing up.  So far after the changes, our problems users have not reported the issues showing up again.

Is there a reason that FD_SETSIZE, NR_OPEN, NR_FILE, INR_OPEN, etc., are still set to low-ish default values?  We're having trouble scaling with the stock SMP kernel - is there any way to up the values in question without a patch and recompile?  Will increasing those values break any software?  Is there anyway to make this a dynamic value that Apache/PHP will take into account using a stock kernel?

Thanks...

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-32.0.1.EL, glibc-kernheader-2.4-8.34.1

How reproducible:
Sometimes

Steps to Reproduce:
1. Load up PHP web app.
2. Repeat.
3. Eventually, you'll run out of file descriptors for the httpd process.
  

Actual Results:  PHP error: "failed to open stream: Too many open files."

Expected Results:  We shouldn't be running out of file descriptors.  These are large, fast systems with lots of processor power and physical memory.

Additional info:

Why is FD_SETSIZE set to 1024?  That makes no sense to me...

Comment 1 Peter Staubach 2005-07-22 18:25:31 UTC
Still looking, but I might suggest checking out /proc/sys/fs instead of
building a completely custom kernel.  Some of the tunables appear to be
in there.

Comment 2 Ryan Woodsmall 2005-07-22 19:17:02 UTC
Tried that suggestion first before filing the bug report - it simply didn't work
as expected.

Comment 3 Peter Staubach 2005-07-22 20:18:14 UTC
Could you elaborate on what didn't work as expected, please?

--

The symbol, NR_FILE, is used to initialize the maximum number of file
structs which will be allocated in the system.  It set so that the
file table uses about 10% of memory.  It can be adjusted via
/proc/sys/fs/file-max.

The symbol, NR_OPEN, appears to control the maximum number of open
files per process.  It appears to be set to 1024*1024 or 1048576.

The symbol, INR_OPEN, is used to initialize the default for the
maximum number of open files per process.  The maximum number of
open file can be changed via the setrlimit(2) system call, up to
the limit as specified by NR_OPEN.  For shells such as bash(1),
the built-in ulimit command can be used to adjust the number of
open files per process.

FD_SETSIZE is a POSIX standards thing.  It describes the size of
the file descriptor bit array which is defined by fd_set.

--

I suspect that this information was already well known.  Some things
like FD_SETSIZE are just not going to be changable.  The others
appear to be changeable, but it appears that they should already be
dynamic enough.

Comment 4 Ryan Woodsmall 2005-07-22 21:47:03 UTC
Changing the /proc settings and playing with ulimit settings appeared to do
nothing to help the limits on server processes, only those that were
interactive.  Manually increasing INR_OPEN and FD_SETSIZE appeared to increase
the default limit on open files, which is what we wanted.

Here's the real problem: We were using Apache's piped logging facility with
httplog to do dynamic, separate logging for all virtual hosts on all cluster
nodes.  There are >260 vhosts per cluster node, with separate custom error and
access logs.  Apache's client limits were increased to account for the volume of
traffic we get, so there were around 100 active httpd processes per node at any
moment, with a max client limit of 512 per node.  The piped logging facility
uses two pipes per log.  At two pipes per log, two logs for virtual host,
260-odd virtual hosts, and ~100 httpd processes, you're looking at >100,000 open
files (pipes, really, but lsof and the OS don't seem to discriminate) per node
for JUST Apache.  We increase the /proc/sys/fs/file-max to 300,000 and set
ulimits in the httpd startup script all to no avail.  That's when I started
investigating kernel settings, etc.

PHP's ldap_bind function broke yesterday as it apparently opens a new
pipe/socket/descriptor on connect.  We were already pushing the limits on the
changes I made to the kernel as it was, and the addition of a few new vhost logs
caused us to break.  Perusing the PHP LDAP code, it looks like ldap_bind does
this somewhere deep in OpenLDAP's libraries, but I didn't have the time or the
patience to figure out exactly where it was breaking.  We reverted back to a few
single, monolithic logs per machine in order to get our sites back up and
running; we can deal with the log issues at a later date, as we're keeping the
vhost name with their respective log entries.  More work, but it actually DOES work.

We will probably revert back to a stock kernel and stock glibc-kernheaders in
the near future, as the changes I made are no longer needed.  Our open file
usage for Apache has dropped from over 100,000 to under 15,000.  The log parsing
should be a cinch, though it won't be quite as easy as the previous amount of
"no work at all thanks to the piping to httplog."  The easy solution is rarely
the best though...

What I'm hoping for is that Red Hat 4's 2.6 kernel will scale better and
actually OBEY the dynamic limits we tried to set.  There's no reason this should
have broken at all, especially considering all of the attempts we made to fix
the problem using the accepted norms of /proc/sys/fs/file-{nr,max} and ulimit
simply didn't work; plus, it didn't give us any usable information about what
was broken.  We had to dig into system internals using tools like strace, lsof
and fuser and browsing kernel and other package source.  Any comments on how RH4
would deal with this open-file load?  We have Apache and PHP tweaked to work and
perform exactly how it should and how we need it to - we're just running out of
system resources that we shouldn't be running out of.

Thanks for the help on this one.

Comment 5 Peter Staubach 2005-07-25 21:03:44 UTC
I don't know that RHEL-4 would deal with this sort of workload any differently,
but I could be very wrong.  It is a completely different kernel, much newer.  I
would hope that it could deal with the load in a much better fashion, but I
could not find a reason for why RHEL-3 would not work as expected.  I guess
that it would have helped if I could have seen the setup, but oh well.

I don't think that I will be able to push through changes like this for
RHEL-3.  It is just too late in the RHEL-3 lifecycle and these changes would
be considered to be very risky.  Unless there is a strong objection, I will
close this BZ as WONTFIX.  If problems develop on RHEL-4 that I can help with,
please feel free to open another BZ and we will take a look then.


Note You need to log in before you can comment on or make changes to this bug.