Bug 519192 - rsyslog server cannot handle more than 1000 open files
rsyslog server cannot handle more than 1000 open files
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rsyslog (Show other bugs)
5.3
All Linux
medium Severity urgent
: rc
: ---
Assigned To: Tomas Heinrich
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-25 11:29 EDT by Dag Wieers
Modified: 2010-03-30 04:17 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 04:17:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Dag Wieers 2009-08-25 11:29:55 EDT
Description of problem:
We have a central syslog server setup using rsyslog (for 444 servers) that bailed out on us with the following error messages in the log file:

----
Aug  1 19:10:30 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
Aug  1 19:10:34 lg2log01 rsyslogd:last message repeated 49 times
Aug  1 19:10:33 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
----

this quickly turned (after a logrotate) into:

----
Aug  2 04:03:38 lg2log01 rsyslogd: [origin software="rsyslogd" swVersion="2.0.6" x-pid="31473" x-info="http://www.rsyslo
g.com"][x-configInfo udpReception="Yes" udpPort="514" tcpReception="Yes" tcpPort="514"] restart
Aug  2 04:03:38 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request
Aug  2 04:03:38 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request
Aug  2 04:03:58 lg2log01 rsyslogd:last message repeated 10 times
Aug  2 04:03:39 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request
Aug  2 04:03:58 lg2log01 rsyslogd:last message repeated 3 times
Aug  2 04:03:46 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request
-snip-
Aug  2 04:03:51 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
Aug  2 04:03:51 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
Aug  2 04:03:51 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
-snip-
Aug  2 04:03:55 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lg0app42/secure' - discarding message: Too many open files
Aug  2 04:03:55 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lg0app42/secure' - discarding message: Too many open files
Aug  2 04:03:55 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lg0app52/secure' - discarding message: Too many open files
----

Which after almost 2 weeks abruptly ends (in the logfile) with:

----
Aug 15 07:55:27 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
Aug 15 07:55:27 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files
Aug 15 07:55:27 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lc1pol71/secure' - discarding message: Too manyopen files
Aug 15 07:55:27 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lc1pol71/messages' - discarding message: Too many open files
Aug 15 07:55:27 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lc1pol71/messages' - discarding message: Too many open files
---

After a restart of the rsyslog service the problem was gone.

----
Aug 17 14:56:24 lg2log01 rsyslogd: [origin software="rsyslogd" swVersion="2.0.6" x-pid="13344" x-info="http://www.rsyslo
g.com"][x-configInfo udpReception="Yes" udpPort="514" tcpReception="Yes" tcpPort="514"] restart
Aug 17 14:56:24 lg2log01 rsyslog: rsyslogd startup succeeded
Aug 17 14:56:24 lg2log01 rsyslog: rklogd startup succeeded
Aug 17 14:56:25 lg2log01 kernel: rklogd 2.0.6, log source = /proc/kmsg started.
----

This caused a few secondary problems, for which I will open new bug entries. A few days after (started on Aug 21):

 - local syslog processes fails to log to file (likely when remote messages are discarded or tcp connections not accepted)

 - local syslog stops accepting messages, causing everything that is logging to block on this (no remote logins possible, process that log stop working)

After a restart of the central syslog server, everything started to work again.

The urgency of this bug is related to the fact that systems are no longer accessible when this chain of events has happened and that a reboot of a server will drop the queued messages of the local syslog. Complicating debugging the cause of the login problems because the syslog has no recent logging.

A statement of the author for similar reported incidents on the rsyslog mailinglist seem to indicate that the global select used is limited to 1000 file descriptors and that is a hard limit of select on Linux.

    http://kb.monitorware.com/rsyslog-rhel5-stuck-1000-fds-t8942.html 


Version-Release number of selected component (if applicable):
rsyslog-2.0.6-1.el5

How reproducible:
See description.
Comment 1 Dag Wieers 2009-08-25 11:37:33 EDT
Configuration of our syslog server is quite simple:

/etc/sysconfig/rsyslog
----
# Options to syslogd
# -m 0 disables 'MARK' messages.
# -rPortNumber Enables logging from remote machines. The listener will listen to the specified port.
# -x disables DNS lookups on messages recieved with -r
# See syslogd(8) for more details
SYSLOGD_OPTIONS="-m 0 -r514 -t514,1000"
# Options to klogd
# -2 prints all kernel oops messages twice; once for klogd to decode, and
#    once for processing with 'ksymoops'
# -x disables all klogd processing of oops messages entirely
# See klogd(8) for more details
KLOGD_OPTIONS="-x"
----

/etc/rsyslog.conf
----
# --- All remote files ---
$template remote-messages,"/var/log/%HOSTNAME%/messages"
$template remote-secure,"/var/log/%HOSTNAME%/secure"
$template remote-maillog,"/var/log/%HOSTNAME%/maillog"
$template remote-cron,"/var/log/%HOSTNAME%/cron"
$template remote-spooler,"/var/log/%HOSTNAME%/spooler"
$template remote-bootlog,"/var/log/%HOSTNAME%/boot.log"

*.info;mail.none;authpriv.none;cron.none                ?remote-messages
authpriv.*                                              ?remote-secure
mail.*                                                  -?remote-maillog
cron.*                                                  ?remote-cron
uucp,news.crit                                          ?remote-spooler
local7.*                                                ?remote-bootlog
----
Comment 2 Dag Wieers 2009-08-25 12:36:43 EDT
We opened a problem ticket with our TAM for this as Issue 334499.
Comment 3 Dag Wieers 2009-09-03 11:54:27 EDT
Just for the public record.

I wanted to add the following error messages we got from rsyslogd on our central rsyslog server today. We didn't see this error before, but now we have ulimit set to allow 5000 open files so maybe we hit the ulimit open files limit before and now we are hitting the select() limit instead.

The system slowed down completely, trying to log on provided me with a login-prompt, but I got disconnected before a login via SSH happened. Colleagues tried to log in on the console, but failed as well. A forced reboot brought the system back, this was in the logfile on disk:

----
Sep  3 14:15:10 lg2log01 rsyslogd:select: Bad file descriptor
Sep  3 14:15:59 lg2log01 rsyslogd:last message repeated 153422 times
Sep  3 14:17:00 lg2log01 rsyslogd:last message repeated 294806 times
Sep  3 14:18:01 lg2log01 rsyslogd:last message repeated 309974 times
Sep  3 14:19:02 lg2log01 rsyslogd:last message repeated 297799 times
Sep  3 14:20:03 lg2log01 rsyslogd:last message repeated 266527 times
Sep  3 14:21:04 lg2log01 rsyslogd:last message repeated 273352 times
Sep  3 14:22:05 lg2log01 rsyslogd:last message repeated 263318 times
Sep  3 14:23:06 lg2log01 rsyslogd:last message repeated 241717 times
...
Sep  3 17:22:02 lg2log01 rsyslogd:last message repeated 274251 times
Sep  3 17:23:03 lg2log01 rsyslogd:last message repeated 284283 times
Sep  3 17:39:06 lg2log01 rsyslogd: [origin software="rsyslogd" swVersion="2.0.6" x-pid="12291" x-info="http://www.rsyslog.com"][x-configInfo udpReception="Yes" udpPort="514" tcpReception="Yes" tcpPort="514"] restart
Sep  3 17:39:06 lg2log01 rsyslog: rsyslogd startup succeeded
----

If this is happening with only 400 systems logging to one server, I wonder why there aren't more people reporting problems like this one, and the two others.
Comment 9 errata-xmlrpc 2010-03-30 04:17:16 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0213.html

Note You need to log in before you can comment on or make changes to this bug.