Description of problem: We have a central syslog server setup using rsyslog (for 444 servers) that bailed out on us with the following error messages in the log file: ---- Aug 1 19:10:30 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files Aug 1 19:10:34 lg2log01 rsyslogd:last message repeated 49 times Aug 1 19:10:33 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files ---- this quickly turned (after a logrotate) into: ---- Aug 2 04:03:38 lg2log01 rsyslogd: [origin software="rsyslogd" swVersion="2.0.6" x-pid="31473" x-info="http://www.rsyslo g.com"][x-configInfo udpReception="Yes" udpPort="514" tcpReception="Yes" tcpPort="514"] restart Aug 2 04:03:38 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request Aug 2 04:03:38 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request Aug 2 04:03:58 lg2log01 rsyslogd:last message repeated 10 times Aug 2 04:03:39 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request Aug 2 04:03:58 lg2log01 rsyslogd:last message repeated 3 times Aug 2 04:03:46 lg2log01 rsyslogd:too many tcp sessions - dropping incoming request -snip- Aug 2 04:03:51 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files Aug 2 04:03:51 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files Aug 2 04:03:51 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files -snip- Aug 2 04:03:55 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lg0app42/secure' - discarding message: Too many open files Aug 2 04:03:55 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lg0app42/secure' - discarding message: Too many open files Aug 2 04:03:55 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lg0app52/secure' - discarding message: Too many open files ---- Which after almost 2 weeks abruptly ends (in the logfile) with: ---- Aug 15 07:55:27 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files Aug 15 07:55:27 lg2log01 rsyslogd:tcp accept, ignoring error and connection request: Too many open files Aug 15 07:55:27 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lc1pol71/secure' - discarding message: Too manyopen files Aug 15 07:55:27 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lc1pol71/messages' - discarding message: Too many open files Aug 15 07:55:27 lg2log01 rsyslogd:Could not open dynamic file '/var/log/lc1pol71/messages' - discarding message: Too many open files --- After a restart of the rsyslog service the problem was gone. ---- Aug 17 14:56:24 lg2log01 rsyslogd: [origin software="rsyslogd" swVersion="2.0.6" x-pid="13344" x-info="http://www.rsyslo g.com"][x-configInfo udpReception="Yes" udpPort="514" tcpReception="Yes" tcpPort="514"] restart Aug 17 14:56:24 lg2log01 rsyslog: rsyslogd startup succeeded Aug 17 14:56:24 lg2log01 rsyslog: rklogd startup succeeded Aug 17 14:56:25 lg2log01 kernel: rklogd 2.0.6, log source = /proc/kmsg started. ---- This caused a few secondary problems, for which I will open new bug entries. A few days after (started on Aug 21): - local syslog processes fails to log to file (likely when remote messages are discarded or tcp connections not accepted) - local syslog stops accepting messages, causing everything that is logging to block on this (no remote logins possible, process that log stop working) After a restart of the central syslog server, everything started to work again. The urgency of this bug is related to the fact that systems are no longer accessible when this chain of events has happened and that a reboot of a server will drop the queued messages of the local syslog. Complicating debugging the cause of the login problems because the syslog has no recent logging. A statement of the author for similar reported incidents on the rsyslog mailinglist seem to indicate that the global select used is limited to 1000 file descriptors and that is a hard limit of select on Linux. http://kb.monitorware.com/rsyslog-rhel5-stuck-1000-fds-t8942.html Version-Release number of selected component (if applicable): rsyslog-2.0.6-1.el5 How reproducible: See description.
Configuration of our syslog server is quite simple: /etc/sysconfig/rsyslog ---- # Options to syslogd # -m 0 disables 'MARK' messages. # -rPortNumber Enables logging from remote machines. The listener will listen to the specified port. # -x disables DNS lookups on messages recieved with -r # See syslogd(8) for more details SYSLOGD_OPTIONS="-m 0 -r514 -t514,1000" # Options to klogd # -2 prints all kernel oops messages twice; once for klogd to decode, and # once for processing with 'ksymoops' # -x disables all klogd processing of oops messages entirely # See klogd(8) for more details KLOGD_OPTIONS="-x" ---- /etc/rsyslog.conf ---- # --- All remote files --- $template remote-messages,"/var/log/%HOSTNAME%/messages" $template remote-secure,"/var/log/%HOSTNAME%/secure" $template remote-maillog,"/var/log/%HOSTNAME%/maillog" $template remote-cron,"/var/log/%HOSTNAME%/cron" $template remote-spooler,"/var/log/%HOSTNAME%/spooler" $template remote-bootlog,"/var/log/%HOSTNAME%/boot.log" *.info;mail.none;authpriv.none;cron.none ?remote-messages authpriv.* ?remote-secure mail.* -?remote-maillog cron.* ?remote-cron uucp,news.crit ?remote-spooler local7.* ?remote-bootlog ----
We opened a problem ticket with our TAM for this as Issue 334499.
Just for the public record. I wanted to add the following error messages we got from rsyslogd on our central rsyslog server today. We didn't see this error before, but now we have ulimit set to allow 5000 open files so maybe we hit the ulimit open files limit before and now we are hitting the select() limit instead. The system slowed down completely, trying to log on provided me with a login-prompt, but I got disconnected before a login via SSH happened. Colleagues tried to log in on the console, but failed as well. A forced reboot brought the system back, this was in the logfile on disk: ---- Sep 3 14:15:10 lg2log01 rsyslogd:select: Bad file descriptor Sep 3 14:15:59 lg2log01 rsyslogd:last message repeated 153422 times Sep 3 14:17:00 lg2log01 rsyslogd:last message repeated 294806 times Sep 3 14:18:01 lg2log01 rsyslogd:last message repeated 309974 times Sep 3 14:19:02 lg2log01 rsyslogd:last message repeated 297799 times Sep 3 14:20:03 lg2log01 rsyslogd:last message repeated 266527 times Sep 3 14:21:04 lg2log01 rsyslogd:last message repeated 273352 times Sep 3 14:22:05 lg2log01 rsyslogd:last message repeated 263318 times Sep 3 14:23:06 lg2log01 rsyslogd:last message repeated 241717 times ... Sep 3 17:22:02 lg2log01 rsyslogd:last message repeated 274251 times Sep 3 17:23:03 lg2log01 rsyslogd:last message repeated 284283 times Sep 3 17:39:06 lg2log01 rsyslogd: [origin software="rsyslogd" swVersion="2.0.6" x-pid="12291" x-info="http://www.rsyslog.com"][x-configInfo udpReception="Yes" udpPort="514" tcpReception="Yes" tcpPort="514"] restart Sep 3 17:39:06 lg2log01 rsyslog: rsyslogd startup succeeded ---- If this is happening with only 400 systems logging to one server, I wonder why there aren't more people reporting problems like this one, and the two others.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0213.html