From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: When enabling receiving messages from other systems (by having: SYSLOGD_OPTIONS="-r -x -m 0" in /etc/sysconfig/syslog ), syslog will stop accepting messages after some time. Thus creating a system that will no longer allow logins. Killing the syslogd (not with service syslog stop, but with) a hard kill, a stream of messages flow on the console, and logins are possible again. Note precense of the -x flag which the manual page suggests is for preventing deadlocks on machines that also run named (which is indeed the case on my machine). The problem appears whether or not -x is used. Logins will block right at the time they want to log the log the login (i.e. after entering a correct password) Version-Release number of selected component (if applicable): sysklogd-1.4.1-22 How reproducible: Always Steps to Reproduce: 1. put SYSLOGD_OPTIONS="-r -x -m 0" in /etc/sysconfig/syslog 2. restart syslogd: service syslog restart 3. logins will block after some time -> denial of service Actual Results: After some normal activity, any message to syslog will block the sending application, and will not log into the relevant logfile in /var/log (mostly messages/secure). Expected Results: messages ending up in /var/log/whatever and services that can normally send messages to syslogd Additional info: this worked ok in RH9. The bug is annoying and (for me) security related because my ADSL router will only log messages through syslog to a remote host. It will also lead to a denial of service since login and su will no longer function (until syslogd is killed). No special messages are seen in /var/log/messages, and when stracing syslogd, it is happily waiting in select(). Note that once blocked, many services will no longer work: login, su, ...
I had syslogd running with the "-r -x -m 0" options on an FC3 machine, which also runs named, and with log messages being generated once every 10 seconds from a remote machine, all day yesterday without duplicating this problem. Please supply some more information: 1. Are you sure you have enough space on the /var partition ? # df -k /var 2. How soon after reboot does the problem occur ? 3. Have you made any modifications to the default /etc/syslog.conf? If so, please append your syslog.conf to this bug report. 4. Do you actually see messages in the log from the ADSL router before the problem occurs, or does it seem to occur when the first message from the router arrives? 5. Did you upgrade from RH9 or do a clean install? 6. Is SELinux enabled in enforcing mode ? # getenforce If so, does the command: # /sbin/restorecon -Rn /var report any policy issues ? If so, run: # /sbin/restorecon -R /var to correct them and see if this fixes the problem. 7. If the problem occurs soon after restart, and you have space on the /var/ partition, please do the following: a. Install the syslogd debuginfo package, available from: http://people.redhat.com/~jvdias/sysklogd-debuginfo-1.4.1-22.i386.rpm (let me know if you need an RPM for a different architecture) b. Stop the sysklogd service and restart syslogd in debug mode: # service syslog stop # cd / # klogd -x & # syslogd -r -x -m0 -d >/tmp/syslogd.debug.log 2>&1 & c. Wait for the problem to occur. When it does: # pid=`pgrep syslogd` # gcore -o /tmp/syslogd.core $pid d. Restore syslogd service: # pkill klogd # service syslog restart e. Then append a compressed tar of the resulting /tmp/syslogd.* files to this bug: # tar -cf - /tmp/syslogd.* | gzip /tmp/syslogd_bug.tar.gz Please either upload the resulting /tmp/syslogd_bug.tar.gz file into bugzilla or send it by email to: jvdias Thank you.
1) Room enough in /var: df -k /var: Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg0-varvol 10321208 293584 9503336 3% /var 2) the problem occurs some 15 minutes after reboot. Not a fixed amount of time. 3) local change to syslog.conf: added: local6.* /var/log/zyxel.log 4) Messages do actually occur in syslog. But not many - then syslog blocks 5) This was a clean install of FC3 x86_64 on new hardware 6) Selinux is *not* enabled: # getenforce Disabled 7) Will answer questions on question 7 when I get a syslogd that starts OK. (this version won't start)
Thank you for the debug output you sent - I think I may have resolved the problem now: When the problem occurs, the syslogd process is stuck in the 'recvfrom' system call, which it expects to either return data or be interrupted by an alarm signal. In FC3, system calls are no longer interrupted by alarm signals by default, whereas they were in RH9 ; in FC3, another call is necessary to explictly require that signals will interrupt system calls. I have produced a test version of syslogd which makes the recvfrom system call interruptable by signals. Please download it from: http://people.redhat.com/~jvdias/sysklogd/FC3-test/x86_64/sysklogd-1.4.1-24.x86_64.rpm and install it with : # rpm -e sysklogd-debuginfo # rpm -Uvh sysklogd-1.4.1-24.x86_64.rpm This should fix your problem - please test and let me know .
I have installed your rpm, and have had the new syslogd running for 4 hours now and the original problem still has not occurred. I will keep an eye on things, but I think the problem is solved. Thank you!
The new syslogd has been running for a week now without a problem. Issue soved. Thanks!
Just a note that I experienced exactly the same problem (deadlocks on remote syslogging), but with Fedora Core 2's sysklogd package (1.4.1-16). Sadly, this FC2 package has *never* been updated since the original OS was released ! Note that if you turn on remote syslogging and have a lot of logging (several a second) going to syslogd, then not only will syslogd deadlock, but this will effectively cripple the vast majority of services on the local machine (NFS, ssh, etc. etc.) because an awful lot of them just hang waiting to log to the local syslog and don't seem to time out. I would argue that this is actually quite a serious bug, resulting in a mostly unusable system until you can kill or restart syslogd (which may be impossible, because you can't even fully log in from the console !). I'd like it to be fixed in FC2 if at all possible (since that distros is still - just - getting updates). It took us a week to track this problem down after many "random hangs" (we'd turned on remote syslogging just before the hangs started, which finally gave us a clue as to what was wrong...by realising that a change had triggered the bug). Because of its severity, I think "normal" and "impact=low" keywords are very much understating the importance of fixing this bug - please fix and release an FC2 sysklogd RPM (before FC2 disappears into the Fedora Legacy project with this nasty bug still present).
The sysklogd-1.4.1-26 package that fixes this problem, compiled for FC2, is available for download from: http://people.redhat.com/~jvdias/sysklogd/FC2 It has been compiled for FC2 on the i386 platform in the i386/ directory. Should you require alternate platforms, download the src.rpm and rebuild: # rpmbuild --rebuild sysklogd-1.4.1-26.src.rpm and install the resulting packages in /usr/src/redhat/RPMS/${ARCH} . I will try to get this package into FC2 updates. But why not update to FC3 / FC4 ?
Thanks for the pointer to the fixed FC2 package, although as I said, there appears to have been an oversight in not feeding this fix back into the FC2 updates, where the vast majority of FC2 users would pick up the fix, as opposed to the few who might be reading this bug. We've been running FC2 fine (without the remote syslogging :-) ) on several Dell PowerEdge production servers and we tend to stick to the a single OS [with user/kernel updates] during the lifetime of our production machines. Hence, upgrading to FC3 or FC4 to fix the syslogd problem is somewhat of a drastic measure (especially since we're already on a 2.6 kernel with FC2) - I simply disabled remote syslogging on the affected FC2 server and it's been stable since I did that. BTW, I checked one of our servers running RHEL 3 Update 4 and that seems to have quite an old sysklogd package (1.4.1-12.3 from June 2004) - does this problem affect 2.4 kernel syslogging or is it just for 2.6? Oh, I should state that we're running x86 systems throughout, not the x86_64 platform that the original reporter was running.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-087.html
Just a closing note that I'm pleased that this quite serious problem of remote syslogging causing major system disruption has now been fixed. I must temper that, however, with my disappointment at how long this took to get fixed - almost 6 months after it was reported ! - and the fact that the keyword "impact=low" was clearly way off base. It took so long, in fact, that Fedora Core 2 is now in Legacy and this quite serious bug may actually never be fixed for FC2 (though maybe they might pick it up as it is tagged with severity "security" rather than the original dubious "normal" severity).