140983 – deadlocks when accepting remote messages

Bug 140983 - deadlocks when accepting remote messages

Summary: deadlocks when accepting remote messages

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	sysklogd
Sub Component:
Version:	3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Jason Vas Dias
QA Contact:
Docs Contact:
URL:
Whiteboard:	impact=low,public=20041127
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-11-27 13:44 UTC by Jan Christiaan van Winkel
Modified:	2007-11-30 22:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-05-19 23:19:11 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2005:087	0	low	SHIPPED_LIVE	sysklogd bug fix update	2005-05-19 04:00:00 UTC

Description Jan Christiaan van Winkel 2004-11-27 13:44:42 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
When enabling receiving messages from other systems (by having:
SYSLOGD_OPTIONS="-r -x -m 0"
in /etc/sysconfig/syslog ), syslog will stop accepting messages after
some time.  Thus creating a system that will no longer allow logins. 
Killing the syslogd (not with service syslog stop, but with) a hard
kill, a stream of messages flow on the console, and logins are
possible again.  Note precense of the -x flag which the manual page
suggests is for preventing deadlocks on machines that also run named
(which is indeed the case on my machine).  The problem appears whether
or not -x is used.  Logins will block right at the time they want to
log the log the login (i.e. after entering a correct password)


Version-Release number of selected component (if applicable):
sysklogd-1.4.1-22

How reproducible:
Always

Steps to Reproduce:
1. put SYSLOGD_OPTIONS="-r -x -m 0" in /etc/sysconfig/syslog
2. restart syslogd: service syslog restart
3. logins will block after some time -> denial of service
    

Actual Results:  After some normal activity, any message to syslog
will block the sending application, and will not log into the relevant
logfile in /var/log (mostly messages/secure).

Expected Results:  messages ending up in /var/log/whatever and
services that can normally send messages to syslogd

Additional info:

this worked ok in RH9.  The bug is annoying and (for me) security
related because my ADSL router will only log messages through syslog
to a remote host.  It will also lead to a denial of service since
login and su will no longer function (until syslogd is killed).
No special messages are seen in /var/log/messages, and when stracing
syslogd, it is happily waiting in select().  Note that once blocked,
many services will no longer work: login, su, ...

Comment 1 Jason Vas Dias 2004-11-30 15:45:45 UTC

 I had syslogd running with the "-r -x -m 0" options on an FC3 
 machine, which also runs named, and with log messages being
 generated once every 10 seconds from a remote machine, all day
 yesterday without duplicating this problem. 
 Please supply some more information:
 1. Are you sure you have enough space on the /var partition ?
   # df -k /var

 2. How soon after reboot does the problem occur ?

 3. Have you made any modifications to the default /etc/syslog.conf?
    If so, please append your syslog.conf to this bug report. 

 4. Do you actually see messages in the log from the ADSL router
    before the problem occurs, or does it seem to occur when the
    first message from the router arrives?

 5. Did you upgrade from RH9 or do a clean install? 

 6. Is SELinux enabled in enforcing mode ? 
    # getenforce
    If so, does the command:
    # /sbin/restorecon -Rn /var
    report any policy issues ?
    If so, run:
    # /sbin/restorecon -R /var
    to correct them and see if this fixes the problem.

 7. If the problem occurs soon after restart, and you have 
    space on the /var/ partition, please do the following:
  
  a. Install the syslogd debuginfo package, available from:
http://people.redhat.com/~jvdias/sysklogd-debuginfo-1.4.1-22.i386.rpm
     (let me know if you need an RPM for a different architecture)

  b. Stop the sysklogd service and restart syslogd in debug mode:
     # service syslog stop
     # cd /
     # klogd -x &
     # syslogd -r -x -m0 -d >/tmp/syslogd.debug.log 2>&1 &

  c. Wait for the problem to occur. When it does:
     # pid=`pgrep syslogd`
     # gcore -o /tmp/syslogd.core $pid
    
  d. Restore syslogd service:
     # pkill klogd
     # service syslog restart 
    
  e. Then append a compressed tar of the resulting /tmp/syslogd.*
     files to this bug:
    
     # tar -cf - /tmp/syslogd.* | gzip /tmp/syslogd_bug.tar.gz
      
     Please either upload the resulting /tmp/syslogd_bug.tar.gz file
     into bugzilla or send it by email to: jvdias

Thank you.

Comment 2 Jan Christiaan van Winkel 2004-11-30 20:16:07 UTC

1) Room enough in /var:
df -k /var:
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg0-varvol
                      10321208    293584   9503336   3% /var

2) the problem occurs some 15 minutes after reboot.  Not a fixed
amount of time.

3) local change to syslog.conf:
added:
local6.*                                          /var/log/zyxel.log

4) Messages do actually occur in syslog.  But not many - then syslog
blocks

5) This was a clean install of FC3 x86_64 on new hardware

6) Selinux is *not* enabled:
# getenforce
Disabled

7) Will answer questions on question 7 when I get a syslogd that
starts OK. (this version won't start)

Comment 3 Jason Vas Dias 2004-12-01 18:52:20 UTC

Thank you for the debug output you sent - I think I may have resolved
the problem now:

When the problem occurs, the syslogd process is stuck in the 
'recvfrom' system call, which it expects to either return data
or be interrupted by an alarm signal. In FC3, system calls are
no longer interrupted by alarm signals by default, whereas they
were in RH9 ; in FC3, another call is necessary to explictly 
require that signals will interrupt system calls.

I have produced a test version of syslogd which makes the recvfrom
system call interruptable by signals.

Please download it from:
  
http://people.redhat.com/~jvdias/sysklogd/FC3-test/x86_64/sysklogd-1.4.1-24.x86_64.rpm
 
and install it with :
  # rpm -e sysklogd-debuginfo
  # rpm -Uvh  sysklogd-1.4.1-24.x86_64.rpm

This should fix your problem - please test and let me know .

Comment 4 Jan Christiaan van Winkel 2004-12-01 20:51:59 UTC

I have installed your rpm, and have had the new syslogd running for 4
hours  now and the original problem still has not occurred.  I will
keep an eye on things, but I think the problem is solved.  Thank you!

Comment 5 Jan Christiaan van Winkel 2004-12-08 10:54:05 UTC

The new syslogd has been running for a week now without a problem.
Issue soved.  Thanks!

Comment 6 Richard Lloyd 2005-03-01 01:28:01 UTC

Just a note that I experienced exactly the same problem (deadlocks on remote
syslogging), but with Fedora Core 2's sysklogd package (1.4.1-16). Sadly, this
FC2 package has *never* been updated since the original OS was released ! Note
that if you turn on remote syslogging and have a lot of logging (several a
second) going to syslogd, then not only will syslogd deadlock, but this will
effectively cripple the vast majority of services on the local machine (NFS,
ssh, etc. etc.) because an awful lot of them just hang waiting to log to the
local syslog and don't seem to time out.

I would argue that this is actually quite a serious bug, resulting in a mostly
unusable system until you can kill or restart syslogd (which may be impossible,
because you can't even fully log in from the console !). I'd like it to be fixed
in FC2 if at all possible (since that distros is still - just - getting
updates). It took us a week to track this problem down after many "random hangs"
(we'd turned on remote syslogging just before the hangs started, which finally
gave us a clue as to what was wrong...by realising that a change had triggered
the bug).

Because of its severity, I think "normal" and "impact=low" keywords are very
much understating the importance of fixing this bug - please fix and release an
FC2 sysklogd RPM (before FC2 disappears into the Fedora Legacy project with this
nasty bug still present).

Comment 7 Jason Vas Dias 2005-03-01 16:02:15 UTC

The sysklogd-1.4.1-26 package that fixes this problem, compiled for
FC2, is available for download from:
 http://people.redhat.com/~jvdias/sysklogd/FC2 
It has been compiled for FC2 on the i386 platform in the i386/ 
directory.
Should you require alternate platforms, download the src.rpm and
rebuild:
  # rpmbuild --rebuild sysklogd-1.4.1-26.src.rpm 
and install the resulting packages in /usr/src/redhat/RPMS/${ARCH} .
I will try to get this package into FC2 updates.
But why not update to FC3 / FC4 ?

Comment 8 Richard Lloyd 2005-03-01 22:20:24 UTC

Thanks for the pointer to the fixed FC2 package, although as I said,
there appears to have been an oversight in not feeding this fix back
into the FC2 updates, where the vast majority of FC2 users would pick
up the fix, as opposed to the few who might be reading this bug.

We've been running FC2 fine (without the remote syslogging :-) ) on
several Dell PowerEdge production servers and we tend to stick to the
a single OS [with user/kernel updates] during the lifetime of our
production machines. Hence, upgrading to FC3 or FC4 to fix the syslogd
problem is somewhat of a drastic measure (especially since we're
already on a 2.6 kernel with FC2) - I simply disabled remote
syslogging on the affected FC2 server and it's been stable since I did
that.

BTW, I checked one of our servers running RHEL 3 Update 4 and that
seems to have quite an old sysklogd package (1.4.1-12.3 from June
2004) - does this problem affect 2.4 kernel syslogging or is it just
for 2.6? Oh, I should state that we're running x86 systems throughout,
not the x86_64 platform that the original reporter was running.

Comment 9 Tim Powers 2005-05-19 23:19:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-087.html

Comment 10 Richard Lloyd 2005-05-20 09:05:36 UTC

Just a closing note that I'm pleased that this quite serious problem of remote
syslogging causing major system disruption has now been fixed. I must temper
that, however, with my disappointment at how long this took to get fixed -
almost 6 months after it was reported ! - and the fact that the keyword
"impact=low" was clearly way off base.

It took so long, in fact, that Fedora Core 2 is now in Legacy and this quite
serious bug may actually never be fixed for FC2 (though maybe they might pick it
up as it is tagged with severity "security" rather than the original dubious
"normal" severity).

Note You need to log in before you can comment on or make changes to this bug.