Bug 847207

Summary: journald should say when it drops messages when syslog can't keep up with the log stream
Product: [Fedora] Fedora Reporter: Jan Friesse <jfriesse>
Component: systemdAssignee: systemd-maint
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 17CC: abeekhof, asalkeld, fdinitto, johannbg, lnykryn, lpoetter, metherid, msekleta, notting, plautrba, rhel, systemd-maint, theinric, vpavlin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-07 22:44:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
node4-messages.xz - /var/log/messages
none
node4-corosynclog.xz - /var/log/cluster/corosync.log
none
diagnostic patch for libqb
none
/var/log/cluster/corosync.log - with diagnostic patch
none
/var/log/messages - with diagnostic logs
none
Quiet reliable reproducer for syslog problem none

Description Jan Friesse 2012-08-10 05:36:48 UTC
Description of problem:
Some log messages can get lost when logging to syslog. It's very hard to reproduce, and I'm really unsure if problem is libqb, syslog, kernel or even corosync (very small probability).

Take a look to node4-messages.xz (xzipped content of /var/log/messages), at time 14:36:09 (line 13399). As you can see, next line is NOT "Completed service synchronization, ready to provide service".

Now look to file node4-corosynclog.xz (xzipped content of /var/log/cluster/corosync.log) again at time and you can see line 5074 IS "Completed service synchronization, ready to provide service".

So it looks like some messages are LOST.

Rsyslog is set to not apply rate limiting by "$SystemLogRateLimitInterval 0" in rsyslog.conf

Version-Release number of selected component (if applicable):
Corosync git, libqb git, other things are fc17 + updates

How reproducible:
0.1%

Steps to Reproduce:
1. In description
  
Actual results:
Not all messages are logged to syslog.

Expected results:
ALL messages are logged to syslog.

Comment 1 Jan Friesse 2012-08-10 05:38:46 UTC
Created attachment 603422 [details]
node4-messages.xz - /var/log/messages

Comment 2 Jan Friesse 2012-08-10 05:39:22 UTC
Created attachment 603423 [details]
node4-corosynclog.xz - /var/log/cluster/corosync.log

Comment 3 Angus Salkeld 2012-08-13 02:14:52 UTC
libqb just goes through the targets and calls it's write method. So
if you are getting the file message then libqb is calling syslog().
Unfortunately syslog() has a void return so I can't check for failed
log message. I suspect the kernel buffer is full (/dev/log) and it's just
getting dropped. Is this happening with threaded or non-threaded logging?

Comment 4 Angus Salkeld 2012-08-13 03:37:20 UTC
Created attachment 603845 [details]
diagnostic patch for libqb

Hi Honza

Can you apply the attached patch to libqb and re-run?

It should log to syslog on shutdown the number of logs written
by each target.

Like this:
Aug 13 13:26:15 elf corosync[19489]: corosync:[0] wrote 35 messages
Aug 13 13:26:15 elf corosync[19489]: corosync:stderr[1] wrote 2 messages
Aug 13 13:26:15 elf corosync[19489]: corosync:/var/log/cluster/corosync.log[4] wrote 35 messages

Note: "corosync:[0]" is syslog.

So if libqb counts the number of messages sent to syslog == number sent
to file then the problem is syslog else it is probably a libqb bug.

-Angus

Comment 5 Jan Friesse 2012-08-13 06:48:58 UTC
(In reply to comment #4)
> Created attachment 603845 [details]
> diagnostic patch for libqb
> 
> Hi Honza
> 
> Can you apply the attached patch to libqb and re-run?
> 
> It should log to syslog on shutdown the number of logs written
> by each target.
> 
> Like this:
> Aug 13 13:26:15 elf corosync[19489]: corosync:[0] wrote 35 messages
> Aug 13 13:26:15 elf corosync[19489]: corosync:stderr[1] wrote 2 messages
> Aug 13 13:26:15 elf corosync[19489]:
> corosync:/var/log/cluster/corosync.log[4] wrote 35 messages
> 
> Note: "corosync:[0]" is syslog.
> 
> So if libqb counts the number of messages sent to syslog == number sent
> to file then the problem is syslog else it is probably a libqb bug.
> 
> -Angus

Angus,
I will try patch and let you know when CTS will hit bug (I'm not able to reproduce it locally, and it happens really from time to time - like 0.5 times per whole CTS -> at least 103 starts + shut downs multiply 4 nodes).

And this is happening WITHOUT threaded mode enabled (as current master is, I didn't yet applied patch to make some targets threaded).

Comment 6 Jan Friesse 2012-08-17 07:49:31 UTC
Created attachment 605107 [details]
/var/log/cluster/corosync.log - with diagnostic patch

Comment 7 Jan Friesse 2012-08-17 07:50:08 UTC
Created attachment 605108 [details]
/var/log/messages - with diagnostic logs

Comment 8 Jan Friesse 2012-08-17 07:53:58 UTC
Finally able to reproduce again. As can be seen, 22:14:13 at corosync.log contains "main.c:273 Completed service synchronization, ready to provide s
ervice." but /var/log/messages doesn't. On exit, libqb says:

corosync:[0] wrote 91 messages
corosync:/var/log/cluster/corosync.log[4] wrote 91 messages

So probably it's not libqb problem (and also not corosync one). I will try busybox syslogd (which seems to be simple/stupid enough to simply work) and post results.

Comment 9 Jan Friesse 2012-08-20 08:01:28 UTC
Ok,
so I've tried busybox syslogd and result was the same, some messages was missing. It's for sure not problem in libqb. So changing component to rsyslog (even it's probably NOT rsyslogd bug, but somewhere bellow -> glibc, kernel, ..., but I'm pretty sure rsyslogd maintainer is more competent to find out good component then I'm).

Comment 10 Jan Friesse 2012-08-20 08:14:11 UTC
Created attachment 605595 [details]
Quiet reliable reproducer for syslog problem

This is quiet reliable reproducer for syslog problem. It DOESN'T use libqb, but only pure openlog/syslog/closelog. It creates 20 childs and each of child write 10000 messages in form

session=SESSION_ID, int_pid=INTERNAL_PID, msg_no=MESSAGE_NUMBER, rand string=RAND_STRING

where SESSION_ID is random number generated at start of program and written to stdout at start (so it's easy to grep). INTERNAL_PID is internal child pid and can be <0, 19>. MESSAGE_NUMBER is self explaining and can be <0, 9999> (numbers are increasing and ordered). RAND_STRING is random string of characters from ASCII table starting from character '0'

So usage is following:
1.) gcc test.c
2.) ./a.out
result is "session=SESSION_ID started with 20 forks and 10000 msgs = 200000"
3.) when program finishes, it's time to find out HOW many messages was written, so:
cat /var/log/messages  | grep 'syslogbug.*session=SESSION_ID,' | wc -l

so concrete example:
./a.out
session=1175375171 started with 20 forks and 10000 msgs = 200000
[root@cts-node1 test]# cat /var/log/messages  | grep 'syslogbug.*session=1175375171,' | wc -l
199952

There should be 200000 lines!

Also it's important to say that test.c is really only reproducer. SAME behavior happens even when less messages are logged (also with bigger pauses between messages) and also in single threaded/single process apps.

And just for sure, please note that $SystemLogRateLimitInterval 0 is set in rsyslog.conf. It's fc17 + updates from Fri.

Comment 11 Jan Friesse 2012-08-20 14:30:45 UTC
Just a note, fc16 + updates is NOT affected.

Comment 12 Tomas Heinrich 2012-08-20 15:03:53 UTC
I've tried to reproduce this with
rsyslog-5.8.10-2.fc17.x86_64
Unsuccessfully at first. I had to decrease the delay between messages from 10 ms to 1 ms.
This alone didn't help either, but something came up when running the test in a loop:

# while sleep 1; do date && ./a.out |& tee /tmp/sess-id && \
grep -c "$(sed -e 's/\(session=[^ ]*\) .*/\1/' /tmp/sess-id)" /var/log/messages; done

First several runs completed with 200002 messages but then some losses started to appear.
I've tried tweaking some directives and this one had the most impact:

$MainMsgQueueSize 200000

My guess is that this is purely a performance issue: the main queue gradually fills up and messages that can't be queued are dropped. There's not much to do about this besides making the processes block or enlarging the queues.

Please bear in mind that on f17, there stands journald inbetween rsyslog and the process attempting to log a message:
/etc/rsyslog.d/listen.conf: $SystemLogSocketName /run/systemd/journal/syslog

Comment 13 Jan Friesse 2012-08-20 15:29:24 UTC
One weird note. FC17 + ext2 is NOT affected. FC17 + ext4 is affected.

Comment 14 Jan Friesse 2012-08-20 15:34:19 UTC
(In reply to comment #12)
> I've tried to reproduce this with
> rsyslog-5.8.10-2.fc17.x86_64
> Unsuccessfully at first. I had to decrease the delay between messages from
> 10 ms to 1 ms.
> This alone didn't help either, but something came up when running the test
> in a loop:
> 
> # while sleep 1; do date && ./a.out |& tee /tmp/sess-id && \
> grep -c "$(sed -e 's/\(session=[^ ]*\) .*/\1/' /tmp/sess-id)"
> /var/log/messages; done
> 
> First several runs completed with 200002 messages but then some losses
> started to appear.
> I've tried tweaking some directives and this one had the most impact:
> 
> $MainMsgQueueSize 200000
> 
> My guess is that this is purely a performance issue: the main queue
> gradually fills up and messages that can't be queued are dropped. There's
> not much to do about this besides making the processes block or enlarging
> the queues.
> 

I believe this is expected. I mean, syslog is api which returns void. In other words, there is NO way how to find out if log message was logger correctly or not. In such case I would expect such API cannot ever fall. If it falls (like no space on disk, ...) there should be log ether on console, dmesg, ...

Also why RHEL 6 isn't affected?

> Please bear in mind that on f17, there stands journald inbetween rsyslog and
> the process attempting to log a message:
> /etc/rsyslog.d/listen.conf: $SystemLogSocketName /run/systemd/journal/syslog

Ya, if you feel it's problem in systemd, please reassign to journald ... (as I noted in comment, busybox syslogd is ALSO affected, so it's probably not problem of rsyslog, but you should be able to identify correct component better then I'm)

Comment 15 Jan Friesse 2012-08-20 15:37:27 UTC
> I've tried tweaking some directives and this one had the most impact:
> 
> $MainMsgQueueSize 200000

This really doesn't solve problem for me. Only thing which really does (WEIRD) is ext2.

Comment 16 Jan Friesse 2012-08-21 08:49:09 UTC
So I've spent a little more time testing issue, and found following:
Command:

for i in `seq 1 5`;do ses_id=`./a.out 2>&1 | cut -d ' ' -f 1 | cut -d  '=' -f 2`; cat /var/log/messages  | grep 'syslogbug.*session='$ses_id',' | wc -l;done

systemd-journald.conf unchanged

ext4:
199996
199998
199997
199997
199999

ext3:
200000
200000
200000
200000
200000

ext2:
200000
200000
199996
200000
200000

After "tweaking" systemd-journald.conf:
RuntimeMaxUse=10
RuntimeMaxFileSize=100K

ext4:
200000
200000
195150
197479
199973

In other words, problem seems to be not fs specific and really looks like systemd-journald problem.

Comment 17 Tomas Heinrich 2012-08-21 12:25:11 UTC
> I believe this is expected. I mean, syslog is api which returns void. In
> other words, there is NO way how to find out if log message was logger
> correctly or not. In such case I would expect such API cannot ever fall.

I don't believe this is the case. There are many ways in which it can fail.

When applications are logging directly to rsyslog, the daemon can be configured to either make the applications block when attempting to write to the socket or drop the excessive messages. The first approach ensures messages get at least to the daemon, but this can also cause system hangs if the daemon gets overrun or stalls for some other reason.
Things get even more complicated with journald involved.

> If it falls (like no space on disk, ...) there should be log ether on console,
> dmesg, ...

If the socket itself fills and messages on the sending side are dropped because of that, rsyslog has no way of detecting it.

You can try to compare logs from rsyslog and journal [1] to se if there are some discrepancies.

> Ya, if you feel it's problem in systemd, please reassign to journald

Feel free to do so.

[1] http://www.freedesktop.org/software/systemd/man/systemd-journald.service.html

Comment 18 Jan Friesse 2012-08-21 12:53:39 UTC
(In reply to comment #17)
> > I believe this is expected. I mean, syslog is api which returns void. In
> > other words, there is NO way how to find out if log message was logger
> > correctly or not. In such case I would expect such API cannot ever fall.
> 
> I don't believe this is the case. There are many ways in which it can fail.
> 
> When applications are logging directly to rsyslog, the daemon can be
> configured to either make the applications block when attempting to write to
> the socket or drop the excessive messages.

What is the configuration option for that? I would like to see how it's implemented, because I've didn't know that it's possible to make socket non-blocking on both side when one side wants blocking.

 The first approach ensures
> messages get at least to the daemon, but this can also cause system hangs if
> the daemon gets overrun or stalls for some other reason.

I hope blocking behavior is default.

> Things get even more complicated with journald involved.
> 
> > If it falls (like no space on disk, ...) there should be log ether on console,
> > dmesg, ...
> 
> If the socket itself fills and messages on the sending side are dropped
> because of that, rsyslog has no way of detecting it.
> 

But I'm not using any non-blocking API. So if socket (/dev/log) buffer is full, application simply blocks. So whoever created /dev/log don't need to detect that.

> You can try to compare logs from rsyslog and journal [1] to se if there are
> some discrepancies.
> 
> > Ya, if you feel it's problem in systemd, please reassign to journald
> 
> Feel free to do so.

Will do.

> 
> [1]
> http://www.freedesktop.org/software/systemd/man/systemd-journald.service.html

Comment 19 Lennart Poettering 2012-08-23 02:22:22 UTC
When journald forwards messages to syslog and syslog is stuck it will not wait for the syslog daemon.

Comment 20 Jan Friesse 2012-08-23 05:58:31 UTC
(In reply to comment #19)
> When journald forwards messages to syslog and syslog is stuck it will not
> wait for the syslog daemon.

So what you wanted to say is, that there is no way how to reliably log message in non-failure environment?

Comment 21 Andrew Beekhof 2012-08-29 07:22:25 UTC
(Different part of the cluster stack here) 

Personally I'm fine with messages being lost/discarded[1], as long as it doesn't happen silently.

Basically, I'd like to see the existing rsyslog behaviour preserved by journald.
The pattern we currently look for is:
  "rsyslogd.* imuxsock lost .* messages from pid .* due to rate-limiting",

I think its important that journald log something similar.


[1] Pacemaker no longer sends info/debug/trace logs to syslog anyway, so we rarely get rate-limited.  If I /really/ want every single log, I write directly to a logfile.

Comment 22 Lennart Poettering 2012-09-14 13:43:25 UTC
So, here's what I propose:

a) we should give rsyslog more room to catchup with log messages without having to freeze journald for that too. I filed https://bugzilla.redhat.com/show_bug.cgi?id=857385 requesting a new sockopt so that we can queue much more messages in the syslog socket before we start droppping things.

b) I'll change journald to count how many messages we drop and write a message about it as first thing into syslog when syslog takes messages again.

Both together should make delivery much more reliable while still making things robust and introspectable.

Comment 23 Lennart Poettering 2012-09-17 22:01:30 UTC
b) is now fixed in git, and soon in F18.

Comment 24 Fedora Update System 2012-09-20 19:54:52 UTC
systemd-190-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/systemd-190-1.fc18

Comment 25 Fedora Update System 2012-09-22 06:36:01 UTC
Package systemd-191-2.fc18, rtkit-0.11-3.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-191-2.fc18 rtkit-0.11-3.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-14581/rtkit-0.11-3.fc18,systemd-191-2.fc18
then log in and leave karma (feedback).

Comment 26 Fedora Update System 2012-09-28 00:16:28 UTC
Package glibc-2.16-17.fc18, systemd-192-1.fc18, selinux-policy-3.11.1-23.fc18, rtkit-0.11-3.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glibc-2.16-17.fc18 systemd-192-1.fc18 selinux-policy-3.11.1-23.fc18 rtkit-0.11-3.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-14581/selinux-policy-3.11.1-23.fc18,rtkit-0.11-3.fc18,systemd-192-1.fc18,glibc-2.16-17.fc18
then log in and leave karma (feedback).

Comment 27 Fedora Update System 2012-10-01 20:08:22 UTC
Package glibc-2.16-17.fc18, rtkit-0.11-3.fc18, systemd-193-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glibc-2.16-17.fc18 rtkit-0.11-3.fc18 systemd-193-1.fc18'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-14581/rtkit-0.11-3.fc18,systemd-193-1.fc18,glibc-2.16-17.fc18
then log in and leave karma (feedback).