Bug 1216957

Summary: rsyslog restart pulls lots of older log entries again, runs into rate-limiting
Product: Red Hat Enterprise Linux 7 Reporter: Gerd v. Egidy <gerd>
Component: rsyslogAssignee: Peter Vrabec <pvrabec>
Status: CLOSED ERRATA QA Contact: Stefan Dordevic <sdordevi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.1CC: aladd, asavkov, bugzilla, chorn, fj-lsoft-oss, jburke, jhunt, jstancek, jvymazal, mmatsuya, pvrabec, rsroka, santony, sdordevi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 05:35:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1201318, 1203710, 1295396, 1296594, 1313485, 1364088    

Description Gerd v. Egidy 2015-04-29 09:56:19 UTC
Description of problem:

I use rsyslog to have logs written to the classic /var/log/messages and .../maillog etc. files.

Sometimes when I restart rsyslog ("systemctl restart rsyslog.service"), the file /var/lib/rsyslog/imjournal.state is not updated. 

When the new rsyslog process starts, it then reads lots of older log entries from the systemd-journal and thinks these are new and should be logged. Because there are so many entries, rate limiting is activated and I get something like this in the rsyslog:

rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting
rsyslogd-2177: imjournal: 82074 messages lost due to rate-limiting

Version-Release number of selected component (if applicable):

rsyslog-7.4.7-7

How reproducible:

seldom. I haven't found a way to trigger it yet. I've seen this on two servers of many. When the problem is occuring, I have to restart rsyslog two or three times till it works again.

Steps to Reproduce:
1. stat /var/lib/rsyslog/imjournal.state
2. systemctl restart rsyslog.service
3. stat /var/lib/rsyslog/imjournal.state

Actual results:

stat shows the same old modify time as in the first run

Expected results:

stat shows the current time as modify time

Comment 2 Gerd v. Egidy 2015-05-20 08:32:41 UTC
Since these logging problems kept appearing again and again, I recompiled the rsyslog-8.8.0-4 rpm from Fedora and upgraded the machines to that. I had to recompile some dependencies too and adapt the logrotate-config.

That was about 2 weeks ago and the problem is gone since then.

So to me it seems like there is a bug in the rsyslog version used in RHEL 7 that is fixed upstream.

Comment 3 Tomas Heinrich 2015-06-03 13:59:44 UTC
Hello Gerd,

I don't recall any upstream fixes dealing with the state file that could affect this.
The state file is written in intervals and during shutdown. Is it possible that the daemon crashed or some else unexpected event happened that prevented it from writing the state file and that's what was fixed in the v8 version?
Or could it have been a sheer luck that the issue disappeared?

Comment 4 Gerd v. Egidy 2015-06-03 14:18:39 UTC
Hello Tomas,

thanks for your answer.

When the problem was occuring, the rsyslogd was running fine before, but the state file wasn't written for a longer time (don't remember the exact time though). On shutdown the state file wasn't updated too.

Seems like a bug in the writeout of the state file to me.

The problem didn't appear once since I updated to 8.8.0 while it appeared at least once a week before. So I don't think it is sheer luck.

Comment 5 Aaron Howell 2015-08-28 01:04:06 UTC
We are experiencing this issue exactly as described - however we are not in a position to grab packages from outside of Redhat's channels. Making a note here as the workaround the OP uses is not workable for us and a solution actually needs to be applied to the standard RHEL7 packages

Comment 6 Gerd v. Egidy 2015-08-28 07:11:23 UTC
Thanks for your report Aaron, seems like I was not chasing ghosts. I'd very much prefer a solution with the stock RHEL packages too. Using the Fedora packages was the lesser evil than having unreliable logging to me.

Comment 9 Jan Stancek 2016-01-05 14:58:15 UTC
I'm seeing this problem too occasionally with syslog01 LTP testcase. There appear to be 2 parts to problem I see:
1. imjournal.state file is sporadically not updated on rsyslog restart
2. imjournal.state file is not updated every "PersistStateInterval" messages while rsyslog is running (by default this is 10) [1]

Problem 2) could be fixed by following commit (not tested yet):
https://github.com/rsyslog/rsyslog/commit/64af99a04c36c629f32e11305ddffd7797c5b6ba

[1] http://www.rsyslog.com/doc/v8-stable/configuration/modules/imjournal.html

Comment 10 Jan Stancek 2016-01-06 09:51:04 UTC
(In reply to Jan Stancek from comment #9)
> I'm seeing this problem too occasionally with syslog01 LTP testcase. There
> appear to be 2 parts to problem I see:
> 1. imjournal.state file is sporadically not updated on rsyslog restart
> 2. imjournal.state file is not updated every "PersistStateInterval" messages
> while rsyslog is running (by default this is 10) [1]
> 
> Problem 2) could be fixed by following commit (not tested yet):
> https://github.com/rsyslog/rsyslog/commit/
> 64af99a04c36c629f32e11305ddffd7797c5b6ba

After applying patch above, I see that change time of imjournal.state is updated, I can also trigger update by at any time by:
  for i in `seq 1 15`; do echo $i | logger; done

Comment 11 Tomas Heinrich 2016-01-21 14:26:41 UTC
(In reply to Jan Stancek from comment #10)
> (In reply to Jan Stancek from comment #9)
> > I'm seeing this problem too occasionally with syslog01 LTP testcase. There
> > appear to be 2 parts to problem I see:
> > 1. imjournal.state file is sporadically not updated on rsyslog restart
> > 2. imjournal.state file is not updated every "PersistStateInterval" messages
> > while rsyslog is running (by default this is 10) [1]
> > 
> > Problem 2) could be fixed by following commit (not tested yet):
> > https://github.com/rsyslog/rsyslog/commit/
> > 64af99a04c36c629f32e11305ddffd7797c5b6ba
> 
> After applying patch above, I see that change time of imjournal.state is
> updated, I can also trigger update by at any time by:
>   for i in `seq 1 15`; do echo $i | logger; done

Wow, what a stupid bug.
Let's fix the second issue and see how much things improve.

The first issue might be caused by systemd killing rsyslog before it shuts down for whatever reason.
Could you perhaps enable persistent logs for journald and look for lines such as

> rsyslog.service stop-sigterm timed out. Killing.
> rsyslog.service: main process exited, code=killed, status=9/KILL
> Job rsyslog.service/stop finished, result=done
> Stopped System Logging Service.
> Unit rsyslog.service entered failed state.

in the output of journalctl?

Comment 12 Jan Stancek 2016-01-22 09:43:27 UTC
(In reply to Tomas Heinrich from comment #11)
> The first issue might be caused by systemd killing rsyslog before it shuts
> down for whatever reason.
> Could you perhaps enable persistent logs for journald and look for lines
> such as
> 
> > rsyslog.service stop-sigterm timed out. Killing.
> > rsyslog.service: main process exited, code=killed, status=9/KILL
> > Job rsyslog.service/stop finished, result=done
> > Stopped System Logging Service.
> > Unit rsyslog.service entered failed state.
> 
> in the output of journalctl?

I don't have a way to reproduce this on-demand. Traces from test however suggest that it is not a timeout. systemctl restart took ~0.2 secs in bad run:
(this is with 7.2GA distro)

+ date +%s.%N
1453398421.659443787

+ stat /var/lib/rsyslog/imjournal.state
  File: '/var/lib/rsyslog/imjournal.state'
  Size: 120       	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 102588397   Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:syslogd_var_lib_t:s0
Access: 2016-01-21 15:44:04.029662259 -0500
Modify: 2016-01-21 10:43:56.008417388 -0500
Change: 2016-01-21 10:43:56.008417388 -0500
 Birth: -

+ ls -la /var/lib/rsyslog/
total 8
drwx------.  2 root root   28 Jan 21 10:43 .
drwxr-xr-x. 35 root root 4096 Jan 21 11:10 ..
-rw-------.  1 root root  120 Jan 21 10:43 imjournal.state

+ restart_syslog_daemon
+ '[' 0 -eq 0 ']'
+ cleanup_command='cleanup 1'
+ tst_resm TINFO 'restarting syslog daemon'
syslog01    0  TINFO  :  restarting syslog daemon
+ '[' -n rsyslog ']'
+ restart_daemon rsyslog
+ '[' 1 -eq 1 ']'

+ systemctl restart rsyslog.service
+ '[' 0 -eq 0 ']'
+ sleep 2

+ date +%s.%N
1453398423.873026284

+ stat /var/lib/rsyslog/imjournal.state
  File: '/var/lib/rsyslog/imjournal.state'
  Size: 120       	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 102588397   Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:syslogd_var_lib_t:s0
Access: 2016-01-21 15:44:04.029662259 -0500
Modify: 2016-01-21 10:43:56.008417388 -0500
Change: 2016-01-21 10:43:56.008417388 -0500


2016-01-21T12:47:02.300094-05:00 dhcp180-13 rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting
+ systemctl -l -n 100 status rsyslog
* rsyslog.service - System Logging Service
   Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-01-21 12:47:01 EST; 9s ago
 Main PID: 17790 (rsyslogd)
   CGroup: /system.slice/rsyslog.service
           `-17790 /usr/sbin/rsyslogd -n

Jan 21 12:47:01 dhcp180-13.ml3.eng.bos.redhat.com systemd[1]: Starting System Logging Service...
Jan 21 12:47:01 dhcp180-13.ml3.eng.bos.redhat.com systemd[1]: Started System Logging Service.

Comment 14 Jan Stancek 2016-03-02 17:51:22 UTC
This looks like race between imjournal thread exiting on its own and main thread trying to terminate it. If imjournal thread manages to exit before main thread runs thrdDestruct, then persistJournalState() never runs. imjournal thread completes its while loop and code continues at thrdStarter(), which sets bIsActive to 0. Now comes thrdDestruct, which however will not run thrdTerminate() and therefore pAfterRun() of imjournal also never runs.

-> SIGTERM
  doDie()
    bFinished = sig

  main thread                              imjournal thread
----------------------------------------+--------------------------------------
realMain()                              |
  die(bFinished)                        |
    glbl.SetGlobalInputTermination()    |     CODESTARTrunInput
  thrdExit                              |       thrdStarter()
    llDestroy(&llThrds)                 |         pThis->bIsActive = 0;
      thrdDestruct()                    |
        if(pThis->bIsActive == 1)       |
          thrdTerminate(pThis);         |
            pThis->pAfterRun(pThis);    |
              persistJournalState()     |


This can be easily reproduced if you add small delay to thrdDestruct, to let imjournal thread exit on its own:

--- a/threads.c
+++ b/threads.c
@@ -76,8 +76,12 @@ static rsRetVal thrdDestruct(thrdInfo_t *pThis)
        DEFiRet;
        assert(pThis != NULL);

+       sleep(1);
+       printf("thrdDestruct: %p %s\n", pThis, pThis->name);
        if(pThis->bIsActive == 1) {
                thrdTerminate(pThis);
+       } else {
+               printf("thrdDestruct: %p %s is not active\n", pThis, pThis->name);
        }
        pthread_mutex_destroy(&pThis->mutThrd);
        pthread_cond_destroy(&pThis->condThrdTerm);


# ls -la /var/lib/rsyslog/imjournal.state
-rw-------. 1 root root 123 Mar  2 12:33 /var/lib/rsyslog/imjournal.state
# date
Wed Mar  2 12:42:05 EST 2016
# systemctl restart rsyslog
# ls -la /var/lib/rsyslog/imjournal.state
-rw-------. 1 root root 123 Mar  2 12:33 /var/lib/rsyslog/imjournal.state


Mar 02 12:42:11 vm systemd[1]: Stopping System Logging Service...
Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3c9ff0 imuxsock
Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3c9ff0 imuxsock is not active
Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3ca350 imjournal
Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3ca350 imjournal is not active
Mar 02 12:42:13 vm systemd[1]: Starting System Logging Service...
Mar 02 12:42:13 vm systemd[1]: Started System Logging Service.

Possible fix could be to either call persistJournalState() after while loop, or turn while loop into "while (1)" and rely on thrdTerminate() to always run and call pAfterRun().

Comment 15 Tomas Heinrich 2016-03-15 15:44:01 UTC
This issue affects multiple plugins. So far it seems to manifest only in imjournal.

Filed upstream: https://github.com/rsyslog/rsyslog/issues/882

Comment 16 Jan Stancek 2016-04-25 07:05:39 UTC
(In reply to Tomas Heinrich from comment #15)
> This issue affects multiple plugins. So far it seems to manifest only in
> imjournal.
> 
> Filed upstream: https://github.com/rsyslog/rsyslog/issues/882

https://github.com/rsyslog/rsyslog/commit/8a10940b5db901ec74e2d3ea7dd796076c1ae05d

Comment 19 Tomas Heinrich 2016-07-11 00:42:54 UTC
*** Bug 1204004 has been marked as a duplicate of this bug. ***

Comment 29 errata-xmlrpc 2016-11-04 05:35:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2401.html