Bug 1216957
Summary: | rsyslog restart pulls lots of older log entries again, runs into rate-limiting | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Gerd v. Egidy <gerd> |
Component: | rsyslog | Assignee: | Peter Vrabec <pvrabec> |
Status: | CLOSED ERRATA | QA Contact: | Stefan Dordevic <sdordevi> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.1 | CC: | aladd, asavkov, bugzilla, chorn, fj-lsoft-oss, jburke, jhunt, jstancek, jvymazal, mmatsuya, pvrabec, rsroka, santony, sdordevi |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-04 05:35:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1201318, 1203710, 1295396, 1296594, 1313485, 1364088 |
Description
Gerd v. Egidy
2015-04-29 09:56:19 UTC
Since these logging problems kept appearing again and again, I recompiled the rsyslog-8.8.0-4 rpm from Fedora and upgraded the machines to that. I had to recompile some dependencies too and adapt the logrotate-config. That was about 2 weeks ago and the problem is gone since then. So to me it seems like there is a bug in the rsyslog version used in RHEL 7 that is fixed upstream. Hello Gerd, I don't recall any upstream fixes dealing with the state file that could affect this. The state file is written in intervals and during shutdown. Is it possible that the daemon crashed or some else unexpected event happened that prevented it from writing the state file and that's what was fixed in the v8 version? Or could it have been a sheer luck that the issue disappeared? Hello Tomas, thanks for your answer. When the problem was occuring, the rsyslogd was running fine before, but the state file wasn't written for a longer time (don't remember the exact time though). On shutdown the state file wasn't updated too. Seems like a bug in the writeout of the state file to me. The problem didn't appear once since I updated to 8.8.0 while it appeared at least once a week before. So I don't think it is sheer luck. We are experiencing this issue exactly as described - however we are not in a position to grab packages from outside of Redhat's channels. Making a note here as the workaround the OP uses is not workable for us and a solution actually needs to be applied to the standard RHEL7 packages Thanks for your report Aaron, seems like I was not chasing ghosts. I'd very much prefer a solution with the stock RHEL packages too. Using the Fedora packages was the lesser evil than having unreliable logging to me. I'm seeing this problem too occasionally with syslog01 LTP testcase. There appear to be 2 parts to problem I see: 1. imjournal.state file is sporadically not updated on rsyslog restart 2. imjournal.state file is not updated every "PersistStateInterval" messages while rsyslog is running (by default this is 10) [1] Problem 2) could be fixed by following commit (not tested yet): https://github.com/rsyslog/rsyslog/commit/64af99a04c36c629f32e11305ddffd7797c5b6ba [1] http://www.rsyslog.com/doc/v8-stable/configuration/modules/imjournal.html (In reply to Jan Stancek from comment #9) > I'm seeing this problem too occasionally with syslog01 LTP testcase. There > appear to be 2 parts to problem I see: > 1. imjournal.state file is sporadically not updated on rsyslog restart > 2. imjournal.state file is not updated every "PersistStateInterval" messages > while rsyslog is running (by default this is 10) [1] > > Problem 2) could be fixed by following commit (not tested yet): > https://github.com/rsyslog/rsyslog/commit/ > 64af99a04c36c629f32e11305ddffd7797c5b6ba After applying patch above, I see that change time of imjournal.state is updated, I can also trigger update by at any time by: for i in `seq 1 15`; do echo $i | logger; done (In reply to Jan Stancek from comment #10) > (In reply to Jan Stancek from comment #9) > > I'm seeing this problem too occasionally with syslog01 LTP testcase. There > > appear to be 2 parts to problem I see: > > 1. imjournal.state file is sporadically not updated on rsyslog restart > > 2. imjournal.state file is not updated every "PersistStateInterval" messages > > while rsyslog is running (by default this is 10) [1] > > > > Problem 2) could be fixed by following commit (not tested yet): > > https://github.com/rsyslog/rsyslog/commit/ > > 64af99a04c36c629f32e11305ddffd7797c5b6ba > > After applying patch above, I see that change time of imjournal.state is > updated, I can also trigger update by at any time by: > for i in `seq 1 15`; do echo $i | logger; done Wow, what a stupid bug. Let's fix the second issue and see how much things improve. The first issue might be caused by systemd killing rsyslog before it shuts down for whatever reason. Could you perhaps enable persistent logs for journald and look for lines such as > rsyslog.service stop-sigterm timed out. Killing. > rsyslog.service: main process exited, code=killed, status=9/KILL > Job rsyslog.service/stop finished, result=done > Stopped System Logging Service. > Unit rsyslog.service entered failed state. in the output of journalctl? (In reply to Tomas Heinrich from comment #11) > The first issue might be caused by systemd killing rsyslog before it shuts > down for whatever reason. > Could you perhaps enable persistent logs for journald and look for lines > such as > > > rsyslog.service stop-sigterm timed out. Killing. > > rsyslog.service: main process exited, code=killed, status=9/KILL > > Job rsyslog.service/stop finished, result=done > > Stopped System Logging Service. > > Unit rsyslog.service entered failed state. > > in the output of journalctl? I don't have a way to reproduce this on-demand. Traces from test however suggest that it is not a timeout. systemctl restart took ~0.2 secs in bad run: (this is with 7.2GA distro) + date +%s.%N 1453398421.659443787 + stat /var/lib/rsyslog/imjournal.state File: '/var/lib/rsyslog/imjournal.state' Size: 120 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 102588397 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:syslogd_var_lib_t:s0 Access: 2016-01-21 15:44:04.029662259 -0500 Modify: 2016-01-21 10:43:56.008417388 -0500 Change: 2016-01-21 10:43:56.008417388 -0500 Birth: - + ls -la /var/lib/rsyslog/ total 8 drwx------. 2 root root 28 Jan 21 10:43 . drwxr-xr-x. 35 root root 4096 Jan 21 11:10 .. -rw-------. 1 root root 120 Jan 21 10:43 imjournal.state + restart_syslog_daemon + '[' 0 -eq 0 ']' + cleanup_command='cleanup 1' + tst_resm TINFO 'restarting syslog daemon' syslog01 0 TINFO : restarting syslog daemon + '[' -n rsyslog ']' + restart_daemon rsyslog + '[' 1 -eq 1 ']' + systemctl restart rsyslog.service + '[' 0 -eq 0 ']' + sleep 2 + date +%s.%N 1453398423.873026284 + stat /var/lib/rsyslog/imjournal.state File: '/var/lib/rsyslog/imjournal.state' Size: 120 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 102588397 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:syslogd_var_lib_t:s0 Access: 2016-01-21 15:44:04.029662259 -0500 Modify: 2016-01-21 10:43:56.008417388 -0500 Change: 2016-01-21 10:43:56.008417388 -0500 2016-01-21T12:47:02.300094-05:00 dhcp180-13 rsyslogd-2177: imjournal: begin to drop messages due to rate-limiting + systemctl -l -n 100 status rsyslog * rsyslog.service - System Logging Service Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2016-01-21 12:47:01 EST; 9s ago Main PID: 17790 (rsyslogd) CGroup: /system.slice/rsyslog.service `-17790 /usr/sbin/rsyslogd -n Jan 21 12:47:01 dhcp180-13.ml3.eng.bos.redhat.com systemd[1]: Starting System Logging Service... Jan 21 12:47:01 dhcp180-13.ml3.eng.bos.redhat.com systemd[1]: Started System Logging Service. This looks like race between imjournal thread exiting on its own and main thread trying to terminate it. If imjournal thread manages to exit before main thread runs thrdDestruct, then persistJournalState() never runs. imjournal thread completes its while loop and code continues at thrdStarter(), which sets bIsActive to 0. Now comes thrdDestruct, which however will not run thrdTerminate() and therefore pAfterRun() of imjournal also never runs. -> SIGTERM doDie() bFinished = sig main thread imjournal thread ----------------------------------------+-------------------------------------- realMain() | die(bFinished) | glbl.SetGlobalInputTermination() | CODESTARTrunInput thrdExit | thrdStarter() llDestroy(&llThrds) | pThis->bIsActive = 0; thrdDestruct() | if(pThis->bIsActive == 1) | thrdTerminate(pThis); | pThis->pAfterRun(pThis); | persistJournalState() | This can be easily reproduced if you add small delay to thrdDestruct, to let imjournal thread exit on its own: --- a/threads.c +++ b/threads.c @@ -76,8 +76,12 @@ static rsRetVal thrdDestruct(thrdInfo_t *pThis) DEFiRet; assert(pThis != NULL); + sleep(1); + printf("thrdDestruct: %p %s\n", pThis, pThis->name); if(pThis->bIsActive == 1) { thrdTerminate(pThis); + } else { + printf("thrdDestruct: %p %s is not active\n", pThis, pThis->name); } pthread_mutex_destroy(&pThis->mutThrd); pthread_cond_destroy(&pThis->condThrdTerm); # ls -la /var/lib/rsyslog/imjournal.state -rw-------. 1 root root 123 Mar 2 12:33 /var/lib/rsyslog/imjournal.state # date Wed Mar 2 12:42:05 EST 2016 # systemctl restart rsyslog # ls -la /var/lib/rsyslog/imjournal.state -rw-------. 1 root root 123 Mar 2 12:33 /var/lib/rsyslog/imjournal.state Mar 02 12:42:11 vm systemd[1]: Stopping System Logging Service... Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3c9ff0 imuxsock Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3c9ff0 imuxsock is not active Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3ca350 imjournal Mar 02 12:42:13 vm rsyslogd[19954]: thrdDestruct: 0x7f0cbb3ca350 imjournal is not active Mar 02 12:42:13 vm systemd[1]: Starting System Logging Service... Mar 02 12:42:13 vm systemd[1]: Started System Logging Service. Possible fix could be to either call persistJournalState() after while loop, or turn while loop into "while (1)" and rely on thrdTerminate() to always run and call pAfterRun(). This issue affects multiple plugins. So far it seems to manifest only in imjournal. Filed upstream: https://github.com/rsyslog/rsyslog/issues/882 (In reply to Tomas Heinrich from comment #15) > This issue affects multiple plugins. So far it seems to manifest only in > imjournal. > > Filed upstream: https://github.com/rsyslog/rsyslog/issues/882 https://github.com/rsyslog/rsyslog/commit/8a10940b5db901ec74e2d3ea7dd796076c1ae05d *** Bug 1204004 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2401.html |