Bug 1088021

Summary: Changing a VM host's time disables rsyslog file logging
Product: Red Hat Enterprise Linux 7 Reporter: John Margaritsanakis <john>
Component: rsyslogAssignee: Radovan Sroka <rsroka>
Status: CLOSED ERRATA QA Contact: Stefan Dordevic <sdordevi>
Severity: high Docs Contact: Vladimír Slávik <vslavik>
Priority: high    
Version: 7.1CC: bmarson, bschmaus, bugs.management, dahorak, dapospis, david, dkopecek, dshaw, gulikoza, hartsjc, heinrich.tomas, igeorgex, john, jscalf, jscotka, jvymazal, ksnider, lnykryn, lxue, mgrepl, mhaicman, moshiro, mwoehlke.floss, nstraz, ochalups, pablo.iranzo, pasik, pb, pportant, pvrabec, rahranja, redhat-bugzilla, redhat-bugzilla, redhat, rhel, riehecky, rsroka, sdordevi, sfroemer, systemd-maint-list, systemd-maint, twright
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rsyslog-8.24.0-6.el7 Doc Type: Bug Fix
Doc Text:
*rsyslog* no longer stops receiving messages after system time change When a system's time was set back to an earlier time, after a reboot the *imjournal* component no longer had a valid cursor to the last journal message. As a consequence, *imjournal* completely stopped logging to the runtime journal and the *rsyslog* server did not receive any messages. This behaviour was also triggered by a virtual machine guest getting its time from its host, if the host time was earlier. The *imjournal* behaviour has been changed to validate the cursor and jump to head of journal if the cursor is invalid.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 09:53:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1409527    
Bug Blocks: 1203710, 1289485, 1313485, 1377248    
Attachments:
Description Flags
patch(7.4) none

Description John Margaritsanakis 2014-04-15 20:49:58 UTC
Description of problem:

This is an odd issue.

I installed a fresh RHEL7 VM (and recreated this issue later through a second VM). I was testing ntpd/ntpdate and chronyd, and over those tests I disabled both ntpd and chronyd, changed the time to two months in the future, then rebooted.

I noticed the time was still being corrected, and after examining this further realized it had to do with the host's time. During its reboot the VM was setting its time to that of the host machine.

Sure enough when I set the host's own time the VM had the same wrong time as the host. That's when I noticed /var/log/messages was no longer being updated.

It turns out nothing other than restarting the rsyslog service itself logs to files at that point. If I do so this gets logged:

---
Apr 15 16:39:39 rhel7time-dev rsyslogd-3000: sd_journal_get_cursor() failed: 'Cannot assign requested address'

Apr 15 16:39:39 rhel7time-dev rsyslogd: [origin software="rsyslogd" swVersion="7.4.2" x-pid="574" x-info="http://www.rsyslog.com"] exiting on signal 15.
Apr 15 16:39:39 rhel7time-dev rsyslogd: [origin software="rsyslogd" swVersion="7.4.2" x-pid="2117" x-info="http://www.rsyslog.com"] start
---

Otherwise nothing is logged to file, including logger.

If I comment out $OmitLocalLogging on in rsyslog.conf then file logging resumes (notice that until that point I hadn't changed rsyslog.conf).

Logging through journal is unaffected by all this. journalctl -b shows logging, including anything sent by logger.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Set up a fresh RHEL7 VM installation
2. Change the time on its host machine to a future one
3. Reboot the VM.

Actual results:

rsyslog logging to log files (including /var/log/messages) is now disabled.

Expected results:

rsyslog continues logging to log files. (it was logging fine before this)

Additional info:

These are the uncommented settings in my rsyslog.conf file (I took the comments out for brevity's sake)

-----
$ModLoad imuxsock 
$ModLoad imjournal 
$ModLoad imklog  
$WorkDirectory /var/lib/rsyslog

$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

$IncludeConfig /etc/rsyslog.d/*.conf

$OmitLocalLogging on

$IMJournalStateFile imjournal.state

*.info;mail.none;authpriv.none;cron.none;local0.none    /var/log/messages

authpriv.*                                              /var/log/secure

mail.*                                                  -/var/log/maillog

cron.*                                                  /var/log/cron

*.emerg                                                 :omusrmsg:*

uucp,news.crit                                          /var/log/spooler

local7.*                                                /var/log/boot.log

local0.*                                                /var/log/pound.log
-----

Comment 2 Tomas Heinrich 2014-04-18 14:51:52 UTC
When this issue occurs, you can delete /var/lib/rsyslog/imjournal.state and restart the daemon as a workaround.

rsyslog doesn't handle the date directly but only through the systemd API.
I've checked the code in imjournal a while ago and this looks like an issue in systemd.

For reference, see:
https://github.com/rsyslog/rsyslog/issues/43

Comment 3 RHEL Program Management 2014-04-26 05:47:27 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 Tomas Heinrich 2014-05-08 18:56:29 UTC
*** Bug 1095784 has been marked as a duplicate of this bug. ***

Comment 5 Ken Snider 2014-07-03 23:11:30 UTC
Note: this does NOT just happen on VM's. I've had this happen on a bare-metal box as well.

Comment 6 Peter Bieringer 2014-12-30 08:35:35 UTC
I have this issue now in a VM running on Parallels Virtuozzo (Strato Germany Linux vServer).

# rpm -q rsyslog systemd
rsyslog-7.4.7-7.el7_0.x86_64
systemd-208-11.el7_0.5.x86_64

Issue can be reproduced on every reboot:

rm -f /var/lib/rsyslog/imjournal.state
service rsyslog restart

=> rsyslog is logging lines to /var/log/messages

reboot

/var/log/messages:

Dec 30 09:27:38 st2 systemd-logind: New session 276901 of user root.
Dec 30 09:27:59 st2 systemd-logind: New session 276902 of user root.
Dec 30 09:28:06 st2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="156" x-info="http://www.rsyslog.com"] exiting on signal 15.
Dec 30 09:29:02 st2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="156" x-info="http://www.rsyslog.com"] start

but nothing more

Also reboot related log messages are no longer logged

For me this looks very ugly and imho the priority must be increased to get this fixed somehow that rsyslog logging is reliable again.

Comment 7 Peter Bieringer 2014-12-30 08:48:30 UTC
Workaround:

cat <<END >>/etc/rc.d/rc.local
# BZ 1088021 workaround: remove imjournal.state and restart rsyslog
rm -f /var/lib/rsyslog/imjournal.state
/sbin/service rsyslog restart
END

Comment 8 Peter Bieringer 2015-01-03 16:35:00 UTC
(In reply to Peter Bieringer from comment #7)
> Workaround:
> 
> cat <<END >>/etc/rc.d/rc.local
> # BZ 1088021 workaround: remove imjournal.state and restart rsyslog
> rm -f /var/lib/rsyslog/imjournal.state
> /sbin/service rsyslog restart
> END

Even that workaround isn't reliable :-(

Looks like better workaround is to disable this imjournal input by editing /etc/rsyslog.conf and disabling "$OmitLocalLogging on":

#$OmitLocalLogging on

See also:
https://unix.stackexchange.com/questions/124942/rsyslog-not-logging

and looks like also that rsyslog developers do not recommend this input module at all:

http://www.rsyslog.com/doc/master/configuration/modules/imjournal.html

Comment 9 Barry Marson 2015-04-01 15:08:12 UTC
I just ran into this problem with a 7.1 GFS2 cluster config after having to force reboot each node.

Workaround was to remove the imjournal.state file and restart service

The question is ... What can be done to prevent this from ever happening.  It can be easily unnoticed until you have to look at certain log files.

thanks,
Barry

Comment 10 Matthew Woehlke 2015-06-19 19:04:10 UTC
I'm having apparently similar problems on a *brand new* F22 install. The only entries in /var/log/messages look like:

  Jun 19 14:30:51 gryphon rsyslogd-2027: imjournal: fscanf on state file
`/var/lib/rsyslog/imjournal.state' failed  [v8.8.0 try
http://www.rsyslog.com/e/2027 ]

The aforementioned file is empty (0 bytes).

Comment 11 Matthew Woehlke 2015-06-19 21:04:22 UTC
To the extent that it's interesting...

I'm having *SEVERE* issues with a brand new Dell Precision M6800. No login of any kind, as any user, works *at all*, and moreover, any such attempts tend to cause the machine to quit responding to some extent. Moreover, as has been the case with my other Precision M-series, Fedora is utterly incapable of rebooting or powering off the machine.

I'd open a bug for the first issue if I had the slightest clue what to file it against, but I'm not trying to hijack this report.

The point is that, as a result of those two issues, I'm turning the machine off the hard way *all the time*. This seems to result in the aforementioned file being truncated / corrupted / whatever - at any rate, ending up empty - quite frequently. Which means that the next time I boot, I effectively have no syslog.

This, IMHO, is a serious bug that needs to be fixed; the system needs to be robust against this file being corrupt and needs to recover from said condition in a way that the syslog still functions.

Comment 13 Tomas Heinrich 2015-07-16 15:40:58 UTC
(In reply to Matthew Woehlke from comment #11)
> To the extent that it's interesting...
> 
> I'm having *SEVERE* issues with a brand new Dell Precision M6800. No login
> of any kind, as any user, works *at all*, and moreover, any such attempts
> tend to cause the machine to quit responding to some extent. Moreover, as
> has been the case with my other Precision M-series, Fedora is utterly
> incapable of rebooting or powering off the machine.

That's a rather horrible experience.

> I'd open a bug for the first issue if I had the slightest clue what to file
> it against, but I'm not trying to hijack this report.

What you may give a shot is disabling rsyslog via systemctl and seeing whether the symptoms change.

> The point is that, as a result of those two issues, I'm turning the machine
> off the hard way *all the time*. This seems to result in the aforementioned
> file being truncated / corrupted / whatever - at any rate, ending up empty -
> quite frequently. Which means that the next time I boot, I effectively have
> no syslog.
> 
> This, IMHO, is a serious bug that needs to be fixed; the system needs to be
> robust against this file being corrupt and needs to recover from said
> condition in a way that the syslog still functions.

Yes, this area needs some improvements.
There are some changes in the newer upstream versions that enable ignoring all older messages if there is no (usable) state file.

Could you file a new bug for this? It's really unrelated to the systemd time-jump issue.

Comment 15 Matthew Woehlke 2015-08-03 17:34:26 UTC
(In reply to Tomas Heinrich from comment #13)
> (In reply to Matthew Woehlke from comment #11)
> > I'm having *SEVERE* issues with a brand new Dell Precision M6800. [...] 
> > I'd open a bug for the first issue if I had the slightest clue what to file
> > it against, but I'm not trying to hijack this report.
> 
> What you may give a shot is disabling rsyslog via systemctl and seeing
> whether the symptoms change.

On further debugging it appears to be a KMS problem.

> > The point is that, as a result of those two issues, I'm turning the machine
> > off the hard way *all the time*. This seems to result in the aforementioned
> > file being truncated / corrupted / whatever - at any rate, ending up empty -
> > quite frequently. Which means that the next time I boot, I effectively have
> > no syslog.
> 
> Could you file a new bug for this? It's really unrelated to the systemd
> time-jump issue.

Filed bug #1249752.

Comment 18 David Lang 2016-01-29 20:56:17 UTC
A question about the systemd time-jump bug. By any chance does it clear up when you catch up with the time that you were at before?

so if you set the clock back 10 min, does it start seeing data again after 10 min?

Also, can anyone confirm that retrieving log data via journalctl has the same problem?

Comment 25 Lukáš Nykrýn 2016-04-19 06:23:35 UTC
This is more a question for rsyslog guys, but if you don't use imjournal then you can't hit this bug.

Comment 26 David Lang 2016-04-19 07:00:22 UTC
I don't currently use any RedHat derived distro, or (so far) anything that uses systemd, so I don't have the easy ability to duplicate the problem.

The last big problem I know of between systemd and rsyslog ended up being something that also impacted the journalctl tool when reading logs. That's why I'm asking if the problem happens with that as well.

If this is the case, then it's a problem inside the systemd journal code.

Given that there were protections put in place against loops with the last problem, I could very easily see these protections detecting the 'impossible' condition of time going backwards and deciding to ignore the message. That's why I was asking if things start working again once time is moving forwards again.

If this is the case, then it's a big arrow pointing at the loop detection code.

Can someone who can duplicate the problem test these two conditions?

Comment 28 Tomas Heinrich 2016-04-19 10:16:27 UTC
(In reply to David Lang from comment #26)

David, I suspect Lukáš was actually replying to someone else - there are some private comments...

> I don't currently use any RedHat derived distro, or (so far) anything that
> uses systemd, so I don't have the easy ability to duplicate the problem.

I don't have a reproducer at hand either.

> Can someone who can duplicate the problem test these two conditions?

I'd be interested to know the outcome of this too.

Comment 29 David Lang 2016-04-20 17:37:05 UTC
hmm, we have a bug reported to rsyslog that could be related to this

https://github.com/rsyslog/rsyslog/issues/862

There are apparently ways that the rsyslog imjournal module is missing some file updates, but then it clears when the file rotates.

I could easily see the time going backwards on the system confusing this sort of thing.

Does the problem clear if the file is rotated?

Comment 31 Lukáš Nykrýn 2016-08-16 11:48:35 UTC
I think this is caused by https://github.com/systemd/systemd/issues/1752 Unfortunately we don't have a solution for that problem. For now, I would suggest to use rsylog without the imjournal plugin (not sure if there is a official kb article that describes that.).

Comment 32 Lukáš Nykrýn 2016-08-18 11:35:28 UTC
So we were able to find the problem.
The cursor for journal consist internally from some sections, but for now the most important one are bootid+monotonic time and real time. If we don't find precise log entry in journal that matches the cursor, we are trying to find the nearest location in journal file. First we try to find the boot with the matching id and then jump to the nearest monotonic time. This approach has the benefit that it is not affected by changes of a realtime clock. But if we don't find the matching boot id, that we need to use the realtime stamp and we just to the matching time.

And now lets imagine the following scenario:
1) During shutdown rsyslog stores the last know cursor
2) Machine reboots and sicne we have volatile jounral by default the journal files are deleted
3) Time jumps backwards
4) Rsyslog starts and is using the seek with the cursor from last boot
5) Journal is not able to find the boot id in logs since they were deleted, so it uses the realtime timestamp, but that time is set in future.
6) So now we have a cursor that aims "behind" the current log
7) And every call of the next functions ends with 0 since we have reached the end of the journal.

I am afraid that we can't do anything from the journal side here. I would suggest that we add some additional logic to im-journal.

Besides the cursor also store the current boot id and if it differs during the next start of rsyslog and sd_journal_has_persistent_files return 0. Then just skip to the beginning of journal.

What do you think?

Comment 33 Peter Portante 2016-08-18 13:32:41 UTC
Nice!

Sounds like a good fix.

I would post a github issue on rsyslog for this as well.

Comment 40 Radovan Sroka 2016-12-19 13:46:51 UTC
There were discussed many problems around rsyslog imjournal statefile. Problem with fscanf, invalid or corrupted statefile was resolved in 7.3 in rhbz1245194.
I tried to reproduce this bug as was described in first comment, set time to future one but, it works. Problem is when time jump backwards as lukas described in comment #32.

This is how I understand this bug, am I wrong ?

Comment 41 Jiří Vymazal 2017-01-02 11:25:08 UTC
linked upstream GitHub issue, from that problem appears to be on systemd side, not rsyslog/journalctl

Comment 44 Radovan Sroka 2017-02-03 13:26:19 UTC
Created attachment 1247439 [details]
patch(7.4)

This is resolving the situation when system is after reboot and boot_id doesn't match so cursor pointing into "future". Usually sd_journal_next jump to head of journal due to journal aproximation, but when system time goes backwards and cursor is still invalid, rsyslog stops logging. We use sd_journal_get_cursor to validate our cursor. When cursor is invalid we are trying to jump to the head of journal. This problem with time is not affecting persistent journal.

This patch is applicable on rsyslog v8.24.0.

Comment 54 errata-xmlrpc 2017-08-01 09:53:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2306