Description of problem:
This is an odd issue.
I installed a fresh RHEL7 VM (and recreated this issue later through a second VM). I was testing ntpd/ntpdate and chronyd, and over those tests I disabled both ntpd and chronyd, changed the time to two months in the future, then rebooted.
I noticed the time was still being corrected, and after examining this further realized it had to do with the host's time. During its reboot the VM was setting its time to that of the host machine.
Sure enough when I set the host's own time the VM had the same wrong time as the host. That's when I noticed /var/log/messages was no longer being updated.
It turns out nothing other than restarting the rsyslog service itself logs to files at that point. If I do so this gets logged:
Apr 15 16:39:39 rhel7time-dev rsyslogd-3000: sd_journal_get_cursor() failed: 'Cannot assign requested address'
Apr 15 16:39:39 rhel7time-dev rsyslogd: [origin software="rsyslogd" swVersion="7.4.2" x-pid="574" x-info="http://www.rsyslog.com"] exiting on signal 15.
Apr 15 16:39:39 rhel7time-dev rsyslogd: [origin software="rsyslogd" swVersion="7.4.2" x-pid="2117" x-info="http://www.rsyslog.com"] start
Otherwise nothing is logged to file, including logger.
If I comment out $OmitLocalLogging on in rsyslog.conf then file logging resumes (notice that until that point I hadn't changed rsyslog.conf).
Logging through journal is unaffected by all this. journalctl -b shows logging, including anything sent by logger.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Set up a fresh RHEL7 VM installation
2. Change the time on its host machine to a future one
3. Reboot the VM.
rsyslog logging to log files (including /var/log/messages) is now disabled.
rsyslog continues logging to log files. (it was logging fine before this)
These are the uncommented settings in my rsyslog.conf file (I took the comments out for brevity's sake)
When this issue occurs, you can delete /var/lib/rsyslog/imjournal.state and restart the daemon as a workaround.
rsyslog doesn't handle the date directly but only through the systemd API.
I've checked the code in imjournal a while ago and this looks like an issue in systemd.
For reference, see:
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
*** Bug 1095784 has been marked as a duplicate of this bug. ***
Note: this does NOT just happen on VM's. I've had this happen on a bare-metal box as well.
I have this issue now in a VM running on Parallels Virtuozzo (Strato Germany Linux vServer).
# rpm -q rsyslog systemd
Issue can be reproduced on every reboot:
rm -f /var/lib/rsyslog/imjournal.state
service rsyslog restart
=> rsyslog is logging lines to /var/log/messages
Dec 30 09:27:38 st2 systemd-logind: New session 276901 of user root.
Dec 30 09:27:59 st2 systemd-logind: New session 276902 of user root.
Dec 30 09:28:06 st2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="156" x-info="http://www.rsyslog.com"] exiting on signal 15.
Dec 30 09:29:02 st2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="156" x-info="http://www.rsyslog.com"] start
but nothing more
Also reboot related log messages are no longer logged
For me this looks very ugly and imho the priority must be increased to get this fixed somehow that rsyslog logging is reliable again.
cat <<END >>/etc/rc.d/rc.local
# BZ 1088021 workaround: remove imjournal.state and restart rsyslog
rm -f /var/lib/rsyslog/imjournal.state
/sbin/service rsyslog restart
(In reply to Peter Bieringer from comment #7)
> cat <<END >>/etc/rc.d/rc.local
> # BZ 1088021 workaround: remove imjournal.state and restart rsyslog
> rm -f /var/lib/rsyslog/imjournal.state
> /sbin/service rsyslog restart
Even that workaround isn't reliable :-(
Looks like better workaround is to disable this imjournal input by editing /etc/rsyslog.conf and disabling "$OmitLocalLogging on":
and looks like also that rsyslog developers do not recommend this input module at all:
I just ran into this problem with a 7.1 GFS2 cluster config after having to force reboot each node.
Workaround was to remove the imjournal.state file and restart service
The question is ... What can be done to prevent this from ever happening. It can be easily unnoticed until you have to look at certain log files.
I'm having apparently similar problems on a *brand new* F22 install. The only entries in /var/log/messages look like:
Jun 19 14:30:51 gryphon rsyslogd-2027: imjournal: fscanf on state file
`/var/lib/rsyslog/imjournal.state' failed [v8.8.0 try
The aforementioned file is empty (0 bytes).
To the extent that it's interesting...
I'm having *SEVERE* issues with a brand new Dell Precision M6800. No login of any kind, as any user, works *at all*, and moreover, any such attempts tend to cause the machine to quit responding to some extent. Moreover, as has been the case with my other Precision M-series, Fedora is utterly incapable of rebooting or powering off the machine.
I'd open a bug for the first issue if I had the slightest clue what to file it against, but I'm not trying to hijack this report.
The point is that, as a result of those two issues, I'm turning the machine off the hard way *all the time*. This seems to result in the aforementioned file being truncated / corrupted / whatever - at any rate, ending up empty - quite frequently. Which means that the next time I boot, I effectively have no syslog.
This, IMHO, is a serious bug that needs to be fixed; the system needs to be robust against this file being corrupt and needs to recover from said condition in a way that the syslog still functions.
(In reply to Matthew Woehlke from comment #11)
> To the extent that it's interesting...
> I'm having *SEVERE* issues with a brand new Dell Precision M6800. No login
> of any kind, as any user, works *at all*, and moreover, any such attempts
> tend to cause the machine to quit responding to some extent. Moreover, as
> has been the case with my other Precision M-series, Fedora is utterly
> incapable of rebooting or powering off the machine.
That's a rather horrible experience.
> I'd open a bug for the first issue if I had the slightest clue what to file
> it against, but I'm not trying to hijack this report.
What you may give a shot is disabling rsyslog via systemctl and seeing whether the symptoms change.
> The point is that, as a result of those two issues, I'm turning the machine
> off the hard way *all the time*. This seems to result in the aforementioned
> file being truncated / corrupted / whatever - at any rate, ending up empty -
> quite frequently. Which means that the next time I boot, I effectively have
> no syslog.
> This, IMHO, is a serious bug that needs to be fixed; the system needs to be
> robust against this file being corrupt and needs to recover from said
> condition in a way that the syslog still functions.
Yes, this area needs some improvements.
There are some changes in the newer upstream versions that enable ignoring all older messages if there is no (usable) state file.
Could you file a new bug for this? It's really unrelated to the systemd time-jump issue.
(In reply to Tomas Heinrich from comment #13)
> (In reply to Matthew Woehlke from comment #11)
> > I'm having *SEVERE* issues with a brand new Dell Precision M6800. [...]
> > I'd open a bug for the first issue if I had the slightest clue what to file
> > it against, but I'm not trying to hijack this report.
> What you may give a shot is disabling rsyslog via systemctl and seeing
> whether the symptoms change.
On further debugging it appears to be a KMS problem.
> > The point is that, as a result of those two issues, I'm turning the machine
> > off the hard way *all the time*. This seems to result in the aforementioned
> > file being truncated / corrupted / whatever - at any rate, ending up empty -
> > quite frequently. Which means that the next time I boot, I effectively have
> > no syslog.
> Could you file a new bug for this? It's really unrelated to the systemd
> time-jump issue.
Filed bug #1249752.
A question about the systemd time-jump bug. By any chance does it clear up when you catch up with the time that you were at before?
so if you set the clock back 10 min, does it start seeing data again after 10 min?
Also, can anyone confirm that retrieving log data via journalctl has the same problem?
This is more a question for rsyslog guys, but if you don't use imjournal then you can't hit this bug.
I don't currently use any RedHat derived distro, or (so far) anything that uses systemd, so I don't have the easy ability to duplicate the problem.
The last big problem I know of between systemd and rsyslog ended up being something that also impacted the journalctl tool when reading logs. That's why I'm asking if the problem happens with that as well.
If this is the case, then it's a problem inside the systemd journal code.
Given that there were protections put in place against loops with the last problem, I could very easily see these protections detecting the 'impossible' condition of time going backwards and deciding to ignore the message. That's why I was asking if things start working again once time is moving forwards again.
If this is the case, then it's a big arrow pointing at the loop detection code.
Can someone who can duplicate the problem test these two conditions?
(In reply to David Lang from comment #26)
David, I suspect Lukáš was actually replying to someone else - there are some private comments...
> I don't currently use any RedHat derived distro, or (so far) anything that
> uses systemd, so I don't have the easy ability to duplicate the problem.
I don't have a reproducer at hand either.
> Can someone who can duplicate the problem test these two conditions?
I'd be interested to know the outcome of this too.
hmm, we have a bug reported to rsyslog that could be related to this
There are apparently ways that the rsyslog imjournal module is missing some file updates, but then it clears when the file rotates.
I could easily see the time going backwards on the system confusing this sort of thing.
Does the problem clear if the file is rotated?
I think this is caused by https://github.com/systemd/systemd/issues/1752 Unfortunately we don't have a solution for that problem. For now, I would suggest to use rsylog without the imjournal plugin (not sure if there is a official kb article that describes that.).
So we were able to find the problem.
The cursor for journal consist internally from some sections, but for now the most important one are bootid+monotonic time and real time. If we don't find precise log entry in journal that matches the cursor, we are trying to find the nearest location in journal file. First we try to find the boot with the matching id and then jump to the nearest monotonic time. This approach has the benefit that it is not affected by changes of a realtime clock. But if we don't find the matching boot id, that we need to use the realtime stamp and we just to the matching time.
And now lets imagine the following scenario:
1) During shutdown rsyslog stores the last know cursor
2) Machine reboots and sicne we have volatile jounral by default the journal files are deleted
3) Time jumps backwards
4) Rsyslog starts and is using the seek with the cursor from last boot
5) Journal is not able to find the boot id in logs since they were deleted, so it uses the realtime timestamp, but that time is set in future.
6) So now we have a cursor that aims "behind" the current log
7) And every call of the next functions ends with 0 since we have reached the end of the journal.
I am afraid that we can't do anything from the journal side here. I would suggest that we add some additional logic to im-journal.
Besides the cursor also store the current boot id and if it differs during the next start of rsyslog and sd_journal_has_persistent_files return 0. Then just skip to the beginning of journal.
What do you think?
Sounds like a good fix.
I would post a github issue on rsyslog for this as well.
There were discussed many problems around rsyslog imjournal statefile. Problem with fscanf, invalid or corrupted statefile was resolved in 7.3 in rhbz1245194.
I tried to reproduce this bug as was described in first comment, set time to future one but, it works. Problem is when time jump backwards as lukas described in comment #32.
This is how I understand this bug, am I wrong ?
linked upstream GitHub issue, from that problem appears to be on systemd side, not rsyslog/journalctl
Created attachment 1247439 [details]
This is resolving the situation when system is after reboot and boot_id doesn't match so cursor pointing into "future". Usually sd_journal_next jump to head of journal due to journal aproximation, but when system time goes backwards and cursor is still invalid, rsyslog stops logging. We use sd_journal_get_cursor to validate our cursor. When cursor is invalid we are trying to jump to the head of journal. This problem with time is not affecting persistent journal.
This patch is applicable on rsyslog v8.24.0.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.