Bug 1120894 - syslog messages very badly out of order in /var/log/messages
Summary: syslog messages very badly out of order in /var/log/messages
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1231957 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-17 23:09 UTC by Paul DeStefano
Modified: 2018-12-19 16:12 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-08 11:47:33 UTC


Attachments (Terms of Use)
sample out-of-order /var/log/messages (232.03 KB, application/tomato-binary-file)
2014-08-06 21:37 UTC, Paul DeStefano
no flags Details

Description Paul DeStefano 2014-07-17 23:09:26 UTC
Description of problem:
Messages that end up in /var/log/messages are often horribly out-of-order or disordered.  It's not individual messages, but vary large chunks of the log are in the wrong order.  It's so bad that, even after the log has been rotated, messages from the previous log period can appear in the current log.

Version-Release number of selected component (if applicable):
rsyslog-7.4.8-1.fc20.1.x86_64

How reproducible:
Seems to have been this way for a while now, all of F20, I'm pretty sure.

Steps to Reproduce:
1. occurs with out any user action
2.
3.

Actual results:
This is the best way to show you what I'm talking about, I think:

$ grep ^Ju /var/log/messages | awk '{print $1,$2}' | uniq                           
Jul 1
Jun 10
Jul 1
Jun 10
Jun 11
Jun 12
Jun 13
Jul 1
Jul 2
Jul 3
Jul 4
Jul 5
Jul 6
Jul 7
Jul 8
Jul 9
Jul 10
Jul 11
Jul 12
Jul 13
Jul 14
Jul 15
Jul 16
Jul 2
Jul 3
Jul 4
Jul 16
Jul 17
Jul 2
Jul 3
Jul 17
Jul 3
Jul 17
Jul 3
Jul 4
Jul 17
Jul 2
Jul 3
Jul 4
Jul 17


Expected results:
Chronological-ish ordering of messages.

Additional info:
I supposed I should be using journalctl, but I probably will not make time to do that unless /var/log/messages is fully retired.  If the present syslogd/journald duality is causing this problem, I would like to know how to make a clean break and move completely to journald such that I upgrades will not break things.  Perhaps that's not possible, but I'd like to know.

Comment 1 Paul DeStefano 2014-07-17 23:26:27 UTC
Wow, it's even worse than I thought.

$ egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' /var/log/messages | awk '{print $1,$2}' | uniq
Jul 1
Jun 10
Jul 1
Jun 10
Jun 11
Jun 12
Jun 13
Jul 1
Jan 1
Jul 1
Jan 1
May 8
May 9
May 10
May 11
May 12
May 13
May 14
Jan 1
May 14
Jan 1
May 14
May 15
Jan 1
Jul 1
May 19
May 20
May 21
Jul 1
Jan 1
Jul 1
Jan 1
May 8
May 9
May 10
Jan 1
May 10
May 11
May 12
May 13
Jan 1
May 13
May 14
May 15
Jan 1
Jul 1
May 19
May 20
May 21
Jul 1
Jul 2
Jul 3
Jul 4
Jul 5
Jul 6
Jul 7
Jul 8
Jul 9
Jul 10
Jul 11
Jul 12
Jul 13
Jul 14
Jul 15
Jul 16
Jul 2
Jul 3
Jul 4
Jul 16
Jul 17
Jul 2
Jul 3
Jul 17
Jul 3
Jul 17
Jul 3
Jul 4
Jul 17
Jul 2
Jul 3
Jul 4
Jul 17

Comment 2 Tomas Heinrich 2014-07-18 09:21:13 UTC
(In reply to Paul DeStefano from comment #0)
> Description of problem:
> Messages that end up in /var/log/messages are often horribly out-of-order or
> disordered.  It's not individual messages, but vary large chunks of the log
> are in the wrong order.  It's so bad that, even after the log has been
> rotated, messages from the previous log period can appear in the current log.

rsyslog writes messages in the order it receives them and therefore can't enforce chronological order. With flat text files, there's not much you can do about that. In this sense, such behavior is not a bug. But since you actually see large blocks with a big delta, this looks more like one.

On f20, rsyslog is just pulling messages from journald. Whatever gets passed, in whatever order, is logged. 

From sd_journal_next(3):
The journal is strictly ordered by reception time, and hence advancing to the next entry guarantees that the entry then pointing to is later in time than then previous one, or has the same timestamp.

I'll switch this to systemd.

> Additional info:
> I supposed I should be using journalctl, but I probably will not make time
> to do that unless /var/log/messages is fully retired.  If the present
> syslogd/journald duality is causing this problem, I would like to know how
> to make a clean break and move completely to journald such that I upgrades
> will not break things.  Perhaps that's not possible, but I'd like to know.

If you want, you can just uninstall rsyslog and make sure journald is storing the logs in a permanent location, not just tmpfs. Look it up in the systemd docs.

Comment 3 Zbigniew Jędrzejewski-Szmek 2014-07-18 12:29:53 UTC
And if you run the same commands on 'journalctl' output instead of /var/log/messages?

Comment 4 Paul DeStefano 2014-07-18 23:50:19 UTC
Thanks.

Regarding traditional syslog behavior, yes, I understand that.  I've noticed short delays in syslog data or out-of-order entries before under special conditions.  But, as you say, this is very different.

The journal looks good.

$ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk '{print $1,$2}' | uniq
May 24
May 25
May 26
May 27
May 28
May 29
May 30
May 31
Jun 01
Jun 02
Jun 03
Jun 04
Jun 05
Jun 06
Jun 07
Jun 08
Jun 09
Jun 10
Jun 11
Jun 12
Jun 13
Jun 14
Jun 15
Jun 16
Jun 17
Jun 18
Jun 19
Jun 20
Jun 21
Jun 22
Jun 23
Jun 24
Jun 25
Jun 26
Jun 27
Jun 28
Jun 29
Jun 30
Jan 01
Jun 30
Jul 01
Jan 01
Jul 01
Jan 01
Jul 01
Jan 01
Jul 01
Jul 02
Jul 03
Jul 04
Jul 05
Jul 06
Jul 07
Jul 08
Jul 09
Jul 10
Jul 11
Jul 12
Jul 13
Jul 14
Jul 15
Jul 16
Jul 17
Jul 18

Comment 5 Paul DeStefano 2014-07-19 00:02:31 UTC
I was going to see how this bug turned out, but since the journal is on the table, I'll mention it.

It looks to me like the journal, while properly ordered, is missing messages submitted by 'logger'.


$ journalctl --user | egrep 'Jul 18.*Temp:' | sort --key=1M --key=2n --key=3 | awk '{print $1,$2,$3,$6,$7,$8}'|head -n 20                                                                
Jul 18 00:42:01 CPU Temp: +56.8°C
Jul 18 01:24:01 CPU Temp: +54.4°C
Jul 18 02:00:01 MB Temp: +62.0°C
Jul 18 02:12:01 MB Temp: +59.0°C
Jul 18 02:36:01 MB Temp: +59.0°C
Jul 18 02:42:01 MB Temp: +59.0°C
Jul 18 03:54:01 CPU Temp: +47.9°C
Jul 18 04:15:01 MB Temp: +56.0°C
Jul 18 04:30:01 MB Temp: +56.0°C
Jul 18 04:48:01 MB Temp: +55.0°C
Jul 18 04:51:01 MB Temp: +55.0°C
Jul 18 04:57:01 MB Temp: +56.0°C
Jul 18 05:06:01 MB Temp: +56.0°C
Jul 18 05:18:01 MB Temp: +57.0°C
Jul 18 05:27:01 MB Temp: +57.0°C
Jul 18 06:09:01 CPU Temp: +52.1°C
Jul 18 07:51:01 MB Temp: +55.0°C
Jul 18 08:12:01 CPU Temp: +60.9°C
Jul 18 08:15:01 CPU Temp: +49.9°C
Jul 18 08:30:01 MB Temp: +59.0°C

$ grep 'Jul 18.*Temp:' /var/log/messages| sort --key=1M --key=2n --key=3 | awk '{print $1,$2,$3,$6,$7,$8}'|head -n 20                                                                    
Jul 18 00:00:01 CPU Temp: +50.0°C
Jul 18 00:00:01 MB Temp: +59.0°C
Jul 18 00:03:01 CPU Temp: +49.8°C
Jul 18 00:03:01 MB Temp: +60.0°C
Jul 18 00:06:01 CPU Temp: +53.0°C
Jul 18 00:06:01 MB Temp: +59.0°C
Jul 18 00:09:01 CPU Temp: +53.6°C
Jul 18 00:09:01 MB Temp: +59.0°C
Jul 18 00:12:01 CPU Temp: +50.9°C
Jul 18 00:12:01 MB Temp: +59.0°C
Jul 18 00:15:01 CPU Temp: +48.9°C
Jul 18 00:15:01 MB Temp: +58.0°C
Jul 18 00:18:01 CPU Temp: +49.1°C
Jul 18 00:18:01 MB Temp: +59.0°C
Jul 18 00:21:01 CPU Temp: +47.9°C
Jul 18 00:21:01 MB Temp: +59.0°C
Jul 18 00:24:02 CPU Temp: +48.0°C
Jul 18 00:24:02 MB Temp: +59.0°C
Jul 18 00:27:01 CPU Temp: +51.4°C
Jul 18 00:27:01 MB Temp: +59.0°C

What do you think?  Could this be related?  Is this a logger issue?
I'm inclined to think it's not related, but just in case, that's what it looks like and you can decide.

Comment 6 Paul DeStefano 2014-07-19 00:30:36 UTC
(In reply to Paul DeStefano from comment #4)
> $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk
> '{print $1,$2}' | uniq
> ...
> Jun 26
> Jun 27
> Jun 28
> Jun 29
> Jun 30
> Jan 01   <---
> Jun 30
> Jul 01
> Jan 01   <--- 
> Jul 01
> Jan 01   <---
> Jul 01
> Jan 01   <---
> Jul 01
> Jul 02
> Jul 03
> Jul 04
> Jul 05
> Jul 06
> Jul 07
> ...

Woops!!! I almost missed it.

Damn it.  How embarrassing.  Either my clock bat is dead or these January dates correspond to a BIOS upgrade.  I see these particular messages are from chronyd and Tor saying the system clock time is way off.  So these January dates can be ignored.  I will check my battery.

But, the out-of-order July and June dates are still wrong.

Comment 7 Zbigniew Jędrzejewski-Szmek 2014-07-20 18:02:19 UTC
(In reply to Paul DeStefano from comment #5)
> It looks to me like the journal, while properly ordered, is missing messages
> submitted by 'logger'.
Most likely the messages from logger are not missing, but misplaced. For short lived processes that go away before journald gets to processing them, some metadata is missing and they might not show up in 'journalctl --user' output. It is a known problem. If they are missing from plain'journalctl' output, please file a separate bug.

(In reply to Paul DeStefano from comment #6)
> (In reply to Paul DeStefano from comment #4)
> > $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk
> > '{print $1,$2}' | uniq
> > ...
> > Jun 26
> > Jun 27
> > Jun 28
> > Jun 29
> > Jun 30
> > Jan 01   <---
> > Jun 30
> > Jul 01
> > Jan 01   <--- 
> > Jul 01
> > Jan 01   <---
> > Jul 01
> > Jan 01   <---
> > Jul 01
> > Jul 02
> > Jul 03
> > Jul 04
> > Jul 05
> > Jul 06
> > Jul 07
...
> I see these particular messages are
> from chronyd and Tor saying the system clock time is way off.  So these
> January dates can be ignored.  I will check my battery.
Yeah, that looks OK.

> But, the out-of-order July and June dates are still wrong.
I had a look at https://github.com/rsyslog/rsyslog/blob/master/plugins/imjournal/imjournal.c. I don't see anything obviously wrong. One iffy thing is that it truncates the state file first, then writes to it. In theory this will cause a loss of state, but the system would have to crash at the right time, so I think this is unlikely to be responsible for your problems.

To test things further, can you 
1. run 'journalctl --verify'?
2. check whether the out-of-order runs in /var/log/messages look like rsyslog simply started pulling in logs from a point that is too old after a reboot, or is the pattern more complicated?

Comment 8 Paul DeStefano 2014-08-06 21:34:48 UTC
Ah, okay.(In reply to Zbigniew Jędrzejewski-Szmek from comment #7)
> Most likely the messages from logger are not missing, but misplaced. For
> short lived processes that go away before journald gets to processing them,
> some metadata is missing and they might not show up in 'journalctl --user'
> output. It is a known problem. If they are missing from plain'journalctl'
> output, please file a separate bug.

Okay, journalctl does not show the same problem as journalctl --user.  Thank you for explaining.  I'm not so worried about that at the moment.


> (In reply to Paul DeStefano from comment #6)
> > (In reply to Paul DeStefano from comment #4)
> > > $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk
> > > '{print $1,$2}' | uniq
> > > ...
> > > Jun 26
> > > Jun 27
> > > Jun 28
> > > Jun 29
> > > Jun 30
> > > Jan 01   <---
> > > Jun 30
> > > Jul 01
> > > Jan 01   <--- 
> > > Jul 01
> > > Jan 01   <---
> > > Jul 01
> > > Jan 01   <---
> > > Jul 01
> > > Jul 02
> > > Jul 03
> > > Jul 04
> > > Jul 05
> > > Jul 06
> > > Jul 07
> ...
> > I see these particular messages are
> > from chronyd and Tor saying the system clock time is way off.  So these
> > January dates can be ignored.  I will check my battery.
> Yeah, that looks OK.

I shutdown and check the voltage on the battery; it was good.  And, while I forgot to check the time in the BIOS, I don't see any more Tor or chronyd complaints about the time.  Plus, boot messages since then have reasonable times, so, I think it was not a problem with the battery.  Could have been a BIOS upgrade, though.

> > But, the out-of-order July and June dates are still wrong.
> I had a look at
> https://github.com/rsyslog/rsyslog/blob/master/plugins/imjournal/imjournal.c.
> I don't see anything obviously wrong. One iffy thing is that it truncates
> the state file first, then writes to it. In theory this will cause a loss of
> state, but the system would have to crash at the right time, so I think this
> is unlikely to be responsible for your problems.

Okay.  I wouldn't rule that out entirely; I've have some strange crashes since upgrading to F20.  But, as you say, still unlikely.

> To test things further, can you
> 1. run 'journalctl --verify'?
> 2. check whether the out-of-order runs in /var/log/messages look like
> rsyslog simply started pulling in logs from a point that is too old after a
> reboot, or is the pattern more complicated?

1. journalctl --verify
Ooo, --verify output is so fancy!  I like it.

$ journalctl --verify
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990@ff0adb4e735e46e7abb285c59386fd77-000000000000069b-0004fe51ec473518.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@0004fe51ecfe7122-5f63cb6e0200ebba.journal~
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@9f7f63221f534bcda98b9d5135536b3f-000000000001f9f4-0004fba1766f4d71.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@0004ff864b260b48-05be0066d9cff42f.journal~
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@4f57bc869bc94cacb91692610f96dbd9-00000000000419ce-0004fced4d54ecc5.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@6786b36a8119429e80695f12d66b4c6b-0000000000000001-0004fd2b27151ea8.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@4b28038ece894bb9a786980ab6fbbe0e-000000000000bc64-0004fed267a2bf84.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@4f57bc869bc94cacb91692610f96dbd9-000000000001fa70-0004fba1c180b4bd.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-985.journal                
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990@fabcb9e14c2b42d9ad3b9352126687e1-00000000000453f3-0004fd1068063600.journal
Missing entry array░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   2%
File corruption detected at /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990@0004ff864b625163-6a690f66ac5ed76a.journal~:000000 (of 8388608 bytes, 0%).
FAIL: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990@0004ff864b625163-6a690f66ac5ed76a.journal~ (Bad message)
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@4f57bc869bc94cacb91692610f96dbd9-000000000002aec4-0004fc0fda24a7af.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013.journal              
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@9f7f63221f534bcda98b9d5135536b3f-0000000000000001-0004fb253803887d.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@c797b756e85447899cb17b0d22f7304a-000000000000bc78-0004fed272503a7d.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@0004fe65e025538d-4248b94e772a22b8.journal~
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990.journal                
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@0004fe51eb9e3898-a053ed20b9d17230.journal~
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@9f7f63221f534bcda98b9d5135536b3f-0000000000036439-0004fc7ebea4be58.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@4b28038ece894bb9a786980ab6fbbe0e-0000000000000001-0004fe65dc984439.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@4f57bc869bc94cacb91692610f96dbd9-000000000005a79f-0004fd75ed51d64c.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@6786b36a8119429e80695f12d66b4c6b-000000000005a76d-0004fd75cd2b60a4.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@0004fd2b2997f2b8-a8885a80394ebacb.journal~
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system.journal                  
Missing entry array░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   2%
File corruption detected at /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-997.journal:000000 (of 2162688 bytes, 0%).
FAIL: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-997.journal (Bad message)
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@0004fe65e3891351-62a73b4f06f49bba.journal~
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@c797b756e85447899cb17b0d22f7304a-0000000000000733-0004fe65e38aa3e7.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system@9f7f63221f534bcda98b9d5135536b3f-000000000002ae6d-0004fc0fa481f26b.journal
PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013@4f57bc869bc94cacb91692610f96dbd9-0000000000036518-0004fc7f3f6d6329.journal

So, that's that.  I cannot interpret these errors for myself.

Regarding /var/log/messages, yes, I would say it looks like rsyslog just pulled the wrong messages after a reboot.  I will attach a large sample from last month's log file.  You can see that it starts out with July 1, which makes sense.  Then, at 17:42, it suddenly switches to Jun 10.  It seems chronological, again, until Jun 13 02:38, when it switches back to Jul 1 17:42 for a few messages and then has boot messages starting at 17:44.  This seems sort of like the "simple" scenario you describe.

Comment 9 Paul DeStefano 2014-08-06 21:37:23 UTC
Created attachment 924620 [details]
sample out-of-order /var/log/messages

Comment 10 long 2015-02-11 14:47:59 UTC
Hi, is this problem getting any attention?  I'm also experiencing it in F21.

Comment 11 Jan Synacek 2015-03-26 07:24:10 UTC
I also have seen this on a colleague's machine...

Notice that the misplaced messages are the kernel ones. Also, there are a lot of ^@ (NUL) characters on line 1575 of the attached log, that might also indicate a problem.

Comment 12 Fedora End Of Life 2015-05-29 12:24:03 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Tomas Heinrich 2015-06-16 08:48:33 UTC
*** Bug 1231957 has been marked as a duplicate of this bug. ***

Comment 14 Tomas Heinrich 2015-06-16 08:50:27 UTC
Bumping per comment #10.

Comment 15 David A. De Graaf 2015-07-28 17:17:42 UTC
I was preparing a "new" BZ and found this one, almost one year old, that mimics my observations.  Except that the problem persists in Fedora 22, too.
I observe that /var/log/messages is mangled at a reboot by appending old stuff
from the journal, instead of properly merely adding new stuff subsequent to the reboot.

So,instead of a new BZ, here's my ME TOO!

The /var/log/messages log file is no longer monotonic.  WTF!!

We are promised (/var/log/README) that "syslog-ng or rsyslog may be
installed side-by-side with the journal and will continue to function
the way they always did."

This is simply untrue.

Instead of being a continuous log of events, the messages file now
contains journal data that is regurgitated and repeated at each reboot.
This makes it very hard to understand.

I have one machine that spontaneously freezes about once a day.  I
suspect a hardware problem.  The only clue so far is the gkrellm
display on a remote machine that marks the freeze with a red
blinker and the time.  Trying to find an event - anything - that
occurred at that moment, I discovered that /var/log/messges is
non-monotonic.

In rc.local I play a sound file which serves as a convenient marker for
each reboot.  Grepping the journal for this marker, I see 14 reboots
since Jul 16, tidily in proper time sequence, with 4 occurring since
the last logrotate on Jul 26.

In /var/log/messages these four reboots are reported as 10 reboots,
with 1 listed 4 times, 1 listed 3 times, 1 listed 2 times, 1 once.

With the extraordinary verbosity now extant, this is makes the log
files mostly useless.  Please fix journal, systemd, rsyslog so that
events are reported once and only once in /var/log/messages, as they
once were.

Incidentally, despite the extremely verbose reporting of every conceivable
event, I have found nothing reported at the time of the system freeze.
I'm still hunting...

Comment 16 Paul DeStefano 2015-07-29 17:57:38 UTC
Hey David,

Thanks for updating this.  I'm glad I'm not the only one disturbed by it.  I hadn't bothered to check into this, lately, but I checked after your report and found the same thing.

If you care about logs, I'm betting you may also "enjoy" these bugs: bug 1227379 and bug 1160046.  So, the situation is really messy, at present.

Comment 17 Ian Donaldson 2015-11-13 05:52:04 UTC
Me too, seen this since upgrading from fc19 to fc21, and now to fc22.
(using yum distro-sync)
My rsyslog.conf is the stock one from each version as there
have been major changes related to journald imports.

This is truly bizarre.  It appears that the log files are overwritten
without truncation as new events appear in the middle until there
are enough events that cause them to be appended.   tail -f initially is useless.

I've done journalctl --verify, removed any corrupt (there were a few, some old, some current) and journalctl --update-catalog and rebooted; still happening.

It _looks_ like rsyslogd isn't issuing an open with O_APPEND; instead
its issuing an O_WRONLY and writing from the top or something.

In addition I've seen after reboot that nothing new appears in some logs
for a long time... presumably rsyslogd is scanning journal logs for
new stuff for a long time.  The journal area is multiple gigabytes in size
on many systems.

Comment 18 Ian Collier 2015-11-14 10:14:24 UTC
General agreement here.  I wanted to check something in /var/log/cron and found this:

$ egrep 'Oct|Nov' /var/log/cron|awk '{print $1,$2}'|uniq 
Nov 8
Nov 9
Nov 10
Nov 11
Nov 12
Nov 13
Nov 14
Oct 29
Oct 30
Oct 31
Nov 1
Nov 2
Nov 14

Significantly, my laptop has just rebooted after I accidentally left the power unplugged, and the Nov 14 messages at the end are the first ones after the reboot.  But a bunch of these messages are also in the log that was rotated on November 8:

$ egrep 'Oct|Nov' /var/log/cron-20151108|awk '{print $1,$2}'|uniq 
Nov 1
Nov 2
Nov 3
Nov 4
Nov 5
Nov 6
Oct 29
Oct 30
Oct 31
Nov 1
Nov 2
Nov 7
Nov 8

And again November 7 was the last reboot.  So it does look like the flat-file logs are pulling in
a bunch of old messages in the wrong order at boot time, and they shouldn't have touched those
old messages because they were already logged.

The versions here are:
rsyslog-8.8.0-3.fc22.x86_64
systemd-219-24.fc22.x86_64

Comment 19 Paul DeStefano 2016-01-16 22:58:29 UTC
I just upgraded to F23 and decided to check on this bug.  Looking at my older logs, however, I see that nothing has been out of order since 6 December; I think that's when I did a previous upgrade/update on F22.  Did this get silently fixed?

Could others check and see if they still have this problem?

Comment 20 long 2016-01-19 19:06:05 UTC
On my fresh install of F23 I do not seem to be having this problem any longer (except for the out of order rsyslogd start messages which I can live with I guess):



Dec 18 17:26:10 kite systemd: Stopping User Manager for UID 989...
Dec 18 17:26:11 kite rsyslogd: [origin software="rsyslogd" swVersion="8.10.0" x-pid="743" x-info="http://www.rsyslog.com"] exiting on signal 15.
Dec 21 08:41:59 kite rsyslogd: [origin software="rsyslogd" swVersion="8.10.0" x-pid="831" x-info="http://www.rsyslog.com"] start
Dec 18 17:26:10 kite systemd: Unmounting RPC Pipe File System...
Dec 18 17:26:10 kite avahi-daemon[745]: Got SIGTERM, quitting.

Comment 21 Tomas Heinrich 2016-01-20 16:13:33 UTC
(In reply to long from comment #20)
> On my fresh install of F23 I do not seem to be having this problem any
> longer (except for the out of order rsyslogd start messages which I can live
> with I guess):

One thought here...

This is the chronological order:

> Dec 18 17:26:10 kite systemd: Stopping User Manager for UID 989...

1 - messages are being generated...

> Dec 18 17:26:11 kite rsyslogd: [origin software="rsyslogd"
> swVersion="8.10.0" x-pid="743" x-info="http://www.rsyslog.com"] exiting on
> signal 15.

2 - rsyslog shuts down
3 - more messages are being generated and stored in the journal...

> Dec 21 08:41:59 kite rsyslogd: [origin software="rsyslogd"
> swVersion="8.10.0" x-pid="831" x-info="http://www.rsyslog.com"] start

4 - rsyslog starts and generates the starting message with current time

> Dec 18 17:26:10 kite systemd: Unmounting RPC Pipe File System...
> Dec 18 17:26:10 kite avahi-daemon[745]: Got SIGTERM, quitting.

5 - rsyslog reads messages that were generated while it was down (point 3) from journald and uses the original timestamps


The imuxsock module (originally reading messages from /dev/log) by default ignores timestamps from messages and uses the time of reception. Thus the messages are almost always in chronological order.
imjournal, on the other hand, uses the timestamps from the journal.

imuxsock has an option to change this behavior but imjournal doesn't. I think the current behavior is the better one of the two but i guess an option to change this behavior in imjournal wouldn't hurt.

Comment 22 Paul DeStefano 2016-06-24 03:48:09 UTC
Wow, it's been three full releases since this was reported.

Heinrich, thanks for the explanation.  That sounds like the root cause, but also sounds like it might be like that for some reason we don't know.

The answer to my question is no, it's not fixed:

$ egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' /var/log/messages | awk '{print $1,$2}' | uniq
Jun 1
Jun 2
Jun 3
Jun 4
Jun 5
Jun 6
Jun 7
Jun 8
Jun 9
Jun 10
Jun 11
Jun 12
Jun 13
Jun 14
Jun 15
Jun 16
Jun 17
Jun 18
Jun 19
Jun 20
Jun 21
Apr 9
Apr 10
Jun 21
Jun 22
Jun 23

The regular journalctl output seems file, still.

NEEDINFO: Is there anything else I can do here?  Is there still interest in fixeding this bug?

Comment 23 Michal Sekletar 2016-08-04 09:07:19 UTC
This bug was moved against F24. Let's make sure everyone is on the same page.

Paul can you please clarify what versions of systemd and rsyslog were you using last time you saw this issue? Also can you make sure that out of order messages are not start messages from rsyslog (I assume those being out of order is actually OK).

Comment 24 Paul DeStefano 2016-08-05 07:45:01 UTC
$ egrep --no-filename --text '^(Jan|Feb|Mar|Apr|M
ay|Jun|Jul|Aug|Sep|Nov|Dec)' $(ls /var/log/messages-*) | awk '{print $1,$2,$3}'|cut -c1-6 |uniq
 
Jul 3 
Jul 4 
Jul 5 
Jul 6 
Jul 7 
Jul 8 
Jul 9 
Jul 10
Jul 11
Jul 12
Jul 13
Jul 14
Jul 4 
Jul 14
Jul 4 
Jul 14
Jul 4 
Jul 14
Jul 4 
Jul 14
Jul 4 
Jul 14
Jul 4 
...

So, yes, it's still happening.

Regarding the types of messages, while it includes boot messages, it starts with the shutdown messages immediately preceding some other *previous* reboot.  So, it's not even the boot messages for the current boot, it's just a random chuck of old messages repeated in the new log.  Those messages themselves are in order, but just repeated, which is just plain wrong.  This all sounds very familiar...yeah, I already did all this troubleshooting.  see comment 7 and comment 8.  I thought someone had figured out why this happens, but I don't see that exactly in this bug.

Jul  4 23:15:34 <hostname> systemd: Reached target Unmount All Filesystems.
Jul  4 23:15:34 <hostname> systemd: Removed slice system-systemd\x2dcryptsetup.slice.
Jul  4 23:15:34 <hostname> systemd: Reached target Shutdown.
Jul  4 23:15:34 <hostname> systemd: Reached target Final Step.
Jul  4 23:15:34 <hostname> systemd: Starting Reboot...
Jul  4 23:15:34 <hostname> systemd: Shutting down.
Jul  4 23:15:35 <hostname> systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Jul  4 23:15:35 <hostname> systemd-journald: Journal stopped
Jul 14 09:00:04 <hostname> kernel: [   69.592428] IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
Jul  4 23:18:20 <hostname> systemd-journald: Runtime journal (/run/log/journal/) is 8.0M, max 1.5G, 1.5G free.
Jul  4 23:18:20 <hostname> kernel: Linux version 4.6.3-300.fc24.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 6.1.1 20160510 (Red Hat 6.1.1-2) (GCC) ) #1 SMP Fri Jun 24 20:52:41 UTC 2016

Do you not see this in your logs?  My test command is getting pretty refined; it should work without any effort for...everyone, I think.

Comment 25 Paul DeStefano 2016-08-05 07:46:44 UTC
rsyslog-8.12.0-3.fc24.x86_64
systemd-229-9.fc24.x86_64

Comment 26 Fedora End Of Life 2017-07-25 18:41:07 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 27 Paul DeStefano 2017-07-26 00:13:59 UTC
egrep --no-filename --text '^(Jan|Feb|Mar|Apr|M
ay|Jun|Jul|Aug|Sep|Nov|Dec)' $(ls /var/log/messages-*) | awk '{print $1,$2,$3}'|cut -c1-6 |uniq

Jun 25
Jun 26
Jun 27
Jun 28
Jun 29
Jun 30
Jul 1 
Jul 2 
Jul 3 
Jul 4 
Jul 5 
Jul 6 
Jul 7 
Jul 8 
Jul 9 
Jul 10
Jul 11
Jul 12
Jul 13
Jul 14
Jul 15
Jul 16
Jul 17
Jul 18
Jul 19
Jul 20
Jul 21
Jul 22
Jul 23


I'm no longer able to keep a whole year of logs, so I'm not 100 % sure this has been fixed.  But, the 30 day data shows it hasn't recurred, lately.  It happens most often after crashes, which I haven't had, lately.

This issue was confirmed by several and I don't know why it still has a status of NEW.  Would some others else please give their results, so it's not just me?

On the other had, I am 100 % sure someone knows whether this bug was fixed or not.  And it's not me.

Comment 28 Paul DeStefano 2017-07-26 00:18:21 UTC
rsyslog-8.27.0-1.fc25.x86_64
systemd-231-17.fc25.x86_64

Comment 29 Ian Donaldson 2017-07-26 00:22:04 UTC
I've just checked a few fc24, fc25 and fc26 systems and can't see an issue anymore;
all have been recently rebooted a few times and all are running latest code.

Comment 30 Fedora End Of Life 2017-08-08 11:47:33 UTC
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 31 RobbieTheK 2018-12-18 21:35:56 UTC
Fedora 28, rsyslog-8.39.0-1.fc28.x86_64, systemd-238-10.git438ac26.fc28.x86_64

journalctl --verify, everything passes, and journalctl mirrors /var/log/messages.

I do see a whole bunch of Dec 16 19:26:37 storm rsyslogd: imjournal: journal reloaded... [v8.39.0 try http://www.rsyslog.com/e/0 ]

and then:
Dec 16 19:26:37 storm rsyslogd: imjournal: loaded invalid cursor, seeking to the head of journal : Cannot assign requested address [v8.39.0 try http://www.rsyslog.com/e/2027 ]
Dec 16 19:26:37 storm rsyslogd: imjournal: journal reloaded... [v8.39.0 try http://www.rsyslog.com/e/0 ]
Oct 27 07:02:24 storm systemd[2462]: Stopped target Default.
Oct 27 07:02:25 storm systemd[2462]: Stopped target Basic System.

Then logs like these:
Dec 14 17:10:21 storm systemd[8381]: Starting Exit the Session...
Dec 14 17:10:21 storm systemd[8381]: Received SIGRTMIN+24 from PID 10899 (kill).
Jan  8 03:08:09 storm systemd[27471]: Stopped target Default.
Jan  8 03:08:10 storm systemd[27471]: Stopped target Basic System.
Jan  8 03:08:10 storm systemd[27471]: Stopped target Paths.
Jan  8 03:08:10 storm systemd[27471]: Stopped target Sockets.
Jan  8 03:08:10 storm systemd[27471]: Closed D-Bus User Message Bus Socket.
Jan  8 03:08:10 storm systemd[27471]: Reached target Shutdown.
Jan  8 03:08:10 storm systemd[27471]: Starting Exit the Session...
Jan  8 03:08:10 storm systemd[27471]: Stopped target Timers.
Jan  8 03:08:10 storm systemd[27471]: Received SIGRTMIN+24 from PID 30660 (kill).
May 15 07:03:38 storm systemd[7003]: Stopped target Default.
May 15 07:04:02 storm systemd[24508]: Received SIGRTMIN+24 from PID 28198 (kill).

Nov  8 13:26:54 storm systemd[30136]: Closed Multimedia System.
Nov  8 13:26:54 storm systemd[30136]: Reached target Shutdown.
Nov  8 13:26:54 storm systemd[30136]: Starting Exit the Session...
Nov  8 13:26:54 storm systemd[30136]: Received SIGRTMIN+24 from PID 6795 (kill).
Dec 16 19:26:11 storm systemd[7685]: Listening on Sound System.
Dec 16 19:26:38 storm systemd[7685]: Listening on Multimedia System.

So is this SIGRTMIN+24 a clue?

# egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' /var/log/messages | awk '{print $1,$2}' | uniq
Dec 16
Dec 27
May 17
Nov 2
Dec 7
Dec 8
Dec 9
Dec 10
Dec 11
Dec 12
Dec 13
Dec 14
Jan 8
May 15
May 16
May 17
May 18
May 19
May 20
May 21
May 23
May 25
May 26
May 29
May 30
May 31
Jun 3
Jun 4
Jun 5
Jun 6
Jul 10
Nov 4
Nov 5
Nov 6
Nov 7
Nov 8
Dec 16
Dec 17
Dec 18
Dec 27
May 17
Nov 2
Dec 7
Dec 8
Dec 9
Dec 10
Dec 11
Dec 12
Dec 13
Dec 14
Jan 8
May 15
May 16
May 17
May 18
May 19
May 20
May 21
May 23
May 25
May 26
May 29
May 30
May 31
Jun 3
Jun 4
Jun 5
Jun 6
Jul 10
Nov 4
Nov 5
Nov 6
Nov 7
Nov 8
Dec 18

Comment 32 Zbigniew Jędrzejewski-Szmek 2018-12-19 07:24:36 UTC
> So is this SIGRTMIN+24 a clue?
This message comes from the user instance of systemd and means that it got a request
to terminate. So that message generally means that the machine is shutting down, nothing
nefarious.

It's hard to say what is going on here. It seems that rsyslog for some reason gets an
invalid cursor, but it could be that the cursor is just too old, or that there's something
wrong with it, for example it wasn't stored properly. Without additional debugging it's
hard to tell. My suggestion would be to:
- look at journalctl output
- If it looks incorrect (out of order messages or very old messages followed by much
  later messages with a huge time jump inbetween), file a new bug against systemd.
  Attach the output from 'journalctl --verify'.
- If it looks correct, than the problem is mostly likely on rsyslog side. File a new
  bug against rsyslog.

Comment 33 RobbieTheK 2018-12-19 16:12:02 UTC
I created https://bugzilla.redhat.com/show_bug.cgi?id=1660933 but could https://bugzilla.redhat.com/show_bug.cgi?id=1615014 be related as that has the same error logs such as "sd_journal_get_cursor() failed: Cannot assign requested address"?

Note journalctl appears normal and all passed on running verify.


Note You need to log in before you can comment on or make changes to this bug.