Bug 1120894
Summary: | syslog messages very badly out of order in /var/log/messages | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Paul DeStefano <prd-fedora> | ||||
Component: | systemd | Assignee: | systemd-maint | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 24 | CC: | dad, iand, imc, jlieskov, johannbg, jsynacek, lars, lkundrak, lnykryn, long, mah.darade, msekleta, pvrabec, rkudyba, s, systemd-maint, vpavlin, zbyszek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-08 11:47:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Paul DeStefano
2014-07-17 23:09:26 UTC
Wow, it's even worse than I thought. $ egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' /var/log/messages | awk '{print $1,$2}' | uniq Jul 1 Jun 10 Jul 1 Jun 10 Jun 11 Jun 12 Jun 13 Jul 1 Jan 1 Jul 1 Jan 1 May 8 May 9 May 10 May 11 May 12 May 13 May 14 Jan 1 May 14 Jan 1 May 14 May 15 Jan 1 Jul 1 May 19 May 20 May 21 Jul 1 Jan 1 Jul 1 Jan 1 May 8 May 9 May 10 Jan 1 May 10 May 11 May 12 May 13 Jan 1 May 13 May 14 May 15 Jan 1 Jul 1 May 19 May 20 May 21 Jul 1 Jul 2 Jul 3 Jul 4 Jul 5 Jul 6 Jul 7 Jul 8 Jul 9 Jul 10 Jul 11 Jul 12 Jul 13 Jul 14 Jul 15 Jul 16 Jul 2 Jul 3 Jul 4 Jul 16 Jul 17 Jul 2 Jul 3 Jul 17 Jul 3 Jul 17 Jul 3 Jul 4 Jul 17 Jul 2 Jul 3 Jul 4 Jul 17 (In reply to Paul DeStefano from comment #0) > Description of problem: > Messages that end up in /var/log/messages are often horribly out-of-order or > disordered. It's not individual messages, but vary large chunks of the log > are in the wrong order. It's so bad that, even after the log has been > rotated, messages from the previous log period can appear in the current log. rsyslog writes messages in the order it receives them and therefore can't enforce chronological order. With flat text files, there's not much you can do about that. In this sense, such behavior is not a bug. But since you actually see large blocks with a big delta, this looks more like one. On f20, rsyslog is just pulling messages from journald. Whatever gets passed, in whatever order, is logged. From sd_journal_next(3): The journal is strictly ordered by reception time, and hence advancing to the next entry guarantees that the entry then pointing to is later in time than then previous one, or has the same timestamp. I'll switch this to systemd. > Additional info: > I supposed I should be using journalctl, but I probably will not make time > to do that unless /var/log/messages is fully retired. If the present > syslogd/journald duality is causing this problem, I would like to know how > to make a clean break and move completely to journald such that I upgrades > will not break things. Perhaps that's not possible, but I'd like to know. If you want, you can just uninstall rsyslog and make sure journald is storing the logs in a permanent location, not just tmpfs. Look it up in the systemd docs. And if you run the same commands on 'journalctl' output instead of /var/log/messages? Thanks. Regarding traditional syslog behavior, yes, I understand that. I've noticed short delays in syslog data or out-of-order entries before under special conditions. But, as you say, this is very different. The journal looks good. $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk '{print $1,$2}' | uniq May 24 May 25 May 26 May 27 May 28 May 29 May 30 May 31 Jun 01 Jun 02 Jun 03 Jun 04 Jun 05 Jun 06 Jun 07 Jun 08 Jun 09 Jun 10 Jun 11 Jun 12 Jun 13 Jun 14 Jun 15 Jun 16 Jun 17 Jun 18 Jun 19 Jun 20 Jun 21 Jun 22 Jun 23 Jun 24 Jun 25 Jun 26 Jun 27 Jun 28 Jun 29 Jun 30 Jan 01 Jun 30 Jul 01 Jan 01 Jul 01 Jan 01 Jul 01 Jan 01 Jul 01 Jul 02 Jul 03 Jul 04 Jul 05 Jul 06 Jul 07 Jul 08 Jul 09 Jul 10 Jul 11 Jul 12 Jul 13 Jul 14 Jul 15 Jul 16 Jul 17 Jul 18 I was going to see how this bug turned out, but since the journal is on the table, I'll mention it. It looks to me like the journal, while properly ordered, is missing messages submitted by 'logger'. $ journalctl --user | egrep 'Jul 18.*Temp:' | sort --key=1M --key=2n --key=3 | awk '{print $1,$2,$3,$6,$7,$8}'|head -n 20 Jul 18 00:42:01 CPU Temp: +56.8°C Jul 18 01:24:01 CPU Temp: +54.4°C Jul 18 02:00:01 MB Temp: +62.0°C Jul 18 02:12:01 MB Temp: +59.0°C Jul 18 02:36:01 MB Temp: +59.0°C Jul 18 02:42:01 MB Temp: +59.0°C Jul 18 03:54:01 CPU Temp: +47.9°C Jul 18 04:15:01 MB Temp: +56.0°C Jul 18 04:30:01 MB Temp: +56.0°C Jul 18 04:48:01 MB Temp: +55.0°C Jul 18 04:51:01 MB Temp: +55.0°C Jul 18 04:57:01 MB Temp: +56.0°C Jul 18 05:06:01 MB Temp: +56.0°C Jul 18 05:18:01 MB Temp: +57.0°C Jul 18 05:27:01 MB Temp: +57.0°C Jul 18 06:09:01 CPU Temp: +52.1°C Jul 18 07:51:01 MB Temp: +55.0°C Jul 18 08:12:01 CPU Temp: +60.9°C Jul 18 08:15:01 CPU Temp: +49.9°C Jul 18 08:30:01 MB Temp: +59.0°C $ grep 'Jul 18.*Temp:' /var/log/messages| sort --key=1M --key=2n --key=3 | awk '{print $1,$2,$3,$6,$7,$8}'|head -n 20 Jul 18 00:00:01 CPU Temp: +50.0°C Jul 18 00:00:01 MB Temp: +59.0°C Jul 18 00:03:01 CPU Temp: +49.8°C Jul 18 00:03:01 MB Temp: +60.0°C Jul 18 00:06:01 CPU Temp: +53.0°C Jul 18 00:06:01 MB Temp: +59.0°C Jul 18 00:09:01 CPU Temp: +53.6°C Jul 18 00:09:01 MB Temp: +59.0°C Jul 18 00:12:01 CPU Temp: +50.9°C Jul 18 00:12:01 MB Temp: +59.0°C Jul 18 00:15:01 CPU Temp: +48.9°C Jul 18 00:15:01 MB Temp: +58.0°C Jul 18 00:18:01 CPU Temp: +49.1°C Jul 18 00:18:01 MB Temp: +59.0°C Jul 18 00:21:01 CPU Temp: +47.9°C Jul 18 00:21:01 MB Temp: +59.0°C Jul 18 00:24:02 CPU Temp: +48.0°C Jul 18 00:24:02 MB Temp: +59.0°C Jul 18 00:27:01 CPU Temp: +51.4°C Jul 18 00:27:01 MB Temp: +59.0°C What do you think? Could this be related? Is this a logger issue? I'm inclined to think it's not related, but just in case, that's what it looks like and you can decide. (In reply to Paul DeStefano from comment #4) > $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk > '{print $1,$2}' | uniq > ... > Jun 26 > Jun 27 > Jun 28 > Jun 29 > Jun 30 > Jan 01 <--- > Jun 30 > Jul 01 > Jan 01 <--- > Jul 01 > Jan 01 <--- > Jul 01 > Jan 01 <--- > Jul 01 > Jul 02 > Jul 03 > Jul 04 > Jul 05 > Jul 06 > Jul 07 > ... Woops!!! I almost missed it. Damn it. How embarrassing. Either my clock bat is dead or these January dates correspond to a BIOS upgrade. I see these particular messages are from chronyd and Tor saying the system clock time is way off. So these January dates can be ignored. I will check my battery. But, the out-of-order July and June dates are still wrong. (In reply to Paul DeStefano from comment #5) > It looks to me like the journal, while properly ordered, is missing messages > submitted by 'logger'. Most likely the messages from logger are not missing, but misplaced. For short lived processes that go away before journald gets to processing them, some metadata is missing and they might not show up in 'journalctl --user' output. It is a known problem. If they are missing from plain'journalctl' output, please file a separate bug. (In reply to Paul DeStefano from comment #6) > (In reply to Paul DeStefano from comment #4) > > $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk > > '{print $1,$2}' | uniq > > ... > > Jun 26 > > Jun 27 > > Jun 28 > > Jun 29 > > Jun 30 > > Jan 01 <--- > > Jun 30 > > Jul 01 > > Jan 01 <--- > > Jul 01 > > Jan 01 <--- > > Jul 01 > > Jan 01 <--- > > Jul 01 > > Jul 02 > > Jul 03 > > Jul 04 > > Jul 05 > > Jul 06 > > Jul 07 ... > I see these particular messages are > from chronyd and Tor saying the system clock time is way off. So these > January dates can be ignored. I will check my battery. Yeah, that looks OK. > But, the out-of-order July and June dates are still wrong. I had a look at https://github.com/rsyslog/rsyslog/blob/master/plugins/imjournal/imjournal.c. I don't see anything obviously wrong. One iffy thing is that it truncates the state file first, then writes to it. In theory this will cause a loss of state, but the system would have to crash at the right time, so I think this is unlikely to be responsible for your problems. To test things further, can you 1. run 'journalctl --verify'? 2. check whether the out-of-order runs in /var/log/messages look like rsyslog simply started pulling in logs from a point that is too old after a reboot, or is the pattern more complicated? Ah, okay.(In reply to Zbigniew Jędrzejewski-Szmek from comment #7) > Most likely the messages from logger are not missing, but misplaced. For > short lived processes that go away before journald gets to processing them, > some metadata is missing and they might not show up in 'journalctl --user' > output. It is a known problem. If they are missing from plain'journalctl' > output, please file a separate bug. Okay, journalctl does not show the same problem as journalctl --user. Thank you for explaining. I'm not so worried about that at the moment. > (In reply to Paul DeStefano from comment #6) > > (In reply to Paul DeStefano from comment #4) > > > $ journalctl | egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' | awk > > > '{print $1,$2}' | uniq > > > ... > > > Jun 26 > > > Jun 27 > > > Jun 28 > > > Jun 29 > > > Jun 30 > > > Jan 01 <--- > > > Jun 30 > > > Jul 01 > > > Jan 01 <--- > > > Jul 01 > > > Jan 01 <--- > > > Jul 01 > > > Jan 01 <--- > > > Jul 01 > > > Jul 02 > > > Jul 03 > > > Jul 04 > > > Jul 05 > > > Jul 06 > > > Jul 07 > ... > > I see these particular messages are > > from chronyd and Tor saying the system clock time is way off. So these > > January dates can be ignored. I will check my battery. > Yeah, that looks OK. I shutdown and check the voltage on the battery; it was good. And, while I forgot to check the time in the BIOS, I don't see any more Tor or chronyd complaints about the time. Plus, boot messages since then have reasonable times, so, I think it was not a problem with the battery. Could have been a BIOS upgrade, though. > > But, the out-of-order July and June dates are still wrong. > I had a look at > https://github.com/rsyslog/rsyslog/blob/master/plugins/imjournal/imjournal.c. > I don't see anything obviously wrong. One iffy thing is that it truncates > the state file first, then writes to it. In theory this will cause a loss of > state, but the system would have to crash at the right time, so I think this > is unlikely to be responsible for your problems. Okay. I wouldn't rule that out entirely; I've have some strange crashes since upgrading to F20. But, as you say, still unlikely. > To test things further, can you > 1. run 'journalctl --verify'? > 2. check whether the out-of-order runs in /var/log/messages look like > rsyslog simply started pulling in logs from a point that is too old after a > reboot, or is the pattern more complicated? 1. journalctl --verify Ooo, --verify output is so fancy! I like it. $ journalctl --verify PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013~ PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system~ PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-985.journal PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990 Missing entry array░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2% File corruption detected at /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990~:000000 (of 8388608 bytes, 0%). FAIL: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990~ (Bad message) PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013.journal PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system~ PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-990.journal PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system~ PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system~ PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system.journal Missing entry array░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2% File corruption detected at /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-997.journal:000000 (of 2162688 bytes, 0%). FAIL: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-997.journal (Bad message) PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013~ PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/system PASS: /var/log/journal/f723337b53e54d03907ea57c18c2c125/user-13013 So, that's that. I cannot interpret these errors for myself. Regarding /var/log/messages, yes, I would say it looks like rsyslog just pulled the wrong messages after a reboot. I will attach a large sample from last month's log file. You can see that it starts out with July 1, which makes sense. Then, at 17:42, it suddenly switches to Jun 10. It seems chronological, again, until Jun 13 02:38, when it switches back to Jul 1 17:42 for a few messages and then has boot messages starting at 17:44. This seems sort of like the "simple" scenario you describe. Created attachment 924620 [details]
sample out-of-order /var/log/messages
Hi, is this problem getting any attention? I'm also experiencing it in F21. I also have seen this on a colleague's machine... Notice that the misplaced messages are the kernel ones. Also, there are a lot of ^@ (NUL) characters on line 1575 of the attached log, that might also indicate a problem. This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. *** Bug 1231957 has been marked as a duplicate of this bug. *** Bumping per comment #10. I was preparing a "new" BZ and found this one, almost one year old, that mimics my observations. Except that the problem persists in Fedora 22, too. I observe that /var/log/messages is mangled at a reboot by appending old stuff from the journal, instead of properly merely adding new stuff subsequent to the reboot. So,instead of a new BZ, here's my ME TOO! The /var/log/messages log file is no longer monotonic. WTF!! We are promised (/var/log/README) that "syslog-ng or rsyslog may be installed side-by-side with the journal and will continue to function the way they always did." This is simply untrue. Instead of being a continuous log of events, the messages file now contains journal data that is regurgitated and repeated at each reboot. This makes it very hard to understand. I have one machine that spontaneously freezes about once a day. I suspect a hardware problem. The only clue so far is the gkrellm display on a remote machine that marks the freeze with a red blinker and the time. Trying to find an event - anything - that occurred at that moment, I discovered that /var/log/messges is non-monotonic. In rc.local I play a sound file which serves as a convenient marker for each reboot. Grepping the journal for this marker, I see 14 reboots since Jul 16, tidily in proper time sequence, with 4 occurring since the last logrotate on Jul 26. In /var/log/messages these four reboots are reported as 10 reboots, with 1 listed 4 times, 1 listed 3 times, 1 listed 2 times, 1 once. With the extraordinary verbosity now extant, this is makes the log files mostly useless. Please fix journal, systemd, rsyslog so that events are reported once and only once in /var/log/messages, as they once were. Incidentally, despite the extremely verbose reporting of every conceivable event, I have found nothing reported at the time of the system freeze. I'm still hunting... Hey David, Thanks for updating this. I'm glad I'm not the only one disturbed by it. I hadn't bothered to check into this, lately, but I checked after your report and found the same thing. If you care about logs, I'm betting you may also "enjoy" these bugs: bug 1227379 and bug 1160046. So, the situation is really messy, at present. Me too, seen this since upgrading from fc19 to fc21, and now to fc22. (using yum distro-sync) My rsyslog.conf is the stock one from each version as there have been major changes related to journald imports. This is truly bizarre. It appears that the log files are overwritten without truncation as new events appear in the middle until there are enough events that cause them to be appended. tail -f initially is useless. I've done journalctl --verify, removed any corrupt (there were a few, some old, some current) and journalctl --update-catalog and rebooted; still happening. It _looks_ like rsyslogd isn't issuing an open with O_APPEND; instead its issuing an O_WRONLY and writing from the top or something. In addition I've seen after reboot that nothing new appears in some logs for a long time... presumably rsyslogd is scanning journal logs for new stuff for a long time. The journal area is multiple gigabytes in size on many systems. General agreement here. I wanted to check something in /var/log/cron and found this: $ egrep 'Oct|Nov' /var/log/cron|awk '{print $1,$2}'|uniq Nov 8 Nov 9 Nov 10 Nov 11 Nov 12 Nov 13 Nov 14 Oct 29 Oct 30 Oct 31 Nov 1 Nov 2 Nov 14 Significantly, my laptop has just rebooted after I accidentally left the power unplugged, and the Nov 14 messages at the end are the first ones after the reboot. But a bunch of these messages are also in the log that was rotated on November 8: $ egrep 'Oct|Nov' /var/log/cron-20151108|awk '{print $1,$2}'|uniq Nov 1 Nov 2 Nov 3 Nov 4 Nov 5 Nov 6 Oct 29 Oct 30 Oct 31 Nov 1 Nov 2 Nov 7 Nov 8 And again November 7 was the last reboot. So it does look like the flat-file logs are pulling in a bunch of old messages in the wrong order at boot time, and they shouldn't have touched those old messages because they were already logged. The versions here are: rsyslog-8.8.0-3.fc22.x86_64 systemd-219-24.fc22.x86_64 I just upgraded to F23 and decided to check on this bug. Looking at my older logs, however, I see that nothing has been out of order since 6 December; I think that's when I did a previous upgrade/update on F22. Did this get silently fixed? Could others check and see if they still have this problem? On my fresh install of F23 I do not seem to be having this problem any longer (except for the out of order rsyslogd start messages which I can live with I guess): Dec 18 17:26:10 kite systemd: Stopping User Manager for UID 989... Dec 18 17:26:11 kite rsyslogd: [origin software="rsyslogd" swVersion="8.10.0" x-pid="743" x-info="http://www.rsyslog.com"] exiting on signal 15. Dec 21 08:41:59 kite rsyslogd: [origin software="rsyslogd" swVersion="8.10.0" x-pid="831" x-info="http://www.rsyslog.com"] start Dec 18 17:26:10 kite systemd: Unmounting RPC Pipe File System... Dec 18 17:26:10 kite avahi-daemon[745]: Got SIGTERM, quitting. (In reply to long from comment #20) > On my fresh install of F23 I do not seem to be having this problem any > longer (except for the out of order rsyslogd start messages which I can live > with I guess): One thought here... This is the chronological order: > Dec 18 17:26:10 kite systemd: Stopping User Manager for UID 989... 1 - messages are being generated... > Dec 18 17:26:11 kite rsyslogd: [origin software="rsyslogd" > swVersion="8.10.0" x-pid="743" x-info="http://www.rsyslog.com"] exiting on > signal 15. 2 - rsyslog shuts down 3 - more messages are being generated and stored in the journal... > Dec 21 08:41:59 kite rsyslogd: [origin software="rsyslogd" > swVersion="8.10.0" x-pid="831" x-info="http://www.rsyslog.com"] start 4 - rsyslog starts and generates the starting message with current time > Dec 18 17:26:10 kite systemd: Unmounting RPC Pipe File System... > Dec 18 17:26:10 kite avahi-daemon[745]: Got SIGTERM, quitting. 5 - rsyslog reads messages that were generated while it was down (point 3) from journald and uses the original timestamps The imuxsock module (originally reading messages from /dev/log) by default ignores timestamps from messages and uses the time of reception. Thus the messages are almost always in chronological order. imjournal, on the other hand, uses the timestamps from the journal. imuxsock has an option to change this behavior but imjournal doesn't. I think the current behavior is the better one of the two but i guess an option to change this behavior in imjournal wouldn't hurt. Wow, it's been three full releases since this was reported. Heinrich, thanks for the explanation. That sounds like the root cause, but also sounds like it might be like that for some reason we don't know. The answer to my question is no, it's not fixed: $ egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' /var/log/messages | awk '{print $1,$2}' | uniq Jun 1 Jun 2 Jun 3 Jun 4 Jun 5 Jun 6 Jun 7 Jun 8 Jun 9 Jun 10 Jun 11 Jun 12 Jun 13 Jun 14 Jun 15 Jun 16 Jun 17 Jun 18 Jun 19 Jun 20 Jun 21 Apr 9 Apr 10 Jun 21 Jun 22 Jun 23 The regular journalctl output seems file, still. NEEDINFO: Is there anything else I can do here? Is there still interest in fixeding this bug? This bug was moved against F24. Let's make sure everyone is on the same page. Paul can you please clarify what versions of systemd and rsyslog were you using last time you saw this issue? Also can you make sure that out of order messages are not start messages from rsyslog (I assume those being out of order is actually OK). $ egrep --no-filename --text '^(Jan|Feb|Mar|Apr|M ay|Jun|Jul|Aug|Sep|Nov|Dec)' $(ls /var/log/messages-*) | awk '{print $1,$2,$3}'|cut -c1-6 |uniq Jul 3 Jul 4 Jul 5 Jul 6 Jul 7 Jul 8 Jul 9 Jul 10 Jul 11 Jul 12 Jul 13 Jul 14 Jul 4 Jul 14 Jul 4 Jul 14 Jul 4 Jul 14 Jul 4 Jul 14 Jul 4 Jul 14 Jul 4 ... So, yes, it's still happening. Regarding the types of messages, while it includes boot messages, it starts with the shutdown messages immediately preceding some other *previous* reboot. So, it's not even the boot messages for the current boot, it's just a random chuck of old messages repeated in the new log. Those messages themselves are in order, but just repeated, which is just plain wrong. This all sounds very familiar...yeah, I already did all this troubleshooting. see comment 7 and comment 8. I thought someone had figured out why this happens, but I don't see that exactly in this bug. Jul 4 23:15:34 <hostname> systemd: Reached target Unmount All Filesystems. Jul 4 23:15:34 <hostname> systemd: Removed slice system-systemd\x2dcryptsetup.slice. Jul 4 23:15:34 <hostname> systemd: Reached target Shutdown. Jul 4 23:15:34 <hostname> systemd: Reached target Final Step. Jul 4 23:15:34 <hostname> systemd: Starting Reboot... Jul 4 23:15:34 <hostname> systemd: Shutting down. Jul 4 23:15:35 <hostname> systemd-shutdown[1]: Sending SIGTERM to remaining processes... Jul 4 23:15:35 <hostname> systemd-journald: Journal stopped Jul 14 09:00:04 <hostname> kernel: [ 69.592428] IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready Jul 4 23:18:20 <hostname> systemd-journald: Runtime journal (/run/log/journal/) is 8.0M, max 1.5G, 1.5G free. Jul 4 23:18:20 <hostname> kernel: Linux version 4.6.3-300.fc24.x86_64 (mockbuild.fedoraproject.org) (gcc version 6.1.1 20160510 (Red Hat 6.1.1-2) (GCC) ) #1 SMP Fri Jun 24 20:52:41 UTC 2016 Do you not see this in your logs? My test command is getting pretty refined; it should work without any effort for...everyone, I think. rsyslog-8.12.0-3.fc24.x86_64 systemd-229-9.fc24.x86_64 This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. egrep --no-filename --text '^(Jan|Feb|Mar|Apr|M ay|Jun|Jul|Aug|Sep|Nov|Dec)' $(ls /var/log/messages-*) | awk '{print $1,$2,$3}'|cut -c1-6 |uniq Jun 25 Jun 26 Jun 27 Jun 28 Jun 29 Jun 30 Jul 1 Jul 2 Jul 3 Jul 4 Jul 5 Jul 6 Jul 7 Jul 8 Jul 9 Jul 10 Jul 11 Jul 12 Jul 13 Jul 14 Jul 15 Jul 16 Jul 17 Jul 18 Jul 19 Jul 20 Jul 21 Jul 22 Jul 23 I'm no longer able to keep a whole year of logs, so I'm not 100 % sure this has been fixed. But, the 30 day data shows it hasn't recurred, lately. It happens most often after crashes, which I haven't had, lately. This issue was confirmed by several and I don't know why it still has a status of NEW. Would some others else please give their results, so it's not just me? On the other had, I am 100 % sure someone knows whether this bug was fixed or not. And it's not me. rsyslog-8.27.0-1.fc25.x86_64 systemd-231-17.fc25.x86_64 I've just checked a few fc24, fc25 and fc26 systems and can't see an issue anymore; all have been recently rebooted a few times and all are running latest code. Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Fedora 28, rsyslog-8.39.0-1.fc28.x86_64, systemd-238-10.git438ac26.fc28.x86_64 journalctl --verify, everything passes, and journalctl mirrors /var/log/messages. I do see a whole bunch of Dec 16 19:26:37 storm rsyslogd: imjournal: journal reloaded... [v8.39.0 try http://www.rsyslog.com/e/0 ] and then: Dec 16 19:26:37 storm rsyslogd: imjournal: loaded invalid cursor, seeking to the head of journal : Cannot assign requested address [v8.39.0 try http://www.rsyslog.com/e/2027 ] Dec 16 19:26:37 storm rsyslogd: imjournal: journal reloaded... [v8.39.0 try http://www.rsyslog.com/e/0 ] Oct 27 07:02:24 storm systemd[2462]: Stopped target Default. Oct 27 07:02:25 storm systemd[2462]: Stopped target Basic System. Then logs like these: Dec 14 17:10:21 storm systemd[8381]: Starting Exit the Session... Dec 14 17:10:21 storm systemd[8381]: Received SIGRTMIN+24 from PID 10899 (kill). Jan 8 03:08:09 storm systemd[27471]: Stopped target Default. Jan 8 03:08:10 storm systemd[27471]: Stopped target Basic System. Jan 8 03:08:10 storm systemd[27471]: Stopped target Paths. Jan 8 03:08:10 storm systemd[27471]: Stopped target Sockets. Jan 8 03:08:10 storm systemd[27471]: Closed D-Bus User Message Bus Socket. Jan 8 03:08:10 storm systemd[27471]: Reached target Shutdown. Jan 8 03:08:10 storm systemd[27471]: Starting Exit the Session... Jan 8 03:08:10 storm systemd[27471]: Stopped target Timers. Jan 8 03:08:10 storm systemd[27471]: Received SIGRTMIN+24 from PID 30660 (kill). May 15 07:03:38 storm systemd[7003]: Stopped target Default. May 15 07:04:02 storm systemd[24508]: Received SIGRTMIN+24 from PID 28198 (kill). Nov 8 13:26:54 storm systemd[30136]: Closed Multimedia System. Nov 8 13:26:54 storm systemd[30136]: Reached target Shutdown. Nov 8 13:26:54 storm systemd[30136]: Starting Exit the Session... Nov 8 13:26:54 storm systemd[30136]: Received SIGRTMIN+24 from PID 6795 (kill). Dec 16 19:26:11 storm systemd[7685]: Listening on Sound System. Dec 16 19:26:38 storm systemd[7685]: Listening on Multimedia System. So is this SIGRTMIN+24 a clue? # egrep '^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)' /var/log/messages | awk '{print $1,$2}' | uniq Dec 16 Dec 27 May 17 Nov 2 Dec 7 Dec 8 Dec 9 Dec 10 Dec 11 Dec 12 Dec 13 Dec 14 Jan 8 May 15 May 16 May 17 May 18 May 19 May 20 May 21 May 23 May 25 May 26 May 29 May 30 May 31 Jun 3 Jun 4 Jun 5 Jun 6 Jul 10 Nov 4 Nov 5 Nov 6 Nov 7 Nov 8 Dec 16 Dec 17 Dec 18 Dec 27 May 17 Nov 2 Dec 7 Dec 8 Dec 9 Dec 10 Dec 11 Dec 12 Dec 13 Dec 14 Jan 8 May 15 May 16 May 17 May 18 May 19 May 20 May 21 May 23 May 25 May 26 May 29 May 30 May 31 Jun 3 Jun 4 Jun 5 Jun 6 Jul 10 Nov 4 Nov 5 Nov 6 Nov 7 Nov 8 Dec 18 > So is this SIGRTMIN+24 a clue?
This message comes from the user instance of systemd and means that it got a request
to terminate. So that message generally means that the machine is shutting down, nothing
nefarious.
It's hard to say what is going on here. It seems that rsyslog for some reason gets an
invalid cursor, but it could be that the cursor is just too old, or that there's something
wrong with it, for example it wasn't stored properly. Without additional debugging it's
hard to tell. My suggestion would be to:
- look at journalctl output
- If it looks incorrect (out of order messages or very old messages followed by much
later messages with a huge time jump inbetween), file a new bug against systemd.
Attach the output from 'journalctl --verify'.
- If it looks correct, than the problem is mostly likely on rsyslog side. File a new
bug against rsyslog.
I created https://bugzilla.redhat.com/show_bug.cgi?id=1660933 but could https://bugzilla.redhat.com/show_bug.cgi?id=1615014 be related as that has the same error logs such as "sd_journal_get_cursor() failed: Cannot assign requested address"? Note journalctl appears normal and all passed on running verify. |