Bug 1294002 - failed to look up boot -1 no such boot ID in journal
failed to look up boot -1 no such boot ID in journal
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Zbigniew Jędrzejewski-Szmek
Fedora Extras Quality Assurance
: 1295612 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2015-12-23 20:21 EST by Chris Murphy
Modified: 2016-12-04 23 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
journals.tar.gz (2.76 MB, application/x-gzip)
2015-12-23 20:21 EST, Chris Murphy
no flags Details
journals systemd-231-10 (2.16 MB, application/x-gzip)
2016-12-04 23:45 EST, Chris Murphy
no flags Details

  None (edit)
Description Chris Murphy 2015-12-23 20:21:36 EST
Created attachment 1109064 [details]

Description of problem:

Using -b switch I can't see previous boots, I just have to omit the -b switch and then scroll through a long pile to find what I'm looking for.

[root@f23a ~]# journalctl --list-boots
-5 eb5bc3dfefb849a5935ace8474373181 Sun 2015-12-20 03:44:32 MST—Sun 2015-12-20 03:47:58 MST
-4 22feafe74d6a4b9d940459cecc96dbf4 Sun 2015-12-20 21:25:41 MST—Mon 2015-12-21 00:13:10 MST
-3 99f2036a416446d8a2f0f49e2ca6d05a Mon 2015-12-21 01:03:24 MST—Mon 2015-12-21 01:04:13 MST
-2 78ddbd9a3e5244e29a230dc0b15ad2a5 Tue 2015-12-22 12:07:02 MST—Tue 2015-12-22 22:23:07 MST
-1 6b67a7de83a54b9d8be283dcccce9eef Wed 2015-12-23 09:22:54 MST—Wed 2015-12-23 13:29:58 MST
 0 e9db32cba47c45029752b628b6379097 Wed 2015-12-23 17:46:53 MST—Wed 2015-12-23 17:56:52 MST

So journalctl -b-1 should show the last boot but it fails.

Version-Release number of selected component (if applicable):

How reproducible:
With the journal files I currently have, always.
If I erase them and start clean, and never have any crashes, then -b-1, -b-2 etc will work OK. But as soon as there's an unclean shutdown, there is some kind of journal problem always. It might be this problem as reported, or it might be journal entries out of order.

Steps to Reproduce:
1. journalctl -b-1

Actual results:

Failed to look up boot -1: No such boot ID in journal

Expected results:

Since other options appear to find the journal entries (they aren't completely corrupted, I can find them other ways), this one ought to work also so there must be some cross indexing or hint that broke in these log files.

Additional info:

journalctl by itself show all the boots in one long list that I can appear to scroll through. Also this seems to work:

[root@f23a ~]# journalctl --since 2015-12-23

You can see at the end that something (I think it was lvmthinp) soaked all the memory and resulted in a bunch of oom killers that never worked enough to get back control so the powerkey was used (that's the "crash" in this case).

The journal logs are on btrfs on a hard drive partition, default mount options, nothing fancy. Btrfs has no complaints on mount, scrub comes up clean also.
Comment 1 Chris Murphy 2015-12-23 21:07:01 EST
HUH. This is vaguely interesting, and also not expected behavior.

If I use 'journalctl --since 2015-12-23' it ends with:

Dec 23 17:42:01 f23a.localdomain systemd-logind[744]: Power key pressed.
lines 11375-11429/11429 (END)

But this is not the end of the journal. There is another boot after this which I can access with 'journalctl -b' and starts with

Dec 23 17:46:53 localhost systemd-journal[187]: Runtime journal is using 8.0M (max allowed 193.6M, trying to leave 290.4M free of 1.8G available → current limit 193.6M).

I would expect --since 2015-12-23 would eventually roll into this boot, but instead it ends at a previous boot.

And --list-boots lists the current boot, which --since isn't rolling into.
Comment 2 Zbigniew Jędrzejewski-Szmek 2015-12-25 01:05:53 EST
Yeah, the root issue seems to be this:

Can't iterate through d98f033569044c0887c819c792012e6c/system@0005279a2d32b4c9-a2b3d2cebfe3d4df.journal~, ignoring: Bad message
File d98f033569044c0887c819c792012e6c/system@0005279a2d32b4c9-a2b3d2cebfe3d4df.journal~ removed.
Failed to look up boot -1: No such boot ID in journal

So we find the journal file we want to show, but than we refuse to show it because it is corrupted. Instead, we should show as much as possible. This would help in various other cases, not only here.

First thing is to fix journalctl to actually allow looking at files:
Comment 3 Chris Murphy 2016-08-10 10:51:37 EDT
This is still a problem in systemd-229-9.fc24.x86_64 where -b-1, -b-2, -b-3, -b-4 are showing maybe a dozen lines rather than showing the actual start to finish boot that includes all the bios and memory initialization stuff from the kernel, initramfs unpacking, systemd and services starting.

For example -b-4
[root@f24s ~]# journalctl -b-4
-- Logs begin at Sat 2016-01-30 22:04:43 MST, end at Wed 2016-08-10 08:40:16 MDT. --
Aug 01 22:30:46 f24s systemd[1271]: Reached target Timers.
Aug 01 22:30:46 f24s systemd[1271]: Starting D-Bus User Message Bus Socket.
Aug 01 22:30:46 f24s systemd[1271]: Reached target Paths.
Aug 01 22:30:47 f24s systemd[1271]: Listening on D-Bus User Message Bus Socket.
Aug 01 22:30:47 f24s systemd[1271]: Reached target Sockets.
Aug 01 22:30:47 f24s systemd[1271]: Reached target Basic System.
Aug 01 22:30:47 f24s systemd[1271]: Reached target Default.
Aug 01 22:30:47 f24s systemd[1271]: Startup finished in 146ms.
Aug 01 22:31:20 f24s systemd[1271]: Stopped target Default.
Aug 01 22:31:20 f24s systemd[1271]: Stopped target Basic System.
Aug 01 22:31:20 f24s systemd[1271]: Stopped target Paths.
Aug 01 22:31:20 f24s systemd[1271]: Stopped target Sockets.
Aug 01 22:31:20 f24s systemd[1271]: Closed D-Bus User Message Bus Socket.
Aug 01 22:31:20 f24s systemd[1271]: Reached target Shutdown.
Aug 01 22:31:20 f24s systemd[1271]: Starting Exit the Session...
Aug 01 22:31:20 f24s systemd[1271]: Stopped target Timers.
Aug 01 22:31:21 f24s systemd[1271]: Received SIGRTMIN+24 from PID 2234 (kill).

That's it, and it's the wrong date.

journalctl --list-boots

 -4 c9d3d01eb9a9483aa9f295bc22b96415 Wed 2016-07-13 11:28:18 MDT—Thu 2016-07-14 21:23:55 MDT

Boot -4 is  Jul 13, two weeks before the information -b-4 switch is showing. So it's still totally busted near as I can tel.

If I use # journalctl --since="2016-07-13 11:28:18" copy/pasting directly from the list-boots output, I get exactly what I want. So the info I want is clearly in the journal files, it's clearly accessible by --since, but not by -b.
Comment 4 Zbigniew Jędrzejewski-Szmek 2016-08-10 11:00:20 EDT
It's even worse with current systemd git :(

$ ./journalctl -D /var/tmp/d98f033569044c0887c819c792012e6c --list-boots
-9 eb5bc3dfefb849a5935ace8474373181 Sun 2015-12-20 05:44:32 EST—Sun 2015-12-20 05:47:58 EST
-8 22feafe74d6a4b9d940459cecc96dbf4 Sun 2015-12-20 23:25:41 EST—Mon 2015-12-21 02:13:10 EST
-7 99f2036a416446d8a2f0f49e2ca6d05a Mon 2015-12-21 03:03:24 EST—Mon 2015-12-21 03:04:13 EST
-6 78ddbd9a3e5244e29a230dc0b15ad2a5 Tue 2015-12-22 14:07:02 EST—Wed 2015-12-23 00:23:07 EST
-5 6b67a7de83a54b9d8be283dcccce9eef Wed 2015-12-23 11:22:54 EST—Wed 2015-12-23 15:29:58 EST
-4 ddd6ad5f748d40728d5326a471c17ed0 Wed 2015-12-23 15:30:21 EST—Wed 2015-12-23 16:27:43 EST
-3 16ed3611de4743369a53373a062eb7dd Wed 2015-12-23 16:28:06 EST—Wed 2015-12-23 16:29:58 EST
-2 96f9e3146557409c9f585f0e11c3f5ee Wed 2015-12-23 16:35:33 EST—Wed 2015-12-23 17:22:55 EST
-1 f4d7c894b77545f0ba844bcecb2c187b Wed 2015-12-23 17:23:19 EST—Wed 2015-12-23 17:25:19 EST
 0 e9db32cba47c45029752b628b6379097 Wed 2015-12-23 19:46:53 EST—Wed 2015-12-23 20:02:00 EST
$ ./journalctl -D /var/tmp/d98f033569044c0887c819c792012e6c -b -1
Data from the specified boot (-1) is not available: No such boot ID in journal
$ ./journalctl -D /var/tmp/d98f033569044c0887c819c792012e6c -b -2
Data from the specified boot (-2) is not available: No such boot ID in journal
Comment 5 Chris Murphy 2016-08-10 11:42:35 EDT
Well in some ways it's a better failure to return nothing with a bogus error, rather than returning incomplete data from a totally different boot.
Comment 6 Chris Murphy 2016-08-10 11:44:31 EDT
*** Bug 1295612 has been marked as a duplicate of this bug. ***
Comment 7 Fedora End Of Life 2016-11-24 09:29:49 EST
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 8 Chris Murphy 2016-12-04 23:40:11 EST
Still a problem with systemd-231-10.fc25.x86_64.
Comment 9 Chris Murphy 2016-12-04 23:45 EST
Created attachment 1227967 [details]
journals systemd-231-10

[chris@f25s ~]$ sudo journalctl --list-boots
-2 d6b4e63d8a19482eb9e4f7383e191bb3 Fri 2016-12-02 22:53:11 MST—Sat 2016-12-03 00:01:04 MST
-1 44de0ff7a2674f9d9dc7533d49d0c0f5 Sat 2016-12-03 00:20:09 MST—Sun 2016-12-04 21:33:26 MST
 0 bbc2079f1cc448469b9a8a4839af1c59 Sun 2016-12-04 21:34:47 MST—Sun 2016-12-04 21:38:03 MST

# journalctl --verify
e18fd8: Unused data (entry_offset==0)                                                                                                                              
e18fd8: Invalid hash (920d2338a477f20 vs. edb8b54ac6b13cdc                                                                                                         
e18fd8: Invalid object contents: Bad message                                                                                                                       
File corruption detected at /var/log/journal/507442eec2af4c03ae5602ccbb5417b3/system@000542e1cf2a81ac-516939278f38daaa.journal~:e18fd8 (of 16777216 bytes, 88%).   
FAIL: /var/log/journal/507442eec2af4c03ae5602ccbb5417b3/system@000542e1cf2a81ac-516939278f38daaa.journal~ (Bad message)
PASS: /var/log/journal/507442eec2af4c03ae5602ccbb5417b3/system.journal                                                                                             
PASS: /var/log/journal/507442eec2af4c03ae5602ccbb5417b3/user-1000@000542e1d0c3eff9-0384d8857ae523fd.journal~                                                       
PASS: /var/log/journal/507442eec2af4c03ae5602ccbb5417b3/user-1000.journal                                                                                          

There are only four journal files. I know that boots -2 and -1 were working before I had to do a forced reboot with echo b > /proc/sysrq-trigger, but now they aren't working when I do journalctl -b-2 and -b-1 it shows a handful of lines instead of what's full in the log using either --since or just omitting -b, so basically -b X is still broken.

Attaching a tar.bz2 of all four journal files on this cleanly installed F25 system.

Note You need to log in before you can comment on or make changes to this bug.