Bug 1401444 - Unconditional kernel crash in early boot
Summary: Unconditional kernel crash in early boot
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 25
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1419432 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-05 09:45 UTC by Stephan Mueller
Modified: 2019-01-09 12:54 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-12 10:54:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Grub config (22.63 KB, text/plain)
2016-12-06 07:35 UTC, Stephan Mueller
no flags Details

Description Stephan Mueller 2016-12-05 09:45:00 UTC
Description of problem:

After update from F24 to F25, the system will not boot due to a kernel crash in early boot cycle (before the console is switched from ASCII to VGA/plymouth).

The crash is unconditional and happens with both released F25 kernels 4.8.10-300.fc25.x86_64 and 4.8.8-300.fc25.x86_64. When booting the system with the last remaining F24 kernel that is installed (4.8.8-200.fc24.x86_64), it works well.

Unfortunately I cannot debug the kernel panic as only parts of the panic message is displayed (the top part is scrolled out).

The final message is about init being killed. So, I suspect that it could even be a initramfs issue where the init in there is killed somehow (the stack trace shows the panic being invoked after a do_exit call). But I have no idea how to debug that given that the log message scrolls by so fast.


Version-Release number of selected component (if applicable):

All F25 kernels.


How reproducible:

Simple reboot.

Note, I do not see that on other systems that have been updated to F25 from F24.


Additional info:

Comment 1 Stephan Mueller 2016-12-05 11:33:49 UTC
When booting the F25 kernels with the initramfs from the F24 kernel, crash vanishes, but the system does not come up nicely as expected.

So, the crash happens with the initramfs. But none of the Dracut boot parameters (rd.shell, rd.break) prevent the crash -- i.e. I cannot enter the initramfs to debug.

Comment 2 Laura Abbott 2016-12-05 18:47:00 UTC
Can you attach your grub.cfg?

Comment 3 Stephan Mueller 2016-12-06 07:35:01 UTC
Created attachment 1228306 [details]
Grub config

Comment 4 Laura Abbott 2016-12-06 22:23:23 UTC
Almost always init being killed is a problem with the initramfs. Moving this to dracut.

Comment 5 Zbigniew Jędrzejewski-Szmek 2016-12-07 01:39:35 UTC
Hm, not too much information to go on.

> When booting the F25 kernels with the initramfs from the F24 kernel, crash vanishes, but the system does not come up nicely as expected.

This is not a useful test. No modules will be loaded from the F24 initramfs when booting with F25 kernel, so it's expected that the machine does not boot.

Could you please try the following:
- boot with the F24 kernel and initrd
- capture the boot journal (journalctl -b -o short-monotonic --no-hostname)
- create an initramfs for the F24 kernel, under a NEW NAME, so you can still boot with the old one if necessary (sudo mkinitrd /boot/initramfs-`uname -r`.img2 `uname -r`)
- reboot and in grub select the F24 kernel with the regenerated image.

This will show if it's the F25 dracut or userspace that is the problem, or if it is the F25 kernel that is the problem. Further steps will depend on whether this works.

Comment 6 Stephan Mueller 2016-12-07 14:54:07 UTC
Booting with the F24 kernel and the original initramfs works as described.

After re-creating the initramfs for the F24 kernel with the F25 user space and booting with the new initramfs, the system comes up nicely as expected.

For better debugging: are you aware of any way how to either scroll up the kernel oops or otherwise access the full kernel oops message?

Comment 7 Zbigniew Jędrzejewski-Szmek 2016-12-07 17:33:19 UTC
(In reply to Stephan Mueller from comment #6)
> Booting with the F24 kernel and the original initramfs works as described.
> 
> After re-creating the initramfs for the F24 kernel with the F25 user space
> and booting with the new initramfs, the system comes up nicely as expected.
This strongly suggests that the issue is either with the kernel, or in some incompatibility between the new kernel and the new dracut/systemd combo.

> For better debugging: are you aware of any way how to either scroll up the
> kernel oops or otherwise access the full kernel oops message?

shift-pageup/shift-page-down, but if the kernel oopses, it disables scrolling. Something like netconsole could help, maybe.

Comment 8 Stephan Mueller 2016-12-08 07:50:12 UTC
I found the answer to the issue: the initramfs for the F25 kernels contained the file /etc/system-fips causing the issue.

I would think that this shows multiple root causes all at the same time:

1. I had dracut-fips installed, but the system was booted *without* fips=1. The interesting thing is that during F24 times, the /etc/system-fips file was never created in the initramfs. But after switching to F25, that file started to be created. After removing dracut-fips and re-creating the initramfs for the F25 kernel, that file was gone from the initramfs image.

2. The FIPS mode seem to be broken in F25. I would assume that the insmod of the kernel crypto modules into the kernel during boot is the culprit for the crash, but I cannot say for sure as I lack parts of the kernel panic (netconsole did not work). If this is the case, then some kernel crypto API problem is visible. There may be another reason to the issue: the sha512hmac invocation failed (note, I have /boot on a separate partition and did not have the boot= kernel command line). If the latter is the case, then the dracut-fips code has an issue because it tries to perform the integrity check of the vmlinuz file even when there was *no* fips=1 set on the kernel command line.

Comment 9 Harald Hoyer 2017-01-12 14:10:05 UTC
Booting a qemu with "systemd.log_level=debug systemd.log_target=console console=ttyS0 loglevel=7"

[    0.945320] Freeing unused kernel memory: 1900K (ffff8bbddb825000 - ffff8bbddba00000)
[    0.947343] Freeing unused kernel memory: 704K (ffff8bbddbd50000 - ffff8bbddbe00000)
[    0.953108] x86/mm: Checked W+X mappings: passed, no W+X pages found.
Fatal: no entrop[    0.959585] traps: init[1] general protection ip:7fb641fa3642 sp:7fff9098d400 error:0y gathering module detected
[    0.960370]  in libc-2.24.so[7fb641f6c000+1bd000]
[    0.961106] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.961106] 
[    0.961807] CPU: 2 PID: 1 Comm: init Not tainted 4.9.2-200.fc25.x86_64 #1
[    0.962329] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1.fc26 04/01/2014

it seems like systemd is crashing in libc with

"Fatal: no entropy"

no part of dracut has even run yet

The only part added to the initramfs is "/etc/system-fips"

Comment 10 Florian Weimer 2017-01-12 14:23:58 UTC
The full string is “Fatal: no entropy gathering module detected” and comes from libgcrypt.  I'm not sure why this would be a glibc bug.

Harald, shouldn't this be assigned to libgcrypt?  It looks like FIPS mode is broken.

Comment 11 Harald Hoyer 2017-01-12 14:24:49 UTC
ah, sorry, that is:

# fgrep -r 'system-fips' $(ldd /usr/lib/systemd/systemd | while read _ _ a _; do echo $a;done)
Binary file /lib64/libgcrypt.so.20 matches

Comment 12 Harald Hoyer 2017-01-12 14:28:31 UTC
So, a solution would be either:

1) create /etc/system-fips dynamically from within dracut by parsing the kernel command line
2) or, if /etc/system-fips exists, check the kernel command line from within libgcrypt

Comment 13 Harald Hoyer 2017-01-12 14:55:35 UTC
(In reply to Harald Hoyer from comment #12)
> So, a solution would be either:
> 
> 1) create /etc/system-fips dynamically from within dracut by parsing the
> kernel command line

In this case, it might be better to switch to /run/system-fips in case of a read-only /etc

Comment 14 Harald Hoyer 2017-01-12 14:59:25 UTC
(In reply to Harald Hoyer from comment #13)
> (In reply to Harald Hoyer from comment #12)
> > So, a solution would be either:
> > 
> > 1) create /etc/system-fips dynamically from within dracut by parsing the
> > kernel command line
> 
> In this case, it might be better to switch to /run/system-fips in case of a
> read-only /etc

or take the proven concept of:

/run/system-fips
/etc/system-fips
/usr/lib/system-fips

with invalidation of the flag, if it is a symlink to /dev/null

Comment 15 Tomas Mraz 2017-01-17 08:51:48 UTC
The libgcrypt reports this error when there is no /dev/urandom present or accessible during its initialization. Could it be possible to add it?

The difference between /etc/system-fips present and not is that if it is present the libgcrypt must initialize in a constructor during its load. When it is not present the application chooses when it wants to initialize it.

Comment 16 Harald Hoyer 2017-01-18 08:18:36 UTC
(In reply to Tomas Mraz from comment #15)
> The libgcrypt reports this error when there is no /dev/urandom present or
> accessible during its initialization. Could it be possible to add it?
> 
> The difference between /etc/system-fips present and not is that if it is
> present the libgcrypt must initialize in a constructor during its load. When
> it is not present the application chooses when it wants to initialize it.

ok, so I will have to add /dev/urandom to the static devices like /dev/null

Comment 17 Harald Hoyer 2017-01-18 10:39:33 UTC
commit 83a6d6f928e98fea559d5fde46f326a4df35e114

Comment 18 Tomas Mraz 2017-01-18 11:22:53 UTC
As for the placement of the system-fips file. It is part of the FIPS module as defined by the FIPS security policy so its most appropriate place is /usr/lib I think. The file is part of the dracut-fips package and if you install it you indicate that you want FIPS modules to be "functional". So the historical placement in /etc is not the most correct one. On the other hand I do not know whether it is worth it to modify all the FIPS modules to look elsewhere (note that we have 6 FIPS modules depending on it. Also any "symlink to /dev/null" tricks would probably not conform to FIPS requirements but are also unneeded because if you do not want FIPS you simply do not install the dracut-fips and that would mean the /etc/system-fips (or /usr/lib/system-fips if we move it) is not present.

Although we say booting without fips=1 on kernel command line and having /etc/system-fips present is "unsupported combination" in general we should not make this combination completely broken anyway, so dynamically modifying presence of /etc/system-fips in initramfs based on kernel command line should not be needed. Also it would make no difference here as the cause was different.

Comment 19 Florian Weimer 2017-02-06 21:10:35 UTC
*** Bug 1419432 has been marked as a duplicate of this bug. ***

Comment 20 Fedora End Of Life 2017-11-16 19:22:19 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 21 Fedora End Of Life 2017-12-12 10:54:48 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.