Bug 2090561

Summary: Possible to detect mismatched kernel/initrd and install.img to prevent weird failures?
Product: [Fedora] Fedora Reporter: Jason Tibbitts <j>
Component: dracutAssignee: dracut-maint-list
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: anaconda-maint-list, bcl, bugzilla, dracut-maint-list, jamacku, jkonecny, jonathan, jstodola, lnykryn, pvalena, reallylongword
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-25 15:56:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Tibbitts 2022-05-26 03:23:17 UTC
It was suggested by some anaconda devs that I file this somewhere, though it was less clear where it should be filed.  Anaconda itself was ruled out but lorax was a possibility, so here I am.

To install machines, I boot from PXE images and pull install.img over the network.  Every few releases, while setting things up, I somehow manage to mess up so that the PXE images and the install.img don't match.  When this happens, things break in bizarre and occasionally amusing ways and if I don't catch it I can waste a lot of time trying to debug.  And I know I'm not the only person who runs into this occasionally.

So I was wondering if there is any way to detect this and at least emit a prominent warning somewhere.  I noticed is that the install.img has kernel modules in it and thus the expected kernel version is already available; it seems like it would be relatively simple to use that as at least a simple check.

I'm going to see if I can bodge something into my kickstart %pre just to see if I can save myself the hassle the next time I make the same mistake, but it would probably be better if it were properly integrated and there by default.

In case you're curious about how it breaks, I ran into two problems when using F35 kernel/initrd with the F36 install.img.  The first was a failure of the zram module to load, though I could find no diagnostics which showed me why it failed to load.  The second was the installer saying that I needed to add an additional -1.66 GiB of storage (yes, a negative value) to the root filesystem in order for the install to continue.

Comment 1 Brian Lane 2022-05-26 15:41:27 UTC
I've seen this before with xfs refusing to work when things are mismatched. But I don't think this is something Lorax can (or should) try to do. It seems like dracut in the initramfs is a good candidate for checking this. It handles mounting the root filesystem, and it know what kernel is running, so it should be possible for it to make sure it has the matching modules. Reassigning there for comment.

Comment 2 Jason Tibbitts 2022-05-26 15:47:10 UTC
Yes, dracut was the other possibility, though I wasn't aware that install.img had much at all to do with dracut (outside of being fetched by it).

Comment 3 Chris Murphy 2022-05-27 03:13:05 UTC
I guess the question is whether to panic or drop to a dracut shell? I think this combination is undefined and potentially it's bad, so it might be safer to panic.

Comment 4 Jiri Konecny 2022-05-27 09:08:30 UTC
Please let me bring a bit of insight here. There are two things we need to solve and both are touching Dracut.

First: Kernel mismatch with initrd
In this case the behavior could be fault of any module. Dracut could have this check enabled by default and kill boot with a message that kernel version doesn't match modules in initrd. To avoid people spending time on a unexpected behavior before they realize what is the issue. My noob PoV here would be a simple check of `uname` with kernel module versions directory in initrd.


Second: initrd mismatch with stage2 image (Anaconda installation environment but not only that)
This is definitely more significant issue for Anaconda than with something else but still I think this should be part of the Dracut (feel free to correct me).

To explain how Anaconda works. We will download the stage2 image do some stuff around that (detect repos etc.) and then mount it[1] for Dracut on the given place so Dracut can do a switchroot. Here is a question if this should be handled by Anaconda or Dracut, however, I think Dracut can do a similar check proposed above to check that the kernel modules are match with the current kernel before switchroot. By doing this in Dracut it would help even in cases out of Anaconda where just root= is mismatched with the something else.
Again the behavior should be to kill the boot process and not do a switchroot with a message what was wrong. 

If you want to make it robust, you can add a boot option to disable the check if needed.

[1]: https://github.com/rhinstaller/anaconda/blob/master/dracut/anaconda-lib.sh#L156=

Comment 5 Ben Cotton 2023-04-25 17:15:33 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 6 Ludek Smid 2023-05-25 15:56:02 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.