Bug 2059565
Summary: | Installed system not bootable due to wrong kernel arguments in BLS snippets | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Christian Kellner <ckellner> | |
Component: | anaconda | Assignee: | Jiri Konecny <jkonecny> | |
Status: | CLOSED ERRATA | QA Contact: | Release Test Team <release-test-team-automation> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 9.0 | CC: | bootloader-eng-team, fmartine, jaredz, jkonecny, jrusz, jstodola, lnykryn, mlewando, msekleta, obudai, perobins, pvlasin, pzatko, rharwood, rvykydal | |
Target Milestone: | rc | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | anaconda-34.25.0.29-1.el9_0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2065170 (view as bug list) | Environment: | ||
Last Closed: | 2022-05-17 12:31:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2065170 |
Description
Christian Kellner
2022-03-01 10:50:36 UTC
Hello Christian, thanks a lot for this great investigation. Not sure if this is something Anaconda should solve or it's about the bootloader tools to be solved? Anyway, switching on grub to decide, feel free to return it back in case Anaconda should handle this. Could bootloader team please take a look on this?. Hello Jiri, (In reply to Jiri Konecny from comment #1) > Hello Christian, thanks a lot for this great investigation. > > Not sure if this is something Anaconda should solve or it's about the > bootloader tools to be solved? Anyway, switching on grub to decide, feel > free to return it back in case Anaconda should handle this. Could bootloader > team please take a look on this?. I believe the solution should be twofold, Anaconda should create a /etc/kernel/cmdline file as mentioned by Christian, since that should be done at installation time. Then grubby should also update that file (and probably grub2-mkconfig create as well?) to allow new kernel installs to create BLS snippets that contain the latest params. Actually, after writing the previous comment I realized that Jiri may be correct and that /etc/kernel/cmdline could just be created by grub2-mkconfig. Since IIRC Anaconda already runs this during the bootloader configuration phase. There is no need to duplicate the same logic in the Anaconda bootloader support. I tried to understand why this works on Fedora 35 and not RHEL 9 and I think my current hypothesis why it is failing in the bug case: In either case "20-grub.install" is useless since it uses "/proc/cmdline" (installer kernel command line) because there is no "/etc/kernel/cmdline". But this is "fixed" via: org.fedoraproject.Anaconda.Modules.Storage[1902]: INFO:program:Running in chroot '/mnt/sysroot'... grub2-mkconfig -o /boot/grub2/grub.cfg which calls update_bls_cmdline() via "/etc/grub.d/10_linux" which in turn uses local cmdline="root=${LINUX_ROOT_DEVICE} ro ${GRUB_CMDLINE_LINUX} ${GRUB_CMDLINE_LINUX_DEFAULT}" and "GRUB_CMDLINE_LINUX" is set by anaconda. BUT then later in the log we find: org.fedoraproject.Anaconda.Modules.Storage[1902]: INFO:program:Running in chroot '/mnt/sysroot'... kernel-install add 5.14.0-69.el9.x86_64 /lib/modules/5.14.0-69.el9.x86_64/vmlinuz which again calls 20-grub.install which will use "/proc/cmdline" overwriting the good configuration again. I don't think that happens on the Fedora case, although Fedora was a different payload so this might also be payload specific. We (Achilleas and I) did some more digging and in a bad installer version grub2-mkconfig does not properly update the BLS snippet, because `get_sorted_bls` in `10_linux` returns an empty list. This seems to be due to the fact that `/mnt/sysroot/boot/loader/entries/` has the actual entry with a "wrong" prefix, i.e. "ffffffffffffffffffffffffffffffff-5.14.0-70.el9.x86_64.conf" instead of "${machine_id}-5.14.0-70.el9.x86_64.conf" (like in the "good" image). The initial BLS snippets, if I read anaconda code correctly, are created via 20-grub.install, via kernel-install add called from anaconda `create_bls_entries` in `anaconda/pyanaconda/modules/storage/bootloader/utils.py`. Now the price question is what did change so that the boot loader entries are created with `ffff..` prefix and not the machine-id. We tried to hunt down code changes in anaconda. Maybe https://github.com/rhinstaller/anaconda/commit/dbac59ad430 but that is from June. Digging some more, on the broken installer, manually removing the bls in /boot/loader/entires and then calling kernel-install add 5.14.0-70.el9.x86_64 /lib/modules/5.14.0-70.el9.x86_64/vmlinuz results in -rw-r--r--. 1 root root 380 Mar 10 17:42 /boot/loader/entries/ffffffffffffffffffffffffffffffff-5.14.0-70.el9.x86_64.conf with options inst.stage2=hd:LABEL=CentOS-Stream-9-BaseOS-x86_64 inst.ks=hd:LABEL=CentOS-Stream-9-BaseOS-x86_64:/osbuild.ks quiet inst.sshd $tuned_params kernel-install is part of systemd. The good installer has systemd-249-9.el9.x86_64, the bad one systemd-250-4.el9.x86_64. There has been a change related to machine id in kernel-install: https://github.com/systemd/systemd/commit/357376d0bb525b064f468e0e2af8193b4b90d257 introduced in v250-rc3. I manually backed this change out and this resulted in the correct behaviour: -rw-r--r--. 1 root root 380 Mar 10 18:09 fea38a705d7b4b14a842659b7f993cdd-5.14.0-70.el9.x86_64.conf Just for the record: [anaconda root@ibm-p8-kvm-03-guest-02 ~]# cat /etc/machine-info KERNEL_INSTALL_MACHINE_ID=ffffffffffffffffffffffffffffffff which due to https://github.com/systemd/systemd/commit/357376d0bb525b064f468e0e2af8193b4b90d257 is now preferred # Prefer to use an existing machine ID from /etc/machine-info or /etc/machine-id. If we're using the machine # ID /etc/machine-id, try to persist it in /etc/machine-info. If no machine ID is found, try to generate # a new machine ID in /etc/machine-info. If that fails, use "Default". [ -z "$MACHINE_ID" ] && [ -f /etc/machine-info ] && source /etc/machine-info && MACHINE_ID="$KERNEL_INSTALL_MACHINE_ID" [ -z "$MACHINE_ID" ] && [ -f /etc/machine-id ] && read -r MACHINE_ID </etc/machine-id I think we are missing this patch https://github.com/rhinstaller/anaconda/commit/35f510934902a63c21c2aea45cf32a543d9e7886 Hi Christian, yes it seems that could be the issue. I will try to test these changes to verify if this is really the issue but it seems to be probable. However, I would like to know about this from the systemd developers. Basically, we got that PR is Fedora because Live images (similar to your use-case) were not booting on Fedora. Here is the PR with discussion why is that needed: https://github.com/rhinstaller/anaconda/pull/3770 However, than I was pinged with another PR coming into systemd which may also change the behavior, there were again some discussion but I must admit that my knowledge in this field is not good enough to decide if it would really require another change from the Anaconda in the future, nor if it goes into RHEL in the future. As I understand it the patch in the Anaconda shouldn't be necessary in a future release? https://github.com/systemd/systemd/pull/22463#issuecomment-1040585085 Could I please ask systemd for clear verification that backport of the PR above in Anaconda is a correct thing to do and if we should expect another changes required in the Anaconda code in the near future? PR: https://github.com/rhinstaller/anaconda/pull/3957 I did the testing and it seems that backport of PR mentioned in comment 18 with modifications because of compatibility will resolve this issue. (In reply to Jiri Konecny from comment #18) > > Could I please ask systemd for clear verification that backport of the PR > above in Anaconda is a correct thing to do and if we should expect another > changes required in the Anaconda code in the near future? Yes, previous backport was correct thing to do. However, I think that further changes will still be required. AFAICT, right now we will end up in the situation where /etc/machine-id will contain different UUID compared to value of KERNEL_INSTALL_MACHINE_ID (from /etc/machine-info) that is used as a prefix in BLS config file names. This shouldn't cause any obvious failures but is confusing. We will have the discussion with systemd upstream about further steps and we will keep you posted. Checked that anaconda-34.25.0.29-1.el9_0 is in nightly compose RHEL-9.0.0-20220322.0 Moving to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: anaconda), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2326 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |