Bug 1804483
Summary: | Grubby on aarch64 seems to corrupt grub.cfg | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael H. Warfield <mhw> | ||||||||||||||||||
Component: | grubby | Assignee: | Peter Jones <pjones> | ||||||||||||||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||
Version: | 31 | CC: | fmartine, pjones | ||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | aarch64 | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2020-11-24 18:23:19 UTC | Type: | Bug | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Attachments: |
|
Description
Michael H. Warfield
2020-02-18 23:38:08 UTC
Original Fedora image was the Xfce image, if it matters. Another point on the curve. This seems to NOT be recent. Two of the six systems I was recovering had crashed completely during the dnf update at the kernel-core update script. After editing the grub.cfg file, one of them got passed the U-Boot but crashed loading the latest kernel, due to no initramfs. Fine. That's a known headache. Rebooted to the 5.4.18 kernel and it's back up and I do a reinstall of the 5.4.19 kernel-core package to fix the initramfs. While doing that, I monitored the gurb.cfg file from the good file that worked to the end of the reinstall. This is what I saw... This: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- And became this after the reinstall was complete: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates. It indicates a couple of problems including a potential option too long somewhere and an possible buffer overrun but it doesn't blow chunks until around 800K or so on that line. I tried to upload the bad cfg file but the upload gave me an error. Doesn't appear to affected x86_64 systems though. But it's catching all of my aarm64 systems in the same way. Another point on the curve. This seems to NOT be recent. Two of the six systems I was recovering had crashed completely during the dnf update at the kernel-core update script. After editing the grub.cfg file, one of them got passed the U-Boot but crashed loading the latest kernel, due to no initramfs. Fine. That's a known headache. Rebooted to the 5.4.18 kernel and it's back up and I do a reinstall of the 5.4.19 kernel-core package to fix the initramfs. While doing that, I monitored the gurb.cfg file from the good file that worked to the end of the reinstall. This is what I saw... This: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- Became this during the install: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- And became this after the reinstall was complete: -- set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" -- It duplicated the root UUID on the default_kernelopts several times. And rebooting with that bad line still worked. It seems to be the size of that duplicated root UUID that finally borks it. Each time the kernel gets updated, it gets another string attached until it blows chunks. This may have been in there for a long time, I'm just real aggressive about doing kernel updates. It indicates a couple of problems including a potential option too long somewhere and an possible buffer overrun but it doesn't blow chunks until around 800K or so on that line. I tried to upload the bad cfg file but the upload gave me an error. Doesn't appear to affected x86_64 systems though. But it's catching all of my aarm64 systems in the same way. Could you please attach the following files: /etc/default/grub /etc/grub2-efi.cfg /boot/grub2/grubenv /boot/loader/entries/* Created attachment 1664060 [details]
/etc/grub2.cfg -> ../boot/efi/EFI/fedora/grub.cfg
/etc/grub2.cfg is a symlink to ../boot/efi/EFI/fedora/grub.cfg
I have attached the later. Attachments are taken from the same system in mentioning tracking the changes to the grub.cfg file.
Created attachment 1664062 [details]
/boot/grub2/grub.env
Created attachment 1664066 [details]
/etc/default/grub
Created attachment 1664069 [details]
/boot/loader/entries/* #1
First of the boot/loader/entries/* files. 4 in total.
Created attachment 1664070 [details]
/boot/loader/entries/* #2
Created attachment 1664071 [details]
/boot/loader/entries/* #3
Created attachment 1664072 [details]
/boot/loader/entries/* #4
Created attachment 1664073 [details]
/boot/loader/entries/* #5
#5. My original count was off.
On those attachments. /boot/loader/entries/* where as follows: 32db7df466bc45f9b1c0f514329fb96e-5.3.7-301.fc31.aarch64.conf cebb70635a3d4a669cddb17ac389fc78-0-rescue.conf cebb70635a3d4a669cddb17ac389fc78-5.4.17-200.fc31.aarch64.conf cebb70635a3d4a669cddb17ac389fc78-5.4.18-200.fc31.aarch64.conf cebb70635a3d4a669cddb17ac389fc78-5.4.19-200.fc31.aarch64.conf On the /boot/efi/EFI/fedora/grub.cfg file. This is one of the lightly corrupted files. Only a handful of dups. Did not fix but it did work. I have copies of the heavily corrupted files but their all just addition dups to the "set default_kernelopts=" line. Another point on the curve. Possibly related. Reinstalled kernel-core-5.4.19-200.fc31.aarch64 using dnf and saw this result I had not noticed in the original updates (but could have easily missed it): -- Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Reinstalling : kernel-core-5.4.19-200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 2/2 Cleanup : kernel-core-5.4.19-200.fc31.aarch64 2/2 Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 2/2 grubby fatal error: unable to find a suitable template Verifying : kernel-core-5.4.19-200.fc31.aarch64 1/2 Verifying : kernel-core-5.4.19-200.fc31.aarch64 2/2 Completion plugin: Generating completion cache... Reinstalled: kernel-core-5.4.19-200.fc31.aarch64 Complete! -- Checking /boot/efi/EFI/fedora/grub.cfg BEFORE rebooting and it was now 841768 in size in one go. Fortunately save a working copy of grub.cfg and replaced the bad one. is still present.(In reply to Michael H. Warfield from comment #15) > Another point on the curve. Possibly related. Reinstalled > kernel-core-5.4.19-200.fc31.aarch64 using dnf and saw this result I had not > noticed in the original updates (but could have easily missed it): > > -- > Running transaction check > Transaction check succeeded. > Running transaction test > Transaction test succeeded. > Running transaction > Preparing : > 1/1 > Reinstalling : kernel-core-5.4.19-200.fc31.aarch64 > 1/2 > Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 > 1/2 > Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 > 2/2 > Cleanup : kernel-core-5.4.19-200.fc31.aarch64 > 2/2 > Running scriptlet: kernel-core-5.4.19-200.fc31.aarch64 > 2/2 > grubby fatal error: unable to find a suitable template > Hmm, that error seems to come from the grubby tool that's in the grubby-deprecated package. Do you have that package installed? If so, could you please remove it and reinstall the kernel to see if the issue is still present? (In reply to Michael H. Warfield from comment #5) > Created attachment 1664060 [details] > /etc/grub2.cfg -> ../boot/efi/EFI/fedora/grub.cfg > > /etc/grub2.cfg is a symlink to ../boot/efi/EFI/fedora/grub.cfg > > I have attached the later. Attachments are taken from the same system in > mentioning tracking the changes to the grub.cfg file. So all the files look correct... (besides the grub.cfg of course). Yes, the grubby-deprecated package was installed. Removing it also removed extlinux-bootloader. -- -> Starting dependency resolution --> Finding unneeded leftover dependencies ---> Package extlinux-bootloader.aarch64 1.2-10.fc31 will be erased ---> Package grubby-deprecated.aarch64 8.40-36.fc31 will be erased --> Finished dependency resolution Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Removing: grubby-deprecated aarch64 8.40-36.fc31 @fedora 127 k Removing dependent packages: extlinux-bootloader aarch64 1.2-10.fc31 @fedora 2.0 k Transaction Summary ================================================================================ Remove 2 Packages Freed space: 129 k -- Completing that and reinstalling 5.4.19 works fine. That seems to be at the heart of the problem. Checking both the downloaded aarch64 and armfp images for Xfce in the download directories, they BOTH seem to have grubby-deprecated in the builds. But it doesn't seem to impact my armfp (RPi V2-B+) systems. In fact, there's nothing at all in the /boot/efi/EFI/fedora directories at all. Have not checked any other spins. Have tested it on all, six, of the affected systems. Removing grubby-deprecated seems to have resolved the issue. This real issue might now be, why was grubby-deprecated packaged in those stock images and are any other images affected. (In reply to Michael H. Warfield from comment #18) > Yes, the grubby-deprecated package was installed. Removing it also removed > extlinux-bootloader. > The extlinux-bootloader is only needed for armv7, not for aarch64. And extlinux-bootloader has as a dependency the grubby-deprecated package because extlinux doesn't have BLS support like grub2. [snip] > > Checking both the downloaded aarch64 and armfp images for Xfce in the > download directories, they BOTH seem to have grubby-deprecated in the That doesn't seem correct. The grubby-deprecated package should only be present for armv7 and not for aarch64. But still installing grubby-deprecated should be a no-op if you have GRUB_ENABLE_BLSCFG=true in /etc/default/grub which is your case. > builds. But it doesn't seem to impact my armfp (RPi V2-B+) systems. In > fact, there's nothing at all in the /boot/efi/EFI/fedora directories at all. > Have not checked any other spins. Right, for armv7 that directory would be empty since it doesn't use the u-boot EFI stub to chain-load grub2. For some reason I was not able to reproduce your issue, but could you please test the following in the systems where you had the problem: $ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg // so your grub.cfg is correct $ dnf install -y extlinux-bootloader // to pull grubby-deprecated $ dnf resintall -y kernel-core // this should corrupt your grub.cfg And then try the following: $ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg // so your grub.cfg is correct again $ chmod -x /usr/lib/kernel/install.d/20-grubby.install // so the /usr/lib/kernel/install.d/20-grub2.install plugin is executed instead $ dnf resintall -y kernel-core // I think your grub.cfg should be correct this time (In reply to Javier Martinez Canillas from comment #20) > (In reply to Michael H. Warfield from comment #18) > > Yes, the grubby-deprecated package was installed. Removing it also removed > > extlinux-bootloader. > > > > The extlinux-bootloader is only needed for armv7, not for aarch64. And > extlinux-bootloader has as a dependency the grubby-deprecated package > because extlinux doesn't have BLS support like grub2. > > [snip] > > > > > Checking both the downloaded aarch64 and armfp images for Xfce in the > > download directories, they BOTH seem to have grubby-deprecated in the > That doesn't seem correct. The grubby-deprecated package should only be > present for armv7 and not for aarch64. That proved to be an error at my end. I checked a fresh arm-installer install of that package and it was not present. Checking logs, I found it had snuk in somehow during what should have been a routing update. I think my central management was to blame. A thread on the fedora arm mailing list helped me track some of that down. > But still installing grubby-deprecated should be a no-op if you have > GRUB_ENABLE_BLSCFG=true in /etc/default/grub which is your case. > > > builds. But it doesn't seem to impact my armfp (RPi V2-B+) systems. In > > fact, there's nothing at all in the /boot/efi/EFI/fedora directories at all. > > Have not checked any other spins. > Right, for armv7 that directory would be empty since it doesn't use the > u-boot EFI stub to chain-load grub2. > For some reason I was not able to reproduce your issue, but could you please > test the following in the systems where you had the problem: > > $ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg // so your grub.cfg is > correct I first made a reference copy of /boot/efi/EFI/fedora/grub.cfg (which was now good after the uninstalls, fixes, and reinstalls) and could do diffs. The grub2-mkconfig command resulted in a few minor cosmetic differences like this: 91c91 < set default=1 --- > set default="1" 124c124 < set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB " --- > set default_kernelopts="root=UUID=92567a3e-b3d2-494a-b4ec-ebb4687d6b40 ro cma=192MB" Par for the course. > $ dnf install -y extlinux-bootloader // to pull grubby-deprecated Done > $ dnf resintall -y kernel-core // this should corrupt your grub.cfg Hahahahaha... This one got funny. They say timing is everything and it is... I got the error that none of my kernel-cores were available for reinstall. What the? Oh, kernel 5.4.20 had popped out of the queue over night. So doing an update instead. Some days. :-) Well, the grubby error is back: -- Cleanup : ibus-setup-1.5.21-7.fc31.noarch 64/76 Running scriptlet: kernel-core-5.4.17-200.fc31.aarch64 65/76 grubby fatal error: unable to find a suitable template grubby: doing this would leave no kernel entries. Not writing out new config. Erasing : kernel-core-5.4.17-200.fc31.aarch64 65/76 -- This didn't end up with a corrupted grub.cfg. But it did generate the earlier grubby error plus a bit. Did a reinstall of kerenl-core-5.4.20 Transaction test succeeded. Running transaction Preparing : 1/1 Reinstalling : kernel-core-5.4.20-200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.20-200.fc31.aarch64 1/2 Running scriptlet: kernel-core-5.4.20-200.fc31.aarch64 2/2 grubby fatal error: unable to find a suitable template grubby: doing this would leave no kernel entries. Not writing out new config. Cleanup : kernel-core-5.4.20-200.fc31.aarch64 2/2 Running scriptlet: kernel-core-5.4.20-200.fc31.aarch64 2/2 grubby fatal error: unable to find a suitable template grubby fatal error: unable to find a suitable template grubby: doing this would leave no kernel entries. Not writing out new config. Verifying : kernel-core-5.4.20-200.fc31.aarch64 1/2 Verifying : kernel-core-5.4.20-200.fc31.aarch64 2/2 Reinstalled: kernel-core-5.4.20-200.fc31.aarch64 Still... No corruption. I'm baffled. Removing the errant modules "fixed" the problem but reinstalling them caused the "grubby errors" to return but doesn't seem to have reintroduced the problem. Only difference I see now is that second error line "grubby: doing this would leave no kernel entries. Not writing out new config." That line wasn't there before. Something changed. Checked version numbers of grubby-deprecated and extlinux-bootloader and they match to the earlier versions. I'm baffled. > And then try the following: > $ grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg // so your grub.cfg is > correct again > $ chmod -x /usr/lib/kernel/install.d/20-grubby.install // so the > /usr/lib/kernel/install.d/20-grub2.install plugin is executed instead > $ dnf resintall -y kernel-core // I think your grub.cfg should be correct > this time As mentioned you should remove the grubby-deprecated module since this shouldn't be used on aarch64. And re-generate your grub.cfg file with grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg. This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |