Bug 2162113 - Booting fails after some grub2 updates
Summary: Booting fails after some grub2 updates
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: 38
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Javier Martinez Canillas
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-18 19:34 UTC by Bruno Wolff III
Modified: 2024-05-21 14:26 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-21 14:26:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bruno Wolff III 2023-01-18 19:34:37 UTC
Description of problem:
Rebooting after a week failed on one of three machines. During the week there were a few grub2 updates. The machine where it failed has /boot on mdraid. I tried downgrading grub2 from a live image (using --installroot) but that didn't help. Running grub2-install in an appropriate chroot on a live image fixed the problem. This kind of issue has happened in the past, but I don't think it happens on every grub2 update.
The machines are all x86_64. 2 use legacy bios, 1 uses efi. The one that broke uses legacy.
In this particular case the error was:
452: out of range pointer 0x9b40b010
Backtrace (.text 0xa04e .data 0x1516c):

Version-Release number of selected component (if applicable):
2.06-76.fc38

How reproducible:
Once the problem happens it happens all of the time. I don't believe it happens on every grub2 update though.

Comment 1 W. Michael Petullo 2023-01-19 22:34:45 UTC
I had a similar experience to Bruno. I performed a "dnf update" of a Fedora computer. After the update, the computer would no longer boot. Immediately after selecting a kernel to boot in the grub menu, I saw:

452: out of range pointer: 0xcefff010
Backtrace (.text 0xa05d .data 0x1510c):
Aborted. Press any key to exit.

2.06-75.fc37 is the version I updated to, which causes the problem.

I tried a number of things to get the computer to boot again, using a rescue disk:

(1) Downgrade grub.
(2) Run grub2-mkconfig.
(3) Remove and reinstall kernel package to trigger grub action.
(4) Run grub2-install /dev/sda

Attempt (4) finally allowed the computer to boot.

Comment 2 Ben Cotton 2023-02-07 15:05:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 3 mydarkstar 2023-05-06 04:17:17 UTC
I encountered this bug when installing Fedora 38 (KDE Spin) on an older system.
The system also uses legacy BIOS, as Bruno specified.

Both the regular ISO and latest ISO (live spins) installed an unbootable state
that led to about the exact same error message:

> 452: out of range pointer: 0xbefff010
> Backtrace (.text 0xa05d .data 0x1510c):
> Aborted. Press any key to exit.

Naturally, I checked the SMART status of all drives and ran a memtest with no errors.
I afterwards found this bug report and decided to install Fedora 37 instead for the meantime.
F37 runs perfectly fine, without any issues.

I also find similar reports from other Fedora users, such as:
https://phpc.social/@imabug/109897344783738133

Ubuntu 23.04 seems to be affected by the same issue,
and the folks at Rufus also determined GRUB to be the problem:
https://github.com/pbatard/rufus/issues/2233

Comment 4 Anton Guda 2023-05-07 17:35:56 UTC
I made some test with rebuilds, and can confirm, that 74->75 release changes lead to problem.
Sorry, patches is too heavy to locate bug.  And debug during boot is not a trivial task.

Comment 5 Marta Lewandowska 2023-05-26 07:26:10 UTC
Hi,
Have you managed to resolve this with a newer version of grub2? Installation on mdraid was temporarily broken, but should be working starting from grub version 2.06-79 or so. 
If you're still having issues, please try to upgrade your grub.

Comment 6 Anton Guda 2023-11-16 16:24:49 UTC
Tested with grub2-2.06-109.fc40 - still broken.
Last working version - 2.06-74
I suppose, that suspected patch is near 239-302 (in 109 release).

After first error, it is possible to return to main menu and boot again without errors.
Work both in different linux kernels, windows and even memtest+

Comment 7 Anton Guda 2023-11-17 13:13:49 UTC
Some additional observations:

This error does not appear in single disk configuration.
Host with this problem have such disk layout:

/dev/sda -  primary HDD, with grub in MBR and other auxiliary filesystems, /dev/sda5 - NTFS.
/dev/sdb - SDD, linux root at /dev/sdb5 without separate boot partition. Some other fs and swap here.

I added a special menu element in grub.cfg (among with part_msdos,ext2,fat modules):

menuentry "Linux XF2" --id lin2 --class gnu-linux --class gnu --class os --unrestricted  {
  set gfxpayload=keep
  echo  '### ENV:'
  list_env
  echo  '### SET:'
  set
  echo  "### root: $root"
  ls /
  echo  "### drivemap:"
  drivemap -l
  read xxx
  echo  'Loading Linux'
  linux /boot/vmlinuz_c root=/dev/sdb5  ro selinux=0   
}
# /boot/vmlinuz_c - symlink to current kernel

After the first menu entry, some output elements was not so obviously:
boot=hd0,msdos5 # really hd0, must be hd1,msdos5. No root= or search command is present before.
prefix=(hd0,msdos5)/boot/grub2
root=hd0,msdos5
No remap
But ls show the contents of /dev/sdb5 = (hd1,msdos5).

After error occurred, I return to the menu and select the same element again:
boot=hd1,msdos5
prefix=(hd1,msdos5)/boot/grub2
root=hd1,msdos5
No remap

And the kernel boots without errors.

Comment 8 Marta Lewandowska 2023-11-21 21:16:17 UTC
Anton, I guess you are also using BIOS? would you mind doing `lsblk -f` just to show how it's all laid out?

Comment 9 Anton Guda 2023-11-21 22:42:44 UTC
sda
├─sda1  ntfs         System Reserved 4D28D99D616C31C5
├─sda2
├─sda5  ntfs         hatu_c6         7FB809A66BFEC7CA
├─sda6  ntfs         hatu_d6         6EF63DA8535B912A
├─sda7  vfat   FAT32 hatu_e6         5A0C-43C4                              30.5G     5% /dose
├─sda8  swap   1     hatu_sw6        5ef1dfb2-c668-4dd1-b10c-582cf40e1b9e                [SWAP]
├─sda9  ext4   1.0   hatu_home26     a90e16d7-1df5-4d29-a173-528028400ddf  493.9G    48% /home2
└─sda10 ext4   1.0   hatu_boot       f0f5fe8d-6686-4dbe-a4ac-a902b0f00385
sdb
├─sdb1  ext4   1.0   hatu_xx7        81b4de56-959e-4194-9084-d872ae1d3d8b
├─sdb2
├─sdb5  ext4   1.0   hatu_root7      a0a75949-5026-4596-859f-3e14d09672d9   51.1G    69% /
├─sdb6  swap   1     hatu_sw7        b0f0c104-c865-4f9e-8c2d-45a05e16d164                [SWAP]
└─sdb7  ext4   1.0   hatu_home7      6546b59d-ae8b-463a-b040-8f44f6b70645     93G    52% /home
sr0
zram0                                                                                    [SWAP]

Yes, I use BIOS for now due to presence of old evil OS (sda1/sda5).
I noticed, that hd0/hd1 order is different in release 74 and later.
I plan to hardware (and may be software) upgrade, and assume, that if
root partition will be on first disk the problem will disappear.

Now I fall back to 74 release.
And, for example, to boot win from /dev/sda1, I use (hd1,1), not (hd0,1).

Comment 10 Marta Lewandowska 2023-11-23 13:35:07 UTC
I think there's a good chance that the problem will disappear if you have /boot on the first disk.
It certainly seems that grub is looking at the wrong device in the first place. do you still have a device.map in /boot/grub2 ? Is it correct?

The change from 74 -> 75 is a bunch of memory allocation patches from upstream
https://src.fedoraproject.org/rpms/grub2/c/7be2bf00c3aa7f7610ad092dec1569202a1e00e6?branch=rawhide

Comment 11 Aoife Moloney 2024-05-07 15:55:34 UTC
This message is a reminder that Fedora Linux 38 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 38 on 2024-05-21.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '38'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 38 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 12 Aoife Moloney 2024-05-21 14:26:00 UTC
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21.

Fedora Linux 38 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.