Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1871162

Summary: Azure Gen1 (BIOS) VM occasionally stuck in grub rescue console after upgrade from RHEL7.8 to RHEL8.2 - error: symbol 'grub_calloc' not found
Product: Red Hat Enterprise Linux 7 Reporter: Michal Reznik <mreznik>
Component: leapp-repositoryAssignee: Leapp team <leapp-notifications>
Status: CLOSED ERRATA QA Contact: Alois Mahdal <amahdal>
Severity: high Docs Contact:
Priority: high    
Version: 7.8CC: cbesson, fj-lsoft-ofuku, fmartine, hhei, javierm, jcastran, leapp-notifications, mbocek, mmacura, pstodulk, ribarry, xuli, yacao, ykawada, yuxisun
Target Milestone: rcFlags: mreznik: needinfo-
pm-rhel: mirror+
Target Release: 7.9   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: leapp-repository-0.12.0-1.el7_9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 06:40:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1768952, 1872255    
Attachments:
Description Flags
grub2-install from the successful upgrade
none
grub2-install verbose from the stuck upgrade none

Description Michal Reznik 2020-08-21 13:16:18 UTC
Description of problem:

After upgrade from RHEL7.8 to RHEL8.2, Azure Gen1 (BIOS) VM occasionally stuck in grub rescue console saying error: symbol 'grub_calloc' not found

This happens approximately 1 out of 10 runs. Note: we run "grub2-install" after the el7 > el8 rpm upgrade transaction.


Version-Release number of selected component (if applicable):

grub2-pc-1:2.02-87.el8_2.x86_64

How reproducible:

Try to upgrade from RHEL7.8 to RHEL8.2 on Azure using Gen1 VM. I can provide reproducer on demand.

I also found this question which suggests it can happen also after normal upgrade. Interestingly, it was raised after the boothole fix. 

https://askubuntu.com/questions/1263125/how-to-fix-a-grub-boot-error-symbol-grub-calloc-not-found

Actual results:

VM stuck in GRUB rescue console (as per the summary)

Expected results:

VM boots fine.

Additional info:

Comment 1 Michal Reznik 2020-08-31 07:21:47 UTC
Created attachment 1713115 [details]
grub2-install from the successful upgrade

Comment 2 Michal Reznik 2020-08-31 07:23:21 UTC
Created attachment 1713116 [details]
grub2-install verbose from the stuck upgrade

Comment 3 Michal Reznik 2020-08-31 08:31:03 UTC
I attached debug output of "grub2-install" from both successful and failed (stuck) upgrades.

What looks suspicious is that despite we have run the "grub2-install" command against /dev/sda in both cases, the log from the failed upgrade shows that grub tried to work with /dev/sdb:

[  922.373423] upgrade[629]: grub2-install: info: copying `/usr/share/grub/unicode.pf2' -> `/boot/grub2/fonts/unicode.pf2'.
[  922.383788] upgrade[629]: grub2-install: info: /dev/sdb2 is not present.
[  922.419914] upgrade[629]: grub2-install: info: Looking for /dev/sdb2.
[  922.426850] upgrade[629]: grub2-install: info: /dev/sdb is a parent of /dev/sdb2.
[  922.463843] upgrade[629]: grub2-install: info: /dev/sdb2 starts from 1026048.
[  922.490511] upgrade[629]: grub2-install: info: opening the device hostdisk//dev/sdb.

Comment 4 Michal Reznik 2020-08-31 09:22:29 UTC
Also "grub-mkimage" seems to be different.

When ok:

grub-mkimage --directory '/usr/lib/grub/i386-pc' --prefix '(,gpt2)/grub2' --output '/boot/grub2/i386-pc/core.img'  --dtb '' --format 'i386-pc' --compression 'auto'  'xfs' 'part_gpt' 'biosdisk'

When stuck:

grub-mkimage --directory '/usr/lib/grub/i386-pc' --prefix '/grub2' --output '/boot/grub2/i386-pc/core.img'  --dtb '' --format 'i386-pc' --compression 'auto'  --config '/boot/grub2/i386-pc/load.cfg' 'xfs' 'part_gpt' 'biosdisk' 'search_fs_uuid'

Comment 5 Michal Reznik 2020-08-31 14:35:31 UTC
Ok, it looks like the problem is with storage device name swapping. Right before running "grub2-install":

[ 1654.605904] upgrade[634]: |-sdb1              8:17   0  500M  0 part /boot/efi  C13D-C339                              |-sdb1
[ 1654.616522] upgrade[634]: |-sdb2              8:18   0  500M  0 part /boot      8cc4c23c-fa7b-4a4d-bba8-4108b7ac0135   |-sdb2
[ 1654.623843] upgrade[634]: |-sdb3              8:19   0    2M  0 part                                                   |-sdb3

/dev/sda becomes /dev/sdb. 

Anyway, wondering whether it should not work even if we did not actually upgraded GRUB core. I recall when we did not use to do it during the upgrade, we had e.g. issues with booting into RHEL7 kernel instead of RHEL8 but it did not get stuck in grub rescue.

Comment 6 Javier Martinez Canillas 2020-08-31 14:40:52 UTC
(In reply to Michal Reznik from comment #5)
> Ok, it looks like the problem is with storage device name swapping. Right
> before running "grub2-install":
> 
> [ 1654.605904] upgrade[634]: |-sdb1              8:17   0  500M  0 part
> /boot/efi  C13D-C339                              |-sdb1
> [ 1654.616522] upgrade[634]: |-sdb2              8:18   0  500M  0 part
> /boot      8cc4c23c-fa7b-4a4d-bba8-4108b7ac0135   |-sdb2
> [ 1654.623843] upgrade[634]: |-sdb3              8:19   0    2M  0 part     
> |-sdb3
> 
> /dev/sda becomes /dev/sdb. 
>

This explain then because only the GRUB modules are updated but not the GRUB
core image.
 
> Anyway, wondering whether it should not work even if we did not actually
> upgraded GRUB core. I recall when we did not use to do it during the
> upgrade, we had e.g. issues with booting into RHEL7 kernel instead of RHEL8
> but it did not get stuck in grub rescue.

The reason why it fails is because there's a new function (grub_calloc) that
was added in 8.2. If only the GRUB modules are updated but the GRUB core is
not, then the modules will have a 'grub_calloc' symbol that's not found.

This is the error that you are seeing and why the menu can't be displayed.

The bug you mentioned about the wrong boot entry being selected, was due a bug
that got fixed in a newer version. And that's why not updating led to the RHEL7
kernel being booted instead of the RHEL8 one.

Comment 12 Christophe Besson 2020-09-11 09:45:24 UTC
This bug has been encountered on an Azure VM using RHSM for the upgrade.
The issue didn't occur with /boot on the same disk.

It has been reproduced twice, each time with a /boot being on another disk (/dev/sdb).
Using rd.upgrade.break=leapp-upgrade and reinstalling grub within a chroot into /sysroot (with virtual fs bind mounted) worked.
But after unmounting all FS but /sysroot itself, I exited the shell and then the upgrade sequence failed with:
~~~
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.287 DEBUG    PID: 3 leapp.repository.system_upgrade_el7toel8: Running actor discovery
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.292 DEBUG    PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/addupgradebootentry
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.313 DEBUG    PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/authselectapply
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.329 DEBUG    PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/authselectcheck
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.344 DEBUG    PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/authselectscanner
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.376 DEBUG    PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/biosdevname
Sep 11 09:32:44 localhost upgrade[16139]: Process Process-5:
Sep 11 09:32:44 localhost upgrade[16139]: Traceback (most recent call last):
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/lib64/python2.7/multiprocessing/process.py", line 267, in _bootstrap
Sep 11 09:32:44 localhost upgrade[16139]:     self.run()
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
Sep 11 09:32:44 localhost upgrade[16139]:     self._target(*self._args, **self._kwargs)
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 26, in inspect_actor
Sep 11 09:32:44 localhost upgrade[16139]:     definition.load()
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 155, in load
Sep 11 09:32:44 localhost upgrade[16139]:     self._module = importer.find_module(name).load_module(name)
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/lib64/python2.7/pkgutil.py", line 243, in load_module
Sep 11 09:32:44 localhost upgrade[16139]:     mod = imp.load_module(fullname, self.file, self.filename, self.etc)
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/biosdevname/actor.py", line 2, in <module>
Sep 11 09:32:44 localhost upgrade[16139]:     from leapp.libraries.actor.library import check_biosdevname
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/lib64/python2.7/pkgutil.py", line 243, in load_module
Sep 11 09:32:44 localhost upgrade[16139]:     mod = imp.load_module(fullname, self.file, self.filename, self.etc)
Sep 11 09:32:44 localhost upgrade[16139]:   File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/biosdevname/libraries/library.py", line 3, in <module>
Sep 11 09:32:44 localhost upgrade[16139]:     import pyudev
Sep 11 09:32:44 localhost upgrade[16139]: ImportError: No module named pyudev
Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.393 ERROR    PID: 3 leapp.repository.system_upgrade_el7toel8: Process inspecting actor in actors/biosdevname failed with 1
Sep 11 09:32:44 localhost upgrade[16139]: Error: Inspection of actor in actors/biosdevname failed
Sep 11 09:32:44 localhost kernel: XFS (vda1): Unmounting Filesystem
Sep 11 09:32:44 localhost upgrade[16139]: Container sysroot failed with error code 2.
~~~

Or maybe due to:
~~~
Sep 11 09:33:13 localhost umount[16244]: umount: /sysroot: target is busy.
~~~

For now, I have no workaround to suggest.

Comment 13 ykawada 2020-09-17 07:51:23 UTC
*** Bug 1872165 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2020-11-04 06:40:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (leapp, leapp-repository, and cockpit-leapp bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4882