Bug 1871162
| Summary: | Azure Gen1 (BIOS) VM occasionally stuck in grub rescue console after upgrade from RHEL7.8 to RHEL8.2 - error: symbol 'grub_calloc' not found | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michal Reznik <mreznik> | ||||||
| Component: | leapp-repository | Assignee: | Leapp team <leapp-notifications> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Alois Mahdal <amahdal> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.8 | CC: | cbesson, fj-lsoft-ofuku, fmartine, hhei, javierm, jcastran, leapp-notifications, mbocek, mmacura, pstodulk, ribarry, xuli, yacao, ykawada, yuxisun | ||||||
| Target Milestone: | rc | Flags: | mreznik:
needinfo-
pm-rhel: mirror+ |
||||||
| Target Release: | 7.9 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | leapp-repository-0.12.0-1.el7_9 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-11-04 06:40:13 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1768952, 1872255 | ||||||||
| Attachments: |
|
||||||||
|
Description
Michal Reznik
2020-08-21 13:16:18 UTC
Created attachment 1713115 [details]
grub2-install from the successful upgrade
Created attachment 1713116 [details]
grub2-install verbose from the stuck upgrade
I attached debug output of "grub2-install" from both successful and failed (stuck) upgrades. What looks suspicious is that despite we have run the "grub2-install" command against /dev/sda in both cases, the log from the failed upgrade shows that grub tried to work with /dev/sdb: [ 922.373423] upgrade[629]: grub2-install: info: copying `/usr/share/grub/unicode.pf2' -> `/boot/grub2/fonts/unicode.pf2'. [ 922.383788] upgrade[629]: grub2-install: info: /dev/sdb2 is not present. [ 922.419914] upgrade[629]: grub2-install: info: Looking for /dev/sdb2. [ 922.426850] upgrade[629]: grub2-install: info: /dev/sdb is a parent of /dev/sdb2. [ 922.463843] upgrade[629]: grub2-install: info: /dev/sdb2 starts from 1026048. [ 922.490511] upgrade[629]: grub2-install: info: opening the device hostdisk//dev/sdb. Also "grub-mkimage" seems to be different. When ok: grub-mkimage --directory '/usr/lib/grub/i386-pc' --prefix '(,gpt2)/grub2' --output '/boot/grub2/i386-pc/core.img' --dtb '' --format 'i386-pc' --compression 'auto' 'xfs' 'part_gpt' 'biosdisk' When stuck: grub-mkimage --directory '/usr/lib/grub/i386-pc' --prefix '/grub2' --output '/boot/grub2/i386-pc/core.img' --dtb '' --format 'i386-pc' --compression 'auto' --config '/boot/grub2/i386-pc/load.cfg' 'xfs' 'part_gpt' 'biosdisk' 'search_fs_uuid' Ok, it looks like the problem is with storage device name swapping. Right before running "grub2-install": [ 1654.605904] upgrade[634]: |-sdb1 8:17 0 500M 0 part /boot/efi C13D-C339 |-sdb1 [ 1654.616522] upgrade[634]: |-sdb2 8:18 0 500M 0 part /boot 8cc4c23c-fa7b-4a4d-bba8-4108b7ac0135 |-sdb2 [ 1654.623843] upgrade[634]: |-sdb3 8:19 0 2M 0 part |-sdb3 /dev/sda becomes /dev/sdb. Anyway, wondering whether it should not work even if we did not actually upgraded GRUB core. I recall when we did not use to do it during the upgrade, we had e.g. issues with booting into RHEL7 kernel instead of RHEL8 but it did not get stuck in grub rescue. (In reply to Michal Reznik from comment #5) > Ok, it looks like the problem is with storage device name swapping. Right > before running "grub2-install": > > [ 1654.605904] upgrade[634]: |-sdb1 8:17 0 500M 0 part > /boot/efi C13D-C339 |-sdb1 > [ 1654.616522] upgrade[634]: |-sdb2 8:18 0 500M 0 part > /boot 8cc4c23c-fa7b-4a4d-bba8-4108b7ac0135 |-sdb2 > [ 1654.623843] upgrade[634]: |-sdb3 8:19 0 2M 0 part > |-sdb3 > > /dev/sda becomes /dev/sdb. > This explain then because only the GRUB modules are updated but not the GRUB core image. > Anyway, wondering whether it should not work even if we did not actually > upgraded GRUB core. I recall when we did not use to do it during the > upgrade, we had e.g. issues with booting into RHEL7 kernel instead of RHEL8 > but it did not get stuck in grub rescue. The reason why it fails is because there's a new function (grub_calloc) that was added in 8.2. If only the GRUB modules are updated but the GRUB core is not, then the modules will have a 'grub_calloc' symbol that's not found. This is the error that you are seeing and why the menu can't be displayed. The bug you mentioned about the wrong boot entry being selected, was due a bug that got fixed in a newer version. And that's why not updating led to the RHEL7 kernel being booted instead of the RHEL8 one. Fixed in https://github.com/oamg/leapp-repository/pull/584/commits/0643ddb4f8e4e296bd858678395dd7a812b3a8c3 This bug has been encountered on an Azure VM using RHSM for the upgrade. The issue didn't occur with /boot on the same disk. It has been reproduced twice, each time with a /boot being on another disk (/dev/sdb). Using rd.upgrade.break=leapp-upgrade and reinstalling grub within a chroot into /sysroot (with virtual fs bind mounted) worked. But after unmounting all FS but /sysroot itself, I exited the shell and then the upgrade sequence failed with: ~~~ Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.287 DEBUG PID: 3 leapp.repository.system_upgrade_el7toel8: Running actor discovery Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.292 DEBUG PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/addupgradebootentry Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.313 DEBUG PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/authselectapply Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.329 DEBUG PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/authselectcheck Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.344 DEBUG PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/authselectscanner Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.376 DEBUG PID: 3 leapp.repository.system_upgrade_el7toel8: Starting actor discovery in actors/biosdevname Sep 11 09:32:44 localhost upgrade[16139]: Process Process-5: Sep 11 09:32:44 localhost upgrade[16139]: Traceback (most recent call last): Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/lib64/python2.7/multiprocessing/process.py", line 267, in _bootstrap Sep 11 09:32:44 localhost upgrade[16139]: self.run() Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run Sep 11 09:32:44 localhost upgrade[16139]: self._target(*self._args, **self._kwargs) Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 26, in inspect_actor Sep 11 09:32:44 localhost upgrade[16139]: definition.load() Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 155, in load Sep 11 09:32:44 localhost upgrade[16139]: self._module = importer.find_module(name).load_module(name) Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/lib64/python2.7/pkgutil.py", line 243, in load_module Sep 11 09:32:44 localhost upgrade[16139]: mod = imp.load_module(fullname, self.file, self.filename, self.etc) Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/biosdevname/actor.py", line 2, in <module> Sep 11 09:32:44 localhost upgrade[16139]: from leapp.libraries.actor.library import check_biosdevname Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/lib64/python2.7/pkgutil.py", line 243, in load_module Sep 11 09:32:44 localhost upgrade[16139]: mod = imp.load_module(fullname, self.file, self.filename, self.etc) Sep 11 09:32:44 localhost upgrade[16139]: File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/biosdevname/libraries/library.py", line 3, in <module> Sep 11 09:32:44 localhost upgrade[16139]: import pyudev Sep 11 09:32:44 localhost upgrade[16139]: ImportError: No module named pyudev Sep 11 09:32:44 localhost upgrade[16139]: 2020-09-11 11:32:44.393 ERROR PID: 3 leapp.repository.system_upgrade_el7toel8: Process inspecting actor in actors/biosdevname failed with 1 Sep 11 09:32:44 localhost upgrade[16139]: Error: Inspection of actor in actors/biosdevname failed Sep 11 09:32:44 localhost kernel: XFS (vda1): Unmounting Filesystem Sep 11 09:32:44 localhost upgrade[16139]: Container sysroot failed with error code 2. ~~~ Or maybe due to: ~~~ Sep 11 09:33:13 localhost umount[16244]: umount: /sysroot: target is busy. ~~~ For now, I have no workaround to suggest. *** Bug 1872165 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (leapp, leapp-repository, and cockpit-leapp bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4882 |