Bug 2131234

Summary: [leapp] /boot mounted too late for the "remove_boot_files" actor
Product: Red Hat Enterprise Linux 7 Reporter: Christophe Besson <cbesson>
Component: leapp-repositoryAssignee: Leapp Notifications Bot <leapp-notifications-bot>
Status: NEW --- QA Contact: upgrades-and-conversions
Severity: high Docs Contact:
Priority: high    
Version: 7.9CC: jcastran
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christophe Besson 2022-09-30 11:34:59 UTC
Description of problem:
During an IPU 7>8, customer reported RHEL 8.6 fails stalls at reboot.

Checking the logs, it appears:
 - the upgrade images have not been removed from /boot, explaining why it always reboots on this.
 - rebooting on the RHEL 7.9 kernel, the OS seems to be correctly upgraded anyway (rpms are updated and /etc/redhat-release shows 8.6).
 - no el8 kernel boot entry

The sosreport shows they have 40 disks with multipath and lpfc nics.
However sda where resides the rootfs, /boot and /boot/efi is not a mpath.

NAME                     MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                        8:0    0 558.4G  0 disk  
|-sda1                     8:1    0   200M  0 part  /boot/efi
|-sda2                     8:2    0   500M  0 part  /boot
`-sda3                     8:3    0 557.7G  0 part  
  |-rhel-root            253:0    0    50G  0 lvm   /
  |-rhel-swap            253:1    0     4G  0 lvm   [SWAP]


Version-Release number of selected component (if applicable):
leapp-upgrade-el7toel8-0.16.0-4.el7_9

How reproducible:
Always for the customer.
Cannot reproduce internally.

Actual results:
$ grep -e leapp.workflow.Preparation.remove_boot_files -e sda2 0020-leapp-logs.tar/leapp/leapp-upgrade.log | grep localhost | head -n5
Sep 20 00:56:40 localhost kernel:  sda: sda1 sda2 sda3
Sep 20 00:59:02 localhost upgrade[4157]: 2022-09-19 17:59:02.297 ERROR    PID: 862 leapp.workflow.Preparation.remove_boot_files: Could not remove /boot/vmlinuz-upgrade.x86_64: [Errno 2] No such file or directory: '/boot/vmlinuz-upgrade.x86_64'.
Sep 20 00:59:02 localhost upgrade[4157]: 2022-09-19 17:59:02.362 ERROR    PID: 862 leapp.workflow.Preparation.remove_boot_files: Could not remove /boot/initramfs-upgrade.x86_64.img: [Errno 2] No such file or directory: '/boot/initramfs-upgrade.x86_64.img'.
Sep 20 01:21:09 localhost kernel: XFS (sda2): Mounting V4 Filesystem          <===
Sep 20 01:21:21 localhost kernel: XFS (sda2): Ending clean mount

Additional info:
- Need to request an sosreport taken on the upgraded OS, with the leapp.db.
- It seems replacing the UUIDs by the device names in /etc/fstab didn't help to move forward.
- Trying to hack the leapp dracut do-upgrade script to introduce a delay and force the mount of /boot didn't help, customer reported "/dev/sda2 can't open blockdev".
- Only workaround for now: fix manually the issue after the IPU (outlines: reboot on el7 kernel, remove upgrade imgs from /boot, reinstall the el8 kernel and check the grub.cfg and bls entries are correct).