Bug 1925078 - RHOSP13-16.1 FFU: Overcloud upgrade hangs in controller after failed attempt with reference to wrong ceph image.
Summary: RHOSP13-16.1 FFU: Overcloud upgrade hangs in controller after failed attempt ...
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: All
OS: Unspecified
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Lukas Bezdicka
QA Contact: Jason Grosso
: 1906681 (view as bug list)
Depends On:
Blocks: 1768952
TreeView+ depends on / blocked
Reported: 2021-02-04 11:20 UTC by Shravan Kumar Tiwari
Modified: 2021-04-12 14:24 UTC (History)
17 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210104205662.el8ost.2
Doc Type: Known Issue
Doc Text:
Systems that use UEFI boot and a UEFI bootloader in OSP13 might run into an UEFI issue that results in: * /etc/fstab not being updated * grub-install used incorrectly on EFI system If your systems use UEFI, contact Red Hat Technical Support. For more information, see the Red Hat Knowledgebase solution https://access.redhat.com/solutions/5861031[FFU 13 to 16.1: Leapp fails to update the kernel on UEFI based systems and /etc/fstab does not contain the EFI partition]
Clone Of:
Last Closed: 2021-03-17 15:36:38 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
OpenStack gerrit 774679 0 None MERGED [train-only] Add FFWD workaround for UEFI systems 2021-06-02 15:04:04 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:38:22 UTC

Description Shravan Kumar Tiwari 2021-02-04 11:20:09 UTC
Description of problem:

Overcloud upgrade run failed as container-prepare-image.yaml had the wrong reference to ceph3_image. It was corrected by customer later and upgrade prepare was ran to get the new values in plan.

But, the overcloud upgrade run failed and still trying to pull the wrong ceph image.

Later systemd files in controllers were manually updated to reference to the correct ceph image and then in the logs we could see that image pull happended for corretc image but the overcloud upgrade run still hangs and customer is not able to proceed.

Version-Release number of selected component (if applicable):
FFU from RHOSP13z14 to RHOSP16.1

undercloud upgrade and overcloud upgrade for first controller is in progress.

How reproducible:

Steps to Reproduce:

Actual results:
overcloud upgrade run hangs for controller1

Expected results:
upgrade run should proceed and success.

Additional info:

Comment 3 Lukas Bezdicka 2021-02-05 16:29:45 UTC
Breakdown of the problem:

1) Customer used wrong ceph image and needed to update this in systemd service file for ceph to remove reoccurring log about podman failing to pull image. This was irrelevant to stuck upgrade.
2) The stuck upgrade came from mysql upgrade container getting stuck with podman reporting it running but nothing happening.
3) The container was stuck due to wrong kernel - post Leapp the system booted into old 3.10.0-1160 instead of 4...
4) This was due to Leapp failing to update kernel because system it self was EFI based but /etc/fstab does not contain the EFI partition.

To break this down during deployment Ironic runs:

Jan 11 05:22:25 host-192-168-0-200 ironic-python-agent[2108]: 2021-01-11 05:22:25.019 2108 DEBUG oslo_concurrency.processutils [-] CMD "mount /dev/sda1 /tmp/tmpFpRF6S/boot/efi" returned: 0 in 0.076s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
Jan 11 05:22:25 host-192-168-0-200 ironic-python-agent[2108]: 2021-01-11 05:22:25.281 2108 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): chroot /tmp/tmpFpRF6S /bin/sh -c "grub2-install /dev/sda" execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
Jan 11 05:22:27 host-192-168-0-200 ironic-python-agent[2108]: 2021-01-11 05:22:27.290 2108 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): chroot /tmp/tmpFpRF6S /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg" execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372

On the controller we found sign of using grub-install on efi system which creates non secure boot compatible setups:

Boot0018* red   HD(1,GPT,be5dd387-fc63-4d37-b5a8-68ccca72b172,0x800,0x64000)/File(\EFI\red\grubx64.efi)

Here we can see that partitions are present on the disk so if system boots via EFI it happens through unmounted and not updated partition:

WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 300.0 GB, 299966136320 bytes, 585871360 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes
Disk label type: gpt
Disk identifier: 7954D191-CD2D-4DC3-A3A2-2696AB9E3634

#         Start          End    Size  Type            Name
 1         2048       411647    200M  EFI System      primary
 2       411648       413695      1M  Microsoft basic primary
 3       413696    585871325  279.2G  Microsoft basic primary

LABEL=img-rootfs / xfs defaults 0 1

1) /etc/fstab was not updated
2) grub-install was incorrectly used on EFI system

Comment 5 Lukas Bezdicka 2021-02-11 16:00:46 UTC
blkid output:
/dev/sda1: SEC_TYPE="msdos" LABEL="efi-part" UUID="1930-AFD0" TYPE="vfat" PARTLABEL="primary" PARTUUID="c5f32f78-0c85-469c-8649-1bfb1f56d116"

add /etc/fstab record:
UUID="1930-AFD0" /boot/efi vfat umask=0077 0 1

mount /boot/efi

dnf/yum reinstall grub2-efi-x64 shim-x64

efibootmgr -c --disk /dev/sda -p 1 -w -L RHEL -l "\\EFI\\redhat\\grubx64.efi" 

grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg


Comment 6 Lukas Bezdicka 2021-02-24 12:37:54 UTC
*** Bug 1906681 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2021-03-17 15:36:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 24 Steve Baker 2021-03-18 19:14:29 UTC
*** Bug 1936523 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.