Bug 1909455
| Summary: | Boot disk RAID will not boot if the primary disk enumerates but fails I/O | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Benjamin Gilbert <bgilbert> |
| Component: | RHCOS | Assignee: | Benjamin Gilbert <bgilbert> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | bbreard, imcleod, jligon, nstielau |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:47:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1915617 | ||
|
Description
Benjamin Gilbert
2020-12-20 05:07:30 UTC
Unable to simulate disk I/O error so just verified RAID /boot and grub.cfg file contains the correct bits.
[core@cosa-devsh ~]$ rpm-ostree status
State: idle
Deployments:
* ostree://8e87a86b9444784ab29e7917fa82e00d5e356f18b19449946b687ee8dc27c51a
Version: 47.83.202101161239-0 (2021-01-16T12:43:01Z)
[core@cosa-devsh ~]$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sr0
vda
|-vda1
|-vda2 vfat esp-1 925B-A4E7
|-vda3 linux_raid_member any:md-boot 719af5c2-ad77-c76d-5bf7-386f2615494c
| `-md127 ext4 boot 7b8a382d-3039-4910-bc03-82b2775c2a64 /boot
`-vda4 linux_raid_member any:md-root fc5fb428-9c1c-15ee-c51e-45258bc646fe
`-md126 xfs root 0f752e48-64b5-4db0-907a-e736e1d2313e /sysroot
vdb
|-vdb1
|-vdb2 vfat esp-2 925B-FC96
|-vdb3 linux_raid_member any:md-boot 719af5c2-ad77-c76d-5bf7-386f2615494c
| `-md127 ext4 boot 7b8a382d-3039-4910-bc03-82b2775c2a64 /boot
`-vdb4 linux_raid_member any:md-root fc5fb428-9c1c-15ee-c51e-45258bc646fe
`-md126 xfs root 0f752e48-64b5-4db0-907a-e736e1d2313e /sysroot
vdc
|-vdc1
|-vdc2 vfat EFI-SYSTEM F811-ED3D
|-vdc3 ext4 boot 07ca1891-f27a-421d-a2f9-70326ca46858
`-vdc4 xfs root 910678ff-f77e-4a7d-8d53-86f2ac47a823
[core@cosa-devsh ~]$ cat /boot/grub2/grub.cfg
set pager=1
# petitboot doesn't support -e and doesn't support an empty path part
if [ -d (md/md-boot)/grub2 ]; then
# fcct currently creates /boot RAID with superblock 1.0, which allows
# component partitions to be read directly as filesystems. This is
# necessary because transposefs doesn't yet rerun grub2-install on BIOS,
# so GRUB still expects /boot to be a partition on the first disk.
#
# There are two consequences:
# 1. On BIOS and UEFI, the search command might pick an individual RAID
# component, but we want it to use the full RAID in case there are bad
# sectors etc. The undocumented --hint option is supposed to support
# this sort of override, but it doesn't seem to work, so we set $boot
# directly.
# 2. On BIOS, the "normal" module has already been loaded from an
# individual RAID component, and $prefix still points there. We want
# future module loads to come from the RAID, so we reset $prefix.
# (On UEFI, the stub grub.cfg has already set $prefix properly.)
set boot=md/md-boot
set prefix=($boot)/grub2
else
search --label boot --set boot
fi
set root=$boot
if [ -f ${config_directory}/grubenv ]; then
load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
load_env
fi
if [ x"${feature_menuentry_id}" = xy ]; then
menuentry_id_option="--id"
else
menuentry_id_option=""
fi
function load_video {
if [ x$feature_all_video_module = xy ]; then
insmod all_video
else
insmod efi_gop
insmod efi_uga
insmod ieee1275_fb
insmod vbe
insmod vga
insmod video_bochs
insmod video_cirrus
fi
}
serial --speed=115200
terminal_input serial console
terminal_output serial console
if [ x$feature_timeout_style = xy ] ; then
set timeout_style=menu
set timeout=1
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
set timeout=1
fi
# Determine if this is a first boot and set the ${ignition_firstboot} variable
# which is used in the kernel command line.
set ignition_firstboot=""
if [ -f "/ignition.firstboot" ]; then
# Default networking parameters to be used with ignition.
set ignition_network_kcmdline=''
# Source in the `ignition.firstboot` file which could override the
# above $ignition_network_kcmdline with static networking config.
# This override feature is also by coreos-installer to persist static
# networking config provided during install to the first boot of the machine.
source "/ignition.firstboot"
set ignition_firstboot="ignition.firstboot ${ignition_network_kcmdline}"
fi
blscfg
[core@cosa-devsh ~]$ rpm-ostree status
State: idle
Deployments:
* ostree://8e87a86b9444784ab29e7917fa82e00d5e356f18b19449946b687ee8dc27c51a
Version: 47.83.202101161239-0 (2021-01-16T12:43:01Z)
[core@cosa-devsh ~]$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |