Bug 1909453 - Boot disk RAID can corrupt ESP if UEFI firmware writes to it
Summary: Boot disk RAID can corrupt ESP if UEFI firmware writes to it
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.7.0
Assignee: Benjamin Gilbert
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1915617
TreeView+ depends on / blocked
 
Reported: 2020-12-20 05:00 UTC by Benjamin Gilbert
Modified: 2021-02-24 15:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:47:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos fcct pull 178 0 None closed config/fcos/v1_[34]: un-RAID ESP 2021-01-25 13:29:35 UTC
Github coreos fedora-coreos-config pull 794 0 None closed 40ignition-ostree: copy ESP contents as independent filesystems 2021-01-25 13:29:36 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:47:59 UTC

Description Benjamin Gilbert 2020-12-20 05:00:19 UTC
If

1. boot disk RAID is enabled on a UEFI system, and
2. the firmware decides to write to the ESP,

then

3. the ESP RAID will desynchronize,
4. subsequent ESP reads inside the OS may return incoherent FS metadata, and
5. subsequent ESP writes inside the OS may corrupt filesystem state based on that FS metadata.

The fix is to stop RAIDing the ESP, and instead maintain multiple independent replicas that are synchronized by the OS at the file level.

Comment 3 Michael Nguyen 2021-01-22 23:20:27 UTC
cat << EOF > test.fcc
variant: fcos
version: 1.3.0
passwd:
  users:
    - name: core
      password_hash: "$6$ZgbiFMCFmY/pLBLH$u3kTFAmzDCvnThFyBR931rWyN7xHa44BCBru9RNFgkKQbyycQEviaCNJhYQXyJ5NMqg2QvrzoScM8y4MJzWC11"
      ssh_authorized_keys:
        - ssh-rsa AAA
boot_device:
  mirror:
    devices:
      - /dev/vda
      - /dev/vdb
EOF


podman run -i --rm quay.io/coreos/fcct:release --pretty --strict < test.fcc > test.ign


cosa run --qemu-image=rhcos-47.83.202101161239-0-qemu.x86_64.qcow2 --ignition test.ign --add-disk 5G --add-disk 5G --memory 4096

[core@cosa-devsh boot]$ lsblk -f
NAME      FSTYPE            LABEL       UUID                                 MOUNTPOINT
sr0                                                                          
vda                                                                          
|-vda1                                                                       
|-vda2    vfat              esp-1       59F7-8E27                            
|-vda3    linux_raid_member any:md-boot a7f0332b-efc4-eac3-b81c-838de023a5c7 
| `-md127 ext4              boot        fda55eb7-fcb3-44c4-81ec-ef307ba95dc4 /boot
`-vda4    linux_raid_member any:md-root 57bfdcae-8abe-5583-5ec8-f10ad772342c 
  `-md126 xfs               root        9b7e18c6-c278-4428-82e4-f931d2f7eeec /sysroot
vdb                                                                          
|-vdb1                                                                       
|-vdb2    vfat              esp-2       59F7-A8C5                            
|-vdb3    linux_raid_member any:md-boot a7f0332b-efc4-eac3-b81c-838de023a5c7 
| `-md127 ext4              boot        fda55eb7-fcb3-44c4-81ec-ef307ba95dc4 /boot
`-vdb4    linux_raid_member any:md-root 57bfdcae-8abe-5583-5ec8-f10ad772342c 
  `-md126 xfs               root        9b7e18c6-c278-4428-82e4-f931d2f7eeec /sysroot
vdc                                                                          
|-vdc1                                                                       
|-vdc2    vfat              EFI-SYSTEM  F811-ED3D                            
|-vdc3    ext4              boot        2a7a3d36-e6a9-40e3-87aa-2d08945671b0 
`-vdc4    xfs               root        910678ff-f77e-4a7d-8d53-86f2ac47a823

Comment 6 errata-xmlrpc 2021-02-24 15:47:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.