Bug 1909453

Summary: Boot disk RAID can corrupt ESP if UEFI firmware writes to it
Product: OpenShift Container Platform Reporter: Benjamin Gilbert <bgilbert>
Component: RHCOSAssignee: Benjamin Gilbert <bgilbert>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: bbreard, imcleod, jligon, nstielau
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:47:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1915617    

Description Benjamin Gilbert 2020-12-20 05:00:19 UTC
If

1. boot disk RAID is enabled on a UEFI system, and
2. the firmware decides to write to the ESP,

then

3. the ESP RAID will desynchronize,
4. subsequent ESP reads inside the OS may return incoherent FS metadata, and
5. subsequent ESP writes inside the OS may corrupt filesystem state based on that FS metadata.

The fix is to stop RAIDing the ESP, and instead maintain multiple independent replicas that are synchronized by the OS at the file level.

Comment 3 Michael Nguyen 2021-01-22 23:20:27 UTC
cat << EOF > test.fcc
variant: fcos
version: 1.3.0
passwd:
  users:
    - name: core
      password_hash: "$6$ZgbiFMCFmY/pLBLH$u3kTFAmzDCvnThFyBR931rWyN7xHa44BCBru9RNFgkKQbyycQEviaCNJhYQXyJ5NMqg2QvrzoScM8y4MJzWC11"
      ssh_authorized_keys:
        - ssh-rsa AAA
boot_device:
  mirror:
    devices:
      - /dev/vda
      - /dev/vdb
EOF


podman run -i --rm quay.io/coreos/fcct:release --pretty --strict < test.fcc > test.ign


cosa run --qemu-image=rhcos-47.83.202101161239-0-qemu.x86_64.qcow2 --ignition test.ign --add-disk 5G --add-disk 5G --memory 4096

[core@cosa-devsh boot]$ lsblk -f
NAME      FSTYPE            LABEL       UUID                                 MOUNTPOINT
sr0                                                                          
vda                                                                          
|-vda1                                                                       
|-vda2    vfat              esp-1       59F7-8E27                            
|-vda3    linux_raid_member any:md-boot a7f0332b-efc4-eac3-b81c-838de023a5c7 
| `-md127 ext4              boot        fda55eb7-fcb3-44c4-81ec-ef307ba95dc4 /boot
`-vda4    linux_raid_member any:md-root 57bfdcae-8abe-5583-5ec8-f10ad772342c 
  `-md126 xfs               root        9b7e18c6-c278-4428-82e4-f931d2f7eeec /sysroot
vdb                                                                          
|-vdb1                                                                       
|-vdb2    vfat              esp-2       59F7-A8C5                            
|-vdb3    linux_raid_member any:md-boot a7f0332b-efc4-eac3-b81c-838de023a5c7 
| `-md127 ext4              boot        fda55eb7-fcb3-44c4-81ec-ef307ba95dc4 /boot
`-vdb4    linux_raid_member any:md-root 57bfdcae-8abe-5583-5ec8-f10ad772342c 
  `-md126 xfs               root        9b7e18c6-c278-4428-82e4-f931d2f7eeec /sysroot
vdc                                                                          
|-vdc1                                                                       
|-vdc2    vfat              EFI-SYSTEM  F811-ED3D                            
|-vdc3    ext4              boot        2a7a3d36-e6a9-40e3-87aa-2d08945671b0 
`-vdc4    xfs               root        910678ff-f77e-4a7d-8d53-86f2ac47a823

Comment 6 errata-xmlrpc 2021-02-24 15:47:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633