1916502 – Boot disk mirroring fails with mdadm error

Bug 1916502 - Boot disk mirroring fails with mdadm error

Summary: Boot disk mirroring fails with mdadm error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.7
Hardware:	ppc64le
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jonathan Lebon
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1915617
TreeView+	depends on / blocked

Reported:	2021-01-14 22:37 UTC by Prashanth Sundararaman
Modified:	2021-02-24 15:54 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:53:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	coreos coreos-installer pull 453	0	None	closed	blockdev: use --nodeps when querying single device	2021-01-27 08:14:03 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:54:00 UTC

Description Prashanth Sundararaman 2021-01-14 22:37:35 UTC

As part of testing boot disk mirroring on ppc64le (https://coreos.github.io/fcct/examples/#mirrored-boot-disk), tried to mirror boot disk inside cosa. this is the fcc i used:

variant: fcos
version: 1.3.0
boot_device:
  layout: ppc64le
  mirror:
    devices:
      - /dev/vda
      - /dev/vdb

and generated the ignition file from this and used cosa run with a couple of disks added to it.

cosa run --add-disk 4G --add-disk 4G -i ign.test --devshell-console

And after ignition runs, I see this:

   70.100303] systemd[1]: Starting CoreOS Boot Edit...
         Starting CoreOS Boot Edit...
[   70.110989] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: (null)
[   70.146240] coreos-boot-edit[1093]: mdadm: /dev/vdb3 does not appear to be an md device
[   70.147305] coreos-boot-edit[1093]: Error: "mdadm" "--detail" "--export" "/dev/vdb3" failed with exit code: 1
[   70.201503] systemd[1]: Reloading.
[   70.440946] systemd[1]: coreos-boot-edit.service: Main process exited, code=exited, status=1/FAILURE
[   70.442726] systemd[1]: coreos-boot-edit.service: Failed with result 'exit-code'.
[FAILED] Failed to start CoreOS Boot Edit.

Comment 1 Benjamin Gilbert 2021-01-15 02:49:01 UTC

This is happening on x86_64 as well.

From the log, it appears rdcore is trying to examine a RAID member as though it's a RAID device.

Comment 2 Benjamin Gilbert 2021-01-15 03:08:52 UTC

# lsblk --pairs  --paths --output TYPE /dev/vda4

On FCOS this produces:

TYPE="part"
TYPE="raid1"

On RHCOS this produces:

TYPE="raid1"
TYPE="part"

We expect "part", and lsblk_single() takes the first result.

Comment 4 Micah Abbott 2021-01-15 20:36:27 UTC

Higher priority work has prevented from this issue being solved; adding UpcomingSprint keyword

Comment 5 Prashanth Sundararaman 2021-01-15 21:23:45 UTC

tested with the coreos-installer fix on ppc64le and it works fine.

Comment 7 Michael Nguyen 2021-01-20 22:42:27 UTC

Verified on RHCOS 47.83.202101161239-0.

cat < EOF > test.fcc
variant: fcos
version: 1.3.0
passwd:
  users:
    - name: core
      password_hash: "$6$ZgbiFMCFmY/pLBLH$u3kTFAmzDCvnThFyBR931rWyN7xHa44BCBru9RNFgkKQbyycQEviaCNJhYQXyJ5NMqg2QvrzoScM8y4MJzWC11"
boot_device:
  mirror:
    devices:
      - /dev/vda
      - /dev/vdb
EOF

podman run -i --rm quay.io/coreos/fcct:release --pretty --strict < test.fcc > test.ign
coreos-assembler shell
kola qemuexec --qemu-image rhcos-47.83.202101161239-0-qemu.x86_64.qcow2  -i test.ign --add-disk 5G --add-diskk 5G --memory 4096


[root@ibm-p8-kvm-03-guest-02 md]# rpm-ostree status
State: idle
Deployments:
* ostree://8e87a86b9444784ab29e7917fa82e00d5e356f18b19449946b687ee8dc27c51a
                   Version: 47.83.202101161239-0 (2021-01-16T12:43:01Z)
[root@ibm-p8-kvm-03-guest-02 md]# mdadm --detail --scan
ARRAY /dev/md/md-boot metadata=1.0 name=any:md-boot UUID=87d7e42b:78ad6a74:04e7f781:ba9c6057
ARRAY /dev/md/md-root metadata=1.2 name=any:md-root UUID=2699ce86:00acc9a9:9a6cb951:a40674cd

Comment 10 errata-xmlrpc 2021-02-24 15:53:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.