Bug 2352953 - SW RAID on top of disks fails to boot (when biosboot is inside the RAID)
Summary: SW RAID on top of disks fails to boot (when biosboot is inside the RAID)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: cockpit
Version: 42
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Marius Vollmer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On: 2353195
Blocks: F42FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2025-03-17 15:58 UTC by Kamil Páral
Modified: 2025-03-30 14:28 UTC (History)
12 users (show)

Fixed In Version: cockpit-336-1.fc43 cockpit-336-1.fc42 cockpit-336-1.fc41 cockpit-336.2-1.fc43 cockpit-336.2-1.fc42
Clone Of:
Environment:
Last Closed: 2025-03-30 14:28:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
layout.png (78.57 KB, image/png)
2025-03-17 15:59 UTC, Kamil Páral
no flags Details
layout-valid.png (86.88 KB, image/png)
2025-03-17 15:59 UTC, Kamil Páral
no flags Details
review.png (61.96 KB, image/png)
2025-03-17 15:59 UTC, Kamil Páral
no flags Details
boot-fail.png (6.88 KB, image/png)
2025-03-17 15:59 UTC, Kamil Páral
no flags Details
anaconda.log (7.48 KB, text/plain)
2025-03-17 16:00 UTC, Kamil Páral
no flags Details
dbus.log (3.91 KB, text/plain)
2025-03-17 16:00 UTC, Kamil Páral
no flags Details
journal.log (2.51 MB, text/plain)
2025-03-17 16:00 UTC, Kamil Páral
no flags Details
packaging.log (19.59 KB, text/plain)
2025-03-17 16:00 UTC, Kamil Páral
no flags Details
program.log (1.68 KB, text/plain)
2025-03-17 16:00 UTC, Kamil Páral
no flags Details
storage.log (174.78 KB, text/plain)
2025-03-17 16:00 UTC, Kamil Páral
no flags Details
updates.img (1.95 MB, application/gzip)
2025-03-18 10:07 UTC, Katerina Koukiou
no flags Details
mdraid0 error message (91.27 KB, image/png)
2025-03-25 09:46 UTC, Kamil Páral
no flags Details

Description Kamil Páral 2025-03-17 15:58:57 UTC
Description of problem:
When I create a SW RAID0 directly on top of disks (not on top of partitions), the system fails to boot, it doesn't find any bootloader it seems.

See screenshots.

Version-Release number of selected component (if applicable):
anaconda-42.27.3-1.fc42.x86_64
F42 Beta Workstation

How reproducible:
always

Steps to Reproduce:
1. use two completely clean disks (no partitioning, not even a part table) in a BIOS (virtual) machine
2. in webUI, create a SW RAID0 on top of both disks
3. create biosboot, /boot and / partitions
4. install
5. see that it fails to boot (from any of the two disks)

Comment 1 Kamil Páral 2025-03-17 15:59:46 UTC
Created attachment 2080578 [details]
layout.png

Comment 2 Kamil Páral 2025-03-17 15:59:49 UTC
Created attachment 2080579 [details]
layout-valid.png

Comment 3 Kamil Páral 2025-03-17 15:59:52 UTC
Created attachment 2080580 [details]
review.png

Comment 4 Kamil Páral 2025-03-17 15:59:55 UTC
Created attachment 2080581 [details]
boot-fail.png

Comment 5 Kamil Páral 2025-03-17 16:00:02 UTC
Created attachment 2080582 [details]
anaconda.log

Comment 6 Kamil Páral 2025-03-17 16:00:09 UTC
Created attachment 2080584 [details]
dbus.log

Comment 7 Kamil Páral 2025-03-17 16:00:14 UTC
Created attachment 2080585 [details]
journal.log

Comment 8 Kamil Páral 2025-03-17 16:00:19 UTC
Created attachment 2080586 [details]
packaging.log

Comment 9 Kamil Páral 2025-03-17 16:00:22 UTC
Created attachment 2080587 [details]
program.log

Comment 10 Kamil Páral 2025-03-17 16:00:26 UTC
Created attachment 2080588 [details]
storage.log

Comment 11 Kamil Páral 2025-03-18 09:55:08 UTC
Proposing as a Final blocker:
"The installer must be able to create and install to any workable partition layout using any file system and/or container format combination offered in a default installer configuration. "
https://fedoraproject.org/wiki/Fedora_42_Final_Release_Criteria#Disk_layouts

Let's add a bit of a background. SW RAID on disks (and not on partitions) wasn't possible to do in GTK UIs, so with webUI it's the first time we can have it. This is all new. According to Katerina from Anaconda, grub should be able to boot from sw raid (containing biosboot and /boot), but a storage expert confirmation is needed. The most simple solution at this moment is possibly that anaconda will require a separate non-raid disk to contain biosboot (or ESP) and /boot, and so this is the likely fix that we'll have.

Comment 12 Katerina Koukiou 2025-03-18 09:59:53 UTC
Let's prevent having bootloaders on MDRAID. https://github.com/rhinstaller/anaconda-webui/pull/706

Comment 13 Katerina Koukiou 2025-03-18 10:07:54 UTC
Created attachment 2080720 [details]
updates.img

Here is the updates.img from the upstream PR. It's still under review.

Comment 14 Katerina Koukiou 2025-03-18 10:26:23 UTC
Split the relevant commit for this into seperate PR: https://github.com/rhinstaller/anaconda-webui/pull/707

Comment 15 Vojtech Trefny 2025-03-18 10:59:20 UTC
INFO:program:Running [6] mdadm --examine --brief /dev/vda ...
INFO:program:stdout[6]: ARRAY /dev/md/raid0  metadata=1.2 UUID=4109b403:bd3bd693:0ac8a43b:b82cfabf

So it looks like cockpit created the MD array with metadata version 1.2 which cannot be used for /boot and other boot devices.

Comment 16 Kamil Páral 2025-03-18 12:39:49 UTC
(In reply to Katerina Koukiou from comment #13)
> Created attachment 2080720 [details]
> updates.img
> 
> Here is the updates.img from the upstream PR. It's still under review.

This works, it no longer allows biosboot to be on top of RAID. So I need an extra disk, but installs and boots.

Comment 17 Katerina Koukiou 2025-03-18 13:07:13 UTC
According to https://bugzilla.redhat.com/show_bug.cgi?id=2352953#c15 this is caused because of wrong metadata. The storage validation in anaconda backend should be able to catch such issues, when we apply the partitioning, before we proceed with the installation. Let's handle this in the backend.

Comment 18 Marius Vollmer 2025-03-18 16:51:07 UTC
(In reply to Kamil Páral from comment #0)

> When I create a SW RAID0 directly on top of disks (not on top of
> partitions), the system fails to boot, it doesn't find any bootloader it
> seems.

Does it work when you explicitly create the MDRAID device on partitions?

Comment 19 Katerina Koukiou 2025-03-19 05:40:56 UTC
@Marius There are many parameters for deciding valid usage of RAID when it comes to bootloaders.

For example EFI needs RAID1 and metadata 1.0 for stage1 [0].

So to answer shortly, MDRAID device on partitions also has constraints which are automatically checked with verify_bootloader [1] method when we try to apply the partitioning.

What Blivet does, is that it tries to satisfy the constraints before creation, which it's possible as it is based on storage planning. So there is no MDRAID array created till the user specifies all storage configuration, including mount points. So, when the MDRAID is formatted as EFI bootloader, it adjusts for example the metadata default version from 1.2 to 1.0 [2]. 

In cockpit-storage we cannot do that, so I suggest:

- Disable creating RAID on disks directly for now as Blivet GUI never supported this, it's not currently extensively tested
- Disable formatting bootloader types on MDRAID devices (this is currently hidden [3])  

[0] https://github.com/rhinstaller/anaconda/blob/main/pyanaconda/modules/storage/platform.py#L209
[1] https://github.com/rhinstaller/anaconda/blob/main/pyanaconda/modules/storage/checker/utils.py#L170 
[2] https://github.com/storaged-project/blivet/blob/main/blivet/devices/md.py#L561
[3] https://github.com/cockpit-project/cockpit/blob/main/pkg/storaged/block/format-dialog.jsx#L154

Comment 20 Kamil Páral 2025-03-20 11:09:27 UTC
(In reply to Marius Vollmer from comment #18)
> Does it work when you explicitly create the MDRAID device on partitions?

Currently this doesn't have a simple answer. There are way too many possible combinations, and due to several different bugs and non-implemented features (e.g. a bootloader selection in anaconda webui), it's very difficult to test. My advice to Katerina was to limit the scope instead of trying to fix it hastily. There's no-one pressuring us to support raid-on-disks in F42. Because we're already after Beta and nearing Final, let's just drop it and avoid (at least partially) this can of worms for the moment. Disallowing bootloader partitions on raid is also very sensible atm.

Comment 21 Marius Vollmer 2025-03-20 13:30:57 UTC
(In reply to Kamil Páral from comment #20)
> (In reply to Marius Vollmer from comment #18)
> > Does it work when you explicitly create the MDRAID device on partitions?
> 
> Currently this doesn't have a simple answer.

I mean, did you try it?  I guess not. :-)

I would ask you to try it, but I think I know the answer: The system would still not boot and Anaconda would still not catch this.  The "directly on top of disks (not on top of partitions)" is a red-herring, as it turns out. I did not know this going into this, but it's pretty clear now, I'd say.

To the best of my understanding:

 1) Bootloaders need mdraids with metadata version 1.0 (or 0.90).  The reason (I think) is that this version of the metadata is written to the end of the member devices, and for a raid 1 (mirror), each of the member devices then looks exactly like the content of the mdraid, and the bootloaders don't need to know anything about mdraid. If the mdraid mirror contains a partition table, for example, this partition table is right at the start of each member device (because the mdraid superblock is at the end), and the bootloader will just find it there.  I am not sure of this, but it makes sense to me at least, and would also explain why bootloaders still don't support version 1.2 although that format is already over 20 years old.

 2) Anaconda has a check for this, and will reject a configuration that has biosboot on a mdraid with the wrong metadata version, for example. But it doesn't descend into the content of the mdraid deep enough when performing this check. If there is a partition table on the mdraid and biosboot is one of its partitions, then Anaconda will miss it.

 3) Cockpit just blissfully creates mdraids with the default metadata version, which is 1.2.

These three things together create the bug reported here.

Fixing 1), even long term, seems out of the question.

2) is being fixed by https://github.com/rhinstaller/anaconda/pull/6279 (when checking the md device validity check also parents)

3) is being fixed by https://github.com/cockpit-project/cockpit/pull/21731 (Use mdraid metadata version 1.0 when in Anaconda mode)

To reduce the scope, I would propose to completely disable creation of mdraids in Cockpit (https://github.com/cockpit-project/cockpit/pull/21730), and enable it again with https://github.com/cockpit-project/cockpit/pull/21731.

Comment 22 Kamil Páral 2025-03-20 14:10:26 UTC
(In reply to Marius Vollmer from comment #21)
> I mean, did you try it?  I guess not. :-)

Well, thank you. I've spent a good chunk of the past week testing just swraid in anaconda (and I can't do just that forever). I've reported several different bugs. The unfortunate situation is that there are currently anaconda bugs which make swraid testing difficult. Your question was not specific (which raid level, bios or uefi, what bootloader partition placement, etc), and I can't spend another day or two trying to make a table, when we just want to disable it.

I can go back to swraid testing once at least some anaconda fixes are present in current composes.

> I did not know this going into this, but it's pretty clear now, I'd say.

Thank you for your investigation.

Comment 23 Marius Vollmer 2025-03-20 14:38:24 UTC
(In reply to Kamil Páral from comment #22)
> (In reply to Marius Vollmer from comment #21)
> > I mean, did you try it?  I guess not. :-)
> 
> Well, thank you.

No, I thank _you_ for testing and reporting! :)

> Your question was not specific (which raid level, bios or uefi, what bootloader partition placement, etc)

Just exactly what you wrote down in the "steps to reproduce", but step 1 would be "Two disks, each with a partition table and a single partition spanning the whole disk".  But no need to bother.

Comment 24 Martin Pitt 2025-03-24 07:06:53 UTC
https://github.com/cockpit-project/cockpit/pull/21731 quick-fixes this by forcing mdraid format 1.0 in Anaconda mode.

Comment 25 Fedora Update System 2025-03-24 08:13:44 UTC
FEDORA-2025-5f443b7585 (cockpit-336-1.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-5f443b7585

Comment 26 Fedora Update System 2025-03-24 08:16:03 UTC
FEDORA-2025-f696bd8401 (cockpit-336-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-f696bd8401

Comment 27 Fedora Update System 2025-03-24 08:16:40 UTC
FEDORA-2025-d0209082ac (cockpit-336-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-d0209082ac

Comment 28 Adam Williamson 2025-03-24 15:52:15 UTC
+7 in https://pagure.io/fedora-qa/blocker-review/issue/1799 , marking accepted.

Comment 29 Fedora Update System 2025-03-24 23:54:00 UTC
FEDORA-2025-5f443b7585 (cockpit-336-1.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 30 Fedora Update System 2025-03-25 00:16:36 UTC
FEDORA-2025-f696bd8401 (cockpit-336-1.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 31 Fedora Update System 2025-03-25 04:00:18 UTC
FEDORA-2025-d0209082ac has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-d0209082ac`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-d0209082ac

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 32 Kamil Páral 2025-03-25 08:47:36 UTC
The blocker solution should also require this update, currently in testing:
https://bodhi.fedoraproject.org/updates/FEDORA-2025-63711d039a

Reopening for tracking and verification.

Comment 33 Kamil Páral 2025-03-25 09:45:19 UTC
With both
https://bodhi.fedoraproject.org/updates/FEDORA-2025-63711d039a
https://bodhi.fedoraproject.org/updates/FEDORA-2025-f696bd8401

I can now create a MDRAID1 device containing biosboot and boot from it. Great. Marking as VERIFIED.

When I try with MDRAID0, I see this error message:
> Failed to find a suitable stage1 device: RAID sets that contain 'RAID Device' must have one of the following raid levels: raid1.

I believe this is trying to say that I can't boot from MDRAID1. But it confuses me, should it say "biosboot partition" (or ESP partition) instead of 'RAID Device'?

Comment 34 Kamil Páral 2025-03-25 09:46:13 UTC
Created attachment 2081865 [details]
mdraid0 error message

Comment 35 Fedora Update System 2025-03-26 02:22:41 UTC
FEDORA-2025-d0209082ac (cockpit-336-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 36 Kamil Páral 2025-03-26 10:56:02 UTC
Still waiting for https://bodhi.fedoraproject.org/updates/FEDORA-2025-63711d039a

Comment 37 Martin Pitt 2025-03-27 15:30:19 UTC
Apparently that hack made it worse, see bug 2355346. Katerina asked us to revert the patch for this bug, which I did in https://github.com/cockpit-project/cockpit/pull/21793 . As soon as that lands, I'll backport it to Fedora 42, and do an upload together with refreshed pt-br translations (see bug 2354986). Instead anaconda will forbid /boot on mdraid at least for 42, and this will be tracked in bug 2355346 instead.

Comment 38 Martin Pitt 2025-03-27 18:33:46 UTC
This is still being disputed, arguably the parted invocation is just broken. Resetting to VERIFIED for the time being.

Comment 39 Martin Pitt 2025-03-28 10:11:45 UTC
After discussion between Katerina, Marius, and Vojtech, the revert landed after all: https://github.com/cockpit-project/cockpit/pull/21793 I will push this out as another point release.

The hack will be replaced with https://github.com/rhinstaller/anaconda-webui/pull/707 and tracked in bug 2355346

Comment 40 Fedora Update System 2025-03-28 12:06:33 UTC
FEDORA-2025-e5052a0568 (cockpit-336.2-1.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-e5052a0568

Comment 41 Fedora Update System 2025-03-28 13:15:37 UTC
FEDORA-2025-746939659d (cockpit-336.2-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-746939659d

Comment 42 Fedora Update System 2025-03-28 13:33:51 UTC
FEDORA-2025-e5052a0568 (cockpit-336.2-1.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 43 Martin Pitt 2025-03-28 13:37:20 UTC
Let's track this in F42. I tried to unset "close bugs on stable" in bodhi, but this can't be done as it's not a "real" bodhi update.

Comment 44 Fedora Update System 2025-03-28 14:39:24 UTC
FEDORA-2025-900bb2b163 (anaconda-42.27.10-1.fc42 and anaconda-webui-31-1.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-900bb2b163

Comment 45 Fedora Update System 2025-03-29 01:31:37 UTC
FEDORA-2025-746939659d has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-746939659d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-746939659d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 46 Fedora Update System 2025-03-29 01:31:45 UTC
FEDORA-2025-900bb2b163 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-900bb2b163`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-900bb2b163

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 47 Fedora Update System 2025-03-30 00:16:35 UTC
FEDORA-2025-746939659d (cockpit-336.2-1.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 48 Fedora Update System 2025-03-30 00:16:44 UTC
FEDORA-2025-900bb2b163 (anaconda-42.27.10-1.fc42 and anaconda-webui-31-1.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 49 Kamil Páral 2025-03-30 14:28:08 UTC
With the latest updates, this is now resolved by Anaconda *not allowing* to have biosboot/ESP on any MDRAID volume at all. Tested, seems to work reliably.


Note You need to log in before you can comment on or make changes to this bug.