When deleting an MD RAID array using the Cockpit Storage UI, the removal process does not properly wipe the array metadata. This results in issues when attempting to create a new RAID array using the same name and disks. Specifically, a "broken" GPT table may appear on the newly created array, leading to partitioning errors. Reproducible: Always Steps to Reproduce: Reproducer: 1. Create a RAID0 array using /dev/vda, /dev/vdb, and /dev/vdc. 2. Apply a GPT label and create three partitions. 3. Delete the RAID array using the Cockpit Storage UI. 4. Create a RAID1 array using the same disks and the same name. 5. Apply a GPT label and create three partitions. 6. Run parted -l Actual Results: This error shows up: Not all of the space available to /dev/md/raid0 appears to be used, you ca n fix the GPT to use all of the space (an extra 62877696 blocks) or continue with the current setting? The newly created RAID1 array has leftover metadata from the previously deleted RAID0 array, leading to GPT corruption and partitioning errors. Expected Results: Deleting a RAID array in Cockpit Storage should properly clean up all metadata, ensuring no residual partition tables or RAID metadata interfere with future operations. This caused a bug in anaconda, as seen in the attached screenshot and journal.
Created attachment 2083251 [details] journalctl output
Created attachment 2083252 [details] anaconda crash
*** Bug 2357128 has been marked as a duplicate of this bug. ***
I reported bug 2357128 which was said to be a duplicate of this, so you can see some additional logs etc there. It was proposed as a blocker, so the proposal got transferred here.
We have debugged this some using this reproducing test case: https://github.com/KKoukiou/anaconda-webui/tree/rhbz%232357214 I can not find anything wrong with the things that Cockpit and UDisks2 have done up to the point where Anaconda takes over again with the "Checking storage confirmation" dialog. All udev properties are as expected and parted correctly identifies the partition labels on all devices. Parted also does not complain about "Not all of the space available appears to be used" when run on the command line. The first time of "Checking storage confirmation" with the raid0 (striping) mdraid called "SOMERAID", everything succeeds. Note that this time /dev/md/SOMERAID is 30 GiB in size (two times 15 GiB because of striping). Then the test deletes /dev/md/SOMERAID and creates a new mdraid with the same name but level raid1 (mirroring). The second time "Checking storage confirmation" is done with this array. This time /dev/md/SOMERAID is only 15 GiB (one times 15 GiB because of mirroring). This time parted (or probably rather libparted) reports: INFO:blivet:parted exception: Not all of the space available to /dev/md/SOMERAID appears to be used, you can fix the GPT to use all of the space (an extra 31438848 blocks) or continue with the current setting? INFO:blivet:parted exception: Invalid argument during seek for write on /dev/md/SOMERAID Parted (or libparted) thinks that /dev/md/SOMERAID is 31438848 blocks larger than what the partition table on it says. Note that 31438848 times 512 bytes is 15 GiB. My engineering pinky tells me that somehow libparted is still working with the assumption that /dev/md/SOMERAID is 30 GiB in size. Anaconda tells libparted to go ahead and fix this inconsistenty, but then we see "Invalid argument during seek for write on /dev/md/SOMERAID". Did libparted try to write something at what it thinks is the end of the block device? That would fail with such an error because the block device is of course only 15 GiB and not 30 GiB as parted might assume. And indeed, the fixing does not result in something consistent: Afterwards, the partition table type of /dev/md/SOMERAID is detected as "PMBR" by blkid. This is not a partition table type that we should ever see. It's a naked "Protective Master Boot Record" that should normally always be followed by the rest of the GPT structure. Confusion arises from this, with the block device for the partition /dev/md/SOMERAID1 still existing but not having any of the expected udev properties of partitions. If we change the test to create the first (striped, 30 GiB) mdraid with name "SOMERAID" and the second (mirrored, 15 GiB) mdraid with name "OTHERRAID", then there is no message from parted. No "fixing" happens and all devices keep their expected udev properties. Anaconda goes on to do its things and then fails due to another bug: https://bugzilla.redhat.com/show_bug.cgi?id=2354798
(In reply to Marius Vollmer from comment #5) > Anaconda goes on to do its things I forgot to mention that during doing its thing, we see this error 12 times or so: WARNING:dasbus.server.handler:The call org.fedoraproject.Anaconda.Modules.Storage.DeviceTree.Viewer.GetDiskTotalSpace has failed with an exception: Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/dasbus/server/handler.py", line 455, in _method_callback result = self._handle_call( interface_name, ...<2 lines>... **additional_args ) File "/usr/lib/python3.13/site-packages/dasbus/server/handler.py", line 265, in _handle_call return handler(*parameters, **additional_args) File "/usr/lib64/python3.13/site-packages/pyanaconda/modules/storage/devicetree/viewer_interface.py", line 185, in GetDiskTotalSpace return self.implementation.get_disk_total_space(disk_ids) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^ File "/usr/lib64/python3.13/site-packages/pyanaconda/modules/storage/devicetree/viewer.py", line 495, in get_disk_total_space disks = self._get_devices(disk_ids) File "/usr/lib64/python3.13/site-packages/pyanaconda/modules/storage/devicetree/viewer.py", line 292, in _get_devices return list(map(self._get_device, device_ids)) File "/usr/lib64/python3.13/site-packages/pyanaconda/modules/storage/devicetree/viewer.py", line 282, in _get_device raise UnknownDeviceError(device_id) pyanaconda.modules.common.errors.storage.UnknownDeviceError: MDRAID-SOMERAID Aanconda is still trying to access the "SOMERAID" device, which does no longer exist. I take this as a hint that Anacondo or Blivet or libparted might indeed keep outdated information about block devices, maybe including their size.
This error from the comment above is a red herring. Anaconda-webui indeed has some old state, and tries to read information about invalid devices. But this is not problematic. I actually have an open PR to fix that [1], and with that this error is not present in the journal (we ignore it now anyway) [1] https://github.com/rhinstaller/anaconda-webui/pull/754
when we delete the first SOMERAID, do we do the equivalent of a `wipefs -a` on the disks to ensure all RAID metadata related to it is wiped? If not, I can certainly see that might cause issues.
(In reply to Adam Williamson from comment #8) > when we delete the first SOMERAID, do we do the equivalent of a `wipefs -a` > on the disks to ensure all RAID metadata related to it is wiped? There is wiping, but it is done with the libblkid function blkid_do_wipe after using libblkid to probe for superblocks. I don't know whether that is equivalent to the "-a" flag. The same kind of wiping also happens on a newly created mdraid device.
(In reply to Marius Vollmer from comment #5) > Afterwards, > the partition table type of /dev/md/SOMERAID is detected as "PMBR" by blkid. > This is not a partition table type that we should ever see. I have noticed this during my anaconda workflows. Sometimes, after deleting an MDRAID device (IIRC), one of the former-raid-member disks was marked as having PMBR part table. I paid no special attention to it, and I don't know how to trigger it intentionally, but I saw it multiple times.
Here is another observation. The reproducing test does these steps: 1) Cockpit: Create level 0 mdraid of size 30 GiB named "SOMERAID" on two devices with root partition on it. 2) Anaconda: "Check storage configuration", then cancel and return to Cockpit 3) Cockpit: Delete SOMERAID 4) Cockpit: Create level 1 mdraid of size 15 GiB named "SOMERAID" on the same two devices with root partition on it. 5) Anaconda: "Check storage configuration" With these steps, parted does the autofixing in step 5 and destroy the partitions on SOMERAID in the process. (This is what I have described in comment 5.) The new observation is: If we remove step 2) from this, then parted does no autofixing, the partitions on SOMERAID stay intact, and the test passes. Thus, if Anaconda/Blivet/Parted never "see" SOMERAID while it is 30 GiB, they accept it without complaint when it is 15 GiB.
I was able to reproduce this with only blivet and mdadm. I am not sure if this is a blivet or py/libparted issue yet, but I am moving the bug to blivet for now, because anaconda and cockpit are not involved. Minimal reproducer: ``` import os import blivet # RAID 0 creation with GPT and single 50 GiB partition os.system("mdadm --create --run SOMERAID --level=raid0 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd") os.system("parted --script /dev/md/SOMERAID mklabel gpt mkpart primary 1MiB 50GiB") # blivet rescan b=blivet.Blivet() b.reset() # Remove MD metadata from disks os.system("mdadm --stop /dev/md/SOMERAID") os.system("mdadm --zero-superblock /dev/sdb /dev/sdc /dev/sdd") # RAID 1 creation with GPT and single 15 GiB partition os.system("mdadm --create --run SOMERAID --level=raid1 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd") os.system("parted --script /dev/md/SOMERAID mklabel gpt mkpart primary 1MiB 15GiB") # blivet rescan try: b.reset() except Exception as e: print("Rescan failed: %s" % str(e)) # check the partition table with fdisk os.system("fdisk -l /dev/md/SOMERAID") finally: # cleanup os.system("wipefs -a /dev/md/SOMERAID") os.system("mdadm --stop /dev/md/SOMERAID") os.system("mdadm --zero-superblock /dev/sdb /dev/sdc /dev/sdd") ```
Discussed during the 2025-04-07 blocker review meeting [1]: * AGREED: 2357214 - Accepted FinalBlocker - This violates our criterion: "...installer must be able to: Correctly interpret, and modify ... software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions". [1] https://meetbot.fedoraproject.org/blocker-review_matrix_fedoraproject-org/2025-04-07/f42-blocker-review.2025-04-07-16.01.log.html
upstream PR: https://github.com/storaged-project/blivet/pull/1365
FEDORA-2025-c88bd0d892 (python-blivet-3.12.1-2.fc42) has been submitted as an update to Fedora 42. https://bodhi.fedoraproject.org/updates/FEDORA-2025-c88bd0d892
With F42 RC 1.1, I've made an installation where I changed MDRAID0 to MDRAID1 while keeping the device name, and no crash occurred, the system works fine. It looks resolved.
FEDORA-2025-c88bd0d892 has been pushed to the Fedora 42 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-c88bd0d892` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-c88bd0d892 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2025-c88bd0d892 (python-blivet-3.12.1-2.fc42) has been pushed to the Fedora 42 stable repository. If problem still persists, please make note of it in this bug report.