Bug 2325906 - [live] Can't reuse existing RAID partitioning
Summary: [live] Can't reuse existing RAID partitioning
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 42
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jonathan Wright
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
: 2351809 (view as bug list)
Depends On:
Blocks: F42FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2024-11-13 14:36 UTC by Jiri Kortus
Modified: 2025-04-08 12:18 UTC (History)
15 users (show)

Fixed In Version: mdadm-4.3-7.fc43 mdadm-4.3-7.fc42
Clone Of:
Environment:
Last Closed: 2025-04-08 12:18:04 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
anaconda.log (9.77 KB, text/plain)
2024-11-13 14:40 UTC, Jiri Kortus
no flags Details
dbus.log (3.91 KB, text/plain)
2024-11-13 14:40 UTC, Jiri Kortus
no flags Details
lvm.log (42.45 KB, text/plain)
2024-11-13 14:40 UTC, Jiri Kortus
no flags Details
packaging.log (3.24 KB, text/plain)
2024-11-13 14:40 UTC, Jiri Kortus
no flags Details
program.log (6.92 KB, text/plain)
2024-11-13 14:40 UTC, Jiri Kortus
no flags Details
storage.log (245.95 KB, text/plain)
2024-11-13 14:41 UTC, Jiri Kortus
no flags Details
storage.state (16.00 KB, application/octet-stream)
2024-11-13 14:41 UTC, Jiri Kortus
no flags Details
storage.log, F41 (332.33 KB, text/plain)
2024-11-21 14:11 UTC, Jan Stodola
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github md-raid-utilities mdadm pull 160 0 None open Rename is_name_posix_compatible to is_name_valid, allow : 2025-03-21 23:05:44 UTC

Description Jiri Kortus 2024-11-13 14:36:50 UTC
It's not possible to reuse existing RAID partitioning in the web UI (tested with anaconda-42.13-1.fc42 + LVM on top of RAID1 with an existing Fedora 40 installation), as the partitioning phase fails with the following error:

org.fedoraproject.Anaconda.StorageInstallationError: Process reported exit code 1: mdadm: Value "system2:pv00" cannot be set as devname. Reason: Not POSIX compatible.

Moreover, even though the root LV (on top of the RAID1 device) can be selected for reuse, there isn't any indication that it's a root device, which can be slightly confusing.

Reproducible: Always

Steps to Reproduce:
1. Install Fedora 40 with a default LVM partitioning on top of RAID1.
2. Start Fedora Rawhide installation, select the available disk devices for partitioning.
3. Select that you want to reuse the existing partitions for installation (existing boot partition for / boot, and root LV for /) and enable reformatting them. Proceed with the installation.
Actual Results:  
A critical error occurs in the partitioning phase.

Expected Results:  
Installation is successful, the selected LV(s) on top of existing RAID, as well as other partitions, are reused in the installed system.

Trying to activate the RAID devices manually prior to installation doesn't make any difference.

Comment 1 Jiri Kortus 2024-11-13 14:40:46 UTC
Created attachment 2057505 [details]
anaconda.log

Comment 2 Jiri Kortus 2024-11-13 14:40:49 UTC
Created attachment 2057506 [details]
dbus.log

Comment 3 Jiri Kortus 2024-11-13 14:40:53 UTC
Created attachment 2057507 [details]
lvm.log

Comment 4 Jiri Kortus 2024-11-13 14:40:56 UTC
Created attachment 2057508 [details]
packaging.log

Comment 5 Jiri Kortus 2024-11-13 14:40:59 UTC
Created attachment 2057509 [details]
program.log

Comment 6 Jiri Kortus 2024-11-13 14:41:02 UTC
Created attachment 2057510 [details]
storage.log

Comment 7 Jiri Kortus 2024-11-13 14:41:05 UTC
Created attachment 2057511 [details]
storage.state

Comment 8 Katerina Koukiou 2024-11-21 12:20:17 UTC
@Jiri I believe this error you are seeing has not much to do with anaconda-webui or anaconda in general.

It's possibly some change intruduced by mdadm latest release [1] regarding it's naming policy. It might also have to do that the mdarray contains the hostname of the system where it was created (system2). Maybe this needs special handling, considering a hostname change.

I would either close or reassign to mdadm for further analysis. 

[1] https://github.com/md-raid-utilities/mdadm/releases/tag/mdadm-4.3 -> Strong name rules from Mariusz Tkaczyk.


Note: I implemented an e2e test for the RAID on partition level -> root on LVM scenario.
https://github.com/rhinstaller/anaconda-webui/pull/527

Comment 9 Jan Stodola 2024-11-21 14:11:10 UTC
Created attachment 2059063 [details]
storage.log, F41

This problem can be reproduced also with Fedora Workstation 41, which uses the GTK GUI. The initial installation was successful, but the subsequent re-installation failed.

Comment 10 Vojtech Trefny 2024-11-21 16:12:59 UTC
I am moving this to mdadm, the name with the hostname prefix is used in the live system before Anaconda boots:

------------------------
liveuser@localhost-live:~$ ls /dev/md*
/dev/md127

/dev/md:
localhost-live:pv00

liveuser@localhost-live:~$ udevadm info /dev/md/localhost-live:pv00 
P: /devices/virtual/block/md127
M: md127
R: 127
U: block
T: disk
D: b 9:127
N: md127
L: 100
S: disk/by-id/md-name-localhost-live:pv00
S: disk/by-id/md-uuid-eb7f5e8f:0a0c4be4:4d42d3fe:d211d8a3
S: disk/by-id/lvm-pv-uuid-2gS5tF-gca2-h7Vq-7hdy-m7hl-6fHW-MSY59R
S: md/localhost-live:pv00
...
E: MD_LEVEL=raid1
E: MD_DEVICES=2
E: MD_METADATA=1.2
E: MD_UUID=eb7f5e8f:0a0c4be4:4d42d3fe:d211d8a3
E: MD_DEVNAME=localhost-live:pv00
E: MD_NAME=localhost-live:pv00

liveuser@localhost-live:~$ sudo mdadm -D /dev/md/localhost-live:pv00 --export
MD_LEVEL=raid1
MD_DEVICES=2
MD_METADATA=1.2
MD_UUID=eb7f5e8f:0a0c4be4:4d42d3fe:d211d8a3
MD_DEVNAME=localhost-live:pv00
MD_NAME=localhost-live:pv00
MD_DEVICE_dev_vda3_ROLE=0
MD_DEVICE_dev_vda3_DEV=/dev/vda3
MD_DEVICE_dev_vdb1_ROLE=1
MD_DEVICE_dev_vdb1_DEV=/dev/vdb1
------------------------

so we are simply trying to reuse the name that was used when the live system started (we are internally getting the name of the array from udev from `MD_DEVNAME` property which normally doesn't contain the hostname prefix).

Comment 11 XiaoNi 2024-12-02 13:09:16 UTC
Hi Vojtech and Katerina

I don't understand this problem well. Can you explain more? In the second installation, the array is assembled but the name is not expected, so it can't be used?

Thanks
Xiao

Comment 12 Aoife Moloney 2025-02-26 13:16:18 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle.
Changing version to 42.

Comment 13 Vojtech Trefny 2025-03-13 08:10:04 UTC
*** Bug 2351809 has been marked as a duplicate of this bug. ***

Comment 14 Vojtech Trefny 2025-03-13 08:18:42 UTC
(In reply to XiaoNi from comment #11)
> In the second installation, the array is assembled but the name is not expected, so it
> can't be used?

These are not two installations, just a single installation with a LiveCD. The array is assembled during the boot of the LiveCD with the "localhost-live:pv00" name. During the installation we stop the array and try to start it again. To do that we use the name "localhost-live:pv00" because that's what udev tells us is the name of the array (we are simply using the MD_DEVNAME property). But mdadm now tells us we cannot use "localhost-live:pv00" as array name even though it was used during boot.

Comment 15 lnie 2025-03-13 11:22:27 UTC
This bug could be easily reproduced with Fedora-Workstation-Live-42_Beta-1.4.x86_64.iso
Here is the reproducer: 
1.perform a fresh mdraid installation with none-42 installer( you won't be able to perform a f42 one due to https://bugzilla.redhat.com/show_bug.cgi?id=2351848)
2 boot Fedora-Workstation-Live-42_Beta-1.4.x86_64.iso,select the disks containing mdraid devices, click "use entire disk" to perform a guided installation,
this crash happens immediately after I click "Erase data and install"

Comment 16 Fedora Blocker Bugs Application 2025-03-13 11:24:43 UTC
Proposed as a Blocker for 42-beta by Fedora user lnie using the blocker tracking app because:

 Affects this criteria :https://fedoraproject.org/wiki/Fedora_42_Beta_Release_Criteria#Guided_partitioning

Comment 17 Adam Williamson 2025-03-13 17:50:19 UTC
So based on above comments and my own testing, it seems the case here is not really 'webui' but 'live images'. It affects F41 and F42 lives (so that's GTK UI and web UI). It does not affect F41 netinst (and probably not F42 netinst either, though I didn't specifically test yet). The trigger here is 'we don't entirely control storage initialization in the live case', nothing to do with webUI.

Also from my testing - this fails if there's an existing RAID set *even if you try to completely delete it and create a 'new' one*, rather than reusing it.

Comment 18 Kamil Páral 2025-03-14 09:29:13 UTC
Discussed at Go/No-Go meeting [1]:

!agreed 2325906 - AcceptedBlocker (Beta) - this is accepted as a violation of "the installer must be able to: ... Correctly interpret, and modify as described below any disk with a valid ms-dos or gpt disk label and partition table containing ... software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions ... Remove existing storage volumes ... Assign mount points to existing storage volumes", on live images (non-live seem unaffected)

!agreed 2325906 - waived to Fedora 42 Final under both the "Last minute" and "Difficult to fix" justifications

[1] https://meetbot.fedoraproject.org/meeting_matrix_fedoraproject-org/2025-03-13/f42-beta-go-no-go-meeting.2025-03-13-17.02.log.html


Accepted as a Final blocker.

Comment 19 Adam Williamson 2025-03-21 21:35:16 UTC
Hum, so here's some interesting code in mdadm - https://github.com/md-raid-utilities/mdadm/blob/ee3a6cab09c8acaf6706b3710f5652e9be43b57e/super1.c#L1664-L1678 :

	if (homehost &&
	    strchr(name, ':') == NULL &&
	    strlen(homehost) + 1 + strlen(name) < 32) {
		strcpy(sb->set_name, homehost);
		strcat(sb->set_name, ":");
		strcat(sb->set_name, name);
	} else {
		int namelen;

		namelen = min((int)strlen(name),
			      (int)sizeof(sb->set_name) - 1);
		memcpy(sb->set_name, name, namelen);
		memset(&sb->set_name[namelen], '\0',
		       sizeof(sb->set_name) - namelen);
	}

that is, given a string 'name', the property sb->set_name will be '(hostname):(name)' *if* that's less than 32 characters; otherwise it'll just be the first 31 characters (I guess, assuming the size of sb->set_name is 32) of 'name'.

This does seem to be the thing that gets printed out as MD_NAME (at line 897 of the same file).

As far as MD_DEVNAME goes...well, there's more wrinkles. That, AFAIK, is backed by the 'path' property of the device as read out of the mdadm map file, which for us is /run/mdadm/map . This file lists one device per line, with four space-separated values per device. The fourth is the 'path'. When we print MD_DEVNAME for a given device, we get its path from the map file, strip off DEV_MD_DIR if the path starts with it (that's /dev/md/ for us), and print the remainder. So if the path in the file is /dev/md/root , we print MD_DEVNAME as root.

As to what gets written *into* the map file, well, yikes, it seems quite complicated! https://github.com/md-raid-utilities/mdadm/blob/ee3a6cab09c8acaf6706b3710f5652e9be43b57e/mapfile.c#L411C1-L491C5

			path = map_dev(major(devid), minor(devid), 0);
			if (path == NULL ||
			    strncmp(path, DEV_MD_DIR, DEV_MD_DIR_LEN) != 0) {
				/* We would really like a name that provides
				 * an MD_DEVNAME for udev.
				 * The name needs to be unique both in /dev/md/
				 * and in this mapfile.
				 * It needs to match what -I or -As would come
				 * up with.
				 * That means:
				 *   Check if array is in mdadm.conf
				 *        - if so use that.
				 *   determine trustworthy from homehost etc
				 *   find a unique name based on metadata name.
				 *
				 */
				struct mddev_ident *match = conf_match(st, info,
								       NULL, 0,
								       NULL);
				struct stat stb;
				if (match && match->devname && match->devname[0] == '/') {
					path = match->devname;
					if (path[0] != '/') {
						strcpy(namebuf, DEV_MD_DIR);
						strcat(namebuf, path);
						path = namebuf;
					}
				} else {
					int unum = 0;
					char *sep = "_";
					const char *name;
					int conflict = 1;
					if ((homehost == NULL ||
					     st->ss->match_home(st, homehost) != 1) &&
					    st->ss->match_home(st, "any") != 1 &&
					    (require_homehost ||
					     !conf_name_is_free(info->name)))
						/* require a numeric suffix */
						unum = 0;
					else
						/* allow name to be used as-is if no conflict */
						unum = -1;
					name = info->name;
					if (!*name) {
						name = st->ss->name;
						if (!isdigit(name[strlen(name)-1]) &&
						    unum == -1) {
							unum = 0;
							sep = "";
						}
					}
					if (strchr(name, ':')) {
						/* Probably a uniquifying
						 * hostname prefix.  Allow
						 * without a suffix, and strip
						 * hostname if it is us.
						 */
						if (homehost && unum == -1 &&
						    strncmp(name, homehost,
							    strlen(homehost)) == 0 &&
						    name[strlen(homehost)] == ':')
							name += strlen(homehost)+1;
						unum = -1;
					}

					while (conflict) {
						if (unum >= 0)
							sprintf(namebuf, DEV_MD_DIR "%s%s%d",
								name, sep, unum);
						else
							sprintf(namebuf, DEV_MD_DIR "%s",
								name);
						unum++;
						if (lstat(namebuf, &stb) != 0 &&
						    (map == NULL ||
						     !map_by_name(&map, namebuf+8)))
							conflict = 0;
					}
					path = namebuf;
				}
			}

so, there's, uh, a lot going on there. We call `map_dev` first. If that gives us a result that starts with /dev/md , we take it. Otherwise we take the device's name - which I think is the same 'name' that backs MD_NAME and thus subject to the same interesting 'might have a hostname in it or might not' condition - and fiddle with it. Notably, if it *does* include a hostname, we strip the hostname *if the current hostname is the same*. So if, for instance, the name is `localhost-live:root` and the current system hostname is `localhost-live`, we'd make the path just `root`. If the current system hostname is *not* `localhost-live`, we'd make the path `localhost-live:root`. That...certainly sounds a lot like it could be what we're running into here; the hostname in an installer environment is absolutely not guaranteed to be the same as the hostname in whatever environment you created the RAID device. Of course, if that's what we're running into, it'd be interesting to know why `map_dev` didn't give us a result we liked, otherwise we wouldn't be in this whole block at all.

Lots of mysteries still to poke into here, but on the whole it seems to me that it's not entirely safe to assume that MD_DEVNAME should never have a hostname prefix. It clearly *is* possible for it to. None of the code pointed out here is new, AFAICS, it all seems to date back to 2009-2014.

However, it does sadly seem like the device 'name' can itself have a : in it - note the check `strchr(name, ':') == NULL` in the code from super1.c at the start of this comment - so it's probably not safe to just unconditionally split MD_DEVNAME on : and throw away the leftmost portion :/ Maybe we can try using MD_DEVNAME as is first, then if that doesn't work, try a split/strip on : ?

Comment 20 Adam Williamson 2025-03-21 21:53:46 UTC
CCing Neil Brown, who apparently wrote a bunch of this stuff more than a decade ago, to see if he a) remembers any of it and b) feels at all like helping :D no pressure if you don't, Neil!

Comment 21 Adam Williamson 2025-03-21 22:06:26 UTC
oh, hmm. Now I poke into what `map_dev` does...it seems like it ultimately winds up returning the device's name too! All roads lead back to the device's name. So I think the key factor here is really the whole thing about the hostname sometimes getting prepended to the device's name and sometimes not?

Comment 22 Adam Williamson 2025-03-21 22:40:27 UTC
Ooooh. So I figured out what the difference is here: the POSIX compatibility check is new in mdadm 4.3. It was added in https://github.com/md-raid-utilities/mdadm/commit/e2eb503bd797908f515b58428b274f1ba6a05349 .

So I think anaconda/blivet's behaviour has been the same all along, and is not incorrect here. anaconda's just doing `mdadm --assemble /dev/md/system2:pv00 --run --uuid=8e4473bc:997208c1:da327078:0dc4d134 /dev/sda3 /dev/sdb1` , and that seems correct. The device almost certainly *is* at /dev/md/system2:pv00 . mdadm needs to either relax its check to allow : characters, or not apply it to this codepath. I'll file an mdadm issue.

Comment 23 Adam Williamson 2025-03-22 00:31:20 UTC
Can folks who can reproduce this please test with https://adamwill.fedorapeople.org/04847690-Fedora-Workstation-Live-x86_64-130581978.iso ? I believe it should fix the bug.

Comment 24 lnie 2025-03-24 08:57:10 UTC
I tried with the reproducers mentioned in #comment15 and Description, looks like it is indeed fixed:)

Comment 25 Adam Williamson 2025-03-24 22:37:15 UTC
Thanks. Unfortunately upstream is being mysterious about this, so I'm waiting to hear more from them before backporting this.

Comment 26 Adam Williamson 2025-04-02 18:19:44 UTC
So, upstream is objecting to this on grounds I'm having trouble making out, but it *seems* to boil down to "we don't want to let people put : in names when manually naming devices, and this codepath sort of gets sucked up into that". I don't understand why "mdadm can't assemble a device under the name mdadm itself gave it" is not a slam dunk "oh god of course we need to fix that", but oh well.

They are making suggestions like "mess with the hostname so mdadm doesn't put : in the device name!" and "rewrite your RAID code to use `mdadm --incremental` or `mdadm -Dbs` then `mdadm -As`", neither of which seems like a remotely sane thing to try and do on F42 timeframe (and I personally wouldn't want to try and figure it out on an F43 timeframe and be confident it didn't break anything, I dunno about anyone else).

So...I'm a bit worried about this one. On the whole I think I'd prefer to just backport my patch (maybe in a tweaked version which doesn't rename the function, in case it's API in any way), and keep arguing with upstream about it until we hopefully get it merged or figure out another resolution which doesn't involve completely rewriting blivet's RAID handling. What does anyone else think?

Comment 27 XiaoNi 2025-04-03 05:53:02 UTC
(In reply to Adam Williamson from comment #26)
> So, upstream is objecting to this on grounds I'm having trouble making out,
> but it *seems* to boil down to "we don't want to let people put : in names
> when manually naming devices, and this codepath sort of gets sucked up into
> that". I don't understand why "mdadm can't assemble a device under the name
> mdadm itself gave it" is not a slam dunk "oh god of course we need to fix
> that", but oh well.
> 
> They are making suggestions like "mess with the hostname so mdadm doesn't
> put : in the device name!" and "rewrite your RAID code to use `mdadm
> --incremental` or `mdadm -Dbs` then `mdadm -As`", neither of which seems
> like a remotely sane thing to try and do on F42 timeframe (and I personally
> wouldn't want to try and figure it out on an F43 timeframe and be confident
> it didn't break anything, I dunno about anyone else).
> 
> So...I'm a bit worried about this one. On the whole I think I'd prefer to
> just backport my patch (maybe in a tweaked version which doesn't rename the
> function, in case it's API in any way), and keep arguing with upstream about
> it until we hopefully get it merged or figure out another resolution which
> doesn't involve completely rewriting blivet's RAID handling. What does
> anyone else think?

Hi Adam

It should be a regression problem like https://issues.redhat.com/browse/RHEL-72756
I tried to fix this by https://github.com/md-raid-utilities/mdadm/pull/159


Your PR will change the way that it works now (it doesn't allow ':' when creating array). So we can try to persuade Marisuz to accept my PR which only allow ':' when assembling an array. 

Thanks
Xiao

Comment 28 NeilBrown 2025-04-03 23:45:29 UTC
Thanks for bringing this to my attention.

I've given my opinion on github:  https://github.com/md-raid-utilities/mdadm/pull/159#issuecomment-2777219122

Comment 29 Fedora Update System 2025-04-07 00:56:55 UTC
FEDORA-2025-325b5e64af (mdadm-4.3-7.fc43) has been submitted as an update to Fedora 43.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-325b5e64af

Comment 30 Fedora Update System 2025-04-07 01:03:53 UTC
FEDORA-2025-bf5ca47207 (mdadm-4.3-7.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2025-bf5ca47207

Comment 31 Adam Williamson 2025-04-07 01:42:06 UTC
OK, can we please retest with https://adamwill.fedorapeople.org/03366408-FEDORA-2025-bf5ca47207-netinst-x86_64.iso ? that's a new ISO with a different fix, the one from the update above. It just reverts the questionable 'posix' name check entirely, which is more or less in line with what upstream is proposing in https://github.com/md-raid-utilities/mdadm/pull/165 (but a bit different because things changed a lot between 4.3 and current git, so that PR doesn't apply directly). thanks!

Comment 32 Fedora Update System 2025-04-07 02:36:56 UTC
FEDORA-2025-325b5e64af (mdadm-4.3-7.fc43) has been pushed to the Fedora 43 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 33 Kamil Páral 2025-04-07 15:40:01 UTC
(In reply to Fedora Update System from comment #30)
> FEDORA-2025-bf5ca47207 (mdadm-4.3-7.fc42) has been submitted as an update to
> Fedora 42.
> https://bodhi.fedoraproject.org/updates/FEDORA-2025-bf5ca47207

Resolves the bug, the installation can now proceed.

Comment 34 Fedora Update System 2025-04-08 02:19:14 UTC
FEDORA-2025-bf5ca47207 has been pushed to the Fedora 42 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2025-bf5ca47207`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2025-bf5ca47207

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 35 Fedora Update System 2025-04-08 12:18:04 UTC
FEDORA-2025-bf5ca47207 (mdadm-4.3-7.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.