As a result of https://bugzilla.redhat.com/show_bug.cgi?id=2239121,users may delete a btrfs volume partition without aware of it,after a reboot,they will get a system hanging forever(?).If they boot that system with rhgb and quiet removed and nomodeset added, they will see kernel tries to start that device with no-limit time. IMO,we should tell users what's going on, and start a dracut,just like we do after a lvm partition with ext4/xfs fs is deleted. Reproducible: Always
Created attachment 1989321 [details] screenshot showing pv is deleted warning
Created attachment 1989364 [details] screenshot after btrfs volume partition is deleted
Strictly speaking this is not a bug, it's a feature request. I don't know if this can be done in systemd. The mdadm equivalent is done entirely in dracut. https://pagure.io/fedora-btrfs/project/issue/59 The indefinite hang is the result of a udev rule, /usr/lib/udev/rules.d/64-btrfs.rules that waits for all Btrfs member devices to become available. In the case of a device failure or removal, that never happens so the wait is indefinite. If we remove this udev rule, then if any device is missing, we get a mount failure and drop to a shell. From a certain point of view that might be better than an indefinite hang. A risk is if users were to add `degraded` mount option to the kernel command line *and* remove this udev rule. In that case we will see any delay from any device resulting in degraded mounts, and btrfs right now cannot properly handled more than one degraded mount, or it can get into a kind of split brain situation. We need to manually rectify the problem by finding the missing device and scrubbing the file system to "catch up" all member devices. There's no equivalent of an resync and write intent bitmap to automatically catch up a formerly missing member device. Anyway, there's work to be done here. It's not a great failure mode. But it's also not really a bug.
I'm not sure what our short term work around should be, but the indefinite hang not a good user experience. If we get to a dracut shell, we can at least mount degraded then continue the boot, and from there `btrfs replace start` if this is actually a device failure. Right now I'm not sure how to work around the indefinite wait on the udev rule, even with an extra kernel parameter. I would feel more comfortable if the kernel had some protection against the split brain scenario, e.g. "has been mounted degraded" flag possibly in the dev tree? If present, only permit subsequent mount (degraded or not) when all the flagged degraded devices have the same transid. The typical problem scenario is 2x device raid1, and each device is separately mounted degraded. In rare case transid of both devices could be the same, maybe it's enough to catch this by disallowing "all devices present have degraded flag set" because that's suspicious. At least one device must be non-degraded mounted, and should have a lower transid than the degraded mounted devices.
I think there is a timeout possible (man dracut.cmdline): ``` rd.timeout=<seconds> specify how long dracut should wait for devices to appear. The default is 0, which means forever. Note that this timeout should be longer than rd.retry to allow for proper configuration. ``` If this doesn't work, it's probably a bug, please report it upstream (or fill in more info/reproducer here, and I'll forward that). https://github.com/dracutdevs/dracut/
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26. Fedora Linux 39 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed.