Bug 2239419 - system keeps trying to start a non-exist btrfs device
Summary: system keeps trying to start a non-exist btrfs device
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 39
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-18 09:06 UTC by lnie
Modified: 2024-11-27 21:30 UTC (History)
17 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-11-27 21:30:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
screenshot showing pv is deleted warning (61.28 KB, image/png)
2023-09-18 09:09 UTC, lnie
no flags Details
screenshot after btrfs volume partition is deleted (56.31 KB, image/png)
2023-09-18 11:42 UTC, lnie
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure fedora-btrfs/project issue 59 0 None None None 2023-09-24 23:53:05 UTC
Github dracutdevs dracut issues 1922 0 None open enable unattended degraded boot for btrfs 2023-09-26 08:45:33 UTC

Description lnie 2023-09-18 09:06:45 UTC
As a result of https://bugzilla.redhat.com/show_bug.cgi?id=2239121,users may delete a btrfs volume partition without aware of it,after a reboot,they will get a system hanging forever(?).If they boot that system with rhgb and quiet removed and nomodeset added, they will see kernel tries to start that device with no-limit time.
IMO,we should tell users what's going on, and start a dracut,just like we do after a lvm partition with ext4/xfs fs is deleted.

Reproducible: Always

Comment 1 lnie 2023-09-18 09:09:47 UTC
Created attachment 1989321 [details]
screenshot showing pv is deleted warning

Comment 2 lnie 2023-09-18 11:42:49 UTC
Created attachment 1989364 [details]
screenshot after btrfs volume partition is deleted

Comment 3 Chris Murphy 2023-09-24 22:37:11 UTC
Strictly speaking this is not a bug, it's a feature request. I don't know if this can be done in systemd. The mdadm equivalent is done entirely in dracut.
https://pagure.io/fedora-btrfs/project/issue/59

The indefinite hang is the result of a udev rule, /usr/lib/udev/rules.d/64-btrfs.rules that waits for all Btrfs member devices to become available. In the case of a device failure or removal, that never happens so the wait is indefinite.

If we remove this udev rule, then if any device is missing, we get a mount failure and drop to a shell. From a certain point of view that might be better than an indefinite hang. A risk is if users were to add `degraded` mount option to the kernel command line *and* remove this udev rule. In that case we will see any delay from any device resulting in degraded mounts, and btrfs right now cannot properly handled more than one degraded mount, or it can get into a kind of split brain situation. We need to manually rectify the problem by finding the missing device and scrubbing the file system to "catch up" all member devices. There's no equivalent of an resync and write intent bitmap to automatically catch up a formerly missing member device.

Anyway, there's work to be done here. It's not a great failure mode. But it's also not really a bug.

Comment 4 Chris Murphy 2023-09-26 22:25:10 UTC
I'm not sure what our short term work around should be, but the indefinite hang not a good user experience. If we get to a dracut shell, we can at least mount degraded then continue the boot, and from there `btrfs replace start` if this is actually a device failure.

Right now I'm not sure how to work around the indefinite wait on the udev rule, even with an extra kernel parameter.

I would feel more comfortable if the kernel had some protection against the split brain scenario, e.g. "has been mounted degraded" flag possibly in the dev tree? If present, only permit subsequent mount (degraded or not) when all the flagged degraded devices have the same transid. The typical  problem scenario is 2x device raid1, and each device is separately mounted degraded. In rare case transid of both devices could be the same, maybe it's enough to catch this by disallowing "all devices present have degraded flag set" because that's suspicious. At least one device must be non-degraded mounted, and should have a lower transid than the degraded mounted devices.

Comment 5 Pavel Valena 2023-09-29 14:40:36 UTC
I think there is a timeout possible (man dracut.cmdline): 

```
rd.timeout=<seconds>
           specify how long dracut should wait for devices to appear. The default is 0, which
           means forever. Note that this timeout should be longer than rd.retry to allow for
           proper configuration.

```

If this doesn't work, it's probably a bug, please report it upstream (or fill in more info/reproducer here, and I'll forward that).

https://github.com/dracutdevs/dracut/

Comment 6 Aoife Moloney 2024-11-27 21:30:21 UTC
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26.

Fedora Linux 39 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.