Bug 1096910 - failure at basic.target hangs indefinitely instead of dropping to shell
Summary: failure at basic.target hangs indefinitely instead of dropping to shell
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 38
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1186908 1878652 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-12 16:22 UTC by Chris Murphy
Modified: 2024-05-21 14:09 UTC (History)
20 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-21 14:09:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Chris Murphy 2014-05-12 16:22:50 UTC
Description of problem: When booting a system with a degraded rootfs, systemd hangs indefinitely at basic.target instead of dropping to a shell and producing the sosreport for troubleshooting.


Version-Release number of selected component (if applicable):
systemd-212-4


How reproducible:
Always


Steps to Reproduce:
0. Any failure to mount /sysroot will do. In my case it's a Btrfs raid1 volume with one device removed, making it degraded which currently on Btrfs does not automatically mount degraded.


Actual results:

Indefinite hang, cyclon eye, at basic.target.


Expected results:

Eventual timeout and shell prompt.

Additional info:

This works as expected on Fedora 20, systemd-209.

If I use systemctl enable debug-shell.service and retry, I still cannot get to a shell on any tty; but even without debug-shell enabled we really need to eventually fail on basic.target and drop to a shell. Indefinite hang prevents troubleshooting even basic causes for boot failures.

Comment 1 Chris Murphy 2014-05-12 16:24:46 UTC
During hang, console text reads with the following four lines, repeated every 15s to 55s (variable).

[    **] A start job is running for dev-disk-by\x2uuid-7b742…55s / no limit)G
ot notification message of unit systemd-journald.service
systemd-journald.service: Got notification message from PID 105 (WATCHDOG=1…)
systemd-journald.service: got WATCHDOG=1

Comment 2 Chris Murphy 2014-05-18 04:24:24 UTC
This also hangs on Fedora 20 after updating systemd-208-9 to systemd-208-16; however there's no timer it just says:


[  ***  ] A start job is running for dev-disk-by\x2uuid-9ff63..b4fb6d66.device

After 1 hour it's still hung.

Comment 3 Gene Czarcinski 2014-10-21 17:32:29 UTC
So far, the only way I have been able to fix a btrfs raid1 volume with a missing device is to boot up in rescue mode, btrfs-dev-add a new device and then btrfs-dev-delete-missing.

Comment 4 Zbigniew Jędrzejewski-Szmek 2015-02-05 20:30:06 UTC
*** Bug 1186908 has been marked as a duplicate of this bug. ***

Comment 5 Jaroslav Reznik 2015-03-03 15:48:19 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 6 Fedora End Of Life 2016-07-19 11:30:26 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 7 Alex G. 2021-05-02 01:48:59 UTC
Re-opening, as we have the exact same problem in F34

Comment 8 Chris Murphy 2021-05-02 18:45:20 UTC
Prior upstream discussion
https://lists.freedesktop.org/archives/systemd-devel/2021-February/045973.html

I don't actually know that this really belongs in dracut per the discussion. On the one hand, degraded arrays are the domain of dracut. mdadm doesn't do it automatically, basically dracut does a loop to wait for all devices to appear, and if they don't (after I think 300 seconds) then it does the work to assemble the array in degraded mode. Equivalent code to do this for Btrfs doesn't exist in dracut.

On the other hand, there is a udev rule in place prior to even attempting to mount a multiple device Btrfs. And udev has no concept of timers. If it's never ready because it's degraded, we simply never get to the next step. This implies we need a better udev rule, that upstream bug is 
https://github.com/kdave/btrfs-progs/issues/264
And maybe
https://github.com/kdave/btrfs-progs/issues/302

And it might imply udev needs work to better understand btrfs multiple devices, so I'm just going to leave this on systemd for now, due to these other btrfs+udev issues related to multiple devices:
https://github.com/systemd/systemd/issues/19393
https://github.com/systemd/systemd/issues/14674

Comment 9 Chris Murphy 2021-05-02 18:47:33 UTC
Oops, setting back to systemd.
And also the top of the systemd thread is in January, here:
https://lists.freedesktop.org/archives/systemd-devel/2021-January/045918.html

Comment 10 Chris Murphy 2021-05-02 19:00:15 UTC
Ha, ok after re-reading all of that, I think we need someone who understands udev, liblkid, and dracut better than I do. Right now it may really be dracut's responsibility to do a timeout. But as I read this: https://lists.freedesktop.org/archives/systemd-devel/2021-January/045928.html I can't help but think that's a problem of its own. How can we not distinguish between a failed device and a user who has just wandered away? While this bug isn't about cryptsetup, it happens without LUKS being used, seems we need some way determining if all devices needed to boot are present, and if not, drop to a shell. Hanging forever isn't great.

A workaround though is to have x-systemd.device-timeout=300 set in fstab for the / UUID.

Comment 11 Zbigniew Jędrzejewski-Szmek 2021-05-18 17:30:42 UTC
*** Bug 1878652 has been marked as a duplicate of this bug. ***

Comment 12 Ben Cotton 2021-08-10 12:44:20 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 13 Ben Cotton 2022-11-29 16:44:18 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 14 Ben Cotton 2022-12-13 15:11:27 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 15 Dominik 'Rathann' Mierzejewski 2022-12-13 20:56:31 UTC
Bumping to rawhide as this doesn't seem to have been fixed.

Comment 16 Ben Cotton 2023-02-07 14:50:56 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 17 Aoife Moloney 2024-05-07 15:40:39 UTC
This message is a reminder that Fedora Linux 38 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 38 on 2024-05-21.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '38'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 38 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 18 Aoife Moloney 2024-05-21 14:09:54 UTC
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21.

Fedora Linux 38 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.