Bug 1956133 - System hangs in shutdown stage - mdmon killed by dracut shutdown script
Summary: System hangs in shutdown stage - mdmon killed by dracut shutdown script
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 34
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-02 23:35 UTC by Dmitriy Kargapolov
Modified: 2021-05-03 02:04 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
rdsosreport.txt (14.00 KB, text/plain)
2021-05-02 23:35 UTC, Dmitriy Kargapolov
no flags Details
process list from emergency shell (12.59 KB, text/plain)
2021-05-02 23:36 UTC, Dmitriy Kargapolov
no flags Details

Description Dmitriy Kargapolov 2021-05-02 23:35:41 UTC
Created attachment 1778770 [details]
rdsosreport.txt

Description of problem:
After upgrading to FC34 the system hangs all the time on the shutdown stage. Last message on the console: “Unmounting /oldroot timed out.”.

Version-Release number of selected component (if applicable):
dracut-053-5.fc34.x86_64

How reproducible:
Always.

Steps to Reproduce:
Just shutdown the system, for example issuing the “shutdown now” command.

Actual results:
Hangs after “Unmounting /oldroot timed out.” message.

Expected results:
System expected to complete shutdown.


Additional info:
Following “Debugging dracut on shutdown” recommendations I found that the process never reached “shutdown” entry of the emergency shell, while “pre-shutdown” entry placed _before_ any attempts to unmount oldroot. So I altered the /usr/lib/dracut/modules.d/99shutdown/shutdown.sh script adding emergency shell entry point just after the call to internal function umount_a() and before calling internal function _check_shutdown(), which invokes various shutdown hooks. After modifying the script following commands were used to run the process.

dracut -f
mkdir -p /run/initramfs/etc/cmdline.d
echo "rd.debug" > /run/initramfs/etc/cmdline.d/debug.conf
touch /run/initramfs/.need_shutdown
shutdown -H now

Unfortunately, I couldn’t figure out how to save debug prints from the dracut shutdown script execution, but I noticed that:

1. After the "umount /oldroot" timed out, it looked like unmount actually succeeded, at least there was no "/oldroot" found in the /proc/mounts.

2. Still, the unmount process appeared to be alive. It could not be killed with SIGKILL. Something locked it.

root       17933  0.0  0.0   3876  1180 ?        D    23:12   0:00 umount /oldroot

3. There was no process found using /oldroot and preventing it from being properly unmounted.

4. Exiting emergency shell and letting it proceed, I found the final hanging command: "mdadm -vv --wait-clean --scan"

5. Repeating the test I tried to execute "mdadm -vv --wait-clean --scan" manually from the emergency shell with the same result - command never returned and could not be killed.

The hanging command is part of the /usr/lib/dracut/modules.d/90mdraid/md-shutdown.sh script (which is invoked as a hook /usr/lib/dracut/hooks/shutdown/30-md-shutdown.sh in the initramfs).

It is unclear if the partially-completed unmount resulted in mdadm --wait-clean hanging, or later has a problem by itself. I also could not think something was wrong with my h/w since everything worked fine with the latest FC33.

Comment 1 Dmitriy Kargapolov 2021-05-02 23:36:57 UTC
Created attachment 1778771 [details]
process list from emergency shell

Comment 2 Dmitriy Kargapolov 2021-05-03 01:54:35 UTC
I tried to alter killall_proc_mountpoint() from /usr/lib/dracut/modules.d/99base/dracut-lib.sh commenting out the line which kill the process suspected in using given mount point (/oldroot) and printing info about the process itself.

The only process found was '@usr/sbin/mdmon --offroot --takeover md127'.

I guess it should not be killed (especially with -9) because it is by the RAID which is a base for the filesystem still mounted as /oldroot. The man pages on mdmon (section START UP AND SHUTDOWN) say that "At shutdown time, mdmon should not be killed along with other processes."

Not sure why this scenario even possible.


Note You need to log in before you can comment on or make changes to this bug.