Description of problem: Using RAID1/nvme drives with encrypted root partition: $ cat /proc/mdstat Personalities : [raid1] md126 : active raid1 nvme0n1[1] nvme1n1[0] 468838400 blocks super external:/md127/0 [2/2] [UU] md127 : inactive nvme1n1[1](S) nvme0n1[0](S) 10402 blocks super external:imsm unused devices: <none> On shutdown, md-shutdown.sh (dracut) gets executed. _do_md_shutdown() { local ret local final=$1 info "Waiting for mdraid devices to be clean." mdadm -vv --wait-clean --scan | vinfo ret=$? info "Disassembling mdraid devices." mdadm -vv --stop --scan | vinfo ret=$((ret + $?)) if [ "x$final" != "x" ]; then info "/proc/mdstat:" vinfo < /proc/mdstat fi return $ret } The system then hangs on the call to "mdadm .. --wait-clean ..", preventing the system from rebooting or powering off. If I comment out the command in question and rebuild initramfs, the system successfully reboots and powers down. Seems similar to: https://bugzilla.redhat.com/show_bug.cgi?id=1092937 Version-Release number of selected component (if applicable): mdadm-4.1-7.fc34.x86_64 How reproducible: Always Steps to Reproduce: 1. Setup RAID1 on dual NVME drives and an encrypted root partition 2. Reboot Actual results: Hangs Expected results: Successful reboot Additional info: SuperMicro X11SPW-TF dracut-054-12.git20210521.fc34.x86_64 kernel-5.12.9-300.fc34.x86_64
Additional information sysrq: Show Blocked State task:dmcrypt_write/2 state:D stack: 0 pid: 1155 ppid: 2 flags:0x00004000 <snip> task:umount state:D stack: 0 pid: 42002 ppid: 1 flags:0x00004004 <snip>
Bueller?
Hi Chad Thanks for reporting about this. I'll try to reproduce this. By the way, I remember you report this for RHEL production too. What's the bug number? I can't search it now. Thanks Xiao
This is the only ticket I've submitted.
Hi I just tried it on my side , seems like it works well. Hardware Manufacturer: Hewlett-Packard Product Name: 304Bh Hard Disks: 4 SSD: sda sdb sdc sdd : RAID1(sda,sdb) OS/rpm packages version Fedora-Server-dvd-x86_64-34-1.2.iso Kernel-5.11.12-300.fc34.x86_64 mdadm-4.1-7.fc34.x86_64 dracut-053-4.fc34.x86_64 After install the OS with RAID1 and encrypted the filesystem. It could boot up. [root@top-server-hp ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 223.6G 0 disk └─md126 9:126 0 223.6G 0 raid1 ├─md126p1 259:1 0 1G 0 part /boot └─md126p2 259:2 0 222.6G 0 part └─luks-1f216872-9f48-489e-b1ae-6a9664fc7952 253:0 0 222.6G 0 crypt └─fedora_top--server--hp-root 253:1 0 15G 0 lvm / sdb 8:16 0 223.6G 0 disk └─md126 9:126 0 223.6G 0 raid1 ├─md126p1 259:1 0 1G 0 part /boot └─md126p2 259:2 0 222.6G 0 part └─luks-1f216872-9f48-489e-b1ae-6a9664fc7952 253:0 0 222.6G 0 crypt └─fedora_top--server--hp-root 253:1 0 15G 0 lvm / sdc 8:32 0 223.6G 0 disk sdd 8:48 0 223.6G 0 disk zram0 252:0 0 7.7G 0 disk [SWAP] nvme0n1 259:0 0 119.2G 0 disk [root@top-server-hp ~]# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda[1] sdb[0] 234428416 blocks super external:/md127/0 [2/2] [UU] md127 : inactive sdb[1](S) sda[0](S) 5296 blocks super external:imsm unused devices: <none> [root@top-server-hp ~]#
Thanks Fine for testing this. @Chad, we don't reproduce this problem. Are the test steps same with yours? Fine used ssd disks rather than NVME disks. Are there different steps besides this one?
A couple of notes. 1) The server boots just fine, but hangs on a reboot or poweroff, as mentioned above. 2) The NVME-based RAID1 drives are configured using Intel Virtual RAID on CPU (VROC), using the Standard version AOC-VROCSTNMOD * https://www.supermicro.com/manuals/other/AOC-VROCxxxMOD_Windows.pdf 3) Server boots UEFI and uses standard partitioning, no LVM: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 232.9G 0 disk ├─sda1 8:1 0 4G 0 part └─sda2 8:2 0 228.9G 0 part └─backup 253:2 0 228.9G 0 crypt /backup sdb 8:16 0 232.9G 0 disk sdc 8:32 0 232.9G 0 disk nvme0n1 259:0 0 447.1G 0 disk └─md126 9:126 0 447.1G 0 raid1 ├─md126p1 259:2 0 512M 0 part /boot/efi ├─md126p2 259:3 0 512M 0 part /boot ├─md126p3 259:4 0 8G 0 part │ └─swap 253:1 0 8G 0 crypt [SWAP] └─md126p4 259:5 0 438.1G 0 part └─luks-bfb7aac3-4bfc-4bef-bf3d-32bf7bf712c9 253:0 0 438.1G 0 crypt / nvme1n1 259:1 0 447.1G 0 disk └─md126 9:126 0 447.1G 0 raid1 ├─md126p1 259:2 0 512M 0 part /boot/efi ├─md126p2 259:3 0 512M 0 part /boot ├─md126p3 259:4 0 8G 0 part │ └─swap 253:1 0 8G 0 crypt [SWAP] └─md126p4 259:5 0 438.1G 0 part └─luks-bfb7aac3-4bfc-4bef-bf3d-32bf7bf712c9 253:0 0 438.1G 0 crypt / 4) This exact configuration works, without issue, on Fedora 32. * mdadm-4.1-5.fc32.x86_64 * kernel-5.11.22-100.fc32.x86_64 * dracut-050-61.git20200529.fc32.x86_64
(In reply to Chad from comment #7) Thanks for the information, here is what I got: [root@top-server-hp ~]# reboot or [root@top-server-hp ~]# poweroff it will hange there: dracut Warning: Killing all remaining processes dracut Warning: Unmounting /oldroot time out. Wait for more than 30minutes , still hang there. Only way to make it reboot is to unplug the power line (or long press the power button), then press the power button.
Hi Fine Thanks for the help. Does the command `mdadm -vv --wait-clean --scan` hang if you run it manually before reboot?
(In reply to XiaoNi from comment #9) Hi Xiao, The command will finishe quickly. [root@top-server-hp ~]# mdadm -vv --wait-clean --scan [root@top-server-hp ~]# But after that ,it will still hange there. dracut Warning: Killing all remaining processes dracut Warning: Unmounting /oldroot time out.
https://bugzilla.redhat.com/show_bug.cgi?id=1956133
The problem can be fixed using the latest upstream mdadm. I'll try to find which patch can fix this problem.
Xiao, Where you able to locate the patch? Will it be ported to Fedora 34's mdadm soon? Chad
(In reply to Chad from comment #13) > Xiao, > > Where you able to locate the patch? Will it be ported to Fedora 34's mdadm > soon? > Yes, I plan to fix this in f34. Now it's not good to backport all patches to f34, so it's better to backport the patch which can fix this problem. But f34 is still using mdadm-4.1-rc2 which is too old. There are hundreds patches between mdadm-4.1-rc2 and the latest upstream. It needs some time to find which patch can fix this problem. I'll update mdadm to latest upstream in f35. Thanks Xiao
After further investigation, I believe the issue stems from this update to systemd. https://github.com/systemd/systemd/commit/0b220a5f2a31844eaa1f5426bab02d41d54f471c On shutdown/reboot, systemd-shutdown stops /dev/md127 but not /dev/md126. This causes the mdadm hang in md-shutdown.sh (dracut). To test, I installed systemd-shutdown (and libsystemd-shared-246.so) from systemd-246.6-3 on my system and md-shutdown.sh works as expected. The system is able to reboot and power-off successfully.
I also built an mdadm package against the git master branch and it made no difference.
Hi Chad Thanks very much for this information. But the bug can be fixed only updating mdadm to latest upstream version. I didn't change any things in systemd. Any thoughts about this?
(In reply to Chad from comment #15) > After further investigation, I believe the issue stems from this update to > systemd. > > https://github.com/systemd/systemd/commit/ > 0b220a5f2a31844eaa1f5426bab02d41d54f471c > > On shutdown/reboot, systemd-shutdown stops /dev/md127 but not /dev/md126. > This causes the mdadm hang in md-shutdown.sh (dracut). > > To test, I installed systemd-shutdown (and libsystemd-shared-246.so) from > systemd-246.6-3 on my system and md-shutdown.sh works as expected. I want to make sure one thing. systemd-246.6-3 doesn't have patch 0b220a5f2a31844eaa1f5426bab02d41d54f471c. It's an old version. And the bug can be fixed with this old version. Right?
(In reply to XiaoNi from comment #17) > Hi Chad > > Thanks very much for this information. But the bug can be fixed only > updating mdadm to latest upstream version. > I didn't change any things in systemd. Any thoughts about this? As mentioned earlier, I built an mdadm package against the git master branch and it made no difference.
(In reply to XiaoNi from comment #18) > (In reply to Chad from comment #15) > > After further investigation, I believe the issue stems from this update to > > systemd. > > > > https://github.com/systemd/systemd/commit/ > > 0b220a5f2a31844eaa1f5426bab02d41d54f471c > > > > On shutdown/reboot, systemd-shutdown stops /dev/md127 but not /dev/md126. > > This causes the mdadm hang in md-shutdown.sh (dracut). > > > > To test, I installed systemd-shutdown (and libsystemd-shared-246.so) from > > systemd-246.6-3 on my system and md-shutdown.sh works as expected. > > I want to make sure one thing. systemd-246.6-3 doesn't have patch > 0b220a5f2a31844eaa1f5426bab02d41d54f471c. > It's an old version. And the bug can be fixed with this old version. Right? v247 was the first release to introduce the "Try stopping MD RAID devices in shutdown too" patch. That's why I choose an earlier version of systemd to test against. In v246, systemd-shutdown does not yet try to handle MD devices.
(In reply to Chad from comment #19) > > As mentioned earlier, I built an mdadm package against the git master branch > and it made no difference. I didn't build a package. This is what I do: 1. git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git 2. make && make install 3. reboot (It can stuck this time) 4. manually reboot (press the reboot button on the machine, wait the machine boot) 5. reboot (It can boot successfully this time) In step3 mdmonitor is running based on the old version now, so the reboot is stuck when first reboot. After step4, the mdmonitor is running based on the new version. The reboot of step5 can run successfully. Do you reboot twice?
(In reply to XiaoNi from comment #21) > (In reply to Chad from comment #19) > > > > As mentioned earlier, I built an mdadm package against the git master branch > > and it made no difference. > > I didn't build a package. This is what I do: > > 1. git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git > 2. make && make install > 3. reboot (It can stuck this time) > 4. manually reboot (press the reboot button on the machine, wait the machine > boot) > 5. reboot (It can boot successfully this time) > > In step3 mdmonitor is running based on the old version now, so the reboot is > stuck when first reboot. > After step4, the mdmonitor is running based on the new version. The reboot > of step5 can run successfully. > > Do you reboot twice? Repeated your steps; exactly. $ mdadm -V mdadm - v4.1-140-g1f5d54a0 - 2021-05-26 (same version as my package) Updated initramfs and rebooted multiple times. The server still hangs in the same place (mdadm --wait-clean --scan) every time.
A simple hack to systemd-shutdown (in the latest systemd-248.4-1 package) to prevent MD detachment, resolves the issue on my server. --- ./src/shutdown/shutdown.c 2021-07-15 12:21:19.463658519 -0500 +++ ../systemd-stable-248.4.orig/src/shutdown/shutdown.c 2021-07-12 06:38:53.000000000 -0500 @@ -402,8 +402,7 @@ need_swapoff = !in_container; need_loop_detach = !in_container; need_dm_detach = !in_container; - //need_md_detach = !in_container; - need_md_detach = false; + need_md_detach = !in_container; can_initrd = !in_container && !in_initrd() && access("/run/initramfs/shutdown", X_OK) == 0; /* Unmount all mountpoints, swaps, and loopback devices */
Sorry, diff in my previous comment is backwards. Here's the correct version: --- ../systemd-stable-248.4.orig/src/shutdown/shutdown.c 2021-07-12 06:38:53.000000000 -0500 +++ ./src/shutdown/shutdown.c 2021-07-15 12:21:19.463658519 -0500 @@ -402,7 +402,8 @@ need_swapoff = !in_container; need_loop_detach = !in_container; need_dm_detach = !in_container; - need_md_detach = !in_container; + //need_md_detach = !in_container; + need_md_detach = false; can_initrd = !in_container && !in_initrd() && access("/run/initramfs/shutdown", X_OK) == 0; /* Unmount all mountpoints, swaps, and loopback devices */
(In reply to Chad from comment #22) > (In reply to XiaoNi from comment #21) > > (In reply to Chad from comment #19) > > > > > > As mentioned earlier, I built an mdadm package against the git master branch > > > and it made no difference. > > > > I didn't build a package. This is what I do: > > > > 1. git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git > > 2. make && make install > > 3. reboot (It can stuck this time) > > 4. manually reboot (press the reboot button on the machine, wait the machine > > boot) > > 5. reboot (It can boot successfully this time) > > > > In step3 mdmonitor is running based on the old version now, so the reboot is > > stuck when first reboot. > > After step4, the mdmonitor is running based on the new version. The reboot > > of step5 can run successfully. > > > > Do you reboot twice? > > Repeated your steps; exactly. > > $ mdadm -V > mdadm - v4.1-140-g1f5d54a0 - 2021-05-26 > > (same version as my package) > > Updated initramfs and rebooted multiple times. The server still hangs in > the same place (mdadm --wait-clean --scan) every time. I know. I only ran `make && make install` and didn't update initramfs. What's your commands to update initramfs. And reboot has relationship with initramfs?
Because Chad has found some hints related with systemd. Move this to systemd first. Feel free move it back to me if I'm wrong.
(In reply to XiaoNi from comment #25) > (In reply to Chad from comment #22) > > (In reply to XiaoNi from comment #21) > > > (In reply to Chad from comment #19) > > > > > > > > As mentioned earlier, I built an mdadm package against the git master branch > > > > and it made no difference. > > > > > > I didn't build a package. This is what I do: > > > > > > 1. git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git > > > 2. make && make install > > > 3. reboot (It can stuck this time) > > > 4. manually reboot (press the reboot button on the machine, wait the machine > > > boot) > > > 5. reboot (It can boot successfully this time) > > > > > > In step3 mdmonitor is running based on the old version now, so the reboot is > > > stuck when first reboot. > > > After step4, the mdmonitor is running based on the new version. The reboot > > > of step5 can run successfully. > > > > > > Do you reboot twice? > > > > Repeated your steps; exactly. > > > > $ mdadm -V > > mdadm - v4.1-140-g1f5d54a0 - 2021-05-26 > > > > (same version as my package) > > > > Updated initramfs and rebooted multiple times. The server still hangs in > > the same place (mdadm --wait-clean --scan) every time. > > I know. I only ran `make && make install` and didn't update initramfs. > What's your commands to update initramfs. dracut > And reboot has relationship with initramfs? mdadm/mdmon are installed into initramfs (see 90mdraid/module-setup.sh).
(In reply to Chad from comment #22) > > > > I didn't build a package. This is what I do: > > > > 1. git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git > > 2. make && make install > > 3. reboot (It can stuck this time) > > 4. manually reboot (press the reboot button on the machine, wait the machine > > boot) > > 5. reboot (It can boot successfully this time) > > > > In step3 mdmonitor is running based on the old version now, so the reboot is > > stuck when first reboot. > > After step4, the mdmonitor is running based on the new version. The reboot > > of step5 can run successfully. > > > > Do you reboot twice? > > Repeated your steps; exactly. Hi Chad Does it mean you install mdadm local rather than updating it to initramfs. In my test, I didn't update initramfs and only installed latest upstream on the machine.
*** Bug 1956133 has been marked as a duplicate of this bug. ***
This message is a reminder that Fedora Linux 34 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '34'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 34 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed.
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07. Fedora Linux 34 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. Thank you for reporting this bug and we are sorry it could not be fixed.