Bug 834245 - Shutdown does not complete. hangs unmounting oldroot
Summary: Shutdown does not complete. hangs unmounting oldroot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 17
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Jes Sorensen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 853467 887562 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-21 10:11 UTC by Sergio Pascual
Modified: 2021-07-28 00:26 UTC (History)
27 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-20 21:38:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Dmesg-txenoo (71.03 KB, text/plain)
2012-12-03 18:19 UTC, Chema Casanova
no flags Details

Description Sergio Pascual 2012-06-21 10:11:40 UTC
Description of problem:

My system hangs when powering off. Log messages are roughly as follows:

Sending SIGTERM
Sending SIGKILL
Unmounting file systems
/sys/kernel/config
/sys/kernel/debug
/dev/mqueue
(more mounts follow)
Disabling swaps
Disabling loop devices
Deataching DM devices
    one left
Cannot finalize, trying to kill
Deataching DM devices
    one left
Cannot finalize, giving up
Successfully changed into root pivot
Umounted /oldroot/proc
Umounted /oldroot/dev/pts
Umounted /oldroot/run
(more mounts here)
Umounted /oldroot/sys

And then hangs

Version-Release number of selected component (if applicable):
kernel-3.4.2-4.fc17.x86_64

How reproducible:
Always

Additional info:
The system does reboot cleanly with kernels kernel-3.3.7-1.fc17.x86_64 and before
The system does not reboot with kernel-3.4.2 and kernel-3.4.0

My system has Intel BIOS Raid. It suffered bug #752593 (shutdown does not complete with Intel BIOS RAID)

May this bug be the same under disguise?

Comment 1 beejay.uk 2012-06-22 05:30:50 UTC
Exactly the same problem here with an IMSM Raid 10 array.

Comment 2 Sergio Pascual 2012-06-22 13:49:33 UTC
kernel-3.4.3-1.fc17.x86_64 does not work either

Comment 3 Sergio Pascual 2012-07-03 13:41:28 UTC
Kernel 3.4.4-3.fc17.x86_64 does not work

Comment 4 dan 2012-07-07 23:23:26 UTC
Confirm same issue as Mr. Pascual.

Kernel 3.4.4-3.fc17.x86_64

System also is Intel BIOS Raid.

Comment 5 Taylor Gunnoe 2012-07-18 14:02:27 UTC
I can also confirm the issue using Intel BIOS raid 10

Comment 6 bender 2012-07-25 07:55:14 UTC
Confirming issue with 3.4.6-2.fc17.x86_64 and Intel BIOS raid 1

Comment 7 Thomas Clark 2012-08-10 19:07:47 UTC
I have exactly the same problem, using Intel BIOS raid and kernel 3.4.7-1.fc16.i686.PAE. Like Sergio Pascual (comment 0), I have wondered if this might be a disguised return of bug 752593, which I also experienced.

Comment 8 Jari Turkia 2012-08-19 19:47:54 UTC
(In reply to comment #0)
> Description of problem:
...
> May this bug be the same under disguise?

This issue happens in F17 with kernel 3.5.2-1.fc17.x86_64.

My system has ICH10R-chip on motherboard and there is RAID1 volume configured.

# mdadm -D /dev/md127
/dev/md127:
        Version : imsm
     Raid Level : container
  Total Devices : 2
Working Devices : 2
  Member Arrays : /dev/md/Mirror0

I typically shut down the system with Alt-SysRq-o, since it is well stuck there. A fix would be appreciated.

Comment 9 Sergio Pascual 2012-08-26 14:21:22 UTC
Raising severity to High, perhaps we can get some attention from kernel maintainers

Comment 10 Jes Sorensen 2012-09-10 09:12:46 UTC
What mdadm version is installed on the systems where the reboots are failing?

When did the problem occur? IMSM RAID has a close tie between kernel _and_
mdadm.

Jes

Comment 11 Sergio Pascual 2012-09-10 09:51:47 UTC
I have mdadm-3.2.5-4.fc17.x86_64. This started happening with kernels >= 3.4, just a few kernel updates after Fedora 17 was released. 

I didn't recall the version of mdadm I had when it worked

Comment 12 Jes Sorensen 2012-09-10 11:33:16 UTC
Hmmm, it really shouldn't be this, but did you update via yum or manually?
If so, did you remember to run 'dracut -f' afterwards?

Is the IMSM array your root device?

Any chance you can try and downgrade the kernels to a < 3.4 F17 one and see
if it still shows the problem?

Cheers,
Jes

Comment 13 Sergio Pascual 2012-09-10 13:51:14 UTC
I update via yum and the ISM is my root device. I have installed kernel-3.3.4-5.fc17.x86_64, in the next comment I'll tell you if it worked or not.

Comment 14 Jes Sorensen 2012-09-10 13:55:25 UTC
Ok, after upgrading via yum, please try to run 'dracut -f' before rebooting
into the new system.

One the system is booted, could you try running
'ps aux | grep dmon' 
'ps aux | grep mdadm'

Thanks,
Jes

Comment 15 Sergio Pascual 2012-09-10 14:04:10 UTC
Too late for dracut -f, I have rebooted already. Was it important?

The system *does* reboot with kernel-3.3.4-5.fc17.x86_64

These are the outputs of the commands

$ ps aux | grep dmon
root       405  0.0  0.0  15004 10904 ?        SLsl 15:57   0:00 @dmon --offroot md127
root       647  0.0  0.0  14972 10872 ?        SLsl 15:57   0:00 /sbin/mdmon --takeover md127

$ ps aux | grep mdadm
root      1173  0.0  0.0   4908   492 ?        Ss   15:57   0:00 /sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid

Perhaps you need this also:

$ cat /proc/mdstat 
Personalities : [raid1] 
md126 : active raid1 sda[1] sdb[0]
      976759808 blocks super external:/md127/0 [2/2] [UU]
      
md127 : inactive sdb[1](S) sda[0](S)
      5288 blocks super external:imsm
       
unused devices: <none>

Comment 16 Jes Sorensen 2012-09-11 07:46:26 UTC
Sergio,

Thanks, you have the @dmon there, so that means the initramfs is launching
mdmon correctly.

Given that this changes based on the kernel version, it sounds like we need
to look for the problem there or in the init scripts, rather than mdadm.

Cheers,
Jes

Comment 17 Aram Agajanian 2012-09-12 15:32:09 UTC
*** Bug 853467 has been marked as a duplicate of this bug. ***

Comment 18 Olivier 2012-09-23 12:04:20 UTC
Same problem here. power-off is okay with F17 fresh install (Kernel 3.3.4). But when updated to latest version via 'Software Update' (Kernel 3.5.4): power-off or restart hang at 'Umounted /oldroot/sys'

With the updated system, booting with the old 3.3.4 kernel solves the problem.

Gigabyte Z68P-DS3 rev1.0 F9 - Intel i5 - RAID 1

Comment 19 Aram Agajanian 2012-10-15 02:47:45 UTC
This problem seems to be fixed in kernel-3.6.1-1.fc17.x86_64.

Comment 20 Olivier 2012-10-15 07:58:09 UTC
Interesting - I've been using the 3.3.4 kernel since the occurrence of this bug. When I try to switch to 3.6.1-1, I have the following error at the 'password' prompt (the RAID1 drives are encrypted):

udevd [241] inotify_add_watch /dev/sda1, 10 : failed, no such file or directory

This looks like a new bug, preventing me from confirming Aram's comment 19.

Comment 21 Sergio Pascual 2012-10-15 08:47:03 UTC
In my case, it works partially with 3.6.1-1. I can reboot the system, but I cannot power off the system. 

In power off the system hangs with the fedora logo. In doesn't respond even to sys req.

Comment 22 Jes Sorensen 2012-10-15 09:03:18 UTC
(In reply to comment #20)
> Interesting - I've been using the 3.3.4 kernel since the occurrence of this
> bug. When I try to switch to 3.6.1-1, I have the following error at the
> 'password' prompt (the RAID1 drives are encrypted):
> 
> udevd [241] inotify_add_watch /dev/sda1, 10 : failed, no such file or
> directory
> 
> This looks like a new bug, preventing me from confirming Aram's comment 19.

This is a different bug, it sounds like something goes wrong with the boot
scripts.

Jes

Comment 23 Jes Sorensen 2012-10-15 09:04:24 UTC
(In reply to comment #21)
> In my case, it works partially with 3.6.1-1. I can reboot the system, but I
> cannot power off the system. 
> 
> In power off the system hangs with the fedora logo. In doesn't respond even
> to sys req.

Please try to remove 'rhgb quiet' from the kernel boot command line and
see where it hangs.

Comment 24 Olivier 2012-10-15 09:33:11 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > Interesting - I've been using the 3.3.4 kernel since the occurrence of this
> > bug. When I try to switch to 3.6.1-1, I have the following error at the
> > 'password' prompt (the RAID1 drives are encrypted):
> > 
> > udevd [241] inotify_add_watch /dev/sda1, 10 : failed, no such file or
> > directory
> > 
> > This looks like a new bug, preventing me from confirming Aram's comment 19.
> 
> This is a different bug, it sounds like something goes wrong with the boot
> scripts.
> 
> Jes

Thanks Jes - FYI, I've created Bug #866395

Comment 25 Sergio Pascual 2012-10-18 08:58:18 UTC
I have tested several times and it seems to work correctly. My system does reboot and poweroff. I don't know why it didn't poweroff on Monday

Comment 26 dan 2012-11-10 12:26:36 UTC
Issue no longer occurs under kernel 3.6.6-1.fc17.x86_64.

Comment 27 Sergio Pascual 2012-12-03 09:49:03 UTC
I'm suffering this problem again. My system does not shutdown nor reboot with
 kernel-3.6.8-2.fc17.x86_64 The reboot process hangs after

Sending SIGTERM
Sending SIGKILL
Unmounting file systems
/sys/kernel/config
/sys/kernel/debug
/dev/mqueue
/dev/hugepages

Comment 28 Jes Sorensen 2012-12-03 10:02:54 UTC
When providing updated version numbers, please make sure to include:

mdadm
dracut
kernel

Thanks,
Jes

Comment 29 Sergio Pascual 2012-12-03 10:24:11 UTC
Here it is

mdadm-3.2.6-1.fc17.x86_64
dracut-018-105.git20120927.fc17.noarch
kernel-3.6.8-2.fc17.x86_64

Comment 30 Chema Casanova 2012-12-03 17:46:58 UTC
I have the same problem in F18 Beta.

mdadm-3.2.6-1.fc18.x86_64
dracut-024-10.git20121121.fc18.x86_64
kernel-3.6.7-5.fc18.x86_64

It was impossible for me to reboot after an update with fedup. Finally i decided to do a clean install from F18 DVD but the problem remained so i need to do always hard reset because neither reboot or poweroff go further unmouting devices.

Comment 31 Jes Sorensen 2012-12-03 17:55:31 UTC
Chema,

Any chance you can provide us the output of /proc/mdstat as well as 'dmesg' ?

Thanks,
Jes

Comment 32 Thomas Clark 2012-12-03 17:59:23 UTC
Just resurfaced again for me also, with the latest kernel.  Currently running:

mdadm-3.2.6-1-fc16.i686
dracut-018-60.git20120927.fc16.noarch
kernel-3.6.7-4.fc16.i686.PAE

Comment 33 Chema Casanova 2012-12-03 18:19:52 UTC
Created attachment 656878 [details]
Dmesg-txenoo

Comment 34 Chema Casanova 2012-12-03 18:22:11 UTC
Jes,

I've just added the dmesg and here it is the outpot of /proc/mdstat

Personalities : [raid1] 
md126 : active raid1 sdb[1] sdc[0]
      1953511424 blocks super external:/md127/0 [2/2] [UU]
      
md127 : inactive sdc[1](S) sdb[0](S)
      6056 blocks super external:imsm
       
unused devices: <none>

I have mounted in the raid the next volumes and the swap.

/dev/md126p2 on / type ext4 (rw,relatime,data=ordered)
/dev/md126p1 on /boot type ext4 (rw,relatime,data=ordered)
/dev/md126p3 on /home type ext4 (rw,relatime,data=ordered)

Chema

Comment 35 Jes Sorensen 2012-12-04 15:57:24 UTC
Interesting, so basically the problem seems to be gone in kernels
between 3.6.1 and 3.6.6 but back again with kernel 3.6.7+

Harald, any idea what to look for next?

Comment 36 Sergio Pascual 2012-12-04 16:51:51 UTC
I don't know if it's related, but a few weeks ago my md126 device changed its name to md125. In other aspects, my raid configuration is equivalent to Chema's (Intel BIOS Raid)

Comment 37 Jes Sorensen 2012-12-04 16:55:52 UTC
Sergio,

I don't think it is related. In principle the names should be assigned
automatically, unless you have explicitly assigned a name in your
/etc/mdadm.conf

Cheers,
Jes

Comment 38 Sergio Pascual 2012-12-04 17:02:08 UTC
I have downgraded mdadm to mdadm-3.2.3-6.fc17.x86_64 (just to be sure) and it hangs, but later. The reboot process goes until the "reboot" word appear in the console and then hangs.

Comment 39 Jes Sorensen 2012-12-05 09:07:35 UTC
3.2.3-6 should still have the --offroot support, as this was added in 3.2.3-4,
so it shouldn't be that causing it.

3.2.3-3 and older are expected to hang, 3.2.3-4 and later shouldn't :(

Comment 40 Taylor Gunnoe 2012-12-06 18:49:37 UTC
I am also experiencing this problem again, but not I can't even boot.

kernel 3.6.7+, freezes on "Unmounting file systems" on boot.

Comment 41 Sergio Pascual 2012-12-11 15:51:34 UTC
The past week I installed a new F17 x86_64 system with Intel BIOS RAID. I used a Live CD. After the install, the system was able to reboot (was kernel-3.3.4-5).

The I updated the system. After the update, the system could not reboot anymore, even with the 3.3.4 kernel. So the kernel is not (the only?) culprit.

Comment 42 Jes Sorensen 2012-12-12 06:57:58 UTC
Sergio,

If possible, could you try and downgrade dracut to the version from the install
as well? Please run 'dracut -f' after upgrading/downgrading it.

Cheers,
Jes

Comment 43 Sergio Pascual 2012-12-12 10:08:43 UTC
I have downgraded to dracut-018-35.git20120510.fc17.noarch

and I have done dracut -f afterwards. The system freezes on "Unmounting file systems". Perhaps is a systemd bug? How safe is to downgrade systemd?

Comment 44 Michal Schmidt 2012-12-12 14:38:07 UTC
Downgrading systemd is reasonably safe.

Comment 45 Michal Schmidt 2012-12-12 15:05:56 UTC
If you add "rd.break=pre-shutdown" to the kernel command line, dracut should give you a shell before it runs its unmounting loop. In the shell it would be helpful to explore what filesystems are still mounted and what processes are running.

Comment 46 Sergio Pascual 2012-12-13 18:38:40 UTC
It's weird, but with "rd.break=pre-shutdown" my kernel stops on boot, before printing "Fedora 17...." in blue in the text  console. On shutdown nothing happens (apart on hanging on Unmounting file systems, of course)

Kernel 3.6.9-2.fc17.x86_64

Comment 47 Jes Sorensen 2012-12-18 11:28:22 UTC
Running tests with Fedora 18 TC3, I am seeing the same problem there after
fresh installs. Basically the install goes fine, but on reboot it goes:

Stopping Software RAID monitor takeover
Unmounting /

<hang>

Looks like the latest systemd or dracut is doing something wrong or ignoring
the --offroot argument.

Jes

Comment 48 Harald Hoyer 2012-12-20 13:32:51 UTC
If stopping "mdmon" is the culprit, then it should not have been doing the takeover in the real root in the first place, but leave the mdmon from the initramfs running (which should have set "@" as the first char in argv[0]).

Comment 49 Diego Rossi Mafioletti 2012-12-20 15:14:21 UTC
I do not know if it's relevant at the current subject, but...

In my Fedora 17 x86_64, the reboot process occurs after a period of approximately 10 min. after "Unmounting file systems." message, but the shutdown hangs infinitely. (some watchdog?)

It is a clean installation of Fedora 17 x86_64 from DVD, plus "yum update -y".


dracut-018-105.git20120927.fc17.noarch
mdadm-3.2.6-1.fc17.x86_64
kernel-6.3.10-2.fc17.x86_64

Personalities: [raid1]
md126: active raid1 sda [1] sdb [0]
       488383488 blocks super external :/ md127 / 0 [2/2] [UU]
      
md127: inactive sdb [1] (S) sda [0] (S)
       5928 blocks super external: IMSM
       
unused devices: <none>


Diego

Comment 50 Diego Rossi Mafioletti 2012-12-27 10:56:07 UTC
Hum... after a "yum downgrade mdadm" plus "dracut -f", my system returned to reboot and turn off normally!
Apparently, the culprit was the mdadm.


Current versions:
mdadm-3.2.3-6.fc17.x86_64
dracut-018-105.git20120927.fc17.noarch
kernel-3.6.10-2.fc17.x86_64

Comment 51 Fabrício Godoy 2013-01-02 01:51:18 UTC
"yum downgrade mdadm" does the trick!

mdadm-3.2.3-6.fc17.x86_64
dracut-018-105.git20120927.fc17.noarch
kernel-3.6.10-2.fc17.x86_64

Comment 52 SpuyMore 2013-01-02 21:18:37 UTC
Confirming the same problem in FC18 with any upgrades up to:

dracut-024-16.git20121220.fc18.x86_64
kernel-3.6.11-3.fc18.x86_64
mdadm-3.2.6-7.fc18.x86_64

I think my bug report #879327 can be marked as a duplicate of this.

Comment 53 Jes Sorensen 2013-01-03 13:29:41 UTC
*** Bug 887562 has been marked as a duplicate of this bug. ***

Comment 54 Harald Hoyer 2013-01-03 13:51:16 UTC
(In reply to comment #51)
> "yum downgrade mdadm" does the trick!

reassigning back to mdadm

Comment 55 Fedora Update System 2013-01-04 17:03:35 UTC
mdadm-3.2.6-8.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/mdadm-3.2.6-8.fc17

Comment 56 Jes Sorensen 2013-01-04 17:17:37 UTC
Ok I think I finally understood the problem here. Basically when we rolled out
the --offroot support there was a bug in the mdadm upstream code which meant
that mdmon processes launched with --offroot would not be taken over in case
of a follow-on 'mdmon --takeover' launch from the mdmonitor-takeover.service.
This was fixed between 3.2.5 and 3.2.6 upstream, which is why the problem is
showing up now.

Basically mdmonitor-takeover.service is now obsolete since we roll back to the
initrd during shutdown, and we rely on the mdmon launched from there to handle
the metadata writeout before rebooting.

I have pushed mdadm-3.2.6-8 into updates-testing which should fix this problem
for Fedora 17+

Please give it a spin and report back.

Thanks,
Jes

Comment 57 SpuyMore 2013-01-04 20:51:52 UTC
Hi Jes,

I have updated but it didn't work. System now does not boot completely. It hangs right after "started initialized storage subsystems (RAID, LVM, etc.)" and started monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling". Help! How can I boot the system now? I mast add that I also installed the latest release version of dracut (yesterday's release). After upgrading mdadm and dracut I ran dracut -f. I suspect the mdmon is not running.

Thanks, Dennis

Comment 58 SpuyMore 2013-01-04 20:57:08 UTC
Sorry, disregard the above comment. It was meant for bug #879327

Comment 59 Fedora Update System 2013-01-05 06:57:20 UTC
Package mdadm-3.2.6-8.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing mdadm-3.2.6-8.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-0275/mdadm-3.2.6-8.fc17
then log in and leave karma (feedback).

Comment 60 Thomas Clark 2013-01-05 15:42:18 UTC
Any chance this fix could make it into Fedora 16?

Comment 61 dan 2013-01-05 17:01:51 UTC
mdadm-3.2.6-8.fc17 installed.  I can now reboot successfully.

Comment 62 Jes Sorensen 2013-01-07 15:50:34 UTC
(In reply to comment #60)
> Any chance this fix could make it into Fedora 16?

Thomas,

I have a build for F16 which also fixes the dangling symlink to
mdmonitor-takeover.service which reappeared in 3.2.6-8

I don't have a Fedora 16 system ready for testing so if you want to test this 
build and report back, that would be useful. Just be sure to install it,
run dracut -f, and try (the first reboot after the dracut run will still hang).

http://alt.fedoraproject.org/pub/alt/stage/18-RC1/Fedora/x86_64/

Note this is at your risk, but I hope it works.

Jes

Comment 63 Sergio Pascual 2013-01-09 11:46:08 UTC
We are near F18 release and I wonder if this problem will appear in F18 install media

Comment 64 Jes Sorensen 2013-01-09 11:59:25 UTC
Sergio,

I think we're ok for the installation itself, but post installation mdadm
will need to be updates to 3.2.6-11 at least.

Note there is still a problem if a user has two BIOS raid arrays, see
BZ#879327

Jes

Comment 65 Thomas Clark 2013-01-09 12:57:34 UTC
Thanks, Jes!  I would be happy to test with Fedora 16.  However, I can't find anything labeled for Fedora 16 in that link.  Can you point me in the right direction?  I am running Fedora 16 32-bit.

Comment 66 Sergio Pascual 2013-01-09 13:00:26 UTC
(In reply to comment #64)
> Sergio,
> 
> I think we're ok for the installation itself, but post installation mdadm
> will need to be updates to 3.2.6-11 at least.
> 

If I understand correctly, this means that a host installing F18 will experience a hang when rebooting after installing the OS and bootloader?


> Note there is still a problem if a user has two BIOS raid arrays, see
> BZ#879327
> 
> Jes

Comment 67 Jes Sorensen 2013-01-24 10:24:41 UTC
Sergio,

Yes indeed it will - F18 installs will need the fixes from here to be able
to reboot correctly:

https://admin.fedoraproject.org/updates/dracut-024-23.git20130118.fc18,mdadm-3.2.6-12.fc18

Jes

Comment 68 Ian Neal 2013-02-27 00:06:08 UTC
Any news on when this going to land in F17?

Comment 69 Olivier 2013-03-18 21:43:22 UTC
Any update on these fixes in F17? It's been 'ON_QA' for a while now. Unforseen troubles? Thanks.

Comment 70 Mark Harfouche 2013-03-18 22:35:44 UTC
It has been ok for me. F17x64.

Comment 71 Fedora Update System 2013-03-20 21:38:30 UTC
mdadm-3.2.6-8.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 72 Olivier 2013-03-21 15:36:00 UTC
oldroot now unmounts okay.

but then dracut say: 'waiting mraid devices to be clean' and hangs.

I suspect this is because my raid array is in 'verify' mode, since I had to manually reset & power off the PC many times.

Comment 73 Olivier 2013-03-22 08:24:19 UTC
everything now fine when the raid array is in a normal state.

Thanks.


Note You need to log in before you can comment on or make changes to this bug.