Bug 752593 - shutdown does not complete with Intel BIOS RAID - hangs at remounting root read-only
shutdown does not complete with Intel BIOS RAID - hangs at remounting root re...
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
16
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
:
: 751060 752498 753339 (view as bug list)
Depends On:
Blocks: 771405 785737 785739 887562
  Show dependency treegraph
 
Reported: 2011-11-09 17:57 EST by Aram Agajanian
Modified: 2012-12-17 10:23 EST (History)
32 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 771405 (view as bug list)
Environment:
Last Closed: 2012-03-12 22:27:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Aram Agajanian 2011-11-09 17:57:47 EST
Description of problem:
When a shutdown commmand is entered, the computer starts to shut down but then stops.  The last message displayed on the console is:

Unmounting file systems.
Unmounted /sys/fs/fuse/connections.
Unmounted /var/lib/nfs/rpc_pipefs.
Unmounted /sys/kernel/debug.
Unmounted /dev/hugepages.
Unmounted /sys/kernel/security.
Unmounted /dev/mqueue.

It seems that order of the lines listed under "Unmounting file systems" varies from shutdown to shutdown.  

Version-Release number of selected component (if applicable):
systemd-36-3.fc16.x86_64

How reproducible:
Happens at every reboot or shutdown.  It happened when the system was instructed to reboot in Anaconda after installation.

Steps to Reproduce:
1.  Boot computer.
2.  Enter a shutdown command

  
Actual results:
Computer does not shut down completely.  It halts and never turns off.

Expected results:
Computer should shut down completely and the power should turn off.

Additional info:
In order to boot the system, I have to enter a kernel parameter of selinux=0.  Otherwise, the OS tries to relabel the SELinux contexts on all of the files.  This won't work because it can't successfully reboot after relabeling.

This bug seems similar to bug #749572.  However, this computer has a CPU with a x86_64 architecture, not PPC.

When installing, I used custom partitioning and I reused existing partitions from earlier Fedora installations.  I did create a new BIOS Boot Partition.

This computer uses Intel BIOS RAID 1.

When installing, I initially selected to install grub2 on the /boot partition.  However, the computer couldn't boot after that.  So, I booted into rescue mode on the installation CD and ran the command "grub2-install /dev/md126".  After that, I was able to boot the OS.
Comment 1 Michal Schmidt 2011-11-11 09:46:33 EST
Could you test with this systemd build?:
http://kojipkgs.fedoraproject.org/scratch/michich/task_3506609/
I added some debug prints in the shutdown path.
Comment 2 Aram Agajanian 2011-11-11 19:39:59 EST
Here is what I see when trying to shut down:

Unmounting file systems.
In mount_points_list_umount()
  m->path = /var/lib/nfs/rpc_pipefs
Unmounted /var/lib/nfs/rpc_pipefs.
  m->path = /dev/hugepages
Unmounted /dev/hugepages.
  m->path = /sys/kernel/security
Unmounted /sys/kernel/security.
  m->path = /dev/mqueue
Unmounted /dev/mqueue.
  m->path = /sys/kernel/debug
Unmounted /sys/kernel/debug.
  m->path = /
out mount_points_list_umount()
in mount_points_list_umount()
  m->path = /
out mount_points_list_umount()
in mount_points_list_umount()
  m->path = /
out mount_points_list_umount()
in mount_points_list_remount_read_only()
  m->path = / ; m->skip_ro = 0
Comment 3 Michal Schmidt 2011-11-12 11:09:04 EST
Thanks. So it is hung inside this syscall to remount root read-only:
mount(NULL, "/", NULL, MS_MGC_VAL|MS_REMOUNT|MS_RDONLY, NULL);
Comment 4 Michal Schmidt 2011-11-14 04:14:08 EST
*** Bug 752498 has been marked as a duplicate of this bug. ***
Comment 5 Michal Schmidt 2011-11-14 04:18:54 EST
Could you describe your partitions and filesystem layout in detail? How many disks, what RAID arrays are there, what filesystem types are used, ...
Comment 6 brian.broussard 2011-11-14 08:46:26 EST
I have several systems all RAID 1 using Intel Matrix solution (Dell optiplex 960 and 990)  We have bee running Fedora 11 on the 960 and move to Fedora 15 on the 990 as to support the newer chips set... notice that the 2.6.38 Kernel works ok but the 2.6.40 and supporting packages cause negative effects to the RAID upon shutdown.  

thus I am looking at Fedora 16 very carefully to ensure we can update out fleet here in the near term.  

physical solutions: 

SATA Drive 160 or 250 GB (2 disk)
RAID 1 from the Intel Matrix Storage Manager  ROM V8.5 and ROM V10

built all on the raid
boot            ext4
/               ext4
swap

Let me know if you need a test done.  

I am failing at the same point 
in mount_points_list_remount_read_only()
  m->path = / ; m->skip_ro = 0
Comment 7 Michal Schmidt 2011-11-14 09:24:28 EST
Intel BIOS RAID is the common pattern here.
The problem is that Intel RAID is partially implemented in userspace (mdmonitor) and we kill all userspace processes before umounting.
There's been a discussion about this on systemd-devel recently. The fix will be to stop doing the mdmon takeover on boot. Instead the mdmonitor running from initramfs will be kept alive all the time. On shutdown we won't kill it and umount/remount will succeed.
Comment 8 brian.broussard 2011-11-14 09:39:53 EST
will this keep the RAID from failing as it is doing currently in Fedora 15 latest YUM update?
Comment 9 Aram Agajanian 2011-11-14 10:12:40 EST
I would guess that Comment #8 is referring to Bug #736387.
Comment 10 brian.broussard 2011-11-14 10:30:48 EST
(In reply to comment #9)
> I would guess that Comment #8 is referring to Bug #736387.

Yes I am looking at that right now will test here shortly. Thanks.
Comment 11 Doug Ledford 2011-11-16 12:17:18 EST
(In reply to comment #7)
> Intel BIOS RAID is the common pattern here.
> The problem is that Intel RAID is partially implemented in userspace
> (mdmonitor) and we kill all userspace processes before umounting.
> There's been a discussion about this on systemd-devel recently. The fix will be
> to stop doing the mdmon takeover on boot. Instead the mdmonitor running from
> initramfs will be kept alive all the time. On shutdown we won't kill it and
> umount/remount will succeed.

I'm sorry, but where is this conversation at?  And since the fix is to stop mdmon from doing a takeover, maybe it should have involved the mdadm/mdmon maintainers?
Comment 12 Kay Sievers 2011-11-16 16:08:36 EST
(In reply to comment #11)
> I'm sorry, but where is this conversation at?

  http://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg03659.html
Comment 13 Doug Ledford 2011-11-16 16:57:57 EST
I'm not entirely convinced by that thread that all things will be OK with systemd, but we can try and see.
Comment 14 brian.broussard 2011-11-17 16:05:34 EST
(In reply to comment #12)
> (In reply to comment #11)
> > I'm sorry, but where is this conversation at?
> 
>   http://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg03659.html

This is take from the given link:

+ That would be the ideal solution. Having the roofs depending on a
+ tools that runs off the rootfs just asks for serious trouble. If all
+ that can't move to the kernel, the initramfs-only solution with the
+ above mentioned constrains, seems like the best option.
+
+ People who like to put their rootfs on a userspace managed raid device
+ just get what they asked for. :)
+
+ Kay


Can you point me to the Fedora documentation on recommended partition configuration when using a BIOS RAID?  As most of my servers and all my robotic controllers have them... I am stating with at clean sheet with Fedora 16 x86_64 and would like to follow an approved mythology, that is under the distro testing plans, to avoid future issues.

thanks, 
brian
Comment 15 brian.broussard 2011-11-17 16:08:51 EST
(In reply to comment #14)
> (In reply to comment #12)
> > (In reply to comment #11)
> > > I'm sorry, but where is this conversation at?
> > 
> >   http://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg03659.html
> 
> This is take from the given link:
> 
> + That would be the ideal solution. Having the roofs depending on a
> + tools that runs off the rootfs just asks for serious trouble. If all
> + that can't move to the kernel, the initramfs-only solution with the
> + above mentioned constrains, seems like the best option.
> +
> + People who like to put their rootfs on a userspace managed raid device
> + just get what they asked for. :)
> +
> + Kay
> 
> 
> Can you point me to the Fedora documentation on recommended partition
> configuration when using a BIOS RAID?  As most of my servers and all my robotic
> controllers have them... I am stating with at clean sheet with Fedora 16 x86_64
> and would like to follow an approved mythology, that is under the distro
> testing plans, to avoid future issues.
> 
> thanks, 
> brian

Specifically the Intel Matrix Family
Comment 16 Peng Huang 2011-11-22 15:22:51 EST
My two HP Z600 workstations have the same problem.
Comment 17 Reartes Guillermo 2011-11-26 17:58:00 EST
I recently had a problem:

"f16 gpt corrupt header, repair it and later the other one gets corrupted"

which i solved by switching the SATA controller from RAID (it came configured
with it) mode to AHCI. While disk/controller cannot be ruled out 100%, i got totally surprised to be RAID mode related. I did not expect this. 
Come to think of, i remember having to power off the laptop because it sometimes did not power-off... humm...

Details, even gpt headers dumps where the corruption shows:

http://forums.fedoraforum.org/showthread.php?t=272868

It may be another controller, but it is an INTEL RAID. Since it is a laptop, there was no volume created, but that was enough to corrupt the disk. 
It worked well with F15 dual boot mbr partition, but got exposed by f16 single boot gpt partitioned disk.
Comment 18 Peter Bieringer 2011-11-27 14:43:04 EST
Hit also by this bug, having also Intel RAID.
Comment 19 Peter Bieringer 2011-12-18 11:07:02 EST
Perhaps already mentioned: it happen on my system only on "shutdown", not on "reboot", any hints, where I can find the script which run

Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
Unmounting file systems.
Unmounted ...
....

to add some debug code there?

Or is everything handled by /lib/systemd/systemd-shutdown (a binary program), but in this case, how to debug easily? Is there a debug mode available for this program then? Systemd Debug options in grub2.cfg already tried, did not help here :-(
Comment 20 Peter Bieringer 2011-12-19 14:18:11 EST
After adding some logging code to systemd-shutdown.c I can confirm #2 - #4 that system hangs on try to remount readonly "/" in

static int mount_points_list_remount_read_only(MountPoint **head, bool *changed) {
....
                /* Trying to remount read-only */
                if (mount(NULL, m->path, NULL, MS_MGC_VAL|MS_REMOUNT|MS_RDONLY, NULL) == 0) {
                        if (changed)
                                *changed = true;

the mount call never returns on "/". This is bad, because it's infinite. Can someone send me a code sniplet to abort this system call after a specific time?


Searching for the root cause I started into runlevel 1 and killed mdmon before executing "reboot", but this leads into a more strange scenario, because then the unmount part is no longer reached at all, system hangs after

Successfully openend /dev/console for logging

without further notifications :-(


BTW: very strange is that "poweroff" is successful in this state (not hanging in the umount). So there must be something different between "shutdown" and "poweroff" in systemd started from runlevel 1.
Comment 21 Doug Ledford 2012-01-03 11:06:16 EST
We need to know what versions of systemd under f15/f16/rawhide implement the mentioned changes (unrolling to the initramfs environment) so we can update mdadm/mdmon to match and include an appropriate Requires: systemd >= for each release to make sure compatibility is maintained.
Comment 22 phisto 2012-01-13 01:29:46 EST
The same bug on FC16 with kernel 3.1.7-1.fc16.x86_64 and systemd-37-3.fc16.x86_64 Motherboard on Intel P55. Intel BIOS RAID1.
Comment 23 Thomas Clark 2012-01-21 14:40:28 EST
Same bug on F16, kernel 3.1.9-1.fc16.i686.PAE, systemd-37-3.fc16
Intel BIOS RAID1
Comment 24 Cédric M. Campos 2012-01-29 04:27:54 EST
This bug is still present on a fully updated fresh install:
- Kernel 3.2.2-1.fc16.x86_64
- ICH9R RAID5 (with 3 disks)
- 1 BIOS boot, 2 Ext4 (/ and /home), Unused free space (I pretended to install Win7, but it does not support GPT for installation... really...)
Comment 25 Jes Sorensen 2012-01-30 08:57:54 EST
phisto, Thomas, Cedric,

We are aware of the problem, the fix is not ready yet, but we are working
on it.

Thanks,
Jes
Comment 26 Cédric M. Campos 2012-01-30 09:03:13 EST
Thx Jes for the update.

If you need some extra info, don't hesitate to ask.
Comment 27 Dmitri 2012-02-11 01:50:47 EST
I'm not 100% sure if it's relevant, but I saw some comments mentioning that having the root fs be based on the raid is part of the problem. I'm experiencing this bug but my root fs is on a separate, non-RAID SSD that's plain old EXT4. The only things in my RAID array are /home, /var, /tmp.

Also, when it does hang on shutdown, I always see this message a few lines above the point where it hangs:
umount: /var: device is busy.

It looks like /var is never getting unmounted correctly.
Comment 28 Cédric M. Campos 2012-02-11 09:27:48 EST
Dmitri, could you case-check in order to isolate the problem? What does happen if you only mount one of them from the RAID and the two others from somewhere non-RAID?
Comment 29 Michal Schmidt 2012-02-14 17:30:14 EST
*** Bug 753339 has been marked as a duplicate of this bug. ***
Comment 30 Cédric M. Campos 2012-02-16 08:39:00 EST
Still present in 3.2.5-3.fc16.x86_64 and 3.2.6-3.fc16.x86_64.
Comment 31 Jes Sorensen 2012-02-16 08:57:10 EST
You will need mdadm-3.2.3-3 or later, as well as dracut-016-1, and the
corresponding systemd version - all should be in f17/rawhide by now.

Once we verified it works there and there are no bad side effects, we'll
look at back porting the fixes to f16/f15.

Jes
Comment 32 Thomas Clark 2012-02-16 17:44:15 EST
Jes, thanks for the update.  I hope this fix can make it into F16 quickly.  Right now, the only way to reboot is with the reset button, and that's really living on the edge!
Comment 33 Peter Bieringer 2012-02-17 01:50:16 EST
(In reply to comment #32)
> Jes, thanks for the update.  I hope this fix can make it into F16 quickly. 
> Right now, the only way to reboot is with the reset button, and that's really
> living on the edge!

Hitting the reset button is the worst method, I always use:

https://en.wikipedia.org/wiki/Magic_SysRq_key

Alt SysRq: S (Sync) -> U (Unmount) -> B (reBoot)

can be enabled with kernel boot options (sysrq=1) or later using /etc/sysctl.conf (kernel.sysrq = 1)

While not default, I recommend enabling this.

BTW: I found a difference behaviour while recovering from this hanging unmount while running RAID sync:

Alt SysRq: S (Sync) -> U (Unmount) -> O (powerOff)
 triggers Intel Software RAID driver to push latest sync state to disk (expected), then shutdown the system

Alt SysRq: S (Sync) -> U (Unmount) -> B (reBoot)
 Immediate reboot, *did not* triggers Intel Software RAID driver to push latest sync state to disk (unexpected), which means, that in this cases it can happen that the resync starts from 0 again.

Is this a known "feature" or a bug?
Comment 34 Jes Sorensen 2012-02-17 06:09:17 EST
(In reply to comment #33)
> Alt SysRq: S (Sync) -> U (Unmount) -> B (reBoot)
>  Immediate reboot, *did not* triggers Intel Software RAID driver to push latest
> sync state to disk (unexpected), which means, that in this cases it can happen
> that the resync starts from 0 again.
> 
> Is this a known "feature" or a bug?

Peter, I cannot say for sure what is happening here. However, one issue is
that the 'sync' will only flush currently outstanding writes. It doesn't
flush any metadata that mdmon is still sitting on but it hasn't tried to
write out yet.

Cheers,
Jes
Comment 35 Jens Tingleff 2012-02-21 02:55:31 EST
Sounds terribly familiar... https://bugzilla.redhat.com/process_bug.cgi

Interestingly, I tried fedora17 Alpha 2 and still had this problem. Should I be doing something different?
Comment 36 Jes Sorensen 2012-02-21 03:10:24 EST
Jens,

You pasted the submit completion link rather than the actual bz you
were referring to.

What dracut/mdadm is in Fedora Alpha 2?

Jes
Comment 37 Dmitri 2012-02-21 09:22:42 EST
So for some reason the behavior of my setup changed a few days ago. Specifically, as the system shuts down, it appears to hang, but after 90 seconds throws something like the following lines (it's not in front of me right now,m just pulling this from memory):

Unmount of /home timed out. Stopping.
Unmount of /var timed out. Stopping.
Unmount of /tmp timed out. Stopping.

Waiting another 90 seconds results in a similar 3 lines, only this time saying the stop timed out and it tries to kill instead.
Another 90 seconds later, it says that the process still exists after SIGKILL, and tries to continue. There's a few messages about DM still being busy, but then it does eventually successfully stop the md devices and shutdown. All in all, it takes about ~5-10 minutes to complete the shutdown (especially since about 50% of the time the stopping NetworkManager times out on my system too)

Now, I haven't tried this enough times to be sure, but I think that when I do a restart instead of a shutdown, it works just fine. I'm not sure if it's the fact that I'm restarting that made it work, or it's just an intermittent problem and coincidentally it worked that one time.

Lastly, I can confirm the behavior of the sysreq keys as mentioned above:
Sysreq + R E I S U O results in proper stopping of md devices
Sysreq + R E I S U B results in instant reboot which dirties the array(s)
Comment 38 Jens Tingleff 2012-02-23 01:18:46 EST
Hi Jes,

This link, then? https://bugzilla.redhat.com/show_bug.cgi?id=751060

It's ID 751060 in any case.

I'll have a look at the exact version (that computer is undergoing different experiments right now)
Comment 39 Jens Tingleff 2012-02-27 00:51:01 EST
Hi

Fedora 17 Alpha RC4 does shutdown cleanly. Yay.

mdadm version 3.2.3-6fc17
dracut version 0.17

When shutting down, it talks about stopping DM devices, and giving up (1 device still attached).

Best regards

    Jens
Comment 40 Jes Sorensen 2012-02-28 04:40:38 EST
Jens,

Thanks for the heads up - this is good news. Any chance you can capture the
message you get during the reboot so we can see what may not go as expected?

Cheers,
Jes
Comment 41 Jes Sorensen 2012-02-28 10:17:05 EST
I think we are getting close to a resolution. For those hitting this problem
on Fedora 16, it should be possible to do the following:

Update to mdadm-3.2.3-4 or later (mdadm-3.2.3-6 is in -testing)
Update to dracut-017-3 or later from Fedora 17.

After updating the two packages, one needs to run 
  dracut -f ""
to update the initramfs.

I just did it on my test system here, and after that it reboots happily.
Prior it would hang like described in the bug.

If you try this out, please report back positive karma in bodhi so we can
get it promoted to stable.

I also discussed this with Harald and he will look into pushing the
needed changes or dracut-017 back to F16/F15, but this will take a little
longer.

Thanks,
Jes
Comment 42 Dmitri 2012-02-28 10:26:22 EST
I set my repos to pull from updates testing a few days ago and have had clean shutdowns since. I didn't seem to have to pull anything from Fedora 17. I don't have a lot of samples (only shutting down overnight, and I'm not always booting into linux), but I'll keep it as-is right now, and if it hangs again I'll try pulling dracut-017-3.
Comment 43 Peter Bieringer 2012-02-28 14:33:23 EST
dracut-017-1.fc17.noarch works for me on a F16 system now in combination with updated mdadm, the backport to regular F16 package would be very nice.
Comment 44 Josh Boyer 2012-02-28 19:24:51 EST
*** Bug 751060 has been marked as a duplicate of this bug. ***
Comment 45 Jens Tingleff 2012-02-29 01:27:59 EST
messages at halt[1] - I did "sync; sync; sync; halt" on an xterm from a logged in f17 alpha RC4 system in the vanilla Gnome environment of a new install

unmounting /dev/hugepage
unmounting /dev/mqueue
Disabling swaps
Detaching loop devices
Detaching DM devices
Not all DM devices detached, 1 left
Cannot finalise remaining file system and devices, trying to kill remaining processes
Detaching DM devices
Not all DM devices detached, 1 left
Cannot finalise remaining file system and devices, giving up
[ 361534.4545] System halted

At this point the system does appear quite dead, i.e. no response to "caps lock" from keyboard. The Raid array does come back clean.

Typing "sync; sync; sync; reboot" gets me a reboot with clean disks. Yay! I do get more messages in the "reboot" case than in the "halt" case, starting with "unmounting .." but I can't read them.

Best Regards

Jens

[1] I'd like to capture, rather than type, is there a way to do that? I tried fooling around with logging at an earlier stage, but that didn't work.
Comment 46 Jes Sorensen 2012-02-29 03:10:37 EST
Jens,

Are you running MD or DM devices? Your log indicates DM devices which are
not affected by the changes we have made to mdadm/systemd/dracut.

For logging a serial port or serial over LAN if your system has a BMC,
is the easiest.

Cheers,
Jes
Comment 47 Jens Tingleff 2012-03-01 01:06:53 EST
I'm definitely running MD (I get /dev/md126 as sole disk available when running the installer - without selecting special storage devices - I get /proc/mdstat, and so on and so forth)

I did wonder about the DM words that I had not seen before, I thought they indicated that the shutdown script side of things had good functionality and bad messages in it...

For logging, I do have a LAN and a laptop with linux, is there a cheat-sheet somewhere?

Best regards

Jens
Comment 48 Jens Tingleff 2012-03-01 02:05:35 EST
(Sorry, I'm not up to speed.)

I could do a mdadm from updates-testing by running "yum --enablerepo=updates-testing update mdadm" but I don't know where to find the F17 dracut.

So, which is the repository I need to use for that dracut, please?

I got mdadm.x86_64 version 3.2.3-6.fc16 but I did not get mdadm-sysvinit . I don't get proper shutdown (but the Raid array comes up clean). do I need to manually update the mdadm.sysvinit also?

Thanks for the help so far :)

Jens
Comment 49 Jes Sorensen 2012-03-01 04:17:09 EST
Jens,

The Fedora 17 version of dracut you can find here:
http://mirrors.kernel.org/fedora/releases/test/17-Alpha/Fedora/x86_64/os/Packages/dracut-016-1.fc17.noarch.rpm

You'll have to install it manually using 'rpm -Uvh http....'

Once you have updated mdadm and dracut, you will have to update your initramfs
like this 'dracut -f ""'.

I don't understand why you need mdadm-sysvinit? In Fedora 16 that is just a
single init script for mdmonitor, but if you have a clean Fedora 16 install
I don't believe it would be installed in the first place.

Note that you should make sure you have a recovery boot intry in your grub
config, so you can boot in and downgrade dracut again. Just to be on the safe
side in case it goes wrong.

I hope this helps.

Cheers,
Jes
Comment 50 Jes Sorensen 2012-03-01 04:20:11 EST
(In reply to comment #47)
> I'm definitely running MD (I get /dev/md126 as sole disk available when running
> the installer - without selecting special storage devices - I get /proc/mdstat,
> and so on and so forth)
> 
> I did wonder about the DM words that I had not seen before, I thought they
> indicated that the shutdown script side of things had good functionality and
> bad messages in it...
> 
> For logging, I do have a LAN and a laptop with linux, is there a cheat-sheet
> somewhere?

This is odd, I'll check with the dracut maintainer why you're seeing DM
messages in your log.

If your laptop has a real serial port, you should be able to connect it to
another system and then specify 'console=ttyS0,115200 console=tty0' on the
kernel boot line in order to get console output on the serial port. The
above assumes you are using com0 connected to another system at 115200 baud.

Cheers,
Jes
Comment 51 Harald Hoyer 2012-03-01 06:49:42 EST
(In reply to comment #48)
> So, which is the repository I need to use for that dracut, please?

https://admin.fedoraproject.org/updates/FEDORA-2012-2699/dracut-017-17.git20120229.2.fc17
Comment 52 Harald Hoyer 2012-03-01 06:50:34 EST
(In reply to comment #50)
> (In reply to comment #47)
> > I'm definitely running MD (I get /dev/md126 as sole disk available when running
> > the installer - without selecting special storage devices - I get /proc/mdstat,
> > and so on and so forth)
> > 

What is the output of:

# dmsetup ls --tree
Comment 53 Ed Raynes 2012-03-02 11:09:28 EST
For what it's worth, I have been having the same shutdown bug, even after multiple installs of F16. I do not have a RAIf controller.
Comment 54 Jes Sorensen 2012-03-02 11:18:54 EST
Ed,

Are you running RAID at all, in which case, what is your setup?
Does this happen for you with F17 Alpha?

Thanks,
Jes
Comment 55 Ed Raynes 2012-03-02 12:03:42 EST
Jes,

I am not running RAID. I have not tried F17 Alpha because this is a production machine in a small office environment. However, when I fell back to F14, all is well
Comment 56 Jes Sorensen 2012-03-02 12:08:53 EST
Ed,

Ok, that means you are hitting a different issue than what we have been
dealing with so far. The fixes that went in were directed at solving the
reboot hands on RAID.

Cheers,
Jes
Comment 57 Jens Tingleff 2012-03-03 10:57:03 EST
All,

to recoup: I did an install of Fedora 16, KDE live CD, x86_64.

I upgraded mdadm and dracut using the following:

yum --enablerepo=updates-testing update mdadm
rpm -Uvh http://mirrors.kernel.org/fedora/releases/test/17-Alpha/Fedora/x86_64/os/Packages/dracut-016-1.fc17.noarch.rpm

which gives me

dracut.noarch                            016-1.fc17                     installed
mdadm.x86_64                             3.2.3-6.fc16                   @updates-testing

I get messages about DM devices on shutdown, but in the end everything is OK (I do see the "stopping /dev/md126" and /dev/md127 right before the machine reboots

The DM devices could be LVM and/or crypto (I have /dev/mapper entries for both LVM and an encrypted partition inside my LVM, as I'm sure you see below).

The reason I mentioned DM messages on shutdown after installing MDADM vers 3.2.3-6 was that this was new to me - it could have been a coincidence.

@Harald:

[root@localhostDesk ~]# dmsetup ls --tree
vg-lv_test (253:3)
 └─ (259:6)
vg-lv_swap (253:1)
 └─ (259:6)
vg-lv_root (253:0)
 └─ (259:6)
vg-lv_iest (253:4)
 └─ (259:6)
luks-92028dec-4282-4ffb-af92-eee85439567b (253:5)
 └─vg-lv_crypt (253:2)
    └─ (259:6)

[root@localhostDesk ~]# cat /proc/mdstat 
Personalities : [raid10] 
md126 : active raid10 sda[3] sdb[2] sdc[1] sdd[0]
      1953519616 blocks super external:/md127/0 64K chunks 2 near-copies [4/4] [UUUU]
      
md127 : inactive sdc[3](S) sdd[2](S) sda[1](S) sdb[0](S)
      10576 blocks super external:imsm
       
unused devices: <none>

Thanks for the help, everyone

Best Regards

    Jens
Comment 58 Jes Sorensen 2012-03-03 12:10:18 EST
(In reply to comment #57)
> All,
> 
> to recoup: I did an install of Fedora 16, KDE live CD, x86_64.
> 
> I upgraded mdadm and dracut using the following:
> 
> yum --enablerepo=updates-testing update mdadm
> rpm -Uvh
> http://mirrors.kernel.org/fedora/releases/test/17-Alpha/Fedora/x86_64/os/Packages/dracut-016-1.fc17.noarch.rpm
> 
> which gives me
> 
> dracut.noarch                            016-1.fc17                    
> installed
> mdadm.x86_64                             3.2.3-6.fc16                  
> @updates-testing

Jens,

Just to be sure, after installing dracut and mdadm, you did update your
initramfs by running:

  dracut -f ""

?

Thanks,
Jes
Comment 59 Jens Tingleff 2012-03-05 01:40:11 EST
Hi Jes,

(doh!)

Yes, indeed, I did run dracut -f ""

(should have mentioned that - I was so excited :) ).

Best Regards

    Jens
Comment 60 Harald Hoyer 2012-03-05 07:52:26 EST
(In reply to comment #58)
> Jens,
> 
> Just to be sure, after installing dracut and mdadm, you did update your
> initramfs by running:
> 
>   dracut -f ""

why "" ? 

# dracut -f 

should work just fine...
Comment 61 Jes Sorensen 2012-03-06 04:37:27 EST
(In reply to comment #60)
> (In reply to comment #58)
> > Jens,
> > 
> > Just to be sure, after installing dracut and mdadm, you did update your
> > initramfs by running:
> > 
> >   dracut -f ""
> 
> why "" ? 
> 
> # dracut -f 
> 
> should work just fine...

I have been using 'dracut -f ""' since you told me to do so a while ago.
If 'dracut -f' is sufficient, that is all good with me. However what I was
actually after was to make sure dracut had been run after the install to
make sure the initramfs had been updated.

Cheers,
Jes
Comment 62 Martin Albrecht 2012-03-06 18:01:55 EST
(In reply to comment #51)
> (In reply to comment #48)
> > So, which is the repository I need to use for that dracut, please?
> 
> https://admin.fedoraproject.org/updates/FEDORA-2012-2699/dracut-017-17.git20120229.2.fc17

Hello everybody

Just to let you know:

I'm running Fedora 16 (3.2.7-1.fc16.x86_64). Root file system is a btrfs subvolume stored on intel raid1 array. No luks, no lvm2.

I've updated my system using:
> yum --enablerepo=updates-testing update mdadm
> rpm --upgrade http://kojipkgs.fedoraproject.org/packages/dracut/017/22.git20120302.fc17/noarch/dracut-017-22.git20120302.fc17.noarch.rpm
>dracut --force

After verifying that initramfs has been updated, I performed one last shutdown the hard way... and now shutdown and reboots are working fine again.

Thank you,
Martin
Comment 64 Manuel Bejarano 2012-03-07 14:08:24 EST
Hi guys,

I confirm that upgrading those packages fixed the problem. I'm runnig F16 x86_64; I should point out that dracut depends on hardlink to be installed before to satisfy its dependecies:

> yum --enablerepo=updates-testing update mdadm
> yum install hardlink
> rpm --upgrade http://kojipkgs.fedoraproject.org/packages/dracut/017/22.git20120302.fc17/noarch/dracut-017-22.git20120302.fc17.noarch.rpm
> dracut --force

Thanks!
Comment 65 Manuel Bejarano 2012-03-08 03:45:37 EST
It seems that the the problem is still there but it did not appear the first time that I rebooted the system. The behavior is still the same when trying to power off.

Any clue?
Comment 66 Martin Albrecht 2012-03-09 14:50:44 EST
Hi Manuel

The fix works stable on my system. I've shut down my system using 'Power Off' from the desktop menu about 30 times and I've rebooted my system about 10 times since the upgrade to gain a high level of confidence. Meanwhile I've upgraded the kernel to 3.2.9-1. I didn't encounter any problems so far.

Just an idea: Unless otherwise specified, the dracut command updates the initramfs of the current kernel. In case you've upgraded your kernel before you've deployed the fix, you may still be using a initramfs generated with a previous version of dracut. Running dracut again should solve the problem.

Kind regards,
Martin
Comment 67 Manuel Bejarano 2012-03-09 18:49:56 EST
Hi Martin,

Thanks for your comments. Sometimes my system reboots fine but most of the time it won't power off properly. I'm using LVM and I installed everything as described in #c64. Do I need to update systemd to the latest version in Fedora 17 repositories? Could it be caused because I'm not running some needed service?

Regards,
Manuel.
Comment 68 Martin Albrecht 2012-03-10 02:39:34 EST
Hi Manuel

Besides dracut and mdadm I have only F16 standard + update repo packages installed.

As mentioned in my first post above, I'm no longer using LVM. It would be interesting to here from the others who have successfully upgraded, whether some of them are using LVM.

I'm not an expert regarding this subject, so I can't help you any further. But may it would help if you post where exactly your shutdown process hangs.

Kind regards,
Martin
Comment 69 Dmitri 2012-03-10 16:52:40 EST
After upgrading to F17's dracut my shutdown/restart is now clean. I do use LVM on the actual RAID array itself.

I do have the same issue as mentioned in Comment 47, and it was there even before the fix. It doesn't really cause much of a delay though.

Now my only remaining shutdown issue is that NetworkManager doesn't stop properly and hangs for 90seconds, but that's not relevant to the RAID thing.
Comment 70 Lennart Poettering 2012-03-12 22:04:54 EDT
(In reply to comment #45)
> messages at halt[1] - I did "sync; sync; sync; halt" on an xterm from a logged
> in f17 alpha RC4 system in the vanilla Gnome environment of a new install
> 
> unmounting /dev/hugepage
> unmounting /dev/mqueue
> Disabling swaps
> Detaching loop devices
> Detaching DM devices
> Not all DM devices detached, 1 left
> Cannot finalise remaining file system and devices, trying to kill remaining
> processes
> Detaching DM devices
> Not all DM devices detached, 1 left
> Cannot finalise remaining file system and devices, giving up
> [ 361534.4545] System halted
> 
> At this point the system does appear quite dead, i.e. no response to "caps
> lock" from keyboard. The Raid array does come back clean.

You asked the system to halt and that's what it did. If you want the system to poweroff use the "poweroff" command instead. Everything appears perfectly in order here.
Comment 71 Lennart Poettering 2012-03-12 22:27:30 EDT
Hmm, it appears to me that all issues here are either closed or misunderstandings. Closing.
Comment 72 Fabrício Godoy 2012-12-11 18:16:01 EST
I'm running "dracut-018-105.git20120927.fc17.noarch" on new F17 installation and shutdown still hangs after deactivate md device before unmounting.
Comment 73 Mark Harfouche 2012-12-13 14:05:50 EST
Fedora 17 x64

mdadm v3.2.6
dracut version 18-105.git20120927.fc17

Still hangs on shutdown.....

This is a new problem (last month of so) because I was rebooting my computer like crazy during the install
Comment 74 Ian Neal 2012-12-16 18:47:26 EST
(In reply to comment #73)
> Fedora 17 x64
> 
> mdadm v3.2.6
> dracut version 18-105.git20120927.fc17
> 
> Still hangs on shutdown.....
> 
> This is a new problem (last month of so) because I was rebooting my computer
> like crazy during the install

This is probably a regression, I have logged bug 887562 on this.
Comment 75 Aram Agajanian 2012-12-17 10:23:33 EST
For similar issues in Fedora 17, please see bug #834245 .

Note You need to log in before you can comment on or make changes to this bug.