Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Shutdown does not complete. hangs unmounting oldroot|
|Product:||[Fedora] Fedora||Reporter:||Sergio Pascual <sergio.pasra>|
|Component:||mdadm||Assignee:||Jes Sorensen <Jes.Sorensen>|
|Status:||CLOSED ERRATA||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||17||CC:||agajania, agk, beejay.uk, bugzillaRedHat.bsquare, dan, dledford, dracut-maint, edunph, gansalmon, harald, iannbugzilla, itamar, Jes.Sorensen, jmcasanova, jmjt, jonathan, kernel-maint, loxxxa, lukasz.dorau, maciej.patelczyk, madhu.chinakonda, mark.harfouche, mschmidt, olivier, redhat, skarllot, t|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-03-20 17:38:27 EDT||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Sergio Pascual 2012-06-21 06:11:40 EDT
Description of problem: My system hangs when powering off. Log messages are roughly as follows: Sending SIGTERM Sending SIGKILL Unmounting file systems /sys/kernel/config /sys/kernel/debug /dev/mqueue (more mounts follow) Disabling swaps Disabling loop devices Deataching DM devices one left Cannot finalize, trying to kill Deataching DM devices one left Cannot finalize, giving up Successfully changed into root pivot Umounted /oldroot/proc Umounted /oldroot/dev/pts Umounted /oldroot/run (more mounts here) Umounted /oldroot/sys And then hangs Version-Release number of selected component (if applicable): kernel-3.4.2-4.fc17.x86_64 How reproducible: Always Additional info: The system does reboot cleanly with kernels kernel-3.3.7-1.fc17.x86_64 and before The system does not reboot with kernel-3.4.2 and kernel-3.4.0 My system has Intel BIOS Raid. It suffered bug #752593 (shutdown does not complete with Intel BIOS RAID) May this bug be the same under disguise?
Comment 1 beejay.uk 2012-06-22 01:30:50 EDT
Exactly the same problem here with an IMSM Raid 10 array.
Comment 2 Sergio Pascual 2012-06-22 09:49:33 EDT
kernel-3.4.3-1.fc17.x86_64 does not work either
Comment 3 Sergio Pascual 2012-07-03 09:41:28 EDT
Kernel 3.4.4-3.fc17.x86_64 does not work
Comment 4 dan 2012-07-07 19:23:26 EDT
Confirm same issue as Mr. Pascual. Kernel 3.4.4-3.fc17.x86_64 System also is Intel BIOS Raid.
Comment 5 Taylor Gunnoe 2012-07-18 10:02:27 EDT
I can also confirm the issue using Intel BIOS raid 10
Comment 6 bender 2012-07-25 03:55:14 EDT
Confirming issue with 3.4.6-2.fc17.x86_64 and Intel BIOS raid 1
Comment 7 Thomas Clark 2012-08-10 15:07:47 EDT
I have exactly the same problem, using Intel BIOS raid and kernel 3.4.7-1.fc16.i686.PAE. Like Sergio Pascual (comment 0), I have wondered if this might be a disguised return of bug 752593, which I also experienced.
Comment 8 Jari Turkia 2012-08-19 15:47:54 EDT
(In reply to comment #0) > Description of problem: ... > May this bug be the same under disguise? This issue happens in F17 with kernel 3.5.2-1.fc17.x86_64. My system has ICH10R-chip on motherboard and there is RAID1 volume configured. # mdadm -D /dev/md127 /dev/md127: Version : imsm Raid Level : container Total Devices : 2 Working Devices : 2 Member Arrays : /dev/md/Mirror0 I typically shut down the system with Alt-SysRq-o, since it is well stuck there. A fix would be appreciated.
Comment 9 Sergio Pascual 2012-08-26 10:21:22 EDT
Raising severity to High, perhaps we can get some attention from kernel maintainers
Comment 10 Jes Sorensen 2012-09-10 05:12:46 EDT
What mdadm version is installed on the systems where the reboots are failing? When did the problem occur? IMSM RAID has a close tie between kernel _and_ mdadm. Jes
Comment 11 Sergio Pascual 2012-09-10 05:51:47 EDT
I have mdadm-3.2.5-4.fc17.x86_64. This started happening with kernels >= 3.4, just a few kernel updates after Fedora 17 was released. I didn't recall the version of mdadm I had when it worked
Comment 12 Jes Sorensen 2012-09-10 07:33:16 EDT
Hmmm, it really shouldn't be this, but did you update via yum or manually? If so, did you remember to run 'dracut -f' afterwards? Is the IMSM array your root device? Any chance you can try and downgrade the kernels to a < 3.4 F17 one and see if it still shows the problem? Cheers, Jes
Comment 13 Sergio Pascual 2012-09-10 09:51:14 EDT
I update via yum and the ISM is my root device. I have installed kernel-3.3.4-5.fc17.x86_64, in the next comment I'll tell you if it worked or not.
Comment 14 Jes Sorensen 2012-09-10 09:55:25 EDT
Ok, after upgrading via yum, please try to run 'dracut -f' before rebooting into the new system. One the system is booted, could you try running 'ps aux | grep dmon' 'ps aux | grep mdadm' Thanks, Jes
Comment 15 Sergio Pascual 2012-09-10 10:04:10 EDT
Too late for dracut -f, I have rebooted already. Was it important? The system *does* reboot with kernel-3.3.4-5.fc17.x86_64 These are the outputs of the commands $ ps aux | grep dmon root 405 0.0 0.0 15004 10904 ? SLsl 15:57 0:00 @dmon --offroot md127 root 647 0.0 0.0 14972 10872 ? SLsl 15:57 0:00 /sbin/mdmon --takeover md127 $ ps aux | grep mdadm root 1173 0.0 0.0 4908 492 ? Ss 15:57 0:00 /sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid Perhaps you need this also: $ cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda sdb 976759808 blocks super external:/md127/0 [2/2] [UU] md127 : inactive sdb(S) sda(S) 5288 blocks super external:imsm unused devices: <none>
Comment 16 Jes Sorensen 2012-09-11 03:46:26 EDT
Sergio, Thanks, you have the @dmon there, so that means the initramfs is launching mdmon correctly. Given that this changes based on the kernel version, it sounds like we need to look for the problem there or in the init scripts, rather than mdadm. Cheers, Jes
Comment 17 Aram Agajanian 2012-09-12 11:32:09 EDT
*** Bug 853467 has been marked as a duplicate of this bug. ***
Comment 18 Olivier 2012-09-23 08:04:20 EDT
Same problem here. power-off is okay with F17 fresh install (Kernel 3.3.4). But when updated to latest version via 'Software Update' (Kernel 3.5.4): power-off or restart hang at 'Umounted /oldroot/sys' With the updated system, booting with the old 3.3.4 kernel solves the problem. Gigabyte Z68P-DS3 rev1.0 F9 - Intel i5 - RAID 1
Comment 19 Aram Agajanian 2012-10-14 22:47:45 EDT
This problem seems to be fixed in kernel-3.6.1-1.fc17.x86_64.
Comment 20 Olivier 2012-10-15 03:58:09 EDT
Interesting - I've been using the 3.3.4 kernel since the occurrence of this bug. When I try to switch to 3.6.1-1, I have the following error at the 'password' prompt (the RAID1 drives are encrypted): udevd  inotify_add_watch /dev/sda1, 10 : failed, no such file or directory This looks like a new bug, preventing me from confirming Aram's comment 19.
Comment 21 Sergio Pascual 2012-10-15 04:47:03 EDT
In my case, it works partially with 3.6.1-1. I can reboot the system, but I cannot power off the system. In power off the system hangs with the fedora logo. In doesn't respond even to sys req.
Comment 22 Jes Sorensen 2012-10-15 05:03:18 EDT
(In reply to comment #20) > Interesting - I've been using the 3.3.4 kernel since the occurrence of this > bug. When I try to switch to 3.6.1-1, I have the following error at the > 'password' prompt (the RAID1 drives are encrypted): > > udevd  inotify_add_watch /dev/sda1, 10 : failed, no such file or > directory > > This looks like a new bug, preventing me from confirming Aram's comment 19. This is a different bug, it sounds like something goes wrong with the boot scripts. Jes
Comment 23 Jes Sorensen 2012-10-15 05:04:24 EDT
(In reply to comment #21) > In my case, it works partially with 3.6.1-1. I can reboot the system, but I > cannot power off the system. > > In power off the system hangs with the fedora logo. In doesn't respond even > to sys req. Please try to remove 'rhgb quiet' from the kernel boot command line and see where it hangs.
Comment 24 Olivier 2012-10-15 05:33:11 EDT
(In reply to comment #22) > (In reply to comment #20) > > Interesting - I've been using the 3.3.4 kernel since the occurrence of this > > bug. When I try to switch to 3.6.1-1, I have the following error at the > > 'password' prompt (the RAID1 drives are encrypted): > > > > udevd  inotify_add_watch /dev/sda1, 10 : failed, no such file or > > directory > > > > This looks like a new bug, preventing me from confirming Aram's comment 19. > > This is a different bug, it sounds like something goes wrong with the boot > scripts. > > Jes Thanks Jes - FYI, I've created Bug #866395
Comment 25 Sergio Pascual 2012-10-18 04:58:18 EDT
I have tested several times and it seems to work correctly. My system does reboot and poweroff. I don't know why it didn't poweroff on Monday
Comment 26 dan 2012-11-10 07:26:36 EST
Issue no longer occurs under kernel 3.6.6-1.fc17.x86_64.
Comment 27 Sergio Pascual 2012-12-03 04:49:03 EST
I'm suffering this problem again. My system does not shutdown nor reboot with kernel-3.6.8-2.fc17.x86_64 The reboot process hangs after Sending SIGTERM Sending SIGKILL Unmounting file systems /sys/kernel/config /sys/kernel/debug /dev/mqueue /dev/hugepages
Comment 28 Jes Sorensen 2012-12-03 05:02:54 EST
When providing updated version numbers, please make sure to include: mdadm dracut kernel Thanks, Jes
Comment 29 Sergio Pascual 2012-12-03 05:24:11 EST
Here it is mdadm-3.2.6-1.fc17.x86_64 dracut-018-105.git20120927.fc17.noarch kernel-3.6.8-2.fc17.x86_64
Comment 30 Chema Casanova 2012-12-03 12:46:58 EST
I have the same problem in F18 Beta. mdadm-3.2.6-1.fc18.x86_64 dracut-024-10.git20121121.fc18.x86_64 kernel-3.6.7-5.fc18.x86_64 It was impossible for me to reboot after an update with fedup. Finally i decided to do a clean install from F18 DVD but the problem remained so i need to do always hard reset because neither reboot or poweroff go further unmouting devices.
Comment 31 Jes Sorensen 2012-12-03 12:55:31 EST
Chema, Any chance you can provide us the output of /proc/mdstat as well as 'dmesg' ? Thanks, Jes
Comment 32 Thomas Clark 2012-12-03 12:59:23 EST
Just resurfaced again for me also, with the latest kernel. Currently running: mdadm-3.2.6-1-fc16.i686 dracut-018-60.git20120927.fc16.noarch kernel-3.6.7-4.fc16.i686.PAE
Comment 34 Chema Casanova 2012-12-03 13:22:11 EST
Jes, I've just added the dmesg and here it is the outpot of /proc/mdstat Personalities : [raid1] md126 : active raid1 sdb sdc 1953511424 blocks super external:/md127/0 [2/2] [UU] md127 : inactive sdc(S) sdb(S) 6056 blocks super external:imsm unused devices: <none> I have mounted in the raid the next volumes and the swap. /dev/md126p2 on / type ext4 (rw,relatime,data=ordered) /dev/md126p1 on /boot type ext4 (rw,relatime,data=ordered) /dev/md126p3 on /home type ext4 (rw,relatime,data=ordered) Chema
Comment 35 Jes Sorensen 2012-12-04 10:57:24 EST
Interesting, so basically the problem seems to be gone in kernels between 3.6.1 and 3.6.6 but back again with kernel 3.6.7+ Harald, any idea what to look for next?
Comment 36 Sergio Pascual 2012-12-04 11:51:51 EST
I don't know if it's related, but a few weeks ago my md126 device changed its name to md125. In other aspects, my raid configuration is equivalent to Chema's (Intel BIOS Raid)
Comment 37 Jes Sorensen 2012-12-04 11:55:52 EST
Sergio, I don't think it is related. In principle the names should be assigned automatically, unless you have explicitly assigned a name in your /etc/mdadm.conf Cheers, Jes
Comment 38 Sergio Pascual 2012-12-04 12:02:08 EST
I have downgraded mdadm to mdadm-3.2.3-6.fc17.x86_64 (just to be sure) and it hangs, but later. The reboot process goes until the "reboot" word appear in the console and then hangs.
Comment 39 Jes Sorensen 2012-12-05 04:07:35 EST
3.2.3-6 should still have the --offroot support, as this was added in 3.2.3-4, so it shouldn't be that causing it. 3.2.3-3 and older are expected to hang, 3.2.3-4 and later shouldn't :(
Comment 40 Taylor Gunnoe 2012-12-06 13:49:37 EST
I am also experiencing this problem again, but not I can't even boot. kernel 3.6.7+, freezes on "Unmounting file systems" on boot.
Comment 41 Sergio Pascual 2012-12-11 10:51:34 EST
The past week I installed a new F17 x86_64 system with Intel BIOS RAID. I used a Live CD. After the install, the system was able to reboot (was kernel-3.3.4-5). The I updated the system. After the update, the system could not reboot anymore, even with the 3.3.4 kernel. So the kernel is not (the only?) culprit.
Comment 42 Jes Sorensen 2012-12-12 01:57:58 EST
Sergio, If possible, could you try and downgrade dracut to the version from the install as well? Please run 'dracut -f' after upgrading/downgrading it. Cheers, Jes
Comment 43 Sergio Pascual 2012-12-12 05:08:43 EST
I have downgraded to dracut-018-35.git20120510.fc17.noarch and I have done dracut -f afterwards. The system freezes on "Unmounting file systems". Perhaps is a systemd bug? How safe is to downgrade systemd?
Comment 44 Michal Schmidt 2012-12-12 09:38:07 EST
Downgrading systemd is reasonably safe.
Comment 45 Michal Schmidt 2012-12-12 10:05:56 EST
If you add "rd.break=pre-shutdown" to the kernel command line, dracut should give you a shell before it runs its unmounting loop. In the shell it would be helpful to explore what filesystems are still mounted and what processes are running.
Comment 46 Sergio Pascual 2012-12-13 13:38:40 EST
It's weird, but with "rd.break=pre-shutdown" my kernel stops on boot, before printing "Fedora 17...." in blue in the text console. On shutdown nothing happens (apart on hanging on Unmounting file systems, of course) Kernel 3.6.9-2.fc17.x86_64
Comment 47 Jes Sorensen 2012-12-18 06:28:22 EST
Running tests with Fedora 18 TC3, I am seeing the same problem there after fresh installs. Basically the install goes fine, but on reboot it goes: Stopping Software RAID monitor takeover Unmounting / <hang> Looks like the latest systemd or dracut is doing something wrong or ignoring the --offroot argument. Jes
Comment 48 Harald Hoyer 2012-12-20 08:32:51 EST
If stopping "mdmon" is the culprit, then it should not have been doing the takeover in the real root in the first place, but leave the mdmon from the initramfs running (which should have set "@" as the first char in argv).
Comment 49 Diego Rossi Mafioletti 2012-12-20 10:14:21 EST
I do not know if it's relevant at the current subject, but... In my Fedora 17 x86_64, the reboot process occurs after a period of approximately 10 min. after "Unmounting file systems." message, but the shutdown hangs infinitely. (some watchdog?) It is a clean installation of Fedora 17 x86_64 from DVD, plus "yum update -y". dracut-018-105.git20120927.fc17.noarch mdadm-3.2.6-1.fc17.x86_64 kernel-6.3.10-2.fc17.x86_64 Personalities: [raid1] md126: active raid1 sda  sdb  488383488 blocks super external :/ md127 / 0 [2/2] [UU] md127: inactive sdb  (S) sda  (S) 5928 blocks super external: IMSM unused devices: <none> Diego
Comment 50 Diego Rossi Mafioletti 2012-12-27 05:56:07 EST
Hum... after a "yum downgrade mdadm" plus "dracut -f", my system returned to reboot and turn off normally! Apparently, the culprit was the mdadm. Current versions: mdadm-3.2.3-6.fc17.x86_64 dracut-018-105.git20120927.fc17.noarch kernel-3.6.10-2.fc17.x86_64
Comment 51 Fabrício Godoy 2013-01-01 20:51:18 EST
"yum downgrade mdadm" does the trick! mdadm-3.2.3-6.fc17.x86_64 dracut-018-105.git20120927.fc17.noarch kernel-3.6.10-2.fc17.x86_64
Comment 52 SpuyMore 2013-01-02 16:18:37 EST
Confirming the same problem in FC18 with any upgrades up to: dracut-024-16.git20121220.fc18.x86_64 kernel-3.6.11-3.fc18.x86_64 mdadm-3.2.6-7.fc18.x86_64 I think my bug report #879327 can be marked as a duplicate of this.
Comment 53 Jes Sorensen 2013-01-03 08:29:41 EST
*** Bug 887562 has been marked as a duplicate of this bug. ***
Comment 54 Harald Hoyer 2013-01-03 08:51:16 EST
(In reply to comment #51) > "yum downgrade mdadm" does the trick! reassigning back to mdadm
Comment 55 Fedora Update System 2013-01-04 12:03:35 EST
mdadm-3.2.6-8.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/mdadm-3.2.6-8.fc17
Comment 56 Jes Sorensen 2013-01-04 12:17:37 EST
Ok I think I finally understood the problem here. Basically when we rolled out the --offroot support there was a bug in the mdadm upstream code which meant that mdmon processes launched with --offroot would not be taken over in case of a follow-on 'mdmon --takeover' launch from the mdmonitor-takeover.service. This was fixed between 3.2.5 and 3.2.6 upstream, which is why the problem is showing up now. Basically mdmonitor-takeover.service is now obsolete since we roll back to the initrd during shutdown, and we rely on the mdmon launched from there to handle the metadata writeout before rebooting. I have pushed mdadm-3.2.6-8 into updates-testing which should fix this problem for Fedora 17+ Please give it a spin and report back. Thanks, Jes
Comment 57 SpuyMore 2013-01-04 15:51:52 EST
Hi Jes, I have updated but it didn't work. System now does not boot completely. It hangs right after "started initialized storage subsystems (RAID, LVM, etc.)" and started monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling". Help! How can I boot the system now? I mast add that I also installed the latest release version of dracut (yesterday's release). After upgrading mdadm and dracut I ran dracut -f. I suspect the mdmon is not running. Thanks, Dennis
Comment 58 SpuyMore 2013-01-04 15:57:08 EST
Sorry, disregard the above comment. It was meant for bug #879327
Comment 59 Fedora Update System 2013-01-05 01:57:20 EST
Package mdadm-3.2.6-8.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing mdadm-3.2.6-8.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-0275/mdadm-3.2.6-8.fc17 then log in and leave karma (feedback).
Comment 60 Thomas Clark 2013-01-05 10:42:18 EST
Any chance this fix could make it into Fedora 16?
Comment 61 dan 2013-01-05 12:01:51 EST
mdadm-3.2.6-8.fc17 installed. I can now reboot successfully.
Comment 62 Jes Sorensen 2013-01-07 10:50:34 EST
(In reply to comment #60) > Any chance this fix could make it into Fedora 16? Thomas, I have a build for F16 which also fixes the dangling symlink to mdmonitor-takeover.service which reappeared in 3.2.6-8 I don't have a Fedora 16 system ready for testing so if you want to test this build and report back, that would be useful. Just be sure to install it, run dracut -f, and try (the first reboot after the dracut run will still hang). http://alt.fedoraproject.org/pub/alt/stage/18-RC1/Fedora/x86_64/ Note this is at your risk, but I hope it works. Jes
Comment 63 Sergio Pascual 2013-01-09 06:46:08 EST
We are near F18 release and I wonder if this problem will appear in F18 install media
Comment 64 Jes Sorensen 2013-01-09 06:59:25 EST
Sergio, I think we're ok for the installation itself, but post installation mdadm will need to be updates to 3.2.6-11 at least. Note there is still a problem if a user has two BIOS raid arrays, see BZ#879327 Jes
Comment 65 Thomas Clark 2013-01-09 07:57:34 EST
Thanks, Jes! I would be happy to test with Fedora 16. However, I can't find anything labeled for Fedora 16 in that link. Can you point me in the right direction? I am running Fedora 16 32-bit.
Comment 66 Sergio Pascual 2013-01-09 08:00:26 EST
(In reply to comment #64) > Sergio, > > I think we're ok for the installation itself, but post installation mdadm > will need to be updates to 3.2.6-11 at least. > If I understand correctly, this means that a host installing F18 will experience a hang when rebooting after installing the OS and bootloader? > Note there is still a problem if a user has two BIOS raid arrays, see > BZ#879327 > > Jes
Comment 67 Jes Sorensen 2013-01-24 05:24:41 EST
Sergio, Yes indeed it will - F18 installs will need the fixes from here to be able to reboot correctly: https://admin.fedoraproject.org/updates/dracut-024-23.git20130118.fc18,mdadm-3.2.6-12.fc18 Jes
Comment 68 Ian Neal 2013-02-26 19:06:08 EST
Any news on when this going to land in F17?
Comment 69 Olivier 2013-03-18 17:43:22 EDT
Any update on these fixes in F17? It's been 'ON_QA' for a while now. Unforseen troubles? Thanks.
Comment 70 Mark Harfouche 2013-03-18 18:35:44 EDT
It has been ok for me. F17x64.
Comment 71 Fedora Update System 2013-03-20 17:38:30 EDT
mdadm-3.2.6-8.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
Comment 72 Olivier 2013-03-21 11:36:00 EDT
oldroot now unmounts okay. but then dracut say: 'waiting mraid devices to be clean' and hangs. I suspect this is because my raid array is in 'verify' mode, since I had to manually reset & power off the PC many times.
Comment 73 Olivier 2013-03-22 04:24:19 EDT
everything now fine when the raid array is in a normal state. Thanks.