Bug 753339

Summary:	Hang during reboot - unable to umount partition
Product:	[Fedora] Fedora	Reporter:	Peter Bieringer <pb>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED DUPLICATE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	16	CC:	alfredo.maria.ferrari, bugzilla, chrisburel, dwalsh, gansalmon, harald, hvtaifwkbgefbaei, itamar, jan.public, Jes.Sorensen, jfeeney, jgetsoian, johannbg, johannbg, johannbg, jonathan, kernel-maint, lpoetter, madasafan, madhu.chinakonda, metherid, ml054, mschmidt, naveed, notting, pb, plautrba, systemd-maint, tygrys
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	713224	Environment:
Last Closed:	2012-02-14 22:30:14 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Peter Bieringer 2011-11-11 22:15:41 UTC

Hit me on an upgraded FC16 also, but here at least some more details are shown.

Last lines:

Unmounted /sys/kernel/security.
Unmounted /dev/mqueue.

-> now hanging

On poweroff, this problem won't occur (and unmount list is longer...)


+++ This bug was initially created as a clone of Bug #713224 +++

Description of problem:
My Fedora doesn't restart. Reboot sequence fall into infinite loop when trying to umount partition mounted to /home/ml054/mirror. 

Version-Release number of selected component (if applicable):
Fedora 15, I installed fedora 12, then updated to 13, 14 and now to 15.

How reproducible:
It's hard to say, I think that it's caused by error in systemd or it's caused by incorrect configuration. However when I perform: umount /home/ml054/mirror and then reboot, then restart is performed correctly. there isn't hang. 


Steps to Reproduce:
1. Just try to reboot my computer. 
2.
3.
  
Actual results:
Hang

Expected results:
Normal reboot

Additional info:

What additional informations should I provide?

--- Additional comment from chrisburel on 2011-06-18 02:51:35 EDT ---

I'm having similar issues that are related to my raid array.  When I let systemd mount the raid, it fails to unmount it, eventually timing out.  But when the shutdown sequence gets to "Unmounting file systems.", I then get "EXT4-fs (sda5): re-mounted, Opts: (null)", which presumably is causing the hang.

Here's what appears to be the relevant information from the shutdown:
Stopping sandbox     [ OK ]
[  835.418833] systemd[1]: mnt-data.mount unmounting timed out. Stopping
[  925.419449] systemd[1]: mnt-data.mount unmounting timed out. Killing
[ 1015.428862] systemd[1]: mnt-data.mount mount process still around after SIGKILL. Ignoring.
[ 1015.420520] systemd[1]: Unit mnt-data.mount entered failed state.
[ 1015.444267] systemd[1]: Shutting down.
Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
Unmounting file systems.
[ 1025.674507] EXT4-fs (sda5): re-mounted, Opts: (null)
Disabling swaps.
Detaching loop devices.
Detacning DM devices.
<computer hangs>

If I run:
> systemctl stop mnt-data.mount
> mount /dev/md126p1 /mnt/data
I get a normal shutdown, even though the drive is mounted.  It appears to be that systemd's mnt-data.mount unit is causing the hang, whereas the normal "Unmounting file systems" process (is it its own unit?) successfully unmounts my raid.  I'm new to systemd though, so I don't know what code is running to even diagnose the problem.  "find /lib/systemd -name mnt-data.mount" doesn't return anything, presumably because the unit is created dynamically based on the contents of /etc/fstab.

Reproducable: Always

--- Additional comment from michael.class on 2011-06-19 07:33:14 EDT ---

Hello,

I can report the same behaviour (system hangs during reboot; I can get around when I press CTRL-ALT-DEL at the hanging state)

It is a fully patched FC15 system as of June 18th. The hang is caused by NFS mounts in /etc/fstab

With the following fragment in /etc/fstab the issue occurs:

vdr:/medien/video   /video/vdr  nfs	bg,soft,intr	0	0	
vdr:/medien/audio   /audio	nfs	bg,soft,intr	0	0
vdr:/medien/pics    /pics	nfs	bg,soft,intr	0	0	
# for nfsv4
/video    /srv/nfsv4/video        none    bind    0       0
# end nfsv4

Without the NFS mounts everything is fine.

Cheers,
Michael

--- Additional comment from michael.class on 2011-06-19 07:46:14 EDT ---

Hello,

actually checked something more:

It is just the the following entry in /etc/fstab that causes the hang on reboot:

/video    /srv/nfsv4/video        none    bind    0       0


I do see this behaviour 100% reproducable on two different machines.

Cheers,
Michael

--- Additional comment from mschmidt on 2011-06-29 19:13:53 EDT ---

Everyone,
please attach your complete /etc/fstab to this bug.
Do you have lvm2-monitor.service active?

--- Additional comment from michael.class on 2011-06-30 04:08:24 EDT ---

As requested:

#
# /etc/fstab
# Created by anaconda on Thu Nov 27 06:19:51 2008
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or vol_id(8) for more info
#
UUID=a5424e10-fdcf-403c-bbcb-9cd12362dee4 /     ext4    defaults        1 1
UUID=fec821ba-e502-4875-8f69-56294209bceb /boot ext3    defaults        1 2
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
sysfs                   /sys                    sysfs   defaults        0 0
UUID=8659bd70-5346-4d2d-be84-123e7ee55959 swap  swap    defaults        0 0

# for nfsv4
/home    /srv/nfsv4/home	none	rw,bind	0	0	
/medien  /srv/nfsv4/medien	none	rw,bind	0	0
# end nfsv4
[michaelc@vdr ~]$ chkconfig --list | fgrep lvm

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

lvm2-monitor   	0:Aus	1:Ein	2:Ein	3:Ein	4:Ein	5:Ein	6:Aus


Cheers,
Michael

--- Additional comment from mschmidt on 2011-06-30 07:56:22 EDT ---

Michael,
in comment #2 you had some NFS mounts from "vdr:/...". Did you remove them?
/home and /medien are simply directories on the root filesystem? Not separate mounts?

> [michaelc@vdr ~]$ chkconfig --list | fgrep lvm

I'd rather see: systemctl status lvm2-monitor.service
Thanks!

--- Additional comment from ml054 on 2011-06-30 08:20:31 EDT ---

I have following configuration:


UUID=e6770ed7-bb20-4bb7-89b9-c2b40f04ddf8       /       ext3    defaults        1       1
/dev/md125p6    swap    swap    defaults        0       0
tmpfs   /dev/shm        tmpfs   defaults        0       0
devpts  /dev/pts        devpts  gid=5,mode=620  0       0
#devpts options modified by setup update to fix #515521 ugly way
sysfs   /sys    sysfs   defaults        0       0
proc    /proc   proc    defaults        0       0
#/dev/md126p1   /home/ml054/lustro      ext3    defaults        1       1
UUID=01a7fef3-8b87-4fb9-9ec8-211445d5b0b2       /home/ml054/lustro      ext3    defaults        1       1


and 


[ml054@raptor ~]$ systemctl status lvm2-monitor.service
lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
          Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor)
          Active: active (exited) since Thu, 30 Jun 2011 13:02:37 +0200; 1h 16min ago
         Process: 981 ExecStart=/etc/rc.d/init.d/lvm2-monitor start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/lvm2-monitor.service


When I perform umount /home/ml054/lustro before restart then my computer is restarted correctly. In other case it doesn't.

--- Additional comment from michael.class on 2011-06-30 08:31:54 EDT ---

Hello,


[michaelc@vdr ~]$ sudo systemctl status lvm2-monitor.service
lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
	  Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor)
	  Active: active (exited) since Thu, 23 Jun 2011 09:00:47 +0200; 1 weeks and 0 days ago
	  CGroup: name=systemd:/system/lvm2-monitor.service


About your comment on the NFS mounts. Sorry I grabed the fstab from the other machine that expieriences the same behaviour. The culprit are the "bind" mounts. 
The original machine is not online (and I am currently traveling ...)

Cheers,
Michael

--- Additional comment from mschmidt on 2011-06-30 08:38:24 EDT ---

(In reply to comment #7)
> UUID=01a7fef3-8b87-4fb9-9ec8-211445d5b0b2       /home/ml054/lustro      ext3   

Marcin,
could you describe your disk layout? I can see you have some md RAID arrays. Is the filesystem for /home/ml054/lustro also located on an md device? Are any of the md arrays monitored by mdmon?

--- Additional comment from ml054 on 2011-06-30 09:06:57 EDT ---

[root@raptor ml054]# cat /proc/mdstat 
Personalities : [raid1] [raid0] 
md125 : active raid0 sda[1] sdb[0]
      629145600 blocks super external:/md127/0 128k chunks
      
md126 : active raid1 sda[1] sdb[0]
      173807616 blocks super external:/md127/1 [2/2] [UU]
      
md127 : inactive sdb[1](S) sda[0](S)
      4514 blocks super external:imsm
       
unused devices: <none>



[root@raptor ml054]# fdisk -l
Warning: invalid flag 0x0000 of partition table 5 will be corrected by w(rite)

Disk /dev/sda: 500.1 GB, 500106780160 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976771055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x29711a93

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048   409602047   204800000    7  HPFS/NTFS/exFAT
/dev/sda2   *   409602048   886032944   238215448+  83  Linux
/dev/sda4       886032945  1258291124   186129090    f  W95 Ext'd (LBA)

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/md126: 178.0 GB, 177978998784 bytes
255 heads, 63 sectors/track, 21638 cylinders, total 347615232 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00006d0d

      Device Boot      Start         End      Blocks   Id  System
/dev/md126p1              63   347614469   173807203+  83  Linux

Disk /dev/md125: 644.2 GB, 644245094400 bytes
255 heads, 63 sectors/track, 78325 cylinders, total 1258291200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 262144 bytes
Disk identifier: 0x29711a93

      Device Boot      Start         End      Blocks   Id  System
/dev/md125p1            2048   409602047   204800000    7  HPFS/NTFS/exFAT
/dev/md125p2   *   409602048   886032944   238215448+  83  Linux
/dev/md125p4       886032945  1258291124   186129090    f  W95 Ext'd (LBA)
Partition 4 does not start on physical sector boundary.
/dev/md125p5       919608795  1258275059   169333132+   7  HPFS/NTFS/exFAT
Partition 5 does not start on physical sector boundary.
/dev/md125p6       886049073   919608794    16779861   82  Linux swap / Solaris
Partition 6 does not start on physical sector boundary.

Partition table entries are not in disk order

--- Additional comment from chrisburel on 2011-07-01 00:08:01 EDT ---

I have 1 hard drive with a bunch of different partitions on it that I boot from, the a 2 drive raid mirror.

I also see an error from lvm2-monitor.service during shutdown:
Not stopping monitoring, this is a dangerous operation.  Please use force-stop to override.
systemd[1]: lvm2-monitor.service: control process exited, code=exited status=1
systemd[1]: Unit lvm2-monitor.service entered failed state.

Here's the info on fstab, disk layout, and the lvm2-monitor.service:

> systemctl status lvm2-monitor.service
lvm2-monitor.service - LSB: Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
          Loaded: loaded (/etc/rc.d/init.d/lvm2-monitor)
          Active: active (exited) since Thu, 30 Jun 2011 20:51:46 -0700; 4min 13s ago
         Process: 828 ExecStart=/etc/rc.d/init.d/lvm2-monitor start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/lvm2-monitor.service

-------------

> cat /etc/fstab
UUID=d523dccd-94b1-4a84-bbb9-36edca9c712f /                       ext4    defaults        1 1
UUID=219e07b4-6406-48d3-b21a-380631a48c60 /boot                   ext3    defaults        1 2
UUID=2e6810f7-8cce-4392-8e46-4d59e1430302 /mnt/lfs                ext3    defaults        1 2
/dev/md126p1                              /mnt/data               ext3    defaults        1 2
UUID=89e8c5be-0c12-45d8-b094-aa68d04c7a94 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0

---------------
fdisk -l gave the following warning:
WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

so here's the output of parted -l:
> parted -l
Model: ATA WDC WD1200JB-00G (scsi)
Disk /dev/sda: 120GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
 1      32.3kB  39.9GB  39.9GB  primary   ntfs            boot
 2      39.9GB  79.9GB  39.9GB  primary   hfs+
 3      79.9GB  80.0GB  107MB   primary   ext3
 4      80.0GB  120GB   40.0GB  extended                  lba
 5      80.0GB  104GB   24.1GB  logical   ext4
 6      104GB   118GB   13.8GB  logical   ext3
 7      118GB   120GB   2147MB  logical   linux-swap(v1)


Model: ATA ST3320620AS (scsi)
Disk /dev/sdb: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      32.3kB  320GB  320GB  primary  ext3         boot


Model: ATA ST3320620AS (scsi)
Disk /dev/sdc: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      32.3kB  320GB  320GB  primary  ext3


Error: /dev/md127: unrecognised disk label                                
Warning: Error fsyncing/closing /dev/md127: Input/output error            
Retry/Ignore? i                                                           

Model: Linux Software RAID Array (md)
Disk /dev/md126: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End    Size   Type     File system  Flags
 1      32.3kB  320GB  320GB  primary  ext3         boot

--- Additional comment from mschmidt on 2011-07-01 07:15:53 EDT ---

(In reply to comment #10)
> md127 : inactive sdb[1](S) sda[0](S)
>       4514 blocks super external:imsm

I see. It's an array with external metadata (Intel Matrix Storage). This depends on the mdmon daemon. systemd may be doing something wrong in this case. I'll see if I can test it myself.

(In reply to comment #11)
> I also see an error from lvm2-monitor.service during shutdown:
> Not stopping monitoring, this is a dangerous operation.  Please use force-stop
> to override.

Chris, you're seeing bug 681582.

--- Additional comment from pb on 2011-07-03 15:22:53 EDT ---

I'm hit by the same problem, also using software RAID and having "mdmon md0" in process table. It worked fine with F14, but upgrading to F15 make system not longer usable as desktop system, have always use SYSRQ keys to get the box off :(

--- Additional comment from pb on 2011-07-03 15:24:35 EDT ---

with "software RAID" I mean also the cheap Intel Matrix Storage.

--- Additional comment from madasafan on 2011-07-05 09:05:22 EDT ---

Have been having this problem too since upgrading from F14.
System often not halting cleanly, hangs at unmounting filesystems and needs an Alt-SysRq-K etc to shutdown fully. The array was rarely cleanly shut down and constantly rebuilding on next boot.
A fresh F15 install onto a test machine with isw raid also showed same problems.

Current workaround working for me is switching to using dmraid and turning off mdraid. This required creating new initramfs 

dracut -v -f -o mdraid -a dmraid initramfs-dmraid.img

and some change to grub.conf for the new initramfs and dracut option changes

rd_DM_UUID=isw_ccchgfgdia_vol0 rd_NO_MDIMSM


I preferred mdraid controlling things, I'm happy to run tests to help to get it back to working as it did in F14.

--- Additional comment from pb on 2011-07-08 00:47:51 EDT ---

(In reply to comment #15)
 
> and some change to grub.conf for the new initramfs and dracut option changes
> 
> rd_DM_UUID=isw_ccchgfgdia_vol0 rd_NO_MDIMSM

is "isw_ccchgfgdia_vol0" a special token? Or was this only an exchange from

rd_MD_UUID=isw_ccchgfgdia_vol0

to

rd_DM_UUID=isw_ccchgfgdia_vol0


If so, than your hint causes at least on my system a damaged / filesystem. Before filesystem crashed I saw that the /dev/sdb1 was mounted for / and not a raid device...

--- Additional comment from madasafan on 2011-07-08 04:03:27 EDT ---

isw_ccchgfgdia_vol0 is the name of the array in my system.
To find yours you need to use
# dmraid -s 

It's not a great solution. If you ever rebuild the array from within bios the name will get changed and boot to fail to find it. I imagine many people using isw arrays are dual booting with Windows, to keep the name from being changed resync-ing in Windows is the safest option.

grub.conf should not have rd_NO_DM in it and no rd_MD_UUID=xxxx entries for any isw arrays.

(Mandriva bugzilla https://qa.mandriva.com/show_bug.cgi?id=61857 is the same problem too.)

--- Additional comment from tygrys on 2011-07-09 10:01:18 EDT ---

I have the same problem:

[root@kitana ~]# cat /proc/mdstat
Personalities : [raid1]
md127 : active raid1 sda[1] sdb[0]
      58612736 blocks super external:/md0/0 [2/2] [UU]

md0 : inactive sdb[1](S) sda[0](S)
      4514 blocks super external:imsm

[root@kitana sysconfig]# fdisk -l

Disk /dev/sda: 60.0 GB, 60022480896 bytes
255 heads, 63 sectors/track, 7297 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0005c7da

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048   117225471    58099712   83  Linux

Disk /dev/sdb: 60.0 GB, 60022480896 bytes
255 heads, 63 sectors/track, 7297 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0005c7da

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *        2048     1026047      512000   83  Linux
/dev/sdb2         1026048   117225471    58099712   83  Linux

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x54019fd6

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048     1026047      512000   83  Linux
/dev/sdc2         1026048   103424295    51199124   83  Linux

Disk /dev/md127: 60.0 GB, 60019441664 bytes
2 heads, 4 sectors/track, 14653184 cylinders, total 117225472 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0005c7da

      Device Boot      Start         End      Blocks   Id  System
/dev/md127p1   *        2048     1026047      512000   83  Linux
/dev/md127p2         1026048   117225471    58099712   83  Linux

[root@kitana ~]# cat /etc/fstab
UUID=e71ce8a5-baac-4289-acc6-f8076d40e34f /                       ext4    noatime,nodiratime,discard,errors=remount-ro        1 1
UUID=079475f8-33eb-47a8-873e-6aef75289779 /boot                   ext4    noatime,nodiratime,discard,errors=remount-ro        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0


Clean FC15 install...

--- Additional comment from pb on 2011-09-11 12:22:48 EDT ---

Very strange is that "poweroff" works fine, but "reboot" hangs on "Unmounting file system." - what's different in the scripts here?

--- Additional comment from fedora-admin-xmlrpc on 2011-10-20 12:28:06 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from bugzilla on 2011-11-03 19:20:48 EDT ---

Hej folks,

similar problems here (have an Intel 82801 SATA RAID Controller). But I also get them on poweroff.
I boot my system from an MD-Raid and about 70% of all shutdowns or reboots it hangs while displaying: Unmounting file systems.
If this succeed (the last 30%), all the detaching of swap, loop and DM-devices works fine and it turns off.

If I can provide some more infos, then let me know.

Thx x lot! :-)

--- Additional comment from jgetsoian on 2011-11-06 02:23:34 EST ---

Another victim here. I've just installed a 'fake (intel) raid' on an ASUS p7p55 system. Only data is on the raid - so I can manually umount the raid partitions before shutdown, and then I get a clean exit, but that is the only way. Tried to put the umounts into a shutdown script but no luck - ran into timing issues I guess.

j getsoian

Comment 1 Gerhard 2011-11-17 06:34:25 UTC

Meanwhile I also use FC16 on my workstation and noticed the problem while shutdown at two different lines:
*) Unmounted /dev/mqueue.
*) Unmounted /dev/hugepages.

Comment 2 Jes Sorensen 2011-12-30 14:40:11 UTC

I did a fresh install of F16 (multiple to test), and I see
this now - hangs on every shutdown.

Previously it didn't happen, but that was an install upgraded from F15.

If I leave it long enough I get a timeout on the console which references
systemd holding a lock.

The same issue shows itself in rawhide - this is starting to look like a
showstopper.....

Jes

Comment 3 Michal Schmidt 2012-01-02 13:31:59 UTC

(In reply to comment #2)
> If I leave it long enough I get a timeout on the console which references
> systemd holding a lock.

Could you be more specific? Is it a lockdep message, a softlockup, or what?
Can this bug be explained by this?:
https://bugzilla.redhat.com/show_bug.cgi?id=752593#c7

Comment 4 Jes Sorensen 2012-01-03 09:03:11 UTC

Sorry I didn't get a log of it, and it seems to have to sit there for a very
long time before it shows up. It was a console message with a backtrace from
the kernel. I do not remember if it was lockdep or softlockup though.

It is most likely due to what you explain in 752593. I added the comment
because I see it too on fresh installs (I didn't with my older upgraded
ones), so it is going to become an urgent issue to get the case fixed you
mention in 752593.

Comment 5 Alfredo Ferrari 2012-01-06 12:49:57 UTC

I have the same problem on two different brand new laptops (a Dell E6220 and a Dell E6520), both running a fully updated fresh install of Fedora 16 (x86_64). None of them has a raid system (even though the BIOS could support them) or mounts nfs partitions. They both run a very plain disk layout (dual boot W7/F16). If I hit "shutdown" on the graphical menu, or I issue "shutdown -h now" the laptops come down neatly. If I hit "restart" or I issue "shutdown -r now" they both start the shutdown process (the window manager disappears and the graphical boot logo appears) but then they hang forever with no relevant message in the system logs and only the power button can succeed in powering them off.

Fully reproducible (I was never able to get reboot working even once).

Rather a Dell E6500 (two years older), with F16 (x86) upgraded from F15, reboots successfully, but half of the times I hit "shutdown" it rather reboots... Finally a Dell E4310 (1 year old), with F16 (x86) upgraded from F15, reboots and shutdowns in the correct way. Rather than their model/age and x86 vs x86_64, fresh install vs upgrade, the four laptops have nearly identical disk layouts and F16 installations.

I truly appreciate the fun that systemd or more in general F15 and F16 brought to us (this one is just one of the many examples, I just spent yesterday getting vncserver working again "a la systemd"). As an old Linux user (I started with RedHat 4.2 and I use Linux almost exclusively for work on several machines and clusters, some managed by me). it is the first time I feel Windows more robust... I never thought "reboot" could become such a complex issue.

Comment 6 Michal Schmidt 2012-01-06 13:01:56 UTC

(In reply to comment #5)
> I never thought "reboot" could become such a complex issue.

There are other possible reasons besides the systemd + storage-daemons-in-userspace why machines may have trouble rebooting:  http://mjg59.livejournal.com/137313.html

Comment 7 Michal Schmidt 2012-02-14 12:08:09 UTC

The Dells will probably be able to reboot if you pass "reboot=pci" on the kernel command line. Give it a try.

Peter, are you still seeing the problem? What hardware do you have?

Comment 8 Daniel Walsh 2012-02-14 13:50:09 UTC

Could you turn off the sandbox init script and see if this fixes the problem.

Comment 9 Alfredo Ferrari 2012-02-14 15:11:39 UTC

About the suggestion to boot with reboot=pci for the DELL laptops.

Thanks for the suggestion.
It works for the E6220 (yet to try on the other one), with that kernel
parameter it reboots correctly. Assuming the other laptop will be fixed as well, how could a "normal" (and decently experienced) user guess it? And for which hardware/software configurations is this switch required (I have other DELL laptops/desktops which reboot without it)? Also wouldn't it be better if the installer takes care of checking and possibly providing the correct boot line?

Comment 10 Michal Schmidt 2012-02-14 15:31:28 UTC

(In reply to comment #9)
> how could a "normal" (and decently experienced) user guess it?

I guessed it by querying "E6220 linux reboot" in Google.

> And for which hardware/software configurations is this switch required
> (I have other DELL laptops/desktops which reboot without it)?

For the broken ones. I don't know the list.

> Also wouldn't it be better if the installer takes care of checking and
> possibly providing the correct boot line?

No. It would be better if this were fixed in the kernel.

Apparently Ubuntu have added quirks for some Dells to their "sauce" patchset. They tried to push them upstream, but they failed:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/833705/comments/13

Comment 11 Michal Schmidt 2012-02-14 16:18:57 UTC

Matthew,
would you know more about the problem that Dell E6220 and E6520 require 'reboot=pci'? Why weren't the Ubuntu quirks accepted upstream? Is there going to be a better solution?

Comment 12 Matthew Garrett 2012-02-14 17:02:24 UTC

It's a Dell firmware bug. They work fine if you disable VT-d first. If Dell aren't going to fix their firmware then we need to tear down VT-d state before reboot, not keep adding to a list of quirks.

Comment 13 Jóhann B. Guðmundsson 2012-02-14 17:37:56 UTC

Just out of curiosity has someone try to contact Dell and inform them about this issue? 

If so what was their response ( if any ) as do they accept this as a valid bug or do they ( or vendors in general ) dismiss firmware bugs since this is not running windows?

Comment 14 Peter Bieringer 2012-02-14 20:51:42 UTC

(In reply to comment #7)
> The Dells will probably be able to reboot if you pass "reboot=pci" on the
> kernel command line. Give it a try.
> 
> Peter, are you still seeing the problem? What hardware do you have?

Yes, I have still the problem, mainboard is Intel DX58SO with BIOS SOX5810J.86A.5559.2011.0405.2144. VT-d disabled at the moment, tried also "reboot=pci", but don't help,

at the moment hanging on

Unmounting /sys/kernel/debug

for me it looks like that it's depending on the Intel Software RAID.

Comment 15 Jóhann B. Guðmundsson 2012-02-14 21:13:05 UTC

Harald recently landed in dracut I do believe the final missing piece of the puzzle to proper working solution. 

http://git.kernel.org/?p=boot/dracut/dracut.git;a=commit;h=e539fa99801d0b0b316017702ee013a50d2c19d3

Not sure if the solution makes it to F17 Alpha spin for someone using intel bios raid to start testing it thou....

Comment 16 Michal Schmidt 2012-02-14 22:30:14 UTC

(In reply to comment #14)
> for me it looks like that it's depending on the Intel Software RAID.

Then it's indeed bug 752593 or something closely related. I'm closing it as duplicate. When that bug gets fixed and if you're still seeing the problem then, please file a new one.

*** This bug has been marked as a duplicate of bug 752593 ***