918901 – fedora-18 panic on reboot/poweroff

Bug 918901 - fedora-18 panic on reboot/poweroff

Summary: fedora-18 panic on reboot/poweroff

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dracut
Sub Component:
Version:	18
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	dracut-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-07 08:04 UTC by Bill Perkins
Modified:	2013-04-19 13:09 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-04-19 13:09:50 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
This is a photograph of the final screen. (3.33 MB, image/jpeg) 2013-03-07 08:04 UTC, Bill Perkins	no flags	Details
Test containg uname, lsmod, lspci, dmesg and rpm -qa (208.21 KB, application/octet-stream) 2013-03-07 16:29 UTC, Bill Perkins	no flags	Details
Last dmesg file created 9 April 2013 during an unseccessful reboot (122.35 KB, text/plain) 2013-04-09 19:16 UTC, Bill Perkins	no flags	Details
Last screen of a failed reboot 9 April 2013 (3.03 MB, image/jpeg) 2013-04-09 19:20 UTC, Bill Perkins	no flags	Details
First dracut reboot test screen (2.69 MB, image/jpeg) 2013-04-10 20:07 UTC, Bill Perkins	no flags	Details
First dracut reboot dmesg file (83.08 KB, text/plain) 2013-04-10 20:10 UTC, Bill Perkins	no flags	Details
Second dracut reboot test screen (2.67 MB, image/jpeg) 2013-04-10 20:11 UTC, Bill Perkins	no flags	Details
Second dracut reboot dmesg file (85.69 KB, text/plain) 2013-04-10 20:13 UTC, Bill Perkins	no flags	Details
Third dracut reboot dmesg file (82.34 KB, text/plain) 2013-04-10 20:14 UTC, Bill Perkins	no flags	Details
Fourth dracut reboot test screen (2.59 MB, image/jpeg) 2013-04-10 20:15 UTC, Bill Perkins	no flags	Details
Fourth dracut reboot dmesg file (81.59 KB, text/plain) 2013-04-10 20:16 UTC, Bill Perkins	no flags	Details
/etc/fstab file from the test system Pooh. (2.71 KB, text/plain) 2013-04-10 20:42 UTC, Bill Perkins	no flags	Details
Dmesg file for panic testing of file systems (74.61 KB, text/plain) 2013-04-11 00:19 UTC, Bill Perkins	no flags	Details
Screenshot of shutdown panic for Comment 32 (2.80 MB, image/jpeg) 2013-04-16 20:42 UTC, Bill Perkins	no flags	Details
Dmesg debug text for Comment 32 (252.42 KB, text/plain) 2013-04-16 20:44 UTC, Bill Perkins	no flags	Details
Screenshot of shutdown panic after selinux disabled (3.20 MB, image/jpeg) 2013-04-17 16:15 UTC, Bill Perkins	no flags	Details
shutdown-log.txt file after selinux disabled (251.49 KB, text/plain) 2013-04-17 16:17 UTC, Bill Perkins	no flags	Details
Screenshot of shutdown panic for Comment 46 (3.00 MB, image/jpeg) 2013-04-17 18:26 UTC, Bill Perkins	no flags	Details
shutdown-log.txt for Comment 46 (251.54 KB, text/plain) 2013-04-17 18:28 UTC, Bill Perkins	no flags	Details
ls -lah /dev/disk/by-id for Comment 52. Included for completeness. (6.41 KB, text/plain) 2013-04-17 22:42 UTC, Bill Perkins	no flags	Details
shutdown-log.txt written after UUID mods to /etc/fstab & reboot for Comment 52 (251.94 KB, text/plain) 2013-04-17 22:44 UTC, Bill Perkins	no flags	Details
systemd-dump.txt (593.02 KB, text/plain) 2013-04-18 16:19 UTC, Bill Perkins	no flags	Details
View All

Description Bill Perkins 2013-03-07 08:04:37 UTC

Created attachment 706413 [details]
This is a photograph of the final screen.

Description of problem: System freezes during poweroff or reboot.


Version-Release number of selected component (if applicable): 197-1


How reproducible: Boot system; login to X using fvwm as the window manager; run a few applications; terminate those applications; type reboot or poweroff in a xterm or use ctrl-alt-delete; system will to shutdown and freeze with a panic displayed.


Steps to Reproduce:
1.
2.
3.
  
Actual results:
System halts/freezes before shutdown is completed; caps lock and scroll lock indicators will rapidly flash; keyboard will be unresponsive and the following is displayed every time as the final message:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

Pid: 1, comm: shutdown Not tainted 3.8.1-201.fc18.x86_64 #1
Call Trace:
 [<ffffffff81645b9e>] panic+0xc1/0x1d0
 [<ffffffff810647a8>] do_exit_0x918/0x9e0
 [<ffffffff810648ff>] do_group_exit+0x3f/0xa0
 [<ffffffff81064977>] sys_exit_group_0x17/0x20
 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b
drm_kms_helper: panic occurred, switching back to text console


Expected results:
System reboots or powers off as expected.

Additional info: The system will reboot/poweroff everytime if I type in "sync && reboot" or "sync && poweroff".  The system freezes too early to get the dmesg output.  I have tried the vt and serial port debug tricks and have received no joy.  On an alternate system with a different hardware configuration the same problem does occur when running the same version of FVWM as the window manager.  The problem on that system does not appear to occur when running Mate or XFCE as the desktops in place of FVWM.  I have a third system running Fedora-17 that does not this and and other problems at the present time and is running a desktop configuration with FVWM that is identical to this system running Fedora-18.  The Radeon and nVidia video cards I am using do not have the horse power to run the Gnome 3 desktop.  So, I have not tested this situation with Gnome 3.

Comment 1 Jóhann B. Guðmundsson 2013-03-07 10:23:12 UTC

Is this fresh install or upgrade? 

What HW is this?

( could be a race of some sort see similar bug 831634 )

Comment 2 Lennart Poettering 2013-03-07 14:46:04 UTC

Is this reproducible if you disable selinux?

Comment 3 Bill Perkins 2013-03-07 16:29:25 UTC

Created attachment 706714 [details]
Test containg uname, lsmod, lspci, dmesg and rpm -qa

> What HW is this?

This is a four core x86_64 with eight gig of memory.  Additional information is in the text attachment.

Comment 4 Bill Perkins 2013-03-07 18:14:14 UTC

> Is this reproducible if you disable selinux?

The problem has been tested by including enforcing=0 in the kernel command boot line to disable selinux.  The conditions of the problem does not change if you disable selinux.

> Is this fresh install or upgrade? 

This system was upgraded from Fedora 17 to Fedora 18 using the Yum Update Process.

> ( could be a race of some sort see similar bug 831634 )

I did look at this bug and do not think that it is similar.  The software RAID configuration I am using has been quite stable and has been in use for greater than five years.  Also, if I run X, but do not log into X, and instead run the system from the command line before the reboot/poweroff, regardless of how long the system remains up, the system will reboot/poweroff normally with no apparent problem during the shutdown.  

I have tried the "dracut -f" to rebuild initramfs image; rebooted several time with no change with the problem.

As to whether this might be a race condition or not, that might be a possibility.  I use FVWM as the window manager during the X session, XDM to login into X and the Mate (Gnome 2 before that) panel application during the session.  If I log into X and run a few applications (it does not seem to matter which applications are run), and than reboot, the system MAY reboot without any apparent problems.  If I repeat the process and run a few more applications (again, it matters not what applications) and, maybe wait awhile before the reboot, the system will generally always panic when you do finally reboot/poweroff.

I have build a new, but with somewhat different hardware (it has fewer processors and less memory), system using fedora 18.  This system has exhibited most of the same symptoms as the system being used in this report.  It runs a simpler desktop using FVWM and uses Lightdm, instead of XDM, to login into X.  (The .Xclients script has to be run manually.  I have not taken the time yet to figure out how to fix that issue.)  This configuration can run any of the installed desktops: FVWM, Mate and XFCE.  Gnome is not installed or being used.

What I have found with this second system (the TEST system) is that the reboot/poweroff does not seem to ever fail if I run the Mate or XFCE desktops.  Only, sometimes, if I only run FVWM as the window manager does the reboot/poweroff problem occur.  I am still testing on this system and have just now performed a reboot.  Results: a panic screen with the same information as given above in the "Actual Results" and this is the TEST system, not the system originally used for this report.

Comment 5 Lennart Poettering 2013-04-09 11:16:20 UTC

Hmm, could you drop in a script in /usr/lib/systemd/system-shutdown/ and make it executable, with the following contents:

#!/bin/bash
echo "Hello World"

and then shutdown and see if you see "Hello World" on screen? If you do this looks like a dracut problem.

Comment 6 Bill Perkins 2013-04-09 19:13:45 UTC

Lennart Poettering <lpoetter>
> Hmm, could you drop in a script in /usr/lib/systemd/system-shutdown/ and make
it executable, with the following contents:
>
> #!/bin/bash
> echo "Hello World"
>
> and then shutdown and see if you see "Hello World" on screen? If you do this
looks like a dracut problem.

Okay, I have done as you asked.  This is what I have in /usr/lib/systemd/system-shutdown/: 

# ls -l /usr/lib/systemd/system-shutdown/
total 8
-rwxr-xr-x. 1 root root 80 Mar  6 23:58 debug.sh
-rwxr-xr-x. 1 root root 31 Apr  9 13:48 hello-world.sh

I then rebooted the work station under test, and the system hung as before and as it has done so since I installed Fedora-18.  Now it might be getting interesting...

The hello-world.sh script produced no output that I could see on the screen.  However, the debug.sh script did produce a dmesg log file in root:

# cat /usr/lib/systemd/system-shutdown/debug.sh 
#!/bin/sh
mount -o remount,rw /
dmesg > /shutdown-log.txt
mount -o remount,ro /

I had created the debug.sh script sometime soon after I originally created this report, but did not see any /shutdown-log.txt being created the several times I tried to reboot the system after doing so.  Well, lo and behold, I went and check to see if this file was now being created, and it was!  I checked my backups for this system to see if the file had previously been created during shutdown, and it had:

# ls -l /backup/pooh/?/Root/shutdown-log.txt
-rw-r--r--. 1 root root  87488 Apr  9 01:29 /backup/pooh/0/Root/shutdown-log.txt
-rw-r--r--. 1 root root 131855 Apr  8 03:31 /backup/pooh/1/Root/shutdown-log.txt
-rw-r--r--. 1 root root 112878 Apr  7 00:35 /backup/pooh/2/Root/shutdown-log.txt
-rw-r--r--. 1 root root 124768 Apr  6 03:01 /backup/pooh/3/Root/shutdown-log.txt
-rw-r--r--. 1 root root 100181 Apr  5 03:52 /backup/pooh/4/Root/shutdown-log.txt
-rw-r--r--. 1 root root  84481 Apr  4 02:36 /backup/pooh/5/Root/shutdown-log.txt
-rw-r--r--. 1 root root  89152 Apr  3 02:22 /backup/pooh/6/Root/shutdown-log.txt

This system is up to date with all installed Fedora 18 packages, so I do not know why I did not see the /shutdown-log.txt being created during earlier shutdowns.  I have attached the latest picture of the last screen during shutdown and the last /shutdown-log.txt file as well.  I hope this helps.

Bill

Comment 7 Bill Perkins 2013-04-09 19:16:14 UTC

Created attachment 733335 [details]
Last dmesg file created 9 April 2013 during an unseccessful reboot

Comment 8 Bill Perkins 2013-04-09 19:20:11 UTC

Created attachment 733336 [details]
Last screen of a failed reboot 9 April 2013

Comment 9 Lennart Poettering 2013-04-10 12:56:46 UTC

Hmm, indeed to those drop-ins are actually not executed with a console initialized, so you didn't get any output. If the dmesg file did get generated however this is an indication that the hang happens later than the execution of the drop-in scripts, which points to dracut hence. Reassigning.

Comment 10 Harald Hoyer 2013-04-10 15:18:47 UTC

Does the file "/run/initramfs/.need_shutdown" exist?

If no, then it's a systemd bug.

If yes:

Please do:

# mkdir -p /run/initramfs/etc/cmdline.d
# echo "rd.debug rd.shell rd.break=pre-shutdown" > /run/initramfs/etc/cmdline.d/debug.conf

and see, if you dropped into a dracut shell

you can exit, and see it progressing

Might want to take a screenshot, when it hangs

Comment 11 Bill Perkins 2013-04-10 20:04:19 UTC

> Does the file "/run/initramfs/.need_shutdown" exist?

> If no, then it's a systemd bug.

> If yes:

> Please do:

> # mkdir -p /run/initramfs/etc/cmdline.d
> # echo "rd.debug rd.shell rd.break=pre-shutdown" > /run/initramfs/etc/cmdline.d/debug.conf

> and see, if you dropped into a dracut shell

> you can exit, and see it progressing

> Might want to take a screenshot, when it hangs

the "/run/initramfs/.need_shutdown" file does indeed exist.  So, I ran the follow list of commands four times with a reboot each time as instructed and received some variable results:

[root@pooh:pts/8] Wed Apr 10 14:59:59 EDT (13100) ~
# ls -l /run/initramfs/.need_shutdown
-rw-r--r--. 1 root root 0 Apr 10 14:56 /run/initramfs/.need_shutdown
[root@pooh:pts/8] Wed Apr 10 15:00:05 EDT (13100) ~
# mkdir -vp /run/initramfs/etc/cmdline.d
mkdir: created directory '/run/initramfs/etc'
mkdir: created directory '/run/initramfs/etc/cmdline.d'
[root@pooh:pts/8] Wed Apr 10 15:00:50 EDT (13100) ~
# echo "rd.debug rd.shell rd.break=pre-shutdown" > /run/initramfs/etc/cmdline.d/debug.conf
[root@pooh:pts/8] Wed Apr 10 15:01:08 EDT (13100) ~
# cat /run/initramfs/etc/cmdline.d/debug.conf
rd.debug rd.shell rd.break=pre-shutdown
[root@pooh:pts/8] Wed Apr 10 15:01:17 EDT (13100) ~

The first reboot hit a system hang (imgp3184.jpg).  No dracut shell and no panic information.  I waited ten minutes before I did a reset on the system.
(shutdown-log.txt-130410-133138)

The second reboot produced the usual panic with no dracut shell interaction.
(imgp3187.jpg, shutdown-log.txt-130410-145651)

The third reboot did have a dracut shell, which I exited from.  The system continued through a normal reboot process with no panics.  The problem as I am seeing it appears to have a time or resource element.  If you leave the system running for awhile running X and the FVWM window manager and some other applications, the system will panic during shutdown.  If you have a short session running the same X setup, the system may shutdown normally with no apparent problems. (shutdown-log.txt-130410-151452)

The fourth time rebooting produced the same results as the second reboot: the usual panic with no dracut shell. (imgp3188.jpg, shutdown-log.txt-130410-153408)

Bill

Comment 12 Bill Perkins 2013-04-10 20:07:54 UTC

Created attachment 733864 [details]
First dracut reboot test screen

Comment 13 Bill Perkins 2013-04-10 20:10:08 UTC

Created attachment 733865 [details]
First dracut reboot dmesg file

Comment 14 Bill Perkins 2013-04-10 20:11:44 UTC

Created attachment 733866 [details]
Second dracut reboot test screen

Comment 15 Bill Perkins 2013-04-10 20:13:22 UTC

Created attachment 733867 [details]
Second dracut reboot dmesg file

Comment 16 Bill Perkins 2013-04-10 20:14:33 UTC

Created attachment 733868 [details]
Third dracut reboot dmesg file

Comment 17 Bill Perkins 2013-04-10 20:15:40 UTC

Created attachment 733869 [details]
Fourth dracut reboot test screen

Comment 18 Bill Perkins 2013-04-10 20:16:48 UTC

Created attachment 733870 [details]
Fourth dracut reboot dmesg file

Comment 19 Jóhann B. Guðmundsson 2013-04-10 20:24:47 UTC

Are you using raid? 

Could you attach /etc/fstab?

Comment 20 Bill Perkins 2013-04-10 20:39:45 UTC

> Jóhann B. Guðmundsson 2013-04-10 16:24:47 EDT

> Are you using raid? 

> Could you attach /etc/fstab?

Yes, I am running raid: two sets of disks running RAID1 and two set of disks running RAID5.

Lately I have been un-mounting the two RAID5 disks, /test and /backup before doing the reboots to avoid a possible fsck run during the following boot.  Even though the file systems are EXT4, fsck can still take quite awhile to run.

Comment 21 Bill Perkins 2013-04-10 20:42:18 UTC

Created attachment 733883 [details]
/etc/fstab file from the test system Pooh.

Comment 22 Jóhann B. Guðmundsson 2013-04-10 20:58:48 UTC

"EXT4-fs (dm-1): re-mounted. Opts: (null)"

What's your RAID device on dm-1 as in what's it composed of (i.e., what are the underlying devices). ?

Comment 23 Bill Perkins 2013-04-10 21:27:22 UTC

> Jóhann B. Guðmundsson 2013-04-10 16:58:48 EDT

> "EXT4-fs (dm-1): re-mounted. Opts: (null)"

> What's your RAID device on dm-1 as in what's it composed of (i.e., what are the underlying devices). ?

dm-1 translates as the logical volume /dev/vg0/lv1 (/ and /usr) which is built on raid device /dev/md1.  md1 is a RAID1 device running on /dev/sdf2 and /dev/sdg2.  This raid partition using LVM2 holds the swap, /, /usr/local, /var, /tmp, /opt, /home and /work file systems.  In other words all of the logical volumes mounted on the system.

If you think that this may a raid issue, I think I can disabuse you.  I had a second test system set up running Fedora-18 with different x86_64 hardware, a Dell GX-620, and a very similar software set-up but with no raid disks.  I was able to produce the same panic information on this second system.  See Comment 4 above.  I did not in that comment say if this Dell had any raid disks, which it did not.

Bill

Comment 24 Jóhann B. Guðmundsson 2013-04-10 22:28:49 UTC

It's my firm suspicion that this is either disk setup or network related so walk through your partitions in /etc/fstab an one by one disable them and see if you can find what triggers it 

You can start with /usr/local ( or manually unmount it before shutdown ) next up disable your nfs mounts and see if still happens.  

Was the other system local disks only or nfs as and same partition setup?

Comment 25 Bill Perkins 2013-04-11 00:19:17 UTC

Created attachment 733939 [details]
Dmesg file for panic testing of file systems

Comment 26 Bill Perkins 2013-04-11 20:41:44 UTC

Hmmmm....

I think I may have found a reason for my problem and also how to resolve this problem.

I am still testing to make sure of the reason and the solution that I have applied.

More information later....

Bill

Comment 27 Bill Perkins 2013-04-13 20:34:01 UTC

Okay, credit to Jóhann for pointing out that the problem might be related to the disk setup and suggesting trying to narrow the problem down to one of the filesystems. If I umount the /boot filesystem before performing the shutdown, either reboot or poweroff, the systems do not panic during the shutdown. If I do not umount /boot, the systems generally will panic, if enough uptime has accumulated before the shutdown. The amount of accumulated time required to cause the panic seems to vary based on how busy is the system. I generally have used the systems as I normally would for about an hour before trying to do the reboots. As to why this panic happens, I am unable at this time to explain. So here, at least, is the explanation of the filesystems, raid devices and partition layouts that I use on my systems.

First, there two servers and one workstation. The workstation is a home built 64-bit SMP (4 cores) AMD system. The two servers are Dell GX620s using 64-bit Pentium-D (2 cores) processors. One of them runs as my mail and DNS server. The other is used in audio production work. The file system layout I use is the same, except for some minor variations in sizing, on all three systems. The workstation and the audio system are running Fedora-18. The mail/DNS server is running Fedora-17. The software on all three systems is kept up to date.

The mail/DNS server running Fedora-17 has never experienced this problem. I only include it so as to be able to say that the layout of its filesystems is identical, except for sizing differences, to the other two systems. Its two 80 GB raid hard drives are smaller as compared to the 300 GB hard drives used in the other two systems. It also does not have the /work filesystem of the other two systems.

All three systems are running software raid. The two main disks on all systems are running RAID1 with each identical physical disk having two primary partitions. The first ones are setup for the raid device /dev/md0 which is used solely for the /boot filesystem. The second partitions use the remainder of the disk drives to form the /dev/md1 raid device. The md1 device is than subdivided using LVM2 for the swap, /(root), /usr/local, /var, /tmp, /opt, /home and /work file systems.

The workstation also has two RAID5 devices that are used only for data storage. As to whether they have any affect this problem, I do not believe so, since they seem to have had no effect on this problem mounted or unmounted, assembled or not.

This is the list of filesystems as setup on the workstation, All filesystems on all systems are formatted ext4, except for /boot, which is formatted ext3:

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 372603 98802 254141 28% /boot
/dev/mapper/vg0-lv0 18874364 -- -- - (swap)
/dev/mapper/vg0-lv1 18447100 10345044 7158340 60% /
/dev/mapper/vg0-lv2 1998672 815444 1078372 44% /usr/local
/dev/mapper/vg0-lv3 6061632 4145288 1601772 73% /var
/dev/mapper/vg0-lv4 3997376 11272 3776392 1% /tmp
/dev/mapper/vg0-lv5 1998672 177192 1716624 10% /opt
/dev/mapper/vg0-lv6 16382888 13057576 2486452 85% /home
/dev/mapper/vg0-lv7 219774772 105965436 102638728 51% /work

/proc/mdstat (without the RAID5 devices)

md0 : active raid1 sdf1[0] sdg1[1]
393204 blocks super 1.0 [2/2] [UU]

md1 : active raid1 sdf2[0] sdg2[1]
292640632 blocks super 1.2 [2/2] [UU]
bitmap: 1/3 pages [4KB], 65536KB chunk

Bill

Comment 28 Harald Hoyer 2013-04-15 06:52:11 UTC

If you umount /boot, then the service dracut-shutdown.service cannot unpack /boot/initramfs-$(uname -r).img to /run/initramfs and thus systemd does not call /run/initramfs/shutdown and the whole shutdown procedure mentioned in comment 10 does not kick in.

Comment 29 Bill Perkins 2013-04-15 16:40:26 UTC

> Harald Hoyer 2013-04-15 02:52:11 EDT
>
> If you umount /boot, then the service dracut-shutdown.service cannot unpack /boot/initramfs-$(uname -r).img to /run/initramfs and thus systemd does not call /run/initramfs/shutdown and the whole shutdown procedure mentioned in comment 10 does not kick in.

Okay...  So I go back and umount all the filesystems that I can before doing the shutdown:

/boot
/(root)
/var

These few above listed filesystems are all that are mounted and I had to go to run level three to get that far.  I also had run Harald's suggested actions in Comment 10 to see if the shutdown would drop into the dracut shell, which it did not.  The shutdown in this instance halted at "Fail to acquire terminal: No such file or directory" and the keyboard lights started flashing.  There were no panic numbers displayed.  The /shutdown-log.txt contained no information this late in the shutdown.  The init 3 action may have kill putting any additional information in the dmesg.  The last time entry in the /shutdown-log.txt was "218.637906".  The last time entry displayed during shutdown was "3582.193786".

Bill

Comment 30 Bill Perkins 2013-04-15 22:19:09 UTC

Since Jóhann made the suggestion in Comment 24 that the issue might be a filesystem or network related problem, I have been pursuing a process of finding out if a mounted filesystem might be involved with the problem.  Using a process of divide and conquer (Ha!), I have determined that if I umount the /boot filesystem before doing the shutdown, that the panic during shutdown does not happen.

Harald in Comment 28 says that in umounting /boot before the shutdown prevents the call to /run/initramfs/shutdown and the dracut shell from happening.  So, I decided to find out whether the issue causing the shutdown panic was the /boot filesystem structure or something in its contents.  I umounted /boot and mounted /dev/md0 on /mnt and than rsynced all of the contents of /mnt to the directory called boot on the /(root) filesystem.  I than umounted /mnt and ran the commands that Harald gave us in Comment 10.  I left /boot umounted and performed a reboot.  During the shutdown, the system did panic just as it had before when /boot was normally mounted.  No dracut shell happened.

Bill

Comment 31 Harald Hoyer 2013-04-16 07:52:07 UTC

Ok, then let's do one more check:

# systemctl start dracut-shutdown.service 
# for i in proc sys dev run; do mount --bind /$i /run/initramfs/$i;done
# chroot /run/initramfs
# [ -x ./shutdown ] && echo OK
OK
# exit
# for i in proc sys dev run; do umount /run/initramfs/$i;done

If this all works, I don't have an idea and we need more debug data

You might want to boot with:
"systemd.log_level=debug systemd.log_target=kmsg loglevel=7 rd.debug plymouth.enable=0"

and without

"quiet rhgb"

on the kernel command line. Then shutdown.

Comment 32 Bill Perkins 2013-04-16 20:39:48 UTC

Okay...  Doing the sequence given by Harald in Comment 31. 

[root@pooh:pts/3] Tue Apr 16 08:39:30 EDT (13106) ~
# systemctl start dracut-shutdown.service
[root@pooh:pts/3] Tue Apr 16 09:17:33 EDT (13106) ~
# for i in proc sys dev run; do mount --bind /$i /run/initramfs/$i; done
[root@pooh:pts/3] Tue Apr 16 09:17:54 EDT (13106) ~
# chroot /run/initramfs
bash: tty: command not found
bash: cut: command not found
bash: date: command not found
[root@pooh:]  /
# [ -x ./shutdown ] && echo OK
OK
bash: tty: command not found
bash: cut: command not found
bash: date: command not found
[root@pooh:]  /
# exit
exit
[root@pooh:pts/3] Tue Apr 16 09:19:30 EDT (13106) ~
# for i in proc sys dev run; do umount /run/initramfs/$i; done
[root@pooh:pts/3] Tue Apr 16 09:19:51 EDT (13106) ~

I got an "OK", so I assume there is no joy there...

Alright, for more debug data, we added some addition text to the kernel line in /etc/grub2/grub.cfg as requested:

linux   /vmlinuz-3.8.7-201.fc18.x86_64 root=/dev/mapper/vg0-lv1 ro LANG=en_US.UTF-8  KEYTABLE=us SYSFONT=True rd.md.uuid=767469e8:7a9835b2:e92c29b8:4d4e0bd7 rd.luks=0 rd.dm=0 rd.md.uuid=eab643ef:757c0e0d:42209589:19541488 rd.lvm.lv=vg0/lv1 rd.lvm.lv=vg0/lv0 systemd.log_level=debug systemd.log_target=kmsg loglevel=7 rd.debug plymouth.enable=0

I never use the usual "quiet rhgb" tail.

I also included the dracut shell additions from Comment 10.

I than rebooted, and the system paniced during the shutdown process.  There was no dracut shell.  Two new files are attacted.

Bill

Comment 33 Bill Perkins 2013-04-16 20:42:16 UTC

Created attachment 736543 [details]
Screenshot of shutdown panic for Comment 32

Comment 34 Bill Perkins 2013-04-16 20:44:24 UTC

Created attachment 736544 [details]
Dmesg debug text for Comment 32

Comment 35 Jóhann B. Guðmundsson 2013-04-16 21:40:13 UTC

[ 3472.041233] EXT4-fs (dm-3): re-mounted. Opts: (null)
[ 3472.576255] EXT4-fs (dm-1): re-mounted. Opts: (null)
[ 3472.614549] EXT4-fs (dm-1): re-mounted. Opts: (null)
[ 3472.623927] EXT4-fs (dm-1): re-mounted. Opts: (null)
[ 3472.837573] EXT4-fs (dm-1): re-mounted. Opts: (null)

Well to me this points to the problem being dm-3 and dm-1 maybe Harald got some magic insight and I'm not so sure this is the right combo for dmraid to work ( but then again the whole activation/tear-down logic behind it never did sit right with me )

root=/dev/mapper/vg0-lv1 
rd.md.uuid=767469e8:7a9835b2:e92c29b8:4d4e0bd7 rd.md.uuid=eab643ef:757c0e0d:42209589:19541488 
rd.lvm.lv=vg0/lv1 
rd.lvm.lv=vg0/lv0

In anycase before we get the dmraid experts on this run this and paste the output from this

1. 
# dmraid -r

2.
# dmsetup status

3. 
# dmsetup deps

4. 
# ls -al /dev/mapper

5.
# cat /proc/mdstat

If nothing screams at us then we can move it to them and they throw it back at us if the entire setup is legit

Comment 36 Bill Perkins 2013-04-16 22:13:06 UTC

Regarding Comment 35: It better be legit!  This is pretty much the way I have always setup my filesystems since I started using software raid.  When I used Fedora-15 and older versions, I used to embed /boot in /(root) on a raid device using the first partitions of the disks.  /usr, /usr/local, /opt, /var, /tmp and /home filesystems were put LVM logical volumes on a second raid device made up of the second partitions of those same disks.  It worked for me.  I had to rethink this layout when /usr was bundled into the /(root) filesystem.

1. # dmraid -r
no raid disks
(Raid disks are built and assembled using mdadm)

2. # dmsetup status
vg0-lv7: 0 446824448 linear 
vg0-lv6: 0 33554432 linear 
vg0-lv5: 0 4194304 linear 
vg0-lv4: 0 8388608 linear 
vg0-lv3: 0 12582912 linear 
vg0-lv2: 0 4194304 linear 
vg0-lv1: 0 37748736 linear 
vg0-lv0: 0 37748736 linear 

3. # dmsetup deps
vg0-lv7: 1 dependencies : (9, 1)
vg0-lv6: 1 dependencies : (9, 1)
vg0-lv5: 1 dependencies : (9, 1)
vg0-lv4: 1 dependencies : (9, 1)
vg0-lv3: 1 dependencies : (9, 1)
vg0-lv2: 1 dependencies : (9, 1)
vg0-lv1: 1 dependencies : (9, 1)
vg0-lv0: 1 dependencies : (9, 1)

4. # ls -al /dev/mapper
total 0
drwxr-xr-x.  2 root root     220 Apr 16 16:16 .
drwxr-xr-x. 21 root root    4360 Apr 16 17:47 ..
crw-------.  1 root root 10, 236 Apr 16 16:16 control
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv0 -> ../dm-0
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv1 -> ../dm-1
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv2 -> ../dm-2
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv3 -> ../dm-3
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv4 -> ../dm-6
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv5 -> ../dm-7
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv6 -> ../dm-4
lrwxrwxrwx.  1 root root       7 Apr 16 16:16 vg0-lv7 -> ../dm-5

5. # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md3 : active raid5 sdi2[4] sdj2[3] sdk2[0] sdl2[1]
      732169728 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      
md2 : active raid5 sdi1[1] sdh1[0] sdj1[5] sdl1[3] sdk1[2]
      2929768448 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      
md1 : active raid1 sdf2[0] sdg2[1]
      292640632 blocks super 1.2 [2/2] [UU]
      bitmap: 1/3 pages [4KB], 65536KB chunk

md0 : active raid1 sdg1[1] sdf1[0]
      393204 blocks super 1.0 [2/2] [UU]
-----

Bill

Comment 37 Jóhann B. Guðmundsson 2013-04-16 22:54:57 UTC

What I suspect is happening here EXT4-fs (dm-X): re-mounted. Opts: (null)
is that sync isn't returning because some process is still writing to the file system and the house of (dm)raids cards comes crumbling down on shutdown...

Which explains why "The system will reboot/poweroff everytime if I type in "sync && reboot" or "sync && poweroff"" works thus we just need to figure out what's writing to dm-3/dm-1 and we can potential find out a race condition 

Now let's look at this starting from the kernel cmd what happens if you just remove "rd.lvm.lv=vg0/lv0" which according to your fstab is swap

dm-0 == dev/mapper/vg0-lv0   swap

( might be confusing shutdown logic )

dm-1 == /dev/mapper/vg0-lv1   /
dm-3 == /dev/mapper/vg0-lv3   /var

Have you tried it with the latest systemd release 

1. http://koji.fedoraproject.org/koji/buildinfo?buildID=411809
( might be the journal is still writing something or potental issue between fstab and tmp.mount since in F18 we default to have /tmp on /tmpfs )

tmp on tmpfs should kick in if you hash out the /tmp from /etc/fstab or visa versa you can rule out tmp.mount by running systemctl mask tmp.mount

And further could you run filesystem check on everything and restorecon -v -R on everything as well or literally disable selinux altogether ( just to rule potential hd failure or corruption and selinux altogether or a combination of selinux reporting even in permissive mode and journal logging )

Comment 38 Bill Perkins 2013-04-17 03:05:22 UTC

> Jóhann B. Guðmundsson 2013-04-16 18:54:57 EDT

> What I suspect is happening here EXT4-fs (dm-X): re-mounted. Opts: (null)
is that sync isn't returning because some process is still writing to the file system and the house of (dm)raids cards comes crumbling down on shutdown...

> Which explains why "The system will reboot/poweroff everytime if I type in "sync && reboot" or "sync && poweroff"" works thus we just need to figure out what's writing to dm-3/dm-1 and we can potential find out a race condition 

I had not before tried the "sync && reboot" and leaving /boot mounted to see what might happen to the shutdown process.  I tried it tonight and it worked with a very long pause in the middle of the process.

> Now let's look at this starting from the kernel cmd what happens if you just remove "rd.lvm.lv=vg0/lv0" which according to your fstab is swap

Yeah, I spotted that issue this afternoon and remove the "rd.lvm.lv=vg0/lv0" from the kernel boot command line.  Doing so did not seem to charge the characteristics of the problem: the system panics if /boot is mounted, or it does not if /boot is umounted before dhutdown.

> dm-0 == dev/mapper/vg0-lv0   swap

> ( might be confusing shutdown logic )

> dm-1 == /dev/mapper/vg0-lv1   /
> dm-3 == /dev/mapper/vg0-lv3   /var

> Have you tried it with the latest systemd release 

No, I have not.  I will download and recompile it to see if it changes the problem.  I will report on this later.

> 1. http://koji.fedoraproject.org/koji/buildinfo?buildID=411809
( might be the journal is still writing something or potental issue between fstab and tmp.mount since in F18 we default to have /tmp on /tmpfs )

> tmp on tmpfs should kick in if you hash out the /tmp from /etc/fstab or visa versa you can rule out tmp.mount by running systemctl mask tmp.mount

I would rather keep /tmp as a real filesystem separate from /var/tmp.  There is always some transient history stored there that I might want to look at on the next system boot.

> And further could you run filesystem check on everything and restorecon -v -R on everything as well or literally disable selinux altogether ( just to rule potential hd failure or corruption and selinux altogether or a combination of selinux reporting even in permissive mode and journal logging )

I have already done fsck on all file systems and found nothing wrong, and I do NOT want to do a restorecon.  The system is running in selinux permissive for the duration of this problem and that should be enough.  Disabling selinux completely is a possibility.  Also, SMART on all of drives shows no problems and the SATA driver is showing no errors or warnings in the logs, except for a eSATA connected disk that is now shut off and disconnected.

Sometimes after a panic during shutdown event and than rebooting, the RAID device /dev/md1 will do a rsync.  Why?  Did the RAID device not get shutdown correctly during shutdown?

I might try disabling selinux if only for the reason that every time it performs a relabel operation, it takes 3-1/2 hours to complete it.  That makes for very long boot up times.

Comment 39 Harald Hoyer 2013-04-17 09:31:24 UTC

What I suspect, is that /lib/systemd/systemd-shutdown exits because of an error.

Now, if only we could get it to log things.

Comment 40 Jóhann B. Guðmundsson 2013-04-17 09:53:07 UTC

As soon as he does what we ask him to and disables selinux we will get that shutdown log

[15099.484240] type=1400 audit(1365531957.255:6332): avc:  denied  { write } for  pid=15035 comm="dmesg" path="/shutdown-log.txt" dev="dm-1" ino=18540 scontext=system_u:system_r:dmesg_t:s0 tcontext=system_u:object_r:root_t:s0 tclass=file

And he will need to disable selinux entirely. 

"I will download and recompile it to see if it changes the problem." and this indicates this is not a stock install. 

Heck if this is an upgrade from ( or even earlier ) F16 --> F17 --> F18 which includes the "usermove" we might be dealing with all kinds upgrade crap due to that.

If this is an clean F17 install that got upgraded to F18 he should not be experience it

Comment 41 Bill Perkins 2013-04-17 13:18:03 UTC

> As soon as he does what we ask him to and disables selinux we will get that shutdown log

Jóhann, are you saying that all of those plain-text files that I have attaching to this report are not the shutdown log files that you have wanted?  See Comment 6.

I am running selinux in permissive mode.  It is that not enough?

This system is running a standard install of Fedora-17 that was updated to Fedora-18 by doing a yum upgrade.  The non-standard install of a pre-release version of systemd was done as requested earlier.  See Jóhann's suggestion in  Comment 37.  I am returning systemd back to the standard update version (197-1) because the pre-release version does not appear to change any characteristics of the problem. 

I have also disabled Selinux as requested earlier and rebooted.

# cat /etc/selinux/config 

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted 

----
Bill

Comment 42 Bill Perkins 2013-04-17 16:10:28 UTC

Okay, I reinstall the current version of systemd and libgudev1:

# rpm -qa | grep systemd
systemd-libs-197-1.fc18.2.i686
systemd-libs-197-1.fc18.2.x86_64
systemd-python-197-1.fc18.2.x86_64
systemd-devel-197-1.fc18.2.x86_64
wine-systemd-1.5.24-1.fc18.noarch
systemd-197-1.fc18.2.x86_64
systemd-sysv-197-1.fc18.2.x86_64
# rpm -qa | grep libgudev1
libgudev1-197-1.fc18.2.x86_64

I have disabled selinux:

# sestatus
SELinux status:                 disabled

I have done a reboot with panic and attached a new screenshot and shutdown-log.txt files.

I am seeing in the shutdown-log.txt near the end of the file the following when the the system panics during shutdown:

[  386.631470] systemd[1]: Shutting down.
[  386.753949] systemd-journald[500]: Received SIGTERM
[  387.005515] EXT4-fs (dm-2): re-mounted. Opts: (null)
[  387.058928] EXT4-fs (md0): re-mounted. Opts: (null)
[  387.150675] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  387.186441] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  387.194849] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  387.419315] EXT4-fs (dm-0): re-mounted. Opts: (null)

If I boot the system, login into X, quickly run a few apps and that reboot without getting a shutdown panic, I see this:

[  106.483553] systemd[1]: Shutting down.
[  106.604738] systemd-journald[509]: Received SIGTERM
[  107.002091] EXT4-fs (dm-2): re-mounted. Opts: (null)
[  107.111747] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  107.134482] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  107.136614] EXT4-fs (dm-0): re-mounted. Opts: (null)
[  107.342259] EXT4-fs (dm-0): re-mounted. Opts: (null)

In both of these cases, I did not umount /boot before each of these shutdowns.

I wonder if the md0 remounted line in the first instance indicates anything?

Bill

Comment 43 Bill Perkins 2013-04-17 16:15:22 UTC

Created attachment 736937 [details]
Screenshot of shutdown panic after selinux disabled

Comment 44 Bill Perkins 2013-04-17 16:17:08 UTC

Created attachment 736938 [details]
shutdown-log.txt file after selinux disabled

Comment 45 Jóhann B. Guðmundsson 2013-04-17 17:09:26 UTC

Can you reboot if you stop and disable nfs and bind services ?

Comment 46 Bill Perkins 2013-04-17 18:24:31 UTC

>  Jóhann B. Guðmundsson 2013-04-17 13:09:26 EDT
> Can you reboot if you stop and disable nfs and bind services ?

# systemctl stop named-chroot.service
# systemctl stop nfs.service
# systemctl stop nfs-server.service
# systemctl disable named-chroot.service
rm '/etc/systemd/system/multi-user.target.wants/named-chroot.service'
# systemctl disable nfs-server.service
rm '/etc/systemd/system/multi-user.target.wants/nfs-server.service'

I than rebooted twice: once immediately after stopping and disabling nfs and named, a second time from a session in which named and nfs were not started at boot time.  Both paniced in the same way.

Bill

Comment 47 Bill Perkins 2013-04-17 18:26:34 UTC

Created attachment 736993 [details]
Screenshot of shutdown panic for Comment 46

Comment 48 Bill Perkins 2013-04-17 18:28:58 UTC

Created attachment 736994 [details]
shutdown-log.txt for Comment 46

Comment 49 Jóhann B. Guðmundsson 2013-04-17 19:33:26 UTC

/var gets cleanly unmounted now which I do believe indicates wrong ordering dependency in the nfs ( a bug ) and we no longer see "EXT4-fs (dm-2): re-mounted. Opts: (null)" so you can enable bind again. 

Which leaves us with /boot
[ 1651.670223] systemd[1]: Forked /bin/umount as 3770
...
[ 1651.793018] systemd[1]: Got SIGCHLD for process 3770 (umount)
[ 1651.793170] systemd[1]: Child 3770 died (code=exited, status=32/n/a)
[ 1651.793180] systemd[1]: Child 3770 belongs to boot.mount
[ 1651.793201] systemd[1]: boot.mount mount process exited, code=exited status=32
[ 1651.801284] systemd[1]: boot.mount changed unmounting -> mounted
[ 1651.801295] systemd[1]: Job boot.mount/stop finished, result=failed
[ 1651.801338] systemd[1]: Failed unmounting /boot.

Now another mystery is this somehow it is determined that dm-3 is swap

[  384.265907] systemd[1]: Deactivating swap /dev/dm-3...
[  384.265985] systemd[1]: About to execute /sbin/swapoff /dev/dm-3
[  384.266347] systemd[1]: Forked /sbin/swapoff as 2856
[  384.276856] systemd[1]: dev-dm\x2d3.swap changed active -> deactivating
[  384.276874] systemd[1]: Redirecting stop request from dev-vg0-lv0.swap to dev-dm\x2d3.swap.
[  384.276883] systemd[1]: Deactivating swap /dev/dm-3...
[  384.276914] systemd[1]: Redirecting stop request from dev-disk-by\x2did-dm\x2dname\x2dvg0\x2dlv0.swap to dev-dm\x2d3.swap.
[  384.276920] systemd[1]: Deactivating swap /dev/dm-3...
[  384.276933] systemd[1]: Redirecting stop request from dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dKyaatOWRtwFoKczvyHtyglVZdJiFRGVYEkTgyI6sZklKppsJiU80PukfE8PyBToj.swap to dev-dm\x2d3.swap.
[  384.276939] systemd[1]: Deactivating swap /dev/dm-3...
[  384.276950] systemd[1]: Redirecting stop request from dev-disk-by\x2duuid-271d7447\x2db92d\x2d4d76\x2d8e3f\x2d258ba1a7b7d2.swap to dev-dm\x2d3.swap.
[  384.276956] systemd[1]: Deactivating swap /dev/dm-3...


Now dev/dm-3 == /dev/mapper/vg0-lv3 and the fstab entry for that is 
/dev/mapper/vg0-lv3     /var                    ext4    defaults        1 2
so I'm not so sure why dm-3 instead of dm-0 suddenly has become the swap partition but let's remove the swap entry from the kernel command line and switch to UUID entries in /etc/fstab you can find those via ls -alh /dev/disk/by-uuid/ and provide the shutdown.log.

Also let's add to the mix the contents of he contents of /proc/mdstat , /etc/mdadm.conf and mdadm -E on a constituent element of each md raid array on the system to have a better look mdraid.

Comment 50 Jóhann B. Guðmundsson 2013-04-17 19:42:33 UTC

Btw there is no need for you to post images of the same kernel panic over and over again. It's just a standard panic when it kills the init ( pid 1 ) and yes I agree that kernel should handle that more gracefully

Comment 51 Bill Perkins 2013-04-17 20:05:46 UTC

>  Jóhann B. Guðmundsson 2013-04-17 15:42:33 EDT

> Btw there is no need for you to post images of the same kernel panic over and 
> over again. It's just a standard panic when it kills the init ( pid 1 ) and 
> yes I agree that kernel should handle that more gracefully

Jóhann, Thank you telling me not to post any more images.  I could see that they
were duplicates, but I did not know if the lack of change in the numbers 
represented real information.  I will not post any more images for this report
unless asked to do so, or if real changed information can be shown.

Bill

Comment 52 Bill Perkins 2013-04-17 22:34:59 UTC

Jóhann,

It turns out that dm-3 really is the swap partition.  Here is a portion of the
list "ls -lah /dev/disk/by-id":

lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv0 -> ../../dm-3
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv1 -> ../../dm-0
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv2 -> ../../dm-1
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv3 -> ../../dm-2
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv4 -> ../../dm-6
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv5 -> ../../dm-7
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv6 -> ../../dm-4
lrwxrwxrwx 1 root root   10 Apr 17 17:46 dm-name-vg0-lv7 -> ../../dm-5

As you can see. the logical volume names do not line up with the dm numbering
scheme.  So, dm-3 is the same as /dev/vg0/lv0 which is the swap partition.
The /(root) filesystem is dm-0 or /dev/vg0/lv1.

Now I did not know if you wanted the /etc/fstab modified to replace all 
entries with logical volume names with UUID numbers. or just to do swap.
Well, I wanted to see if it would work, So I replaced all vg0-lvx entries
with UUIDs.  This was the original /etc/fstab file:

#
# /etc/fstab
#
UUID=c8830ea0-64f9-4999-8d57-d6d52e09f8d4 /boot ext3    defaults        1 2
/dev/mapper/vg0-lv0     swap                    swap    defaults        0 0
/dev/mapper/vg0-lv1     /                       ext4    defaults        1 1
/dev/mapper/vg0-lv2     /usr/local              ext4    defaults        1 2
/dev/mapper/vg0-lv3     /var                    ext4    defaults        1 2
/dev/mapper/vg0-lv4     /tmp                    ext4    defaults        1 2
/dev/mapper/vg0-lv5     /opt                    ext4    defaults        1 2
/dev/mapper/vg0-lv6     /home                   ext4    defaults        1 2
/dev/mapper/vg0-lv7     /work                   ext4    defaults        1 2
UUID=7dd0bfab-7983-488e-9163-6c442701cda7 /backup ext4  defaults        1 2
UUID=e70bd9e6-3245-4dd5-898c-475ad2e7142c /test   ext4  defaults        1 2
#
LABEL=/backup15    /backup15        auto    noauto,owner,noatime,barrier=1 0 0
LABEL=/backup20    /backup20        auto    noauto,owner,noatime,barrier=1 0 0
/dev/sr0           /cd0             auto    noauto,owner,user,ro    0 0
/dev/sr1           /cd1             auto    noauto,owner,user,ro    0 0
/dev/fd0           /fd0             auto    noauto,owner,user,noatime       0 0
/backup/dvdfs      /dvdfs           auto    noauto,loop,owner,noatime       0 0
/backup/dvdfs-dl   /dvdfs-dl        auto    noauto,loop,owner,noatime       0 0

This is the new fstab file with UUIDs:

#
# /etc/fstab
#
UUID=c8830ea0-64f9-4999-8d57-d6d52e09f8d4  /boot       ext3  defaults        1 2
UUID=271d7447-b92d-4d76-8e3f-258ba1a7b7d2  swap        swap  defaults        0 0
UUID=57a6fb28-c4a2-4430-b64a-dcbd6c87a4ad  /           ext4  defaults        1 1
UUID=25aae70b-bfa8-42bf-9265-eef723cdeec3  /usr/local  ext4  defaults        1 2
UUID=8d1acf0d-0327-4711-a5f6-4ada8bb02124  /var        ext4  defaults        1 2
UUID=5b6d71b7-2dee-4b4b-ae0a-1ed7c623c9ed  /tmp        ext4  defaults        1 2
UUID=7338b671-fa46-4a95-ae6c-1e8900f75ce8  /opt        ext4  defaults        1 2
UUID=df485586-0e0d-4204-904e-0e62634b8fad  /home       ext4  defaults        1 2
UUID=537658bb-5d08-466c-874b-4bb4a57c3931  /work       ext4  defaults        1 2
UUID=7dd0bfab-7983-488e-9163-6c442701cda7  /backup     ext4  defaults        1 2
UUID=e70bd9e6-3245-4dd5-898c-475ad2e7142c  /test       ext4  defaults        1 2
#
LABEL=/backup15       /backup15  auto  noauto,owner,noatime,barrier=1        0 0
LABEL=/backup20       /backup20  auto  noauto,owner,noatime,barrier=1        0 0
/dev/sr0              /cd0       auto  noauto,owner,user,ro                  0 0
/dev/sr1              /cd1       auto  noauto,owner,user,ro                  0 0
/dev/fd0              /fd0       auto  noauto,owner,user,noatime             0 0
/backup/dvdfs         /dvdfs     auto  noauto,loop,owner,noatime             0 0
/backup/dvdfs-dl      /dvdfs-dl  auto  noauto,loop,owner,noatime             0 0

Not very readable, but it does mount everything in the correct places.

Now for additional information.

The entry for swap in the kernel boot line "rd.lvm.lv=vg0/lv0" had already
been removed from the /etc/default/grub and /boot/grub2/grub.cfg files.

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md2 : active raid5 sdk1[2] sdi1[1] sdj1[5] sdh1[0] sdl1[3]
      2929768448 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      
md3 : active raid5 sdj2[3] sdi2[4] sdl2[1] sdk2[0]
      732169728 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      
md1 : active raid1 sdf2[0] sdg2[1]
      292640632 blocks super 1.2 [2/2] [UU]
      bitmap: 2/3 pages [8KB], 65536KB chunk

md0 : active raid1 sdf1[0] sdg1[1]
      393204 blocks super 1.0 [2/2] [UU]
      
unused devices: <none>

# cat /etc/mdadm.conf 
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md0 level=raid1 num-devices=2 UID=767469e8:7a9835b2:e92c29b8:4d4e0bd7
ARRAY /dev/md1 level=raid1 num-devices=2 UID=eab643ef:757c0e0d:42209589:19541488
ARRAY /dev/md2 metadata=1.2 name=pooh.grnwood.net:2 UUID=75a8f389:a777b702:06cc82ca:929cdd4c
ARRAY /dev/md3 metadata=1.2 name=pooh.grnwood.net:4 UUID=20bebd3f:3eb5ccae:9ccc1d36:b98a877f


# mdadm -E /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : 767469e8:7a9835b2:e92c29b8:4d4e0bd7
           Name : pooh.grnwood.net:0  (local to host pooh.grnwood.net)
  Creation Time : Fri Jan 25 19:40:26 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 786408 (384.05 MiB 402.64 MB)
     Array Size : 393204 (384.05 MiB 402.64 MB)
   Super Offset : 786416 sectors
          State : clean
    Device UUID : 5e768690:884c6f06:ccf2b8db:815ac8f2

    Update Time : Wed Apr 17 18:22:27 2013
       Checksum : 8a05136c - correct
         Events : 112


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)

# mdadm -E /dev/sdg2
/dev/sdg2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : eab643ef:757c0e0d:42209589:19541488
           Name : pooh.grnwood.net:1  (local to host pooh.grnwood.net)
  Creation Time : Fri Jan 25 19:40:07 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 585281536 (279.08 GiB 299.66 GB)
     Array Size : 292640632 (279.08 GiB 299.66 GB)
  Used Dev Size : 585281264 (279.08 GiB 299.66 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 04f869de:63f95416:1d472a2f:e1763d3c

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Apr 17 18:28:32 2013
       Checksum : ae7e707e - correct
         Events : 13602


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)

# mdadm -E /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 75a8f389:a777b702:06cc82ca:929cdd4c
           Name : pooh.grnwood.net:2  (local to host pooh.grnwood.net)
  Creation Time : Thu Jan 17 10:00:05 2013
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1464884976 (698.51 GiB 750.02 GB)
     Array Size : 2929768448 (2794.04 GiB 3000.08 GB)
  Used Dev Size : 1464884224 (698.51 GiB 750.02 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8e1e6f2e:297b747d:e823f3db:a1921c64

    Update Time : Wed Apr 17 17:46:32 2013
       Checksum : 39b97f3f - correct
         Events : 34646

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)

# mdadm -E /dev/sdl2
/dev/sdl2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 20bebd3f:3eb5ccae:9ccc1d36:b98a877f
           Name : pooh.grnwood.net:4  (local to host pooh.grnwood.net)
  Creation Time : Wed Jan 16 07:49:52 2013
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 488113856 (232.75 GiB 249.91 GB)
     Array Size : 732169728 (698.25 GiB 749.74 GB)
  Used Dev Size : 488113152 (232.75 GiB 249.91 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 4108a158:496763d8:0bb5f3e7:5c4cbc82

    Update Time : Wed Apr 17 17:46:31 2013
       Checksum : cc4eb5d2 - correct
         Events : 10597

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing)

----
Bill

Comment 53 Bill Perkins 2013-04-17 22:42:18 UTC

Created attachment 737019 [details]
ls -lah /dev/disk/by-id for Comment 52.  Included for completeness.

Comment 54 Bill Perkins 2013-04-17 22:44:43 UTC

Created attachment 737020 [details]
shutdown-log.txt  written after UUID mods to /etc/fstab & reboot for Comment 52

Comment 55 Harald Hoyer 2013-04-18 11:26:12 UTC

Can you take a screenshot of the shutdown screen, if you boot with "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel command line?

Comment 56 Bill Perkins 2013-04-18 13:44:15 UTC

> Harald Hoyer 2013-04-18 07:26:12 EDT

> Can you take a screenshot of the shutdown screen, if you boot with
> "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel
> command line?

The attachments associated with Comment 46 already include this information
along with "rd.debug plymouth.enable=0".  I have been told that all  of the
additional screenshots are redundant and basically contain the same information.

Bill

Comment 57 Jóhann B. Guðmundsson 2013-04-18 14:14:33 UTC

Without knowing what's failing we cant fix it ;( 

Had you already followed this ( debug shell/serial console )

http://freedesktop.org/wiki/Software/systemd/Debugging#Shutdown_Never_Finishes

Comment 58 Harald Hoyer 2013-04-18 14:53:51 UTC

(In reply to comment #56)
> > Harald Hoyer 2013-04-18 07:26:12 EDT
> 
> > Can you take a screenshot of the shutdown screen, if you boot with
> > "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel
> > command line?
> 
> The attachments associated with Comment 46 already include this information
> along with "rd.debug plymouth.enable=0".  I have been told that all  of the
> additional screenshots are redundant and basically contain the same
> information.
> 
> Bill

are you sure, this had "systemd.log_target=kmsg loglevel=7" ?

Comment 59 Harald Hoyer 2013-04-18 15:40:39 UTC

(In reply to comment #58)
> (In reply to comment #56)
> > > Harald Hoyer 2013-04-18 07:26:12 EDT
> > 
> > > Can you take a screenshot of the shutdown screen, if you boot with
> > > "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel
> > > command line?
> > 
> > The attachments associated with Comment 46 already include this information
> > along with "rd.debug plymouth.enable=0".  I have been told that all  of the
> > additional screenshots are redundant and basically contain the same
> > information.
> > 
> > Bill
> 
> are you sure, this had "systemd.log_target=kmsg loglevel=7" ?

ok, sorry, the screenshot says "shutdown" and not "systemd-shutdown"

Comment 60 Harald Hoyer 2013-04-18 15:45:37 UTC

(In reply to comment #12)
> Created attachment 733864 [details]
> First dracut reboot test screen

This is the key screenshot.

int make_console_stdio(void) {
...
       fd = acquire_terminal("/dev/console", false, true, true, (usec_t) -1);
       log_error("Failed to acquire terminal: %s", strerror(-fd));
...

in shutdown.c:

        chroot(".");

        setsid();
        make_console_stdio();

        log_info("Successfully changed into root pivot.");


So, it seems there is no "/dev/console" ??

Comment 61 Bill Perkins 2013-04-18 16:16:47 UTC

> Jóhann B. Guðmundsson 2013-04-18 10:14:33 EDT>
> Without knowing what's failing we cant fix it ;( 
> Had you already followed this ( debug shell/serial console )
> http://freedesktop.org/wiki/Software/systemd/Debugging#Shutdown_Never_Finishes

Well, I have been down that road already, but I did not document all of the trip.

For me the serial console is impractical.  I lack the needed null modem cable,
and serial ports on computers are getting scare.  So, I have gone with the
script: /lib/systemd/system-shutdown/debug.sh.  Every log text I have attached since Comment 6 has included this debug information.  Booting the kernel uses
the following command line:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.8.7-201.fc18.x86_64 root=/dev/mapper/vg0-lv1 ro LANG=en_US.UTF-8 KEYTABLE=us SYSFONT=True rd.md.uuid=767469e8:7a9835b2:e92c29b8:4d4e0bd7 rd.luks=0 rd.dm=0 rd.md.uuid=eab643ef:757c0e0d:42209589:19541488 rd.lvm.lv=vg0/lv1 systemd.log_level=debug systemd.log_target=kmsg loglevel=7 rd.debug plymouth.enable=0

I have tried the debug shell, but once the panic occurs there is no way to get
to it via CTRL+ALT+F9 because the keyboard is locked up.  I do not think a
serial console would work either in this case.  The panic appears to happen 
just before the kernel would issue the reset for the shutdown.

Doing a "sync && reboot -f" or "sync && poweroff -f" immediately causes a
successful shutdown to run.  So, I would guess it is not a kernel bug.

The last part of that web page talks about "Reporting systemd Bugs"  I do not
think I have included for this report the outputs for the following two 
commands:

# systemctl dump > systemd-dump.txt
# systemd --test --system --log-level=debug > systemd-test.txt 2>&1 (not by
root)

So I include them now for what they may be worth.

Bill

Comment 62 Bill Perkins 2013-04-18 16:19:46 UTC

Created attachment 737352 [details]
systemd-dump.txt

Comment 63 Bill Perkins 2013-04-18 18:10:04 UTC

>  Harald Hoyer 2013-04-18 11:45:37 EDT 

> So, it seems there is no "/dev/console" ??

Sometimes a cause to a problem is not found until someone asks the correct
question.

And that it the correct question to ask.

The answer is no, at least at the point of time in the shutdown cycle about 
which we are concerned.  There was a /dev/console device earlier in the session,
but it was a soft link pointing to a xterm window device so that we could 
monitor the console's output during the X session.  The xterm and where the
link was pointing, of course, gets destroyed when the X session terminates 
during the shutdown process.  I guess systemd needs the console during shutdown
to display some messages and panics when it can not find the console device.

Well, I removed the script that created the soft link for the console and now
the reboot and poweroff shutdowns happen correctly as they should on the two
systems running Fedora-18.  The system running Fedora-17 using this same soft
linked console device script reboots and poweroff just fine.  Go figure!

Bill

Comment 64 Lennart Poettering 2013-04-18 23:49:49 UTC

Uh oh. On Linux /dev/console is not something you can replace by a FIFO! It's a device node with very special semantics! That this every worked is pure luck... Really, don't do this. That's totally unsupported. Yikes...

Comment 65 Michal Schmidt 2013-04-19 10:16:34 UTC

I see. dracut's shutdown script begins with:
  exec </dev/console >>/dev/console 2>>/dev/console
An exec failure causes a non-interactive shell to exit.

I agree that /dev/console should not be removed, but the script can still be made robust against this.

Is the redirection really necessary? systemd will call the shutdown script with the fds 0, 1, 2 redirected to /dev/console anyway.

If it is necessary for some reason (non-systemd systems?), perhaps the exec can be protected with:
[ -w /dev/console ] && exec ...

Comment 66 Harald Hoyer 2013-04-19 13:09:50 UTC

(In reply to comment #65)
> I see. dracut's shutdown script begins with:
>   exec </dev/console >>/dev/console 2>>/dev/console
> An exec failure causes a non-interactive shell to exit.
> 
> I agree that /dev/console should not be removed, but the script can still be
> made robust against this.
> 
> Is the redirection really necessary? systemd will call the shutdown script
> with the fds 0, 1, 2 redirected to /dev/console anyway.
> 
> If it is necessary for some reason (non-systemd systems?), perhaps the exec
> can be protected with:
> [ -w /dev/console ] && exec ...

yes.. added upstream, thanks!

Closing as NOTABUG though, because one should not fiddle with /dev/console.

Note You need to log in before you can comment on or make changes to this bug.