Created attachment 706413 [details] This is a photograph of the final screen. Description of problem: System freezes during poweroff or reboot. Version-Release number of selected component (if applicable): 197-1 How reproducible: Boot system; login to X using fvwm as the window manager; run a few applications; terminate those applications; type reboot or poweroff in a xterm or use ctrl-alt-delete; system will to shutdown and freeze with a panic displayed. Steps to Reproduce: 1. 2. 3. Actual results: System halts/freezes before shutdown is completed; caps lock and scroll lock indicators will rapidly flash; keyboard will be unresponsive and the following is displayed every time as the final message: Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 Pid: 1, comm: shutdown Not tainted 3.8.1-201.fc18.x86_64 #1 Call Trace: [<ffffffff81645b9e>] panic+0xc1/0x1d0 [<ffffffff810647a8>] do_exit_0x918/0x9e0 [<ffffffff810648ff>] do_group_exit+0x3f/0xa0 [<ffffffff81064977>] sys_exit_group_0x17/0x20 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b drm_kms_helper: panic occurred, switching back to text console Expected results: System reboots or powers off as expected. Additional info: The system will reboot/poweroff everytime if I type in "sync && reboot" or "sync && poweroff". The system freezes too early to get the dmesg output. I have tried the vt and serial port debug tricks and have received no joy. On an alternate system with a different hardware configuration the same problem does occur when running the same version of FVWM as the window manager. The problem on that system does not appear to occur when running Mate or XFCE as the desktops in place of FVWM. I have a third system running Fedora-17 that does not this and and other problems at the present time and is running a desktop configuration with FVWM that is identical to this system running Fedora-18. The Radeon and nVidia video cards I am using do not have the horse power to run the Gnome 3 desktop. So, I have not tested this situation with Gnome 3.
Is this fresh install or upgrade? What HW is this? ( could be a race of some sort see similar bug 831634 )
Is this reproducible if you disable selinux?
Created attachment 706714 [details] Test containg uname, lsmod, lspci, dmesg and rpm -qa > What HW is this? This is a four core x86_64 with eight gig of memory. Additional information is in the text attachment.
> Is this reproducible if you disable selinux? The problem has been tested by including enforcing=0 in the kernel command boot line to disable selinux. The conditions of the problem does not change if you disable selinux. > Is this fresh install or upgrade? This system was upgraded from Fedora 17 to Fedora 18 using the Yum Update Process. > ( could be a race of some sort see similar bug 831634 ) I did look at this bug and do not think that it is similar. The software RAID configuration I am using has been quite stable and has been in use for greater than five years. Also, if I run X, but do not log into X, and instead run the system from the command line before the reboot/poweroff, regardless of how long the system remains up, the system will reboot/poweroff normally with no apparent problem during the shutdown. I have tried the "dracut -f" to rebuild initramfs image; rebooted several time with no change with the problem. As to whether this might be a race condition or not, that might be a possibility. I use FVWM as the window manager during the X session, XDM to login into X and the Mate (Gnome 2 before that) panel application during the session. If I log into X and run a few applications (it does not seem to matter which applications are run), and than reboot, the system MAY reboot without any apparent problems. If I repeat the process and run a few more applications (again, it matters not what applications) and, maybe wait awhile before the reboot, the system will generally always panic when you do finally reboot/poweroff. I have build a new, but with somewhat different hardware (it has fewer processors and less memory), system using fedora 18. This system has exhibited most of the same symptoms as the system being used in this report. It runs a simpler desktop using FVWM and uses Lightdm, instead of XDM, to login into X. (The .Xclients script has to be run manually. I have not taken the time yet to figure out how to fix that issue.) This configuration can run any of the installed desktops: FVWM, Mate and XFCE. Gnome is not installed or being used. What I have found with this second system (the TEST system) is that the reboot/poweroff does not seem to ever fail if I run the Mate or XFCE desktops. Only, sometimes, if I only run FVWM as the window manager does the reboot/poweroff problem occur. I am still testing on this system and have just now performed a reboot. Results: a panic screen with the same information as given above in the "Actual Results" and this is the TEST system, not the system originally used for this report.
Hmm, could you drop in a script in /usr/lib/systemd/system-shutdown/ and make it executable, with the following contents: #!/bin/bash echo "Hello World" and then shutdown and see if you see "Hello World" on screen? If you do this looks like a dracut problem.
Lennart Poettering <lpoetter> > Hmm, could you drop in a script in /usr/lib/systemd/system-shutdown/ and make it executable, with the following contents: > > #!/bin/bash > echo "Hello World" > > and then shutdown and see if you see "Hello World" on screen? If you do this looks like a dracut problem. Okay, I have done as you asked. This is what I have in /usr/lib/systemd/system-shutdown/: # ls -l /usr/lib/systemd/system-shutdown/ total 8 -rwxr-xr-x. 1 root root 80 Mar 6 23:58 debug.sh -rwxr-xr-x. 1 root root 31 Apr 9 13:48 hello-world.sh I then rebooted the work station under test, and the system hung as before and as it has done so since I installed Fedora-18. Now it might be getting interesting... The hello-world.sh script produced no output that I could see on the screen. However, the debug.sh script did produce a dmesg log file in root: # cat /usr/lib/systemd/system-shutdown/debug.sh #!/bin/sh mount -o remount,rw / dmesg > /shutdown-log.txt mount -o remount,ro / I had created the debug.sh script sometime soon after I originally created this report, but did not see any /shutdown-log.txt being created the several times I tried to reboot the system after doing so. Well, lo and behold, I went and check to see if this file was now being created, and it was! I checked my backups for this system to see if the file had previously been created during shutdown, and it had: # ls -l /backup/pooh/?/Root/shutdown-log.txt -rw-r--r--. 1 root root 87488 Apr 9 01:29 /backup/pooh/0/Root/shutdown-log.txt -rw-r--r--. 1 root root 131855 Apr 8 03:31 /backup/pooh/1/Root/shutdown-log.txt -rw-r--r--. 1 root root 112878 Apr 7 00:35 /backup/pooh/2/Root/shutdown-log.txt -rw-r--r--. 1 root root 124768 Apr 6 03:01 /backup/pooh/3/Root/shutdown-log.txt -rw-r--r--. 1 root root 100181 Apr 5 03:52 /backup/pooh/4/Root/shutdown-log.txt -rw-r--r--. 1 root root 84481 Apr 4 02:36 /backup/pooh/5/Root/shutdown-log.txt -rw-r--r--. 1 root root 89152 Apr 3 02:22 /backup/pooh/6/Root/shutdown-log.txt This system is up to date with all installed Fedora 18 packages, so I do not know why I did not see the /shutdown-log.txt being created during earlier shutdowns. I have attached the latest picture of the last screen during shutdown and the last /shutdown-log.txt file as well. I hope this helps. Bill
Created attachment 733335 [details] Last dmesg file created 9 April 2013 during an unseccessful reboot
Created attachment 733336 [details] Last screen of a failed reboot 9 April 2013
Hmm, indeed to those drop-ins are actually not executed with a console initialized, so you didn't get any output. If the dmesg file did get generated however this is an indication that the hang happens later than the execution of the drop-in scripts, which points to dracut hence. Reassigning.
Does the file "/run/initramfs/.need_shutdown" exist? If no, then it's a systemd bug. If yes: Please do: # mkdir -p /run/initramfs/etc/cmdline.d # echo "rd.debug rd.shell rd.break=pre-shutdown" > /run/initramfs/etc/cmdline.d/debug.conf and see, if you dropped into a dracut shell you can exit, and see it progressing Might want to take a screenshot, when it hangs
> Does the file "/run/initramfs/.need_shutdown" exist? > If no, then it's a systemd bug. > If yes: > Please do: > # mkdir -p /run/initramfs/etc/cmdline.d > # echo "rd.debug rd.shell rd.break=pre-shutdown" > /run/initramfs/etc/cmdline.d/debug.conf > and see, if you dropped into a dracut shell > you can exit, and see it progressing > Might want to take a screenshot, when it hangs the "/run/initramfs/.need_shutdown" file does indeed exist. So, I ran the follow list of commands four times with a reboot each time as instructed and received some variable results: [root@pooh:pts/8] Wed Apr 10 14:59:59 EDT (13100) ~ # ls -l /run/initramfs/.need_shutdown -rw-r--r--. 1 root root 0 Apr 10 14:56 /run/initramfs/.need_shutdown [root@pooh:pts/8] Wed Apr 10 15:00:05 EDT (13100) ~ # mkdir -vp /run/initramfs/etc/cmdline.d mkdir: created directory '/run/initramfs/etc' mkdir: created directory '/run/initramfs/etc/cmdline.d' [root@pooh:pts/8] Wed Apr 10 15:00:50 EDT (13100) ~ # echo "rd.debug rd.shell rd.break=pre-shutdown" > /run/initramfs/etc/cmdline.d/debug.conf [root@pooh:pts/8] Wed Apr 10 15:01:08 EDT (13100) ~ # cat /run/initramfs/etc/cmdline.d/debug.conf rd.debug rd.shell rd.break=pre-shutdown [root@pooh:pts/8] Wed Apr 10 15:01:17 EDT (13100) ~ The first reboot hit a system hang (imgp3184.jpg). No dracut shell and no panic information. I waited ten minutes before I did a reset on the system. (shutdown-log.txt-130410-133138) The second reboot produced the usual panic with no dracut shell interaction. (imgp3187.jpg, shutdown-log.txt-130410-145651) The third reboot did have a dracut shell, which I exited from. The system continued through a normal reboot process with no panics. The problem as I am seeing it appears to have a time or resource element. If you leave the system running for awhile running X and the FVWM window manager and some other applications, the system will panic during shutdown. If you have a short session running the same X setup, the system may shutdown normally with no apparent problems. (shutdown-log.txt-130410-151452) The fourth time rebooting produced the same results as the second reboot: the usual panic with no dracut shell. (imgp3188.jpg, shutdown-log.txt-130410-153408) Bill
Created attachment 733864 [details] First dracut reboot test screen
Created attachment 733865 [details] First dracut reboot dmesg file
Created attachment 733866 [details] Second dracut reboot test screen
Created attachment 733867 [details] Second dracut reboot dmesg file
Created attachment 733868 [details] Third dracut reboot dmesg file
Created attachment 733869 [details] Fourth dracut reboot test screen
Created attachment 733870 [details] Fourth dracut reboot dmesg file
Are you using raid? Could you attach /etc/fstab?
> Jóhann B. Guðmundsson 2013-04-10 16:24:47 EDT > Are you using raid? > Could you attach /etc/fstab? Yes, I am running raid: two sets of disks running RAID1 and two set of disks running RAID5. Lately I have been un-mounting the two RAID5 disks, /test and /backup before doing the reboots to avoid a possible fsck run during the following boot. Even though the file systems are EXT4, fsck can still take quite awhile to run.
Created attachment 733883 [details] /etc/fstab file from the test system Pooh.
"EXT4-fs (dm-1): re-mounted. Opts: (null)" What's your RAID device on dm-1 as in what's it composed of (i.e., what are the underlying devices). ?
> Jóhann B. Guðmundsson 2013-04-10 16:58:48 EDT > "EXT4-fs (dm-1): re-mounted. Opts: (null)" > What's your RAID device on dm-1 as in what's it composed of (i.e., what are the underlying devices). ? dm-1 translates as the logical volume /dev/vg0/lv1 (/ and /usr) which is built on raid device /dev/md1. md1 is a RAID1 device running on /dev/sdf2 and /dev/sdg2. This raid partition using LVM2 holds the swap, /, /usr/local, /var, /tmp, /opt, /home and /work file systems. In other words all of the logical volumes mounted on the system. If you think that this may a raid issue, I think I can disabuse you. I had a second test system set up running Fedora-18 with different x86_64 hardware, a Dell GX-620, and a very similar software set-up but with no raid disks. I was able to produce the same panic information on this second system. See Comment 4 above. I did not in that comment say if this Dell had any raid disks, which it did not. Bill
It's my firm suspicion that this is either disk setup or network related so walk through your partitions in /etc/fstab an one by one disable them and see if you can find what triggers it You can start with /usr/local ( or manually unmount it before shutdown ) next up disable your nfs mounts and see if still happens. Was the other system local disks only or nfs as and same partition setup?
Created attachment 733939 [details] Dmesg file for panic testing of file systems
Hmmmm.... I think I may have found a reason for my problem and also how to resolve this problem. I am still testing to make sure of the reason and the solution that I have applied. More information later.... Bill
Okay, credit to Jóhann for pointing out that the problem might be related to the disk setup and suggesting trying to narrow the problem down to one of the filesystems. If I umount the /boot filesystem before performing the shutdown, either reboot or poweroff, the systems do not panic during the shutdown. If I do not umount /boot, the systems generally will panic, if enough uptime has accumulated before the shutdown. The amount of accumulated time required to cause the panic seems to vary based on how busy is the system. I generally have used the systems as I normally would for about an hour before trying to do the reboots. As to why this panic happens, I am unable at this time to explain. So here, at least, is the explanation of the filesystems, raid devices and partition layouts that I use on my systems. First, there two servers and one workstation. The workstation is a home built 64-bit SMP (4 cores) AMD system. The two servers are Dell GX620s using 64-bit Pentium-D (2 cores) processors. One of them runs as my mail and DNS server. The other is used in audio production work. The file system layout I use is the same, except for some minor variations in sizing, on all three systems. The workstation and the audio system are running Fedora-18. The mail/DNS server is running Fedora-17. The software on all three systems is kept up to date. The mail/DNS server running Fedora-17 has never experienced this problem. I only include it so as to be able to say that the layout of its filesystems is identical, except for sizing differences, to the other two systems. Its two 80 GB raid hard drives are smaller as compared to the 300 GB hard drives used in the other two systems. It also does not have the /work filesystem of the other two systems. All three systems are running software raid. The two main disks on all systems are running RAID1 with each identical physical disk having two primary partitions. The first ones are setup for the raid device /dev/md0 which is used solely for the /boot filesystem. The second partitions use the remainder of the disk drives to form the /dev/md1 raid device. The md1 device is than subdivided using LVM2 for the swap, /(root), /usr/local, /var, /tmp, /opt, /home and /work file systems. The workstation also has two RAID5 devices that are used only for data storage. As to whether they have any affect this problem, I do not believe so, since they seem to have had no effect on this problem mounted or unmounted, assembled or not. This is the list of filesystems as setup on the workstation, All filesystems on all systems are formatted ext4, except for /boot, which is formatted ext3: Filesystem 1K-blocks Used Available Use% Mounted on /dev/md0 372603 98802 254141 28% /boot /dev/mapper/vg0-lv0 18874364 -- -- - (swap) /dev/mapper/vg0-lv1 18447100 10345044 7158340 60% / /dev/mapper/vg0-lv2 1998672 815444 1078372 44% /usr/local /dev/mapper/vg0-lv3 6061632 4145288 1601772 73% /var /dev/mapper/vg0-lv4 3997376 11272 3776392 1% /tmp /dev/mapper/vg0-lv5 1998672 177192 1716624 10% /opt /dev/mapper/vg0-lv6 16382888 13057576 2486452 85% /home /dev/mapper/vg0-lv7 219774772 105965436 102638728 51% /work /proc/mdstat (without the RAID5 devices) md0 : active raid1 sdf1[0] sdg1[1] 393204 blocks super 1.0 [2/2] [UU] md1 : active raid1 sdf2[0] sdg2[1] 292640632 blocks super 1.2 [2/2] [UU] bitmap: 1/3 pages [4KB], 65536KB chunk Bill
If you umount /boot, then the service dracut-shutdown.service cannot unpack /boot/initramfs-$(uname -r).img to /run/initramfs and thus systemd does not call /run/initramfs/shutdown and the whole shutdown procedure mentioned in comment 10 does not kick in.
> Harald Hoyer 2013-04-15 02:52:11 EDT > > If you umount /boot, then the service dracut-shutdown.service cannot unpack /boot/initramfs-$(uname -r).img to /run/initramfs and thus systemd does not call /run/initramfs/shutdown and the whole shutdown procedure mentioned in comment 10 does not kick in. Okay... So I go back and umount all the filesystems that I can before doing the shutdown: /boot /(root) /var These few above listed filesystems are all that are mounted and I had to go to run level three to get that far. I also had run Harald's suggested actions in Comment 10 to see if the shutdown would drop into the dracut shell, which it did not. The shutdown in this instance halted at "Fail to acquire terminal: No such file or directory" and the keyboard lights started flashing. There were no panic numbers displayed. The /shutdown-log.txt contained no information this late in the shutdown. The init 3 action may have kill putting any additional information in the dmesg. The last time entry in the /shutdown-log.txt was "218.637906". The last time entry displayed during shutdown was "3582.193786". Bill
Since Jóhann made the suggestion in Comment 24 that the issue might be a filesystem or network related problem, I have been pursuing a process of finding out if a mounted filesystem might be involved with the problem. Using a process of divide and conquer (Ha!), I have determined that if I umount the /boot filesystem before doing the shutdown, that the panic during shutdown does not happen. Harald in Comment 28 says that in umounting /boot before the shutdown prevents the call to /run/initramfs/shutdown and the dracut shell from happening. So, I decided to find out whether the issue causing the shutdown panic was the /boot filesystem structure or something in its contents. I umounted /boot and mounted /dev/md0 on /mnt and than rsynced all of the contents of /mnt to the directory called boot on the /(root) filesystem. I than umounted /mnt and ran the commands that Harald gave us in Comment 10. I left /boot umounted and performed a reboot. During the shutdown, the system did panic just as it had before when /boot was normally mounted. No dracut shell happened. Bill
Ok, then let's do one more check: # systemctl start dracut-shutdown.service # for i in proc sys dev run; do mount --bind /$i /run/initramfs/$i;done # chroot /run/initramfs # [ -x ./shutdown ] && echo OK OK # exit # for i in proc sys dev run; do umount /run/initramfs/$i;done If this all works, I don't have an idea and we need more debug data You might want to boot with: "systemd.log_level=debug systemd.log_target=kmsg loglevel=7 rd.debug plymouth.enable=0" and without "quiet rhgb" on the kernel command line. Then shutdown.
Okay... Doing the sequence given by Harald in Comment 31. [root@pooh:pts/3] Tue Apr 16 08:39:30 EDT (13106) ~ # systemctl start dracut-shutdown.service [root@pooh:pts/3] Tue Apr 16 09:17:33 EDT (13106) ~ # for i in proc sys dev run; do mount --bind /$i /run/initramfs/$i; done [root@pooh:pts/3] Tue Apr 16 09:17:54 EDT (13106) ~ # chroot /run/initramfs bash: tty: command not found bash: cut: command not found bash: date: command not found [root@pooh:] / # [ -x ./shutdown ] && echo OK OK bash: tty: command not found bash: cut: command not found bash: date: command not found [root@pooh:] / # exit exit [root@pooh:pts/3] Tue Apr 16 09:19:30 EDT (13106) ~ # for i in proc sys dev run; do umount /run/initramfs/$i; done [root@pooh:pts/3] Tue Apr 16 09:19:51 EDT (13106) ~ I got an "OK", so I assume there is no joy there... Alright, for more debug data, we added some addition text to the kernel line in /etc/grub2/grub.cfg as requested: linux /vmlinuz-3.8.7-201.fc18.x86_64 root=/dev/mapper/vg0-lv1 ro LANG=en_US.UTF-8 KEYTABLE=us SYSFONT=True rd.md.uuid=767469e8:7a9835b2:e92c29b8:4d4e0bd7 rd.luks=0 rd.dm=0 rd.md.uuid=eab643ef:757c0e0d:42209589:19541488 rd.lvm.lv=vg0/lv1 rd.lvm.lv=vg0/lv0 systemd.log_level=debug systemd.log_target=kmsg loglevel=7 rd.debug plymouth.enable=0 I never use the usual "quiet rhgb" tail. I also included the dracut shell additions from Comment 10. I than rebooted, and the system paniced during the shutdown process. There was no dracut shell. Two new files are attacted. Bill
Created attachment 736543 [details] Screenshot of shutdown panic for Comment 32
Created attachment 736544 [details] Dmesg debug text for Comment 32
[ 3472.041233] EXT4-fs (dm-3): re-mounted. Opts: (null) [ 3472.576255] EXT4-fs (dm-1): re-mounted. Opts: (null) [ 3472.614549] EXT4-fs (dm-1): re-mounted. Opts: (null) [ 3472.623927] EXT4-fs (dm-1): re-mounted. Opts: (null) [ 3472.837573] EXT4-fs (dm-1): re-mounted. Opts: (null) Well to me this points to the problem being dm-3 and dm-1 maybe Harald got some magic insight and I'm not so sure this is the right combo for dmraid to work ( but then again the whole activation/tear-down logic behind it never did sit right with me ) root=/dev/mapper/vg0-lv1 rd.md.uuid=767469e8:7a9835b2:e92c29b8:4d4e0bd7 rd.md.uuid=eab643ef:757c0e0d:42209589:19541488 rd.lvm.lv=vg0/lv1 rd.lvm.lv=vg0/lv0 In anycase before we get the dmraid experts on this run this and paste the output from this 1. # dmraid -r 2. # dmsetup status 3. # dmsetup deps 4. # ls -al /dev/mapper 5. # cat /proc/mdstat If nothing screams at us then we can move it to them and they throw it back at us if the entire setup is legit
Regarding Comment 35: It better be legit! This is pretty much the way I have always setup my filesystems since I started using software raid. When I used Fedora-15 and older versions, I used to embed /boot in /(root) on a raid device using the first partitions of the disks. /usr, /usr/local, /opt, /var, /tmp and /home filesystems were put LVM logical volumes on a second raid device made up of the second partitions of those same disks. It worked for me. I had to rethink this layout when /usr was bundled into the /(root) filesystem. 1. # dmraid -r no raid disks (Raid disks are built and assembled using mdadm) 2. # dmsetup status vg0-lv7: 0 446824448 linear vg0-lv6: 0 33554432 linear vg0-lv5: 0 4194304 linear vg0-lv4: 0 8388608 linear vg0-lv3: 0 12582912 linear vg0-lv2: 0 4194304 linear vg0-lv1: 0 37748736 linear vg0-lv0: 0 37748736 linear 3. # dmsetup deps vg0-lv7: 1 dependencies : (9, 1) vg0-lv6: 1 dependencies : (9, 1) vg0-lv5: 1 dependencies : (9, 1) vg0-lv4: 1 dependencies : (9, 1) vg0-lv3: 1 dependencies : (9, 1) vg0-lv2: 1 dependencies : (9, 1) vg0-lv1: 1 dependencies : (9, 1) vg0-lv0: 1 dependencies : (9, 1) 4. # ls -al /dev/mapper total 0 drwxr-xr-x. 2 root root 220 Apr 16 16:16 . drwxr-xr-x. 21 root root 4360 Apr 16 17:47 .. crw-------. 1 root root 10, 236 Apr 16 16:16 control lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv0 -> ../dm-0 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv1 -> ../dm-1 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv2 -> ../dm-2 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv3 -> ../dm-3 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv4 -> ../dm-6 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv5 -> ../dm-7 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv6 -> ../dm-4 lrwxrwxrwx. 1 root root 7 Apr 16 16:16 vg0-lv7 -> ../dm-5 5. # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid5 sdi2[4] sdj2[3] sdk2[0] sdl2[1] 732169728 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] md2 : active raid5 sdi1[1] sdh1[0] sdj1[5] sdl1[3] sdk1[2] 2929768448 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] md1 : active raid1 sdf2[0] sdg2[1] 292640632 blocks super 1.2 [2/2] [UU] bitmap: 1/3 pages [4KB], 65536KB chunk md0 : active raid1 sdg1[1] sdf1[0] 393204 blocks super 1.0 [2/2] [UU] ----- Bill
What I suspect is happening here EXT4-fs (dm-X): re-mounted. Opts: (null) is that sync isn't returning because some process is still writing to the file system and the house of (dm)raids cards comes crumbling down on shutdown... Which explains why "The system will reboot/poweroff everytime if I type in "sync && reboot" or "sync && poweroff"" works thus we just need to figure out what's writing to dm-3/dm-1 and we can potential find out a race condition Now let's look at this starting from the kernel cmd what happens if you just remove "rd.lvm.lv=vg0/lv0" which according to your fstab is swap dm-0 == dev/mapper/vg0-lv0 swap ( might be confusing shutdown logic ) dm-1 == /dev/mapper/vg0-lv1 / dm-3 == /dev/mapper/vg0-lv3 /var Have you tried it with the latest systemd release 1. http://koji.fedoraproject.org/koji/buildinfo?buildID=411809 ( might be the journal is still writing something or potental issue between fstab and tmp.mount since in F18 we default to have /tmp on /tmpfs ) tmp on tmpfs should kick in if you hash out the /tmp from /etc/fstab or visa versa you can rule out tmp.mount by running systemctl mask tmp.mount And further could you run filesystem check on everything and restorecon -v -R on everything as well or literally disable selinux altogether ( just to rule potential hd failure or corruption and selinux altogether or a combination of selinux reporting even in permissive mode and journal logging )
> Jóhann B. Guðmundsson 2013-04-16 18:54:57 EDT > What I suspect is happening here EXT4-fs (dm-X): re-mounted. Opts: (null) is that sync isn't returning because some process is still writing to the file system and the house of (dm)raids cards comes crumbling down on shutdown... > Which explains why "The system will reboot/poweroff everytime if I type in "sync && reboot" or "sync && poweroff"" works thus we just need to figure out what's writing to dm-3/dm-1 and we can potential find out a race condition I had not before tried the "sync && reboot" and leaving /boot mounted to see what might happen to the shutdown process. I tried it tonight and it worked with a very long pause in the middle of the process. > Now let's look at this starting from the kernel cmd what happens if you just remove "rd.lvm.lv=vg0/lv0" which according to your fstab is swap Yeah, I spotted that issue this afternoon and remove the "rd.lvm.lv=vg0/lv0" from the kernel boot command line. Doing so did not seem to charge the characteristics of the problem: the system panics if /boot is mounted, or it does not if /boot is umounted before dhutdown. > dm-0 == dev/mapper/vg0-lv0 swap > ( might be confusing shutdown logic ) > dm-1 == /dev/mapper/vg0-lv1 / > dm-3 == /dev/mapper/vg0-lv3 /var > Have you tried it with the latest systemd release No, I have not. I will download and recompile it to see if it changes the problem. I will report on this later. > 1. http://koji.fedoraproject.org/koji/buildinfo?buildID=411809 ( might be the journal is still writing something or potental issue between fstab and tmp.mount since in F18 we default to have /tmp on /tmpfs ) > tmp on tmpfs should kick in if you hash out the /tmp from /etc/fstab or visa versa you can rule out tmp.mount by running systemctl mask tmp.mount I would rather keep /tmp as a real filesystem separate from /var/tmp. There is always some transient history stored there that I might want to look at on the next system boot. > And further could you run filesystem check on everything and restorecon -v -R on everything as well or literally disable selinux altogether ( just to rule potential hd failure or corruption and selinux altogether or a combination of selinux reporting even in permissive mode and journal logging ) I have already done fsck on all file systems and found nothing wrong, and I do NOT want to do a restorecon. The system is running in selinux permissive for the duration of this problem and that should be enough. Disabling selinux completely is a possibility. Also, SMART on all of drives shows no problems and the SATA driver is showing no errors or warnings in the logs, except for a eSATA connected disk that is now shut off and disconnected. Sometimes after a panic during shutdown event and than rebooting, the RAID device /dev/md1 will do a rsync. Why? Did the RAID device not get shutdown correctly during shutdown? I might try disabling selinux if only for the reason that every time it performs a relabel operation, it takes 3-1/2 hours to complete it. That makes for very long boot up times.
What I suspect, is that /lib/systemd/systemd-shutdown exits because of an error. Now, if only we could get it to log things.
As soon as he does what we ask him to and disables selinux we will get that shutdown log [15099.484240] type=1400 audit(1365531957.255:6332): avc: denied { write } for pid=15035 comm="dmesg" path="/shutdown-log.txt" dev="dm-1" ino=18540 scontext=system_u:system_r:dmesg_t:s0 tcontext=system_u:object_r:root_t:s0 tclass=file And he will need to disable selinux entirely. "I will download and recompile it to see if it changes the problem." and this indicates this is not a stock install. Heck if this is an upgrade from ( or even earlier ) F16 --> F17 --> F18 which includes the "usermove" we might be dealing with all kinds upgrade crap due to that. If this is an clean F17 install that got upgraded to F18 he should not be experience it
> As soon as he does what we ask him to and disables selinux we will get that shutdown log Jóhann, are you saying that all of those plain-text files that I have attaching to this report are not the shutdown log files that you have wanted? See Comment 6. I am running selinux in permissive mode. It is that not enough? This system is running a standard install of Fedora-17 that was updated to Fedora-18 by doing a yum upgrade. The non-standard install of a pre-release version of systemd was done as requested earlier. See Jóhann's suggestion in Comment 37. I am returning systemd back to the standard update version (197-1) because the pre-release version does not appear to change any characteristics of the problem. I have also disabled Selinux as requested earlier and rebooted. # cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of these two values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted ---- Bill
Okay, I reinstall the current version of systemd and libgudev1: # rpm -qa | grep systemd systemd-libs-197-1.fc18.2.i686 systemd-libs-197-1.fc18.2.x86_64 systemd-python-197-1.fc18.2.x86_64 systemd-devel-197-1.fc18.2.x86_64 wine-systemd-1.5.24-1.fc18.noarch systemd-197-1.fc18.2.x86_64 systemd-sysv-197-1.fc18.2.x86_64 # rpm -qa | grep libgudev1 libgudev1-197-1.fc18.2.x86_64 I have disabled selinux: # sestatus SELinux status: disabled I have done a reboot with panic and attached a new screenshot and shutdown-log.txt files. I am seeing in the shutdown-log.txt near the end of the file the following when the the system panics during shutdown: [ 386.631470] systemd[1]: Shutting down. [ 386.753949] systemd-journald[500]: Received SIGTERM [ 387.005515] EXT4-fs (dm-2): re-mounted. Opts: (null) [ 387.058928] EXT4-fs (md0): re-mounted. Opts: (null) [ 387.150675] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 387.186441] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 387.194849] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 387.419315] EXT4-fs (dm-0): re-mounted. Opts: (null) If I boot the system, login into X, quickly run a few apps and that reboot without getting a shutdown panic, I see this: [ 106.483553] systemd[1]: Shutting down. [ 106.604738] systemd-journald[509]: Received SIGTERM [ 107.002091] EXT4-fs (dm-2): re-mounted. Opts: (null) [ 107.111747] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 107.134482] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 107.136614] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 107.342259] EXT4-fs (dm-0): re-mounted. Opts: (null) In both of these cases, I did not umount /boot before each of these shutdowns. I wonder if the md0 remounted line in the first instance indicates anything? Bill
Created attachment 736937 [details] Screenshot of shutdown panic after selinux disabled
Created attachment 736938 [details] shutdown-log.txt file after selinux disabled
Can you reboot if you stop and disable nfs and bind services ?
> Jóhann B. Guðmundsson 2013-04-17 13:09:26 EDT > Can you reboot if you stop and disable nfs and bind services ? # systemctl stop named-chroot.service # systemctl stop nfs.service # systemctl stop nfs-server.service # systemctl disable named-chroot.service rm '/etc/systemd/system/multi-user.target.wants/named-chroot.service' # systemctl disable nfs-server.service rm '/etc/systemd/system/multi-user.target.wants/nfs-server.service' I than rebooted twice: once immediately after stopping and disabling nfs and named, a second time from a session in which named and nfs were not started at boot time. Both paniced in the same way. Bill
Created attachment 736993 [details] Screenshot of shutdown panic for Comment 46
Created attachment 736994 [details] shutdown-log.txt for Comment 46
/var gets cleanly unmounted now which I do believe indicates wrong ordering dependency in the nfs ( a bug ) and we no longer see "EXT4-fs (dm-2): re-mounted. Opts: (null)" so you can enable bind again. Which leaves us with /boot [ 1651.670223] systemd[1]: Forked /bin/umount as 3770 ... [ 1651.793018] systemd[1]: Got SIGCHLD for process 3770 (umount) [ 1651.793170] systemd[1]: Child 3770 died (code=exited, status=32/n/a) [ 1651.793180] systemd[1]: Child 3770 belongs to boot.mount [ 1651.793201] systemd[1]: boot.mount mount process exited, code=exited status=32 [ 1651.801284] systemd[1]: boot.mount changed unmounting -> mounted [ 1651.801295] systemd[1]: Job boot.mount/stop finished, result=failed [ 1651.801338] systemd[1]: Failed unmounting /boot. Now another mystery is this somehow it is determined that dm-3 is swap [ 384.265907] systemd[1]: Deactivating swap /dev/dm-3... [ 384.265985] systemd[1]: About to execute /sbin/swapoff /dev/dm-3 [ 384.266347] systemd[1]: Forked /sbin/swapoff as 2856 [ 384.276856] systemd[1]: dev-dm\x2d3.swap changed active -> deactivating [ 384.276874] systemd[1]: Redirecting stop request from dev-vg0-lv0.swap to dev-dm\x2d3.swap. [ 384.276883] systemd[1]: Deactivating swap /dev/dm-3... [ 384.276914] systemd[1]: Redirecting stop request from dev-disk-by\x2did-dm\x2dname\x2dvg0\x2dlv0.swap to dev-dm\x2d3.swap. [ 384.276920] systemd[1]: Deactivating swap /dev/dm-3... [ 384.276933] systemd[1]: Redirecting stop request from dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dKyaatOWRtwFoKczvyHtyglVZdJiFRGVYEkTgyI6sZklKppsJiU80PukfE8PyBToj.swap to dev-dm\x2d3.swap. [ 384.276939] systemd[1]: Deactivating swap /dev/dm-3... [ 384.276950] systemd[1]: Redirecting stop request from dev-disk-by\x2duuid-271d7447\x2db92d\x2d4d76\x2d8e3f\x2d258ba1a7b7d2.swap to dev-dm\x2d3.swap. [ 384.276956] systemd[1]: Deactivating swap /dev/dm-3... Now dev/dm-3 == /dev/mapper/vg0-lv3 and the fstab entry for that is /dev/mapper/vg0-lv3 /var ext4 defaults 1 2 so I'm not so sure why dm-3 instead of dm-0 suddenly has become the swap partition but let's remove the swap entry from the kernel command line and switch to UUID entries in /etc/fstab you can find those via ls -alh /dev/disk/by-uuid/ and provide the shutdown.log. Also let's add to the mix the contents of he contents of /proc/mdstat , /etc/mdadm.conf and mdadm -E on a constituent element of each md raid array on the system to have a better look mdraid.
Btw there is no need for you to post images of the same kernel panic over and over again. It's just a standard panic when it kills the init ( pid 1 ) and yes I agree that kernel should handle that more gracefully
> Jóhann B. Guðmundsson 2013-04-17 15:42:33 EDT > Btw there is no need for you to post images of the same kernel panic over and > over again. It's just a standard panic when it kills the init ( pid 1 ) and > yes I agree that kernel should handle that more gracefully Jóhann, Thank you telling me not to post any more images. I could see that they were duplicates, but I did not know if the lack of change in the numbers represented real information. I will not post any more images for this report unless asked to do so, or if real changed information can be shown. Bill
Jóhann, It turns out that dm-3 really is the swap partition. Here is a portion of the list "ls -lah /dev/disk/by-id": lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv0 -> ../../dm-3 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv1 -> ../../dm-0 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv2 -> ../../dm-1 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv3 -> ../../dm-2 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv4 -> ../../dm-6 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv5 -> ../../dm-7 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv6 -> ../../dm-4 lrwxrwxrwx 1 root root 10 Apr 17 17:46 dm-name-vg0-lv7 -> ../../dm-5 As you can see. the logical volume names do not line up with the dm numbering scheme. So, dm-3 is the same as /dev/vg0/lv0 which is the swap partition. The /(root) filesystem is dm-0 or /dev/vg0/lv1. Now I did not know if you wanted the /etc/fstab modified to replace all entries with logical volume names with UUID numbers. or just to do swap. Well, I wanted to see if it would work, So I replaced all vg0-lvx entries with UUIDs. This was the original /etc/fstab file: # # /etc/fstab # UUID=c8830ea0-64f9-4999-8d57-d6d52e09f8d4 /boot ext3 defaults 1 2 /dev/mapper/vg0-lv0 swap swap defaults 0 0 /dev/mapper/vg0-lv1 / ext4 defaults 1 1 /dev/mapper/vg0-lv2 /usr/local ext4 defaults 1 2 /dev/mapper/vg0-lv3 /var ext4 defaults 1 2 /dev/mapper/vg0-lv4 /tmp ext4 defaults 1 2 /dev/mapper/vg0-lv5 /opt ext4 defaults 1 2 /dev/mapper/vg0-lv6 /home ext4 defaults 1 2 /dev/mapper/vg0-lv7 /work ext4 defaults 1 2 UUID=7dd0bfab-7983-488e-9163-6c442701cda7 /backup ext4 defaults 1 2 UUID=e70bd9e6-3245-4dd5-898c-475ad2e7142c /test ext4 defaults 1 2 # LABEL=/backup15 /backup15 auto noauto,owner,noatime,barrier=1 0 0 LABEL=/backup20 /backup20 auto noauto,owner,noatime,barrier=1 0 0 /dev/sr0 /cd0 auto noauto,owner,user,ro 0 0 /dev/sr1 /cd1 auto noauto,owner,user,ro 0 0 /dev/fd0 /fd0 auto noauto,owner,user,noatime 0 0 /backup/dvdfs /dvdfs auto noauto,loop,owner,noatime 0 0 /backup/dvdfs-dl /dvdfs-dl auto noauto,loop,owner,noatime 0 0 This is the new fstab file with UUIDs: # # /etc/fstab # UUID=c8830ea0-64f9-4999-8d57-d6d52e09f8d4 /boot ext3 defaults 1 2 UUID=271d7447-b92d-4d76-8e3f-258ba1a7b7d2 swap swap defaults 0 0 UUID=57a6fb28-c4a2-4430-b64a-dcbd6c87a4ad / ext4 defaults 1 1 UUID=25aae70b-bfa8-42bf-9265-eef723cdeec3 /usr/local ext4 defaults 1 2 UUID=8d1acf0d-0327-4711-a5f6-4ada8bb02124 /var ext4 defaults 1 2 UUID=5b6d71b7-2dee-4b4b-ae0a-1ed7c623c9ed /tmp ext4 defaults 1 2 UUID=7338b671-fa46-4a95-ae6c-1e8900f75ce8 /opt ext4 defaults 1 2 UUID=df485586-0e0d-4204-904e-0e62634b8fad /home ext4 defaults 1 2 UUID=537658bb-5d08-466c-874b-4bb4a57c3931 /work ext4 defaults 1 2 UUID=7dd0bfab-7983-488e-9163-6c442701cda7 /backup ext4 defaults 1 2 UUID=e70bd9e6-3245-4dd5-898c-475ad2e7142c /test ext4 defaults 1 2 # LABEL=/backup15 /backup15 auto noauto,owner,noatime,barrier=1 0 0 LABEL=/backup20 /backup20 auto noauto,owner,noatime,barrier=1 0 0 /dev/sr0 /cd0 auto noauto,owner,user,ro 0 0 /dev/sr1 /cd1 auto noauto,owner,user,ro 0 0 /dev/fd0 /fd0 auto noauto,owner,user,noatime 0 0 /backup/dvdfs /dvdfs auto noauto,loop,owner,noatime 0 0 /backup/dvdfs-dl /dvdfs-dl auto noauto,loop,owner,noatime 0 0 Not very readable, but it does mount everything in the correct places. Now for additional information. The entry for swap in the kernel boot line "rd.lvm.lv=vg0/lv0" had already been removed from the /etc/default/grub and /boot/grub2/grub.cfg files. # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md2 : active raid5 sdk1[2] sdi1[1] sdj1[5] sdh1[0] sdl1[3] 2929768448 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] md3 : active raid5 sdj2[3] sdi2[4] sdl2[1] sdk2[0] 732169728 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] md1 : active raid1 sdf2[0] sdg2[1] 292640632 blocks super 1.2 [2/2] [UU] bitmap: 2/3 pages [8KB], 65536KB chunk md0 : active raid1 sdf1[0] sdg1[1] 393204 blocks super 1.0 [2/2] [UU] unused devices: <none> # cat /etc/mdadm.conf MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md0 level=raid1 num-devices=2 UID=767469e8:7a9835b2:e92c29b8:4d4e0bd7 ARRAY /dev/md1 level=raid1 num-devices=2 UID=eab643ef:757c0e0d:42209589:19541488 ARRAY /dev/md2 metadata=1.2 name=pooh.grnwood.net:2 UUID=75a8f389:a777b702:06cc82ca:929cdd4c ARRAY /dev/md3 metadata=1.2 name=pooh.grnwood.net:4 UUID=20bebd3f:3eb5ccae:9ccc1d36:b98a877f # mdadm -E /dev/sdf1 /dev/sdf1: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 767469e8:7a9835b2:e92c29b8:4d4e0bd7 Name : pooh.grnwood.net:0 (local to host pooh.grnwood.net) Creation Time : Fri Jan 25 19:40:26 2013 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 786408 (384.05 MiB 402.64 MB) Array Size : 393204 (384.05 MiB 402.64 MB) Super Offset : 786416 sectors State : clean Device UUID : 5e768690:884c6f06:ccf2b8db:815ac8f2 Update Time : Wed Apr 17 18:22:27 2013 Checksum : 8a05136c - correct Events : 112 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) # mdadm -E /dev/sdg2 /dev/sdg2: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : eab643ef:757c0e0d:42209589:19541488 Name : pooh.grnwood.net:1 (local to host pooh.grnwood.net) Creation Time : Fri Jan 25 19:40:07 2013 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 585281536 (279.08 GiB 299.66 GB) Array Size : 292640632 (279.08 GiB 299.66 GB) Used Dev Size : 585281264 (279.08 GiB 299.66 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 04f869de:63f95416:1d472a2f:e1763d3c Internal Bitmap : 8 sectors from superblock Update Time : Wed Apr 17 18:28:32 2013 Checksum : ae7e707e - correct Events : 13602 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) # mdadm -E /dev/sdh1 /dev/sdh1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 75a8f389:a777b702:06cc82ca:929cdd4c Name : pooh.grnwood.net:2 (local to host pooh.grnwood.net) Creation Time : Thu Jan 17 10:00:05 2013 Raid Level : raid5 Raid Devices : 5 Avail Dev Size : 1464884976 (698.51 GiB 750.02 GB) Array Size : 2929768448 (2794.04 GiB 3000.08 GB) Used Dev Size : 1464884224 (698.51 GiB 750.02 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 8e1e6f2e:297b747d:e823f3db:a1921c64 Update Time : Wed Apr 17 17:46:32 2013 Checksum : 39b97f3f - correct Events : 34646 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAAA ('A' == active, '.' == missing) # mdadm -E /dev/sdl2 /dev/sdl2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 20bebd3f:3eb5ccae:9ccc1d36:b98a877f Name : pooh.grnwood.net:4 (local to host pooh.grnwood.net) Creation Time : Wed Jan 16 07:49:52 2013 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 488113856 (232.75 GiB 249.91 GB) Array Size : 732169728 (698.25 GiB 749.74 GB) Used Dev Size : 488113152 (232.75 GiB 249.91 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : 4108a158:496763d8:0bb5f3e7:5c4cbc82 Update Time : Wed Apr 17 17:46:31 2013 Checksum : cc4eb5d2 - correct Events : 10597 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAA ('A' == active, '.' == missing) ---- Bill
Created attachment 737019 [details] ls -lah /dev/disk/by-id for Comment 52. Included for completeness.
Created attachment 737020 [details] shutdown-log.txt written after UUID mods to /etc/fstab & reboot for Comment 52
Can you take a screenshot of the shutdown screen, if you boot with "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel command line?
> Harald Hoyer 2013-04-18 07:26:12 EDT > Can you take a screenshot of the shutdown screen, if you boot with > "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel > command line? The attachments associated with Comment 46 already include this information along with "rd.debug plymouth.enable=0". I have been told that all of the additional screenshots are redundant and basically contain the same information. Bill
Without knowing what's failing we cant fix it ;( Had you already followed this ( debug shell/serial console ) http://freedesktop.org/wiki/Software/systemd/Debugging#Shutdown_Never_Finishes
(In reply to comment #56) > > Harald Hoyer 2013-04-18 07:26:12 EDT > > > Can you take a screenshot of the shutdown screen, if you boot with > > "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel > > command line? > > The attachments associated with Comment 46 already include this information > along with "rd.debug plymouth.enable=0". I have been told that all of the > additional screenshots are redundant and basically contain the same > information. > > Bill are you sure, this had "systemd.log_target=kmsg loglevel=7" ?
(In reply to comment #58) > (In reply to comment #56) > > > Harald Hoyer 2013-04-18 07:26:12 EDT > > > > > Can you take a screenshot of the shutdown screen, if you boot with > > > "systemd.log_level=debug systemd.log_target=kmsg loglevel=7" on the kernel > > > command line? > > > > The attachments associated with Comment 46 already include this information > > along with "rd.debug plymouth.enable=0". I have been told that all of the > > additional screenshots are redundant and basically contain the same > > information. > > > > Bill > > are you sure, this had "systemd.log_target=kmsg loglevel=7" ? ok, sorry, the screenshot says "shutdown" and not "systemd-shutdown"
(In reply to comment #12) > Created attachment 733864 [details] > First dracut reboot test screen This is the key screenshot. int make_console_stdio(void) { ... fd = acquire_terminal("/dev/console", false, true, true, (usec_t) -1); log_error("Failed to acquire terminal: %s", strerror(-fd)); ... in shutdown.c: chroot("."); setsid(); make_console_stdio(); log_info("Successfully changed into root pivot."); So, it seems there is no "/dev/console" ??
> Jóhann B. Guðmundsson 2013-04-18 10:14:33 EDT> > Without knowing what's failing we cant fix it ;( > Had you already followed this ( debug shell/serial console ) > http://freedesktop.org/wiki/Software/systemd/Debugging#Shutdown_Never_Finishes Well, I have been down that road already, but I did not document all of the trip. For me the serial console is impractical. I lack the needed null modem cable, and serial ports on computers are getting scare. So, I have gone with the script: /lib/systemd/system-shutdown/debug.sh. Every log text I have attached since Comment 6 has included this debug information. Booting the kernel uses the following command line: # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.8.7-201.fc18.x86_64 root=/dev/mapper/vg0-lv1 ro LANG=en_US.UTF-8 KEYTABLE=us SYSFONT=True rd.md.uuid=767469e8:7a9835b2:e92c29b8:4d4e0bd7 rd.luks=0 rd.dm=0 rd.md.uuid=eab643ef:757c0e0d:42209589:19541488 rd.lvm.lv=vg0/lv1 systemd.log_level=debug systemd.log_target=kmsg loglevel=7 rd.debug plymouth.enable=0 I have tried the debug shell, but once the panic occurs there is no way to get to it via CTRL+ALT+F9 because the keyboard is locked up. I do not think a serial console would work either in this case. The panic appears to happen just before the kernel would issue the reset for the shutdown. Doing a "sync && reboot -f" or "sync && poweroff -f" immediately causes a successful shutdown to run. So, I would guess it is not a kernel bug. The last part of that web page talks about "Reporting systemd Bugs" I do not think I have included for this report the outputs for the following two commands: # systemctl dump > systemd-dump.txt # systemd --test --system --log-level=debug > systemd-test.txt 2>&1 (not by root) So I include them now for what they may be worth. Bill
Created attachment 737352 [details] systemd-dump.txt
> Harald Hoyer 2013-04-18 11:45:37 EDT > So, it seems there is no "/dev/console" ?? Sometimes a cause to a problem is not found until someone asks the correct question. And that it the correct question to ask. The answer is no, at least at the point of time in the shutdown cycle about which we are concerned. There was a /dev/console device earlier in the session, but it was a soft link pointing to a xterm window device so that we could monitor the console's output during the X session. The xterm and where the link was pointing, of course, gets destroyed when the X session terminates during the shutdown process. I guess systemd needs the console during shutdown to display some messages and panics when it can not find the console device. Well, I removed the script that created the soft link for the console and now the reboot and poweroff shutdowns happen correctly as they should on the two systems running Fedora-18. The system running Fedora-17 using this same soft linked console device script reboots and poweroff just fine. Go figure! Bill
Uh oh. On Linux /dev/console is not something you can replace by a FIFO! It's a device node with very special semantics! That this every worked is pure luck... Really, don't do this. That's totally unsupported. Yikes...
I see. dracut's shutdown script begins with: exec </dev/console >>/dev/console 2>>/dev/console An exec failure causes a non-interactive shell to exit. I agree that /dev/console should not be removed, but the script can still be made robust against this. Is the redirection really necessary? systemd will call the shutdown script with the fds 0, 1, 2 redirected to /dev/console anyway. If it is necessary for some reason (non-systemd systems?), perhaps the exec can be protected with: [ -w /dev/console ] && exec ...
(In reply to comment #65) > I see. dracut's shutdown script begins with: > exec </dev/console >>/dev/console 2>>/dev/console > An exec failure causes a non-interactive shell to exit. > > I agree that /dev/console should not be removed, but the script can still be > made robust against this. > > Is the redirection really necessary? systemd will call the shutdown script > with the fds 0, 1, 2 redirected to /dev/console anyway. > > If it is necessary for some reason (non-systemd systems?), perhaps the exec > can be protected with: > [ -w /dev/console ] && exec ... yes.. added upstream, thanks! Closing as NOTABUG though, because one should not fiddle with /dev/console.