Bug 1537845 - Won't power off or reboot
Summary: Won't power off or reboot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 27
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-24 00:41 UTC by Todd
Modified: 2018-03-22 17:37 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-22 17:37:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Server #1 shutdown free at kill all (4.91 MB, image/jpeg)
2018-02-03 01:52 UTC, Todd
no flags Details
Server #2 shutdown free at kill all (2.76 MB, image/jpeg)
2018-02-03 01:54 UTC, Todd
no flags Details

Description Todd 2018-01-24 00:41:17 UTC
Hi All,

I have a Fedora 27 server out there that won't poweroff or reboot properly. When told to power off or reboot, it gets to the point where it seems that everything is terminated, but it won't throw the power switch or the reboot command. The keyboard is locked up so tight that your can not even toggle the num lock key. No keystroke is effective. To boot back up, you have to pull the power cord.

I turned off the "f" splash screen to see what the last thing it showed was:

Code:

   mount: /oldsys/sys: file system was mounted, but any subsequent operation failed: permission denied
    mdmon: error connecting monitor with run/mdadm/md127.sock: permission denied
    dracut Warning: killing all remaining processes

The last thing I see with "# journalctl -b -1 | tail" is:

Code:

Jan 15 19:55:09 FedoraServer.foo.local systemd[1]: Reached target Shutdown.
    Jan 15 19:55:09 FedoraServer.foo.local systemd[1]: Reached target Final Step.
    Jan 15 19:55:09 FedoraServer.foo.local systemd[1]: Starting Reboot...
    Jan 15 19:55:09 FedoraServer.foo.local systemd[1]: Shutting down.
    Jan 15 19:55:09 FedoraServer.foo.local lvm[2963]: /run/lvm/lvmetad.socket: connect failed: Connection refused
    Jan 15 19:55:09 FedoraServer.foo.local lvm[2963]:   WARNING: Failed to connect to lvmetad. Falling back to device scanning.
    Jan 15 19:55:09 FedoraServer.foo.local systemd-shutdown[1]: Sending SIGTERM to remaining processes...
    Jan 15 19:55:09 FedoraServer.foo.local systemd-journald[665]: Journal stopped


The motherboard is a Supermicro X11SAE-M:
http://www.supermicro.com/products/motherboard/xeon/c236_c232/x11sae-m.cfm

RSTe RAID 1 is being used.

Problem occued under Fedora 26 before upgrading to 27 as well.

Many thanks,
-T

Comment 1 Todd 2018-01-24 03:51:54 UTC
This data may help:

# lsblk --list
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda       8:0    0 894.3G  0 disk  
sdb       8:16   0 894.3G  0 disk  
sdc       8:32   0 894.3G  0 disk  
sdd       8:48   0   1.8T  0 disk  
sdd1      8:49   0   1.8T  0 part  
md126     9:126  0 849.5G  0 raid1 
md126     9:126  0 849.5G  0 raid1 
md126p1 259:0    0   200M  0 md    /boot/efi
md126p1 259:0    0   200M  0 md    /boot/efi
md126p2 259:1    0     1G  0 md    /boot
md126p2 259:1    0     1G  0 md    /boot
md126p3 259:2    0  15.7G  0 md    [SWAP]
md126p3 259:2    0  15.7G  0 md    [SWAP]
md126p4 259:3    0 832.7G  0 md    /
md126p4 259:3    0 832.7G  0 md    /

# find / -type d -iname md127
/sys/devices/virtual/block/md127

# ls -al /sys/devices/virtual/block/md127
total 0
drwxr-xr-x. 9 root root    0 Jan 23 16:14 .
drwxr-xr-x. 5 root root    0 Jan 23 16:14 ..
-r--r--r--. 1 root root 4096 Jan 23 19:41 alignment_offset
lrwxrwxrwx. 1 root root    0 Jan 23 19:41 bdi -> ../../bdi/9:127
-r--r--r--. 1 root root 4096 Jan 23 19:41 capability
-r--r--r--. 1 root root 4096 Jan 23 16:14 dev
-r--r--r--. 1 root root 4096 Jan 23 19:41 discard_alignment
-r--r--r--. 1 root root 4096 Jan 23 19:41 ext_range
drwxr-xr-x. 2 root root    0 Jan 23 19:10 holders
-r--r--r--. 1 root root 4096 Jan 23 19:41 inflight
drwxr-xr-x. 2 root root    0 Jan 23 19:10 integrity
drwxr-xr-x. 5 root root    0 Jan 23 16:14 md
drwxr-xr-x. 2 root root    0 Jan 23 19:10 power
drwxr-xr-x. 2 root root    0 Jan 23 19:06 queue
-r--r--r--. 1 root root 4096 Jan 23 19:41 range
-r--r--r--. 1 root root 4096 Jan 23 19:41 removable
-r--r--r--. 1 root root 4096 Jan 23 19:41 ro
-r--r--r--. 1 root root 4096 Jan 23 19:06 size
drwxr-xr-x. 2 root root    0 Jan 23 19:10 slaves
-r--r--r--. 1 root root 4096 Jan 23 19:41 stat
lrwxrwxrwx. 1 root root    0 Jan 23 16:14 subsystem -> ../../../../class/block
drwxr-xr-x. 2 root root    0 Jan 23 19:10 trace
-rw-r--r--. 1 root root 4096 Jan 23 16:14 uevent


# lsof | grep md127
mdmon       784         root    3u     unix 0xffff9c9d18561800          0t0      18189 /run/mdadm/md127.sock type=STREAM
mdmon       784  786    root    3u     unix 0xffff9c9d18561800          0t0      18189 /run/mdadm/md127.sock type=STREAM

Comment 4 Todd 2018-02-03 01:52:43 UTC
Created attachment 1390426 [details]
Server #1 shutdown free at kill all

I now have two servers doing this.  Both seize at the killall.  This screen shot is the clearer of the two

Comment 5 Todd 2018-02-03 01:54:20 UTC
Created attachment 1390427 [details]
Server #2 shutdown free at kill all

This screen shot is blurry, but if you squint right, the issue is also at the killall stage (I confirmed it with the customer).

Comment 6 Todd 2018-02-19 12:18:14 UTC
Extra information, possibly a clue.

I now have a server in my office with the identical motherboard, memory sticks, processor, and operating system installed from the same medium.

My server has no such shutdown issue.

The difference is that my server is using SATA and not RSTe RAID.

If interest, my server is set to boot from either a (BIOS) SATA drive or a (EUFI) NVMe drive.  The issue does not occur from either drive.

Comment 7 Todd 2018-03-08 07:07:10 UTC
Looking at the screen shot and comparing to my non-software raid computers I see both have the following:

Mar 07 02:16:23 rn6.foo.local systemd[1]: Reached target Shutdown.

But following is missing from the raid ones:

Mar 07 02:16:23 rn6.foo.local systemd[1]: Reached target Final Step.
Mar 07 02:16:23 rn6.foo.local systemd[1]: Starting Power-Off...
Mar 07 02:16:23 rn6.foo.local audit[1]: USER_AVC pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc:  received setenforce notice (enforcing=1)
                                                 exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?'
Mar 07 02:16:23 rn6.foo.local systemd[1]: Shutting down.
Mar 07 02:16:24 rn6.foo.local systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Mar 07 02:16:24 rn6.foo.local systemd-journald[1003]: Journal stopped

The raid one get the following instead:

mount: /oldsys/sys: filesystem was mounted, but subsequent operations failed: Permission denied
mount: /oldsys/proc: filesystem was mounted, but subsequent operations failed: Permission denied
mount: /oldsys/run: filesystem was mounted, but subsequent operations failed: Permission denied
mount: /oldsys/dev: filesystem was mounted, but subsequent operations failed: Permission denied
mdmon: Error connecting monitor with /run/mdadm/md127.sock: Permission denied
dracut Warning: Killing all remaining processes

Comment 8 Todd 2018-03-19 20:40:11 UTC
Ran updates and retested.  Went through about ten reboot and three poweroff's on he worst offending server, which froze all the time without a hitch.  The second server that froze about 50% of the time, went through two reboots without a hitch.

And now they are both working perfectly.

THANK YOU!

$ uname -r
4.15.8-300.fc27.x86_64

Comment 9 Jeremy Cline 2018-03-22 17:37:04 UTC
Great, thanks for letting us know.


Note You need to log in before you can comment on or make changes to this bug.