Description of problem: systemctl commands timeout This bug has been reported on Ubuntu and Arch Linux, and is now occurring on Fedor27 According to: https://unix.stackexchange.com/questions/98128/what-are-the-systemd-command-line-options-switched-root-and-deserialize https://serverfault.com/questions/712928/systemctl-commands-timeout-when-ran-as-root this bug is related to the use of --switched-root --deserialize 24 options use when root starts systemd Problem occurs on both x86_64, and i686 fedora27 architectures Problem is not currently occurring on fedore26 or earlier. Version-Release number of selected component (if applicable): systemd-233-4.fc27 How reproducible: 100% Steps to Reproduce: 1. normal reboot 2. login as root 3. ps auxwq 1 4. systemctl Actual results: [root@bld-fedora27 ~]# ps auxwq 1 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.2 0.5 158188 11852 ? Ss 11:22 0:02 /usr/lib/systemd/systemd --switched-root --system --deserialize 24 [root@bld-fedora27 ~]# systemctl Failed to list units: Connection timed out [root@bld-fedora27 ~]# .... also .... The Fedora icon on boot screen never completes. If "rhgb quiet" is removed from the kernel line then the bug doesn't happen. See https://bugzilla.redhat.com/show_bug.cgi?id=1452808 .... also ... ssh -vvv logins from other systems hang for ~60 seconds after "pledge: exec" while waiting for debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00 Other reports suggest that UsePAM=0 avoids this hang. I've not tried, but maybe pam is involved here. Again, removing "rhgb quiet" stops the problem See https://bugzilla.redhat.com/show_bug.cgi?id=1452804 https://serverfault.com/questions/792486/ssh-connection-takes-forever-to-initiate-stuck-at-pledge-network Expected results: None of the above Additional info: Related bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1452804 https://bugzilla.redhat.com/show_bug.cgi?id=1452808 https://unix.stackexchange.com/questions/98128/what-are-the-systemd-command-line-options-switched-root-and-deserialize https://serverfault.com/questions/712928/systemctl-commands-timeout-when-ran-as-root https://serverfault.com/questions/792486/ssh-connection-takes-forever-to-initiate-stuck-at-pledge-network
Hm, hanging systemctl usually means that PID1 crashed. Can you show 'journalctl -b' output?
Created attachment 1292011 [details] output of "journalctl -b -1" output contains this crash: Jun 26 10:51:00 bld-fedora27 kernel: ------------[ cut here ]------------ Jun 26 10:51:00 bld-fedora27 kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo_util.c:589! Jun 26 10:51:00 bld-fedora27 kernel: invalid opcode: 0000 [#1] SMP Jun 26 10:51:00 bld-fedora27 kernel: Modules linked in: sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq Jun 26 10:51:00 bld-fedora27 kernel: CPU: 1 PID: 339 Comm: plymouthd Not tainted 4.12.0-0.rc6.git3.1.fc27.x86_64 #1 Jun 26 10:51:00 bld-fedora27 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Jun 26 10:51:00 bld-fedora27 kernel: task: ffff91dd77990000 task.stack: ffffb69e00720000 Jun 26 10:51:00 bld-fedora27 kernel: RIP: 0010:ttm_bo_kmap+0x1af/0x250 [ttm] Jun 26 10:51:00 bld-fedora27 kernel: RSP: 0018:ffffb69e00723ba8 EFLAGS: 00010206 Jun 26 10:51:00 bld-fedora27 kernel: RAX: ffff91dd77987b50 RBX: ffff91dd779dc858 RCX: ffff91dd779dcc08 Jun 26 10:51:00 bld-fedora27 kernel: RDX: 0000000000000300 RSI: 0000000000000000 RDI: ffff91dd77430d90 Jun 26 10:51:00 bld-fedora27 kernel: RBP: ffffb69e00723be0 R08: 0000000000010004 R09: ffff91dd779dc928 Jun 26 10:51:00 bld-fedora27 kernel: R10: 0000000000000008 R11: 0000000000000000 R12: ffffb69e00723c48 Jun 26 10:51:00 bld-fedora27 kernel: R13: 0000000000000000 R14: ffff91dd77430d70 R15: 0000000000000000 Jun 26 10:51:00 bld-fedora27 kernel: FS: 00007fa4e051fd00(0000) GS:ffff91dd7d400000(0000) knlGS:0000000000000000 Jun 26 10:51:00 bld-fedora27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 26 10:51:00 bld-fedora27 kernel: CR2: 00005616290356f8 CR3: 0000000035b8b000 CR4: 00000000001406e0 Jun 26 10:51:00 bld-fedora27 kernel: Call Trace: Jun 26 10:51:00 bld-fedora27 kernel: qxl_bo_kmap+0x42/0x70 [qxl] Jun 26 10:51:00 bld-fedora27 kernel: qxl_draw_dirty_fb+0x1f5/0x420 [qxl] Jun 26 10:51:00 bld-fedora27 kernel: qxl_framebuffer_surface_dirty+0xa0/0xf0 [qxl] Jun 26 10:51:00 bld-fedora27 kernel: drm_mode_dirtyfb_ioctl+0x190/0x1d0 [drm] Jun 26 10:51:00 bld-fedora27 kernel: drm_ioctl+0x232/0x500 [drm] Jun 26 10:51:00 bld-fedora27 kernel: ? drm_mode_getfb+0x110/0x110 [drm] Jun 26 10:51:00 bld-fedora27 kernel: ? __might_sleep+0x4a/0x80 Jun 26 10:51:00 bld-fedora27 kernel: ? selinux_file_ioctl+0x124/0x1e0 Jun 26 10:51:00 bld-fedora27 kernel: do_vfs_ioctl+0xa6/0x720 Jun 26 10:51:00 bld-fedora27 kernel: ? getnstimeofday64+0xe/0x20 Jun 26 10:51:00 bld-fedora27 kernel: ? security_file_ioctl+0x43/0x60 Jun 26 10:51:00 bld-fedora27 kernel: SyS_ioctl+0x79/0x90 Jun 26 10:51:00 bld-fedora27 kernel: do_syscall_64+0x6c/0x1c0 Jun 26 10:51:00 bld-fedora27 kernel: entry_SYSCALL64_slow_path+0x25/0x25 Jun 26 10:51:00 bld-fedora27 kernel: RIP: 0033:0x7fa4df504eb7 Jun 26 10:51:00 bld-fedora27 kernel: RSP: 002b:00007ffca83efbc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jun 26 10:51:00 bld-fedora27 kernel: RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fa4df504eb7 Jun 26 10:51:00 bld-fedora27 kernel: RDX: 00007ffca83efc00 RSI: 00000000c01864b1 RDI: 0000000000000009 Jun 26 10:51:00 bld-fedora27 kernel: RBP: 00007ffca83efc00 R08: 00007fa4dd42577c R09: 0000000000000010 Jun 26 10:51:00 bld-fedora27 kernel: R10: 00007fa4dd424884 R11: 0000000000000246 R12: 00000000c01864b1 Jun 26 10:51:00 bld-fedora27 kernel: R13: 0000000000000009 R14: 0000555e394ef330 R15: 00007fa4ddf8278c Jun 26 10:51:00 bld-fedora27 kernel: Code: d0 48 8b bb 80 00 00 00 48 c1 e6 0c f6 43 62 04 74 44 48 03 7b 70 4c 01 ef e8 8e 21 bd ef 49 89 04 24 44 8b 45 d0 e9 18 ff ff ff <0f> 0b 4b 8d 7c 3e 58 89 4 Jun 26 10:51:00 bld-fedora27 kernel: RIP: ttm_bo_kmap+0x1af/0x250 [ttm] RSP: ffffb69e00723ba8 Jun 26 10:51:00 bld-fedora27 kernel: ---[ end trace 13b661e538049e06 ]---
That oops doesn't explain why systemd itself would not respond. Looks like a bug in the QXL graphics subsystem, you might want to file that as a separate bug. Based on the logs, everything seems to be running OK. Is dbus generally functional? E.g. can you run 'loginctl' or 'udisksctl status'?
'loginctl' and 'udisksctl status' both appear to run ok. https://bugzilla.redhat.com/show_bug.cgi?id=1465148 filed for the QXL issue
I don't see any substantial info in the bug reports and stackoverflow questions you linked. Seems to be a bunch of unrelated issues. Returning to your problem: it seems that somehow either the communication between systemctl and systemd fail, or systemd fails to provide a response. It would love to figure out what is going on here... First question: from the logs, I see you are running with selinux in enforcing mode. If you switch to permissive ('setenforce 0'), does that change anything? Second question: when you run the systemctl command, and it's waiting for the reply, can you start 'gdb -p 1' in another terminal, and paste the output of 'bt full'? This should help with figuring out if its systemd that's hanging.
> If you switch to permissive ('setenforce 0'), does that change anything? I changed to "SELINUX=disabled" in /etc/syslinux/config and rebooted (with "rhgb qyuet" present"). Disabling selinux made no difference. One new observation is that the first ssh -l root, and systemctl, was successful. On the second attempt, a couple of minutes later, the ssh login was really slow, and the systemctl command hung.
Created attachment 1293513 [details] output of: gdb -p 1 systemctl was running and not yet timed-out when this "gdb -p 1" output was captured.
Created attachment 1293514 [details] gdb output of: bt full
Created attachment 1293515 [details] gdb -p 1 ... bt full You probably wanted those from a single gdb session.
*** Bug 1452808 has been marked as a duplicate of this bug. ***
Thanks for the data. > Jun 26 10:51:22 bld-fedora27 systemd[1]: plymouth-quit.service: Start operation timed out. Terminating. The pattern here and in #1452808 seems to be the same: - systemd starts plymouth-quit.service - plymouth-quit.service fails because of a timeout - systemd tries to write to the console - writev() never returns – systemd hangs The only difference is that you see a QXL backtrace in the logs, but it seems reasonable to assume that in either case the graphics subsystem is hosed. We'll need to discuss the solution to this upstream. One option would be to set a timeout on the writev, but that's quite messy.
After discussing this with Lennart, it seems that isn't much that systemd can do. Once the kernel does an oops, any code which calls into the subsystem will also hang. Plymouth is hanging, and systemd will hang too once it calls writev() on the console, even if we used O_NONBLOCK. So the only solution is to fix the kernel code. *** This bug has been marked as a duplicate of bug 1465148 ***