Bug 1463745
| Summary: | systemctl commands timeout | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | John Ellson <john.ellson> | ||||||||||
| Component: | systemd | Assignee: | systemd-maint | ||||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | rawhide | CC: | johannbg, lnykryn, msekleta, muadda, ssahani, s, systemd-maint, zbyszek | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-07-02 21:23:11 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
John Ellson
2017-06-21 15:55:12 UTC
Hm, hanging systemctl usually means that PID1 crashed. Can you show 'journalctl -b' output? Created attachment 1292011 [details]
output of "journalctl -b -1"
output contains this crash:
Jun 26 10:51:00 bld-fedora27 kernel: ------------[ cut here ]------------
Jun 26 10:51:00 bld-fedora27 kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo_util.c:589!
Jun 26 10:51:00 bld-fedora27 kernel: invalid opcode: 0000 [#1] SMP
Jun 26 10:51:00 bld-fedora27 kernel: Modules linked in: sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq
Jun 26 10:51:00 bld-fedora27 kernel: CPU: 1 PID: 339 Comm: plymouthd Not tainted 4.12.0-0.rc6.git3.1.fc27.x86_64 #1
Jun 26 10:51:00 bld-fedora27 kernel: Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Jun 26 10:51:00 bld-fedora27 kernel: task: ffff91dd77990000 task.stack: ffffb69e00720000
Jun 26 10:51:00 bld-fedora27 kernel: RIP: 0010:ttm_bo_kmap+0x1af/0x250 [ttm]
Jun 26 10:51:00 bld-fedora27 kernel: RSP: 0018:ffffb69e00723ba8 EFLAGS: 00010206
Jun 26 10:51:00 bld-fedora27 kernel: RAX: ffff91dd77987b50 RBX: ffff91dd779dc858 RCX: ffff91dd779dcc08
Jun 26 10:51:00 bld-fedora27 kernel: RDX: 0000000000000300 RSI: 0000000000000000 RDI: ffff91dd77430d90
Jun 26 10:51:00 bld-fedora27 kernel: RBP: ffffb69e00723be0 R08: 0000000000010004 R09: ffff91dd779dc928
Jun 26 10:51:00 bld-fedora27 kernel: R10: 0000000000000008 R11: 0000000000000000 R12: ffffb69e00723c48
Jun 26 10:51:00 bld-fedora27 kernel: R13: 0000000000000000 R14: ffff91dd77430d70 R15: 0000000000000000
Jun 26 10:51:00 bld-fedora27 kernel: FS: 00007fa4e051fd00(0000) GS:ffff91dd7d400000(0000) knlGS:0000000000000000
Jun 26 10:51:00 bld-fedora27 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 10:51:00 bld-fedora27 kernel: CR2: 00005616290356f8 CR3: 0000000035b8b000 CR4: 00000000001406e0
Jun 26 10:51:00 bld-fedora27 kernel: Call Trace:
Jun 26 10:51:00 bld-fedora27 kernel: qxl_bo_kmap+0x42/0x70 [qxl]
Jun 26 10:51:00 bld-fedora27 kernel: qxl_draw_dirty_fb+0x1f5/0x420 [qxl]
Jun 26 10:51:00 bld-fedora27 kernel: qxl_framebuffer_surface_dirty+0xa0/0xf0 [qxl]
Jun 26 10:51:00 bld-fedora27 kernel: drm_mode_dirtyfb_ioctl+0x190/0x1d0 [drm]
Jun 26 10:51:00 bld-fedora27 kernel: drm_ioctl+0x232/0x500 [drm]
Jun 26 10:51:00 bld-fedora27 kernel: ? drm_mode_getfb+0x110/0x110 [drm]
Jun 26 10:51:00 bld-fedora27 kernel: ? __might_sleep+0x4a/0x80
Jun 26 10:51:00 bld-fedora27 kernel: ? selinux_file_ioctl+0x124/0x1e0
Jun 26 10:51:00 bld-fedora27 kernel: do_vfs_ioctl+0xa6/0x720
Jun 26 10:51:00 bld-fedora27 kernel: ? getnstimeofday64+0xe/0x20
Jun 26 10:51:00 bld-fedora27 kernel: ? security_file_ioctl+0x43/0x60
Jun 26 10:51:00 bld-fedora27 kernel: SyS_ioctl+0x79/0x90
Jun 26 10:51:00 bld-fedora27 kernel: do_syscall_64+0x6c/0x1c0
Jun 26 10:51:00 bld-fedora27 kernel: entry_SYSCALL64_slow_path+0x25/0x25
Jun 26 10:51:00 bld-fedora27 kernel: RIP: 0033:0x7fa4df504eb7
Jun 26 10:51:00 bld-fedora27 kernel: RSP: 002b:00007ffca83efbc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 26 10:51:00 bld-fedora27 kernel: RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007fa4df504eb7
Jun 26 10:51:00 bld-fedora27 kernel: RDX: 00007ffca83efc00 RSI: 00000000c01864b1 RDI: 0000000000000009
Jun 26 10:51:00 bld-fedora27 kernel: RBP: 00007ffca83efc00 R08: 00007fa4dd42577c R09: 0000000000000010
Jun 26 10:51:00 bld-fedora27 kernel: R10: 00007fa4dd424884 R11: 0000000000000246 R12: 00000000c01864b1
Jun 26 10:51:00 bld-fedora27 kernel: R13: 0000000000000009 R14: 0000555e394ef330 R15: 00007fa4ddf8278c
Jun 26 10:51:00 bld-fedora27 kernel: Code: d0 48 8b bb 80 00 00 00 48 c1 e6 0c f6 43 62 04 74 44 48 03 7b 70 4c 01 ef e8 8e 21 bd ef 49 89 04 24 44 8b 45 d0 e9 18 ff ff ff <0f> 0b 4b 8d 7c 3e 58 89 4
Jun 26 10:51:00 bld-fedora27 kernel: RIP: ttm_bo_kmap+0x1af/0x250 [ttm] RSP: ffffb69e00723ba8
Jun 26 10:51:00 bld-fedora27 kernel: ---[ end trace 13b661e538049e06 ]---
That oops doesn't explain why systemd itself would not respond. Looks like a bug in the QXL graphics subsystem, you might want to file that as a separate bug. Based on the logs, everything seems to be running OK. Is dbus generally functional? E.g. can you run 'loginctl' or 'udisksctl status'? 'loginctl' and 'udisksctl status' both appear to run ok. https://bugzilla.redhat.com/show_bug.cgi?id=1465148 filed for the QXL issue I don't see any substantial info in the bug reports and stackoverflow questions you linked. Seems to be a bunch of unrelated issues.
Returning to your problem: it seems that somehow either the communication between systemctl and systemd fail, or systemd fails to provide a response. It would love to figure out what is going on here...
First question: from the logs, I see you are running with selinux in enforcing mode. If you switch to permissive ('setenforce 0'), does that change anything?
Second question: when you run the systemctl command, and it's waiting for the reply, can you start 'gdb -p 1' in another terminal, and paste the output of 'bt full'? This should help with figuring out if its systemd that's hanging.
> If you switch to permissive ('setenforce 0'), does that change anything?
I changed to "SELINUX=disabled" in /etc/syslinux/config
and rebooted (with "rhgb qyuet" present").
Disabling selinux made no difference.
One new observation is that the first ssh -l root, and systemctl, was successful. On the second attempt, a couple of minutes later, the ssh login was really slow, and the systemctl command hung.
Created attachment 1293513 [details]
output of: gdb -p 1
systemctl was running and not yet timed-out when this "gdb -p 1" output was captured.
Created attachment 1293514 [details]
gdb output of: bt full
Created attachment 1293515 [details]
gdb -p 1 ... bt full
You probably wanted those from a single gdb session.
*** Bug 1452808 has been marked as a duplicate of this bug. *** Thanks for the data.
> Jun 26 10:51:22 bld-fedora27 systemd[1]: plymouth-quit.service: Start operation timed out. Terminating.
The pattern here and in #1452808 seems to be the same:
- systemd starts plymouth-quit.service
- plymouth-quit.service fails because of a timeout
- systemd tries to write to the console
- writev() never returns – systemd hangs
The only difference is that you see a QXL backtrace in the logs, but it seems reasonable to assume that in either case the graphics subsystem is hosed.
We'll need to discuss the solution to this upstream. One option would be to set a timeout on the writev, but that's quite messy.
After discussing this with Lennart, it seems that isn't much that systemd can do. Once the kernel does an oops, any code which calls into the subsystem will also hang. Plymouth is hanging, and systemd will hang too once it calls writev() on the console, even if we used O_NONBLOCK. So the only solution is to fix the kernel code. *** This bug has been marked as a duplicate of bug 1465148 *** |