1494974 – Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

Bug 1494974 - Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

Summary: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the on...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-24 13:23 UTC by David Hill
Modified:	2018-04-08 14:48 UTC (History)
CC List:	24 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-04-08 14:48:49 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Linux Kernel	197861	0	None	None	None	2019-07-23 07:19:50 UTC

Description David Hill 2017-09-24 13:23:33 UTC

Description of problem:
Kernel 4.14 breaks KVM easily

[ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more than 120 seconds.
[ 7496.552987]       Tainted: G          I     4.14.0-0.rc1.git3.1.fc28.x86_64 #1
[ 7496.552996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7496.553006] qemu-system-x86 D12240  5978      1 0x00000004
[ 7496.553024] Call Trace:
[ 7496.553044]  __schedule+0x2dc/0xbb0
[ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
[ 7496.553074]  schedule+0x3d/0x90
[ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
[ 7496.553100]  ? finish_wait+0x90/0x90
[ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
[ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
[ 7496.553166]  SyS_ioctl+0x79/0x90
[ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 7496.553190] RIP: 0033:0x7fa1ea0e1817
[ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007fa1ea0e1817
[ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI: 0000000000000021
[ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09: 000055e330245d92
[ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12: 000055e33351a000
[ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15: 0000000000000000
[ 7496.553284]
               Showing all locks held in the system:
[ 7496.553313] 1 lock held by khungtaskd/161:
[ 7496.553319]  #0:  (tasklist_lock){.+.+}, at: [<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
[ 7496.553373] 1 lock held by in:imklog/1194:
[ 7496.553379]  #0:  (&f->f_pos_lock){+.+.}, at: [<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
[ 7496.553541] 1 lock held by qemu-system-x86/5978:
[ 7496.553547]  #0:  (&dev->mutex#3){+.+.}, at: [<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 David Hill 2017-09-24 13:30:01 UTC

This happens when I shutdown a VM.

Comment 2 David Hill 2017-11-08 01:07:51 UTC

This appears to be only when the VM is being shutdown.   I'm trying to bisect this but since it takes me some hours to reproduce this or not, it's a slow process.

Comment 3 David Hill 2017-11-29 02:54:48 UTC

Should experimental_zcopytx be disabled by default in Fedora too?   Disabling it in that kernel serie solves this issue:

[root@zappa modprobe.d]# cat vhost-net.conf
options vhost_net  experimental_zcopytx=0

Comment 4 Sverd Johnsen 2017-12-01 19:01:43 UTC

Confirmed on vanilla 4.14.2. Hit this for the first time after upgrading to 4.14, same setup did not have problems earlier.

Very simple macvtap like this:

ip link add link enp0s31f6 name "mvfoobar" type macvtap mode bridge
(moved into netns)
-device virtio-net-pci,netdev=net0,mac=foobar,disable-modern=off,disable-legacy=on -netdev tap,vhost=on,id=net0,fd=3 

On the host I have htb + fq_codel and some simple (u32) filter rules.

Guest is Windows 10 with VFIO based gpu passthrough.

 INFO: task qemu-system-x86:3895 blocked for more than 60 seconds.
       Not tainted 4.14.2-2-fb #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 qemu-system-x86 D    0  3895   3888 0x00000080
 Call Trace:
  ? __schedule+0x186/0x5b0
  schedule+0x2a/0x80
  vhost_net_ubuf_put_and_wait+0x50/0x80 [vhost_net]
  ? wait_woken+0x80/0x80
  vhost_net_ioctl+0x42e/0x8b0 [vhost_net]
  ? do_futex+0x451/0xa90
  ? cpuacct_charge+0x2c/0x90
  do_vfs_ioctl+0x9c/0x610
  ? __audit_syscall_entry+0xba/0x100
  ? syscall_trace_enter+0x1e7/0x260
  SyS_ioctl+0x6f/0x80
  do_syscall_64+0x49/0x100
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: 0033:0x7f8e9dbff337
 RSP: 002b:00007f8e9047ae88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007f8e92b9d380 RCX: 00007f8e9dbff337
 RDX: 00007f8e9047ae90 RSI: 000000004008af30 RDI: 000000000000000f
 RBP: 00007f8e9047ae90 R08: 0000561d7cf06f70 R09: 00000000000000ff
 R10: 0000000000000014 R11: 0000000000000246 R12: 00007f8c0aea6520
 R13: 0000000000000001 R14: 00007f8c0aea64a8 R15: 00007f8e92c1bac0

 INFO: task qemu-system-x86:3895 blocked for more than 60 seconds.
       Not tainted 4.14.2-2-fb #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 qemu-system-x86 D    0  3895   3888 0x00000084
 Call Trace:
  ? __schedule+0x186/0x5b0
  schedule+0x2a/0x80
  vhost_net_ubuf_put_and_wait+0x50/0x80 [vhost_net]
  ? wait_woken+0x80/0x80
  vhost_net_ioctl+0x42e/0x8b0 [vhost_net]
  ? do_futex+0x451/0xa90
  ? cpuacct_charge+0x2c/0x90
  do_vfs_ioctl+0x9c/0x610
  ? __audit_syscall_entry+0xba/0x100
  ? syscall_trace_enter+0x1e7/0x260
  SyS_ioctl+0x6f/0x80
  do_syscall_64+0x49/0x100
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: 0033:0x7f8e9dbff337
 RSP: 002b:00007f8e9047ae88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007f8e92b9d380 RCX: 00007f8e9dbff337
 RDX: 00007f8e9047ae90 RSI: 000000004008af30 RDI: 000000000000000f
 RBP: 00007f8e9047ae90 R08: 0000561d7cf06f70 R09: 00000000000000ff
 R10: 0000000000000014 R11: 0000000000000246 R12: 00007f8c0aea6520
 R13: 0000000000000001 R14: 00007f8c0aea64a8 R15: 00007f8e92c1bac0

 INFO: task qemu-system-x86:3895 blocked for more than 60 seconds.
       Not tainted 4.14.2-2-fb #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 qemu-system-x86 D    0  3895      1 0x00000084
 Call Trace:
  ? __schedule+0x186/0x5b0
  schedule+0x2a/0x80
  vhost_net_ubuf_put_and_wait+0x50/0x80 [vhost_net]
  ? wait_woken+0x80/0x80
  vhost_net_ioctl+0x42e/0x8b0 [vhost_net]
  ? do_futex+0x451/0xa90
  ? cpuacct_charge+0x2c/0x90
  do_vfs_ioctl+0x9c/0x610
  ? __audit_syscall_entry+0xba/0x100
  ? syscall_trace_enter+0x1e7/0x260
  SyS_ioctl+0x6f/0x80
  do_syscall_64+0x49/0x100
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: 0033:0x7f8e9dbff337
 RSP: 002b:00007f8e9047ae88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007f8e92b9d380 RCX: 00007f8e9dbff337
 RDX: 00007f8e9047ae90 RSI: 000000004008af30 RDI: 000000000000000f
 RBP: 00007f8e9047ae90 R08: 0000561d7cf06f70 R09: 00000000000000ff
 R10: 0000000000000014 R11: 0000000000000246 R12: 00007f8c0aea6520
 R13: 0000000000000001 R14: 00007f8c0aea64a8 R15: 00007f8e92c1bac0

 INFO: task qemu-system-x86:3895 blocked for more than 60 seconds.
       Not tainted 4.14.2-2-fb #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 qemu-system-x86 D    0  3895      1 0x00000084
 Call Trace:
  ? __schedule+0x186/0x5b0
  schedule+0x2a/0x80
  vhost_net_ubuf_put_and_wait+0x50/0x80 [vhost_net]
  ? wait_woken+0x80/0x80
  vhost_net_ioctl+0x42e/0x8b0 [vhost_net]
  ? do_futex+0x451/0xa90
  ? cpuacct_charge+0x2c/0x90
  do_vfs_ioctl+0x9c/0x610
  ? __audit_syscall_entry+0xba/0x100
  ? syscall_trace_enter+0x1e7/0x260
  SyS_ioctl+0x6f/0x80
  do_syscall_64+0x49/0x100
  entry_SYSCALL64_slow_path+0x25/0x25
 RIP: 0033:0x7f8e9dbff337
 RSP: 002b:00007f8e9047ae88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007f8e92b9d380 RCX: 00007f8e9dbff337
 RDX: 00007f8e9047ae90 RSI: 000000004008af30 RDI: 000000000000000f
 RBP: 00007f8e9047ae90 R08: 0000561d7cf06f70 R09: 00000000000000ff
 R10: 0000000000000014 R11: 0000000000000246 R12: 00007f8c0aea6520
 R13: 0000000000000001 R14: 00007f8c0aea64a8 R15: 00007f8e92c1bac0

Disabling experimental_zcopytx seems to solve it.

Comment 5 Laura Abbott 2017-12-01 19:11:53 UTC

Fedora goes with the defaults that upstream provides and it looks like that's the default. Can you test on the latest rawhide kernel and see if this is still a problem? There are a couple of changes that came in which might be related.

Comment 6 David Hill 2017-12-04 20:34:01 UTC

We can reproduce this issue with v4.15.0-rc* also.

Comment 7 David Hill 2017-12-12 04:18:39 UTC

Looks like this series of commit breaks it:

3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
commit 3ece782693c4b64d588dd217868558ab9a19bfe7
Author: Willem de Bruijn <willemb>
Date:   Thu Aug 3 16:29:38 2017 -0400

    sock: skb_copy_ubufs support for compound pages
    
    Refine skb_copy_ubufs to support compound pages. With upcoming TCP
    zerocopy sendmsg, such fragments may appear.
    
    The existing code replaces each page one for one. Splitting each
    compound page into an independent number of regular pages can result
    in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.
    
    Instead, fill all destination pages but the last to PAGE_SIZE.
    Split the existing alloc + copy loop into separate stages:
    1. compute bytelength and minimum number of pages to store this.
    2. allocate
    3. copy, filling each page except the last to PAGE_SIZE bytes
    4. update skb frag array
    
    Signed-off-by: Willem de Bruijn <willemb>
    Signed-off-by: David S. Miller <davem>

:040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M	include
:040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 4fc8384362693e4619fab39b0a945f6f2349226b M	net

Comment 8 Laura Abbott 2017-12-13 00:59:56 UTC

Thanks for the bisect. I see discussion is going on at the kernel.org bugzilla so if there is a final patch, please let me know so it can be applied.

Comment 9 David Hill 2017-12-13 03:51:55 UTC

I'm reverting this patch and will try reproducing the issue we're all hitting.   I'll keep you posted of my success and/or failures.

Comment 10 Laura Abbott 2018-04-06 18:52:35 UTC

Did this get resolved?

Comment 11 David Hill 2018-04-08 14:48:49 UTC

Yes it's fixed by upstream .     It was caused by vhost_net  experimental_zcopytx feature.

Note You need to log in before you can comment on or make changes to this bug.

airlied
ajax
bskeggs
dhill
eparis
esandeen
hdegoede
ichavero
itamar
jarodwilson
jforbes
jglisse
jonathan
josef
jwboyer
kernel-maint
labbott
linville
mchehab
mjg59
nhorman
quintela
steved
sverd.johnsen