Bug 1005150

Summary: qemu dies when replugging virtserialport too quickly
Product: [Fedora] Fedora Reporter: Lukáš Doktor <ldoktor>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: amit.shah, berrange, cfergeau, crobinso, dwmw2, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, marcelo.barbosa, pbonzini, rjones, scottt.tw, skottler, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1005677 (view as bug list) Environment:
Last Closed: 2013-09-20 11:05:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1005677    
Attachments:
Description Flags
Guest script which repeatedly writes some data into vs1 serial port.
none
Debug log from the modified autotest test run with kernel oops none

Description Lukáš Doktor 2013-09-06 10:07:52 UTC
Created attachment 794638 [details]
Guest script which repeatedly writes some data into vs1 serial port.

Description of problem:
While testing fixed https://bugzilla.redhat.com/show_bug.cgi?id=796048 issue with port hot-plugging I discovered another problem.

I used virtio_console.spread_2.specifiable.virtserialport.virtio_console_smoke.open autotest test. I removed all sleeps from the port re-connection and fixed the issue to open only existing file (will send the fix to upstream later).

For the first time the kernel crashed (see the attachment with test log output).

Than I tried to reproduce this with a simple reproducer, which does the same. Apparently the reproducer is quicker so qemu dies with return no 1 without any message.

Version-Release number of selected component (if applicable):
Host (F19):
qemu-kvm-1.4.2-7.fc19.x86_64
Guest (F19):
kernel-3.10.10-200.fc19.x86_64

How reproducible:
Always (after few seconds)

Steps to Reproduce (simple reproducer):
1. execute qemu with one virtserialport 
    -device virtio-serial-pci,id=virtio_serial_pci2
    -chardev socket,id=devvs1,path=/tmp/virtio_port-vs1-20130906-103138-MfrEYy0Ca,server,nowait
    -device virtserialport,chardev=devvs1,name=vs1,id=vs1,bus=virtio_serial_pci2.0 )
2. in guest execute simple_reproducer.py
3. on host read the vs1
    while :; do sudo socat /tmp/virtio_port-vs1-20130906-103138-MfrEYy0Ca -; done
4. on host start re-plugging the port
    while :; do echo -e "device_del vs1\ndevice_add virtserialport,id=vs1,chardev=devvs1,name=vs1" | sudo socat /tmp/monitor-hmp1-20130906-103138-MfrEYy0Ca -; sleep 0.01; done

Actual results:
The qemu crashes (ret=1) without any messages, no log in serial console. When using sleep > 0.1 I haven't seen this problem, without the sleep the failure is immediate.

Expected results:
Some data should be received in host (or at least the guest should survive)

Comment 1 Lukáš Doktor 2013-09-06 10:13:46 UTC
Created attachment 794639 [details]
Debug log from the modified autotest test run with kernel oops

Comment 2 Amit Shah 2013-09-06 10:57:36 UTC
From the kernel log:

general protection fault: 0000 [#1] SMP 
[  133.095018] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables crc32_pclmul crc32c_intel ghash_clmulni_intel microcode i2c_piix4 i2c_core virtio_net virtio_blk
[  133.095018] CPU: 1 PID: 866 Comm: python Not tainted 3.10.10-200.fc19.x86_64 #1
[  133.095018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  133.095018] task: ffff88003c6ea620 ti: ffff880036eee000 task.ti: ffff880036eee000
[  133.095018] RIP: 0010:[<ffffffff8117fb95>]  [<ffffffff8117fb95>] __kmalloc+0x95/0x270
[  133.095018] RSP: 0018:ffff880036eefe38  EFLAGS: 00010246
[  133.095018] RAX: 0000000000000000 RBX: 0000000000000007 RCX: ffff880036eeffd8
[  133.095018] RDX: 000000000001bc3a RSI: 0000000000000000 RDI: 0000000000000008
[  133.095018] RBP: ffff880036eefe78 R08: 0000000000016e80 R09: ffffffff813c77be
[  133.095018] R10: ffff88003e001a00 R11: 0000000000000293 R12: 00000000000000d0
[  133.095018] R13: ff48464d53514d4f R14: 0000000000000048 R15: ffff88003e001a00
[  133.095018] FS:  00007ff23bfff700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[  133.095018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  133.095018] CR2: 00007f34d0edc000 CR3: 000000003d674000 CR4: 00000000000407e0
[  133.095018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  133.095018] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  133.095018] Stack:
[  133.095018]  ffffffff813c7728 0000000000000000 ffffffff813c77be 0000000000000007
[  133.095018]  0000000000000000 0000000000000007 0000000000000007 00000000020bb880
[  133.095018]  ffff880036eefea0 ffffffff813c77be 0000000000000007 0000000002113f84
[  133.095018] Call Trace:
[  133.095018]  [<ffffffff813c7728>] ? wait_port_writable+0x1d8/0x240
[  133.095018]  [<ffffffff813c77be>] ? alloc_buf.isra.23+0x2e/0xa0
[  133.095018]  [<ffffffff813c77be>] alloc_buf.isra.23+0x2e/0xa0
[  133.095018]  [<ffffffff813c853d>] port_fops_write+0x6d/0x100
[  133.095018]  [<ffffffff81285881>] ? security_file_permission+0x21/0xa0
[  133.095018]  [<ffffffff81197d6d>] vfs_write+0xbd/0x1e0
[  133.095018]  [<ffffffff81198739>] SyS_write+0x49/0xa0
[  133.095018]  [<ffffffff810d94a6>] ? __audit_syscall_exit+0x1f6/0x2a0
[  133.095018]  [<ffffffff81647c59>] system_call_fastpath+0x16/0x1b
[  133.095018] Code: db 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 30 01 00 00 48 85 c0 0f 84 27 01 00 00 49 63 42 20 4d 8b 02 41 f6 c0 0f <49> 8b 5c 05 00 0f 85 7e 01 00 00 48 8d 4a 01 4c 89 e8 65 49 0f 
[  133.095018] RIP  [<ffffffff8117fb95>] __kmalloc+0x95/0x270
[  133.095018]  RSP <ffff880036eefe38>
[  133.201981] ---[ end trace 8e93be13fa06da4a ]---

Comment 3 Amit Shah 2013-09-06 11:12:14 UTC
There are two separate bugs, let's track the kernel bug here, and please file / clone for the qemu quitting bug.  For qemu, please try using gdb and checking if something interesting can be obtained.

Comment 4 Josh Boyer 2013-09-18 20:32:39 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 5 Lukáš Doktor 2013-09-20 10:21:51 UTC
Hi Josh, I'm unable to reproduce the GP fault in guest. I only see the second problem which is qemu exit with ret=1 without any message, which is reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1005677 so I'll update only that bug.

Comment 6 Amit Shah 2013-09-20 11:05:37 UTC
Thanks!

There were lots of fixes made in the 3.11 release for port unplug, and those seem to have fixed this issue.

Comment 7 Lukáš Doktor 2013-09-22 07:04:55 UTC
Well the issue is not fixed, qemu still dies, but I'm unable to reproduce the kernel message. Anyway as there is another bug which is related to qemu exit while doing this this bugzilla can be closed.