Bug 637505

Summary: Windows guest crashes when hot-unplug a virtio NIC using virt-manager
Product: Red Hat Enterprise Linux 6 Reporter: Keqin Hong <khong>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: alex.williamson, lihuang, michen, mkenneth, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-09-27 02:55:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Keqin Hong 2010-09-26 09:03:16 UTC
Description of problem:
When booting a Windows guest with 2 virtual NICs, using virt-manager to hot-unplug a virtio NIC will cause guest to crash.
The problem happens on XP, 2003, 2008, etc. with virtio NIC and does NOT happen with rtl8139 NIC.
The problem doesn't happen on RHEL guests.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.113.el6_0.1.x86_64
libvirt-0.8.1-27.el6.x86_64
virtio-win-1.1.12-0.3.el6.noarch.rpm 

How reproducible:
100%

Steps to Reproduce:
1. Boot a Win2008-R2 guest with two virtio nics
2. Wait till guest is up, then click 'remove' button to hotunplug a NIC

  
Actual results:
Guest crashed

Expected results:
no crash, nic can be hot-unplugged happily

Additional info:
(gdb) bt
#0  tap_set_offload (nc=0x0, csum=0, tso4=0, tso6=0, ecn=0, ufo=0) at net/tap.c:252
#1  0x00000000004205a3 in virtio_net_set_features (vdev=0x50d9010, features=0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-net.c:220
#2  0x000000000042102e in virtio_ioport_write (opaque=<value optimized out>, 
    addr=<value optimized out>, val=0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-pci.c:204
#3  0x000000000042ab48 in kvm_handle_io (env=0x2824d90)
    at /usr/src/debug/qemu-kvm-0.12.1.2/kvm-all.c:541
#4  kvm_run (env=0x2824d90) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:975
#5  0x000000000042ac09 in kvm_cpu_exec (env=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1658
#6  0x000000000042b82f in kvm_main_loop_cpu (_env=0x2824d90)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1900
#7  ap_main_loop (_env=0x2824d90) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1950
#8  0x00007f59d28067e1 in ?? ()
#9  0x00007f59cb5fe710 in ?? ()
#10 0x0000000000000000 in ?? ()

Comment 2 Keqin Hong 2010-09-26 09:15:01 UTC
By checking libvirt log, we could find libvirt sending two cmds with too short an interval, which triggered the crash.

11:05:20.867: debug : qemuMonitorJSONCommandWithFd:217 : Send command 
         ^^^
'{"execute":"device_del","arguments":{"id":"net1"}}' for write with FD -1
11:05:20.868: debug : qemuMonitorJSONIOProcessLine:115 : Line [{"return": {}}]
11:05:20.868: debug : qemuMonitorJSONIOProcess:188 : Total used 16 bytes out of 16 available in buffer
11:05:20.868: debug : qemuMonitorJSONCommandWithFd:222 : Receive command reply ret=0 errno=0 14 bytes '{"return": {}}'
11:05:20.868: debug : qemuMonitorJSONCommandWithFd:217 : Send command 
         ^^^
'{"execute":"netdev_del","arguments":{"id":"hostnet1"}}' for write with FD -1
11:05:20.923: debug : qemuMonitorJSONIOProcessLine:115 : Line [{"return": {}}]
11:05:20.923: debug : qemuMonitorJSONIOProcess:188 : Total used 16 bytes out of 16 available in buffer
11:05:20.923: debug : qemuMonitorJSONCommandWithFd:222 : Receive command reply ret=0 errno=0 14 bytes '{"return": {}}'
11:05:21.014: debug : qemudGetProcessInfo:4602 : Got status for 4525/0 user=4254 sys=3282 cpu=14
11:05:21.014: debug : qemuMonitorJSONCommandWithFd:217 : Send command '{"execute":"query-balloon"}' for write with FD -1
11:06:57.348: debug : qemuHandleMonitorEOF:1117 : Received EOF on 0x239a940 'win2008-r2'
11:06:57.348: debug : qemudShutdownVMDaemon:4256 : Shutting down VM 'win2008-r2' migrated=0
11:06:57.350: debug : qemuMonitorJSONCommandWithFd:222 : Receive command reply ret=-1 errno=104 0 bytes '(null)'
11:06:57.350: error : qemuMonitorJSONCommandWithFd:242 : cannot send monitor command '{"execute":"query-balloon"}': Connection reset by peer

Comment 3 Keqin Hong 2010-09-26 09:20:41 UTC
Based on Comment 2, the problem thus can be reproduced through CLI:

# cat cmdfile
{"execute":"qmp_capabilities"}
{"execute":"device_del","arguments":{"id":"net1"}}
{"execute":"netdev_del","arguments":{"id":"hostnet1"}}

Steps:
1. # /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 8192 -smp 8,sockets=8,cores=1,threads=1 -name win2008-r2 -uuid 74091ced-659f-0706-9d05-dca6c44682a5 -nodefconfig -nodefaults -chardev socket,id=monitor1,path=/var/lib/libvirt/qemu/win2008-r2.monitor,server,nowait -mon chardev=monitor1,mode=control -monitor stdio -rtc base=localtime -boot c -drive file=/var/lib/libvirt/images/160nfs/win2008R2.img,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0c:f7:ac,bus=pci.0,addr=0x3 -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:75:70:b9,bus=pci.0,addr=0x7 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc :0 -vga std -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

2. # nc -U /var/lib/libvirt/qemu/win2008-r2.monitor < cmdfile

Comment 4 Keqin Hong 2010-09-26 09:24:39 UTC
Additional debuginfo:
(/usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-net.c:220)

219	    if (n->has_vnet_hdr) {
220	        tap_set_offload(n->nic->nc.peer,
221	                        (features >> VIRTIO_NET_F_GUEST_CSUM) & 1,
222	                        (features >> VIRTIO_NET_F_GUEST_TSO4) & 1,
223	                        (features >> VIRTIO_NET_F_GUEST_TSO6) & 1,
224	                        (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
(gdb) p n->nic 
$18 = (NICState *) 0x447cb80
(gdb) p n->nic->nc.peer
$19 = (VLANClientState *) 0x0
(gdb) s
tap_set_offload (nc=0x0, csum=0, tso4=0, tso6=0, ecn=0, ufo=0) at net/tap.c:252
252	    return tap_fd_set_offload(s->fd, csum, tso4, tso6, ecn, ufo);
(gdb) s

Program received signal SIGSEGV, Segmentation fault.
tap_set_offload (nc=0x0, csum=0, tso4=0, tso6=0, ecn=0, ufo=0) at net/tap.c:252
252	    return tap_fd_set_offload(s->fd, csum, tso4, tso6, ecn, ufo);

Comment 5 lihuang 2010-09-27 02:55:17 UTC

*** This bug has been marked as a duplicate of bug 623735 ***

Comment 6 Alex Williamson 2010-09-27 03:13:08 UTC
bz623735 has additional issues due to vhost.  I think this is more appropriately a duplicate of bz634661.

*** This bug has been marked as a duplicate of bug 634661 ***