Bug 962413

Summary: qemu-kvm exits after system_reset : Guest moved used index from 648 to 8224
Product: Red Hat Enterprise Linux 6 Reporter: Miya Chen <michen>
Component: qemu-kvmAssignee: Fam Zheng <famz>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 6.5CC: acathrow, akong, amit.shah, armbru, bsarathy, chayang, jasowang, juzhang, lersek, mkenneth, pbonzini, qzhang, sluo, virt-maint, xutian, xwei
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-31 01:31:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miya Chen 2013-05-13 12:02:09 UTC
Description of problem:
qemu-kvm exits after issuing system_reset in monitor: Guest moved used index from 648 to 8224

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-0.12.1.2-2.369.el6.x86_64
2.6.32-376.el6.x86_64
seabios-0.6.1.2-27.el6.x86_64


How reproducible:
only once


Steps to Reproduce:
1. Start guest with:
# /usr/libexec/qemu-kvm -cpu Opteron_G3 -M rhel6.5.0 -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.5-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93001 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -drive file=rhel65-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:12,bus=pci.0,addr=0x4,bootindex=0 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :1 -boot menu=on

2.
(qemu) info status
VM status: running
(qemu) syst
system_wakeup     system_reset      system_powerdown  
(qemu) system_reset 
(qemu) Guest moved used index from 648 to 8224

  
Actual results:
qemu-kvm exits.

Expected results:


Additional info:
No os is installed in the image and no vga is specified.

Comment 2 Qunfang Zhang 2013-05-14 00:58:25 UTC
Can be reproduced with qemu-kvm-366 with "-M rhel6.4.0" machine type as well.

Comment 3 Amit Shah 2013-05-14 10:54:20 UTC
system_reset doesn't co-operate with the guest, so this is expected.  I doubt we even support this.  Markus can help answer that question.

Comment 4 Markus Armbruster 2013-06-19 13:15:32 UTC
Comment#3 was a bit too terse for me to understand, so I asked Amit.
Here's what I learned; I'm sure Amit will correct misunderstandings.

system_reset is roughly equivalent to a physical reset button.

The "Guest moved used index" message comes from a virtio device's
virtqueue_num_heads().  It means the guest changed the ring buffer it
shares with QEMU in an unexpected way.

The bug is that QEMU fails to anticipate the effect of reset, namely
the guest initializing the virtio device's ring buffer.  I figrue the
bug makes system_reset effectively useless for guests with virtio
devices.

system_reset being unsupported can't make that not a bug, it can only
make it a bug not worth fixing in RHEL-6.

I can't say off-hand whether system_reset is supported.  See also bug
739597.

Comment 5 Laszlo Ersek 2013-07-15 17:00:44 UTC
See the following sections in the virtio spec (0.9.5):
- 2.2.1 Device Initialization Sequence
- 2.2.2 Virtio Header
- 2.2.2.1 Device Status

If the guest doesn't reset the virtio device during boot, as first step of device initialization, then this is arguably a guest bug. If the guest does write 0 to the Device Status port, and qemu can still exit with the above error, then the bug is in qemu.

Whether the "system_reset" HMP command is supported or not should not matter; the same qemu_system_reset_request() function can be called in response to guest actions too (eg. keyboard controller reset).

... FWIW one could argue that system_reset should power-cycle virtio devices:

http://thread.gmane.org/gmane.comp.emulators.qemu/198545/focus=198604
http://thread.gmane.org/gmane.comp.emulators.qemu/198545/focus=198649

"all hardware except RAM in self-refresh and registers explicitly hooked to battery sleep power rail has reset state on resume from S3 [...] Power on and resume are identical except for memory controller init and what should happen once init is finished"

This could be fixed by registering reset hooks (with qemu_register_reset()) in virtio devices. What should happen in the hook is visible in virtio_ioport_write() / VIRTIO_PCI_STATUS, val==0:
- virtio_pci_stop_ioeventfd()
- virtio_reset()
- msix_unuse_all_vectors()

Comment 6 Paolo Bonzini 2013-08-29 14:25:20 UTC
Reset should happen, see this commit in hw/virtio-pci.c

commit e489030df2448d22b3cb92fd5dcb22c6fa0fc9e1
Author: Michael S. Tsirkin <mst>
Date:   Wed Sep 16 13:40:37 2009 +0300

    qemu/virtio: fix reset with device removal
    
    virtio pci registers its own reset handler, but fails to unregister it,
    which will lead to crashes after device removal.  Solve this problem by
    switching to qdev reset handler, which is automatically unregistered.
    
    Signed-off-by: Michael S. Tsirkin <mst>
    Signed-off-by: Anthony Liguori <aliguori.com>

@@ -534,6 +532,7 @@ static PCIDeviceInfo virtio_info[] = {
             DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
             DEFINE_PROP_END_OF_LIST(),
         },
+        .qdev.reset = virtio_pci_reset,
     },{
         .qdev.name  = "virtio-net-pci",
         .qdev.size  = sizeof(VirtIOPCIProxy),

etc.

It is missing for virtio-scsi-pci, which is probably a separate bug due to bad conflict resolution.  But this bug uses virtio-blk-pci.

Miya, can you reproduce this under gdb, with a breakpoint on "exit", and get a backtrace?

Comment 8 Sibiao Luo 2013-08-30 07:25:12 UTC
(In reply to Paolo Bonzini from comment #6)
> Reset should happen, see this commit in hw/virtio-pci.c
> 
> commit e489030df2448d22b3cb92fd5dcb22c6fa0fc9e1
> Author: Michael S. Tsirkin <mst>
> Date:   Wed Sep 16 13:40:37 2009 +0300
> 
>     qemu/virtio: fix reset with device removal
>     
>     virtio pci registers its own reset handler, but fails to unregister it,
>     which will lead to crashes after device removal.  Solve this problem by
>     switching to qdev reset handler, which is automatically unregistered.
>     
>     Signed-off-by: Michael S. Tsirkin <mst>
>     Signed-off-by: Anthony Liguori <aliguori.com>
> 
> @@ -534,6 +532,7 @@ static PCIDeviceInfo virtio_info[] = {
>              DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2),
>              DEFINE_PROP_END_OF_LIST(),
>          },
> +        .qdev.reset = virtio_pci_reset,
>      },{
>          .qdev.name  = "virtio-net-pci",
>          .qdev.size  = sizeof(VirtIOPCIProxy),
> 
> etc.
> 
> It is missing for virtio-scsi-pci, which is probably a separate bug due to
> bad conflict resolution.  But this bug uses virtio-blk-pci.
> 
> Miya, can you reproduce this under gdb, with a breakpoint on "exit", and get
> a backtrace?
yes, i just tried it for many times and met this issue in my AMD host. note: No os is installed in the image and no vga is specified and with '-nodefconfig -nodefaults' in cli.

host info:
# uname -r && rpm -q qemu-kvm
2.6.32-414.el6.x86_64
qemu-kvm-0.12.1.2-2.398.el6.x86_64

# gdb /usr/libexec/qemu-kvm
(gdb) b exit
Breakpoint 1 at 0x83990
(gdb) r -cpu host -M pc -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.5-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93011 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -drive file=rhel65-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:11,bus=pci.0,addr=0x4,bootindex=0 -vnc :1 -boot menu=on
Starting program: /usr/libexec/qemu-kvm -cpu host -M pc -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.5-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93011 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -drive file=rhel65-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:11,bus=pci.0,addr=0x4,bootindex=0 -vnc :1 -boot menu=on
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) system_reset 
(qemu) Guest moved used index from 710 to 8224
Breakpoint 1, 0x00007ffff4ca1d40 in exit () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff4ca1d40 in exit () from /lib64/libc.so.6
#1  0x00007ffff7f23a33 in virtqueue_num_heads (vq=0x7ffff9ca0030, idx=710)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:297
#2  0x00007ffff7f23f11 in virtqueue_pop (vq=0x7ffff9ca0030, elem=0x7ffffffecc60)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:427
#3  0x00007ffff7ddfe51 in virtio_net_receive (nc=<value optimized out>, buf=<value optimized out>, size=70)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-net.c:722
#4  0x00007ffff7e36494 in qemu_deliver_packet (sender=<value optimized out>, flags=<value optimized out>, 
    data=<value optimized out>, size=<value optimized out>, opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/net.c:446
#5  0x00007ffff7e38746 in qemu_net_queue_deliver (queue=0x7ffff88b1980, sender=0x7ffff86e45f0, flags=0, 
    data=0x7ffff86e4bc4 "", size=70, sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:154
#6  qemu_net_queue_send (queue=0x7ffff88b1980, sender=0x7ffff86e45f0, flags=0, data=0x7ffff86e4bc4 "", size=70, 
    sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:188
#7  0x00007ffff7e3aacc in tap_send (opaque=0x7ffff86e45f0) at /usr/src/debug/qemu-kvm-0.12.1.2/net/tap.c:210
#8  0x00007ffff7dc9e3b in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4053
#9  0x00007ffff7decd3a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2245
#10 0x00007ffff7dcccf9 in main_loop (argc=33, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4266
#11 main (argc=33, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6644
(gdb) 

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             AuthenticAMD
CPU family:            16
Model:                 8
Stepping:              0
CPU MHz:               800.000
BogoMIPS:              4399.63
Virtualization:        AMD-V
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              6144K
NUMA node0 CPU(s):     0,2,4,6,8,10
NUMA node1 CPU(s):     1,3,5,7,9,11

Best Regards,
sluo

Comment 9 Paolo Bonzini 2013-09-06 15:22:53 UTC
The bug happens in virtio-net, adding Jason and Amos.

Comment 12 Fam Zheng 2014-04-28 09:48:24 UTC
The backtrace falls in the same category with

https://bugzilla.redhat.com/show_bug.cgi?id=1067892
(virtio aborts QEMU process on invalid input)

But can be read as a separate question for system_reset's proper resetting of devices.

So leave it open to keep track of it.

Fam

Comment 13 Fam Zheng 2014-07-30 06:09:06 UTC
(In reply to Sibiao Luo from comment #8)
> yes, i just tried it for many times and met this issue in my AMD host. note:
> No os is installed in the image and no vga is specified and with
> '-nodefconfig -nodefaults' in cli.
> 

Debugged the issue together with Jason. I can't hit the code path (tap_send in the backtrace is never called) unless guest is booted and the nic is brought up.

Sibiao, could you try again? If you reproduce it on your machine, I can login to have a look.

Fam

Comment 14 Sibiao Luo 2014-07-30 08:12:51 UTC
(In reply to Fam Zheng from comment #13)
> (In reply to Sibiao Luo from comment #8)
> > yes, i just tried it for many times and met this issue in my AMD host. note:
> > No os is installed in the image and no vga is specified and with
> > '-nodefconfig -nodefaults' in cli.
> > 
> 
> Debugged the issue together with Jason. I can't hit the code path (tap_send
> in the backtrace is never called) unless guest is booted and the nic is
> brought up.
> 
> Sibiao, could you try again? If you reproduce it on your machine, I can
> login to have a look.
> 
> Fam

Still hit i in my AMD host, maybe this issue specified to AMD host indeed. You can use my host to debug it, host_addr: 10.66.85.229 user/passed: root/redhat.

host info:
# uname -r && rpm -q qemu-kvm-rhev
2.6.32-493.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.432.el6.x86_64

(gdb) bt
#0  0x00007ffff483ad40 in exit () from /lib64/libc.so.6
#1  0x00007ffff7f1a553 in virtqueue_num_heads (vq=0x7ffff9cc2030, idx=441)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:297
#2  0x00007ffff7f1aac1 in virtqueue_pop (vq=0x7ffff9cc2030, elem=0x7ffffffecbb0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:448
#3  0x00007ffff7dc59b1 in virtio_net_receive (nc=<value optimized out>, buf=<value optimized out>, size=70)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-net.c:725
#4  0x00007ffff7e24494 in qemu_deliver_packet (sender=<value optimized out>, flags=<value optimized out>, 
    data=<value optimized out>, size=<value optimized out>, opaque=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/net.c:446
#5  0x00007ffff7e26756 in qemu_net_queue_deliver (queue=0x7ffff88b36a0, sender=0x7ffff86e6df0, flags=0, 
    data=0x7ffff86e73c4 "", size=70, sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:154
#6  qemu_net_queue_send (queue=0x7ffff88b36a0, sender=0x7ffff86e6df0, flags=0, data=0x7ffff86e73c4 "", size=70, 
    sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:188
#7  0x00007ffff7e28adc in tap_send (opaque=0x7ffff86e6df0) at /usr/src/debug/qemu-kvm-0.12.1.2/net/tap.c:210
#8  0x00007ffff7daf66b in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4055
#9  0x00007ffff7dd2bba in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2258
#10 0x00007ffff7db2560 in main_loop (argc=37, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4268
#11 main (argc=37, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6711
(gdb) 

Best Regards,
sluo

Comment 15 Fam Zheng 2014-07-31 01:31:30 UTC
Thanks. That host is connected to a pxe network (the Red Hat office network). Since no boot device is connected, pxe boot is used.

The pxe rom doesn't handle virtio-net properly, a misbehaving guest, not a bug of qemu-kvm. (But we do have a BZ 1067892 to avoid exiting of qemu-kvm with a broken vq)

Note that the rom above, only hits the error with "-vga none" (implied by -nodefaults). With default or any -vga, it works fine. So it's a corner case.

Closing as WONTFIX.