Bug 962413
Summary: | qemu-kvm exits after system_reset : Guest moved used index from 648 to 8224 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Miya Chen <michen> |
Component: | qemu-kvm | Assignee: | Fam Zheng <famz> |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 6.5 | CC: | acathrow, akong, amit.shah, armbru, bsarathy, chayang, jasowang, juzhang, lersek, mkenneth, pbonzini, qzhang, sluo, virt-maint, xutian, xwei |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-07-31 01:31:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Miya Chen
2013-05-13 12:02:09 UTC
Can be reproduced with qemu-kvm-366 with "-M rhel6.4.0" machine type as well. system_reset doesn't co-operate with the guest, so this is expected. I doubt we even support this. Markus can help answer that question. Comment#3 was a bit too terse for me to understand, so I asked Amit. Here's what I learned; I'm sure Amit will correct misunderstandings. system_reset is roughly equivalent to a physical reset button. The "Guest moved used index" message comes from a virtio device's virtqueue_num_heads(). It means the guest changed the ring buffer it shares with QEMU in an unexpected way. The bug is that QEMU fails to anticipate the effect of reset, namely the guest initializing the virtio device's ring buffer. I figrue the bug makes system_reset effectively useless for guests with virtio devices. system_reset being unsupported can't make that not a bug, it can only make it a bug not worth fixing in RHEL-6. I can't say off-hand whether system_reset is supported. See also bug 739597. See the following sections in the virtio spec (0.9.5): - 2.2.1 Device Initialization Sequence - 2.2.2 Virtio Header - 2.2.2.1 Device Status If the guest doesn't reset the virtio device during boot, as first step of device initialization, then this is arguably a guest bug. If the guest does write 0 to the Device Status port, and qemu can still exit with the above error, then the bug is in qemu. Whether the "system_reset" HMP command is supported or not should not matter; the same qemu_system_reset_request() function can be called in response to guest actions too (eg. keyboard controller reset). ... FWIW one could argue that system_reset should power-cycle virtio devices: http://thread.gmane.org/gmane.comp.emulators.qemu/198545/focus=198604 http://thread.gmane.org/gmane.comp.emulators.qemu/198545/focus=198649 "all hardware except RAM in self-refresh and registers explicitly hooked to battery sleep power rail has reset state on resume from S3 [...] Power on and resume are identical except for memory controller init and what should happen once init is finished" This could be fixed by registering reset hooks (with qemu_register_reset()) in virtio devices. What should happen in the hook is visible in virtio_ioport_write() / VIRTIO_PCI_STATUS, val==0: - virtio_pci_stop_ioeventfd() - virtio_reset() - msix_unuse_all_vectors() Reset should happen, see this commit in hw/virtio-pci.c commit e489030df2448d22b3cb92fd5dcb22c6fa0fc9e1 Author: Michael S. Tsirkin <mst> Date: Wed Sep 16 13:40:37 2009 +0300 qemu/virtio: fix reset with device removal virtio pci registers its own reset handler, but fails to unregister it, which will lead to crashes after device removal. Solve this problem by switching to qdev reset handler, which is automatically unregistered. Signed-off-by: Michael S. Tsirkin <mst> Signed-off-by: Anthony Liguori <aliguori.com> @@ -534,6 +532,7 @@ static PCIDeviceInfo virtio_info[] = { DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2), DEFINE_PROP_END_OF_LIST(), }, + .qdev.reset = virtio_pci_reset, },{ .qdev.name = "virtio-net-pci", .qdev.size = sizeof(VirtIOPCIProxy), etc. It is missing for virtio-scsi-pci, which is probably a separate bug due to bad conflict resolution. But this bug uses virtio-blk-pci. Miya, can you reproduce this under gdb, with a breakpoint on "exit", and get a backtrace? (In reply to Paolo Bonzini from comment #6) > Reset should happen, see this commit in hw/virtio-pci.c > > commit e489030df2448d22b3cb92fd5dcb22c6fa0fc9e1 > Author: Michael S. Tsirkin <mst> > Date: Wed Sep 16 13:40:37 2009 +0300 > > qemu/virtio: fix reset with device removal > > virtio pci registers its own reset handler, but fails to unregister it, > which will lead to crashes after device removal. Solve this problem by > switching to qdev reset handler, which is automatically unregistered. > > Signed-off-by: Michael S. Tsirkin <mst> > Signed-off-by: Anthony Liguori <aliguori.com> > > @@ -534,6 +532,7 @@ static PCIDeviceInfo virtio_info[] = { > DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 2), > DEFINE_PROP_END_OF_LIST(), > }, > + .qdev.reset = virtio_pci_reset, > },{ > .qdev.name = "virtio-net-pci", > .qdev.size = sizeof(VirtIOPCIProxy), > > etc. > > It is missing for virtio-scsi-pci, which is probably a separate bug due to > bad conflict resolution. But this bug uses virtio-blk-pci. > > Miya, can you reproduce this under gdb, with a breakpoint on "exit", and get > a backtrace? yes, i just tried it for many times and met this issue in my AMD host. note: No os is installed in the image and no vga is specified and with '-nodefconfig -nodefaults' in cli. host info: # uname -r && rpm -q qemu-kvm 2.6.32-414.el6.x86_64 qemu-kvm-0.12.1.2-2.398.el6.x86_64 # gdb /usr/libexec/qemu-kvm (gdb) b exit Breakpoint 1 at 0x83990 (gdb) r -cpu host -M pc -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.5-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93011 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -drive file=rhel65-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:11,bus=pci.0,addr=0x4,bootindex=0 -vnc :1 -boot menu=on Starting program: /usr/libexec/qemu-kvm -cpu host -M pc -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.5-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93011 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -drive file=rhel65-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:11,bus=pci.0,addr=0x4,bootindex=0 -vnc :1 -boot menu=on (qemu) system_reset (qemu) system_reset (qemu) system_reset (qemu) system_reset (qemu) system_reset (qemu) Guest moved used index from 710 to 8224 Breakpoint 1, 0x00007ffff4ca1d40 in exit () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff4ca1d40 in exit () from /lib64/libc.so.6 #1 0x00007ffff7f23a33 in virtqueue_num_heads (vq=0x7ffff9ca0030, idx=710) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:297 #2 0x00007ffff7f23f11 in virtqueue_pop (vq=0x7ffff9ca0030, elem=0x7ffffffecc60) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:427 #3 0x00007ffff7ddfe51 in virtio_net_receive (nc=<value optimized out>, buf=<value optimized out>, size=70) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-net.c:722 #4 0x00007ffff7e36494 in qemu_deliver_packet (sender=<value optimized out>, flags=<value optimized out>, data=<value optimized out>, size=<value optimized out>, opaque=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net.c:446 #5 0x00007ffff7e38746 in qemu_net_queue_deliver (queue=0x7ffff88b1980, sender=0x7ffff86e45f0, flags=0, data=0x7ffff86e4bc4 "", size=70, sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:154 #6 qemu_net_queue_send (queue=0x7ffff88b1980, sender=0x7ffff86e45f0, flags=0, data=0x7ffff86e4bc4 "", size=70, sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:188 #7 0x00007ffff7e3aacc in tap_send (opaque=0x7ffff86e45f0) at /usr/src/debug/qemu-kvm-0.12.1.2/net/tap.c:210 #8 0x00007ffff7dc9e3b in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4053 #9 0x00007ffff7decd3a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2245 #10 0x00007ffff7dcccf9 in main_loop (argc=33, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4266 #11 main (argc=33, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6644 (gdb) # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 1 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 16 Model: 8 Stepping: 0 CPU MHz: 800.000 BogoMIPS: 4399.63 Virtualization: AMD-V L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 6144K NUMA node0 CPU(s): 0,2,4,6,8,10 NUMA node1 CPU(s): 1,3,5,7,9,11 Best Regards, sluo The bug happens in virtio-net, adding Jason and Amos. The backtrace falls in the same category with https://bugzilla.redhat.com/show_bug.cgi?id=1067892 (virtio aborts QEMU process on invalid input) But can be read as a separate question for system_reset's proper resetting of devices. So leave it open to keep track of it. Fam (In reply to Sibiao Luo from comment #8) > yes, i just tried it for many times and met this issue in my AMD host. note: > No os is installed in the image and no vga is specified and with > '-nodefconfig -nodefaults' in cli. > Debugged the issue together with Jason. I can't hit the code path (tap_send in the backtrace is never called) unless guest is booted and the nic is brought up. Sibiao, could you try again? If you reproduce it on your machine, I can login to have a look. Fam (In reply to Fam Zheng from comment #13) > (In reply to Sibiao Luo from comment #8) > > yes, i just tried it for many times and met this issue in my AMD host. note: > > No os is installed in the image and no vga is specified and with > > '-nodefconfig -nodefaults' in cli. > > > > Debugged the issue together with Jason. I can't hit the code path (tap_send > in the backtrace is never called) unless guest is booted and the nic is > brought up. > > Sibiao, could you try again? If you reproduce it on your machine, I can > login to have a look. > > Fam Still hit i in my AMD host, maybe this issue specified to AMD host indeed. You can use my host to debug it, host_addr: 10.66.85.229 user/passed: root/redhat. host info: # uname -r && rpm -q qemu-kvm-rhev 2.6.32-493.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.432.el6.x86_64 (gdb) bt #0 0x00007ffff483ad40 in exit () from /lib64/libc.so.6 #1 0x00007ffff7f1a553 in virtqueue_num_heads (vq=0x7ffff9cc2030, idx=441) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:297 #2 0x00007ffff7f1aac1 in virtqueue_pop (vq=0x7ffff9cc2030, elem=0x7ffffffecbb0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio.c:448 #3 0x00007ffff7dc59b1 in virtio_net_receive (nc=<value optimized out>, buf=<value optimized out>, size=70) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-net.c:725 #4 0x00007ffff7e24494 in qemu_deliver_packet (sender=<value optimized out>, flags=<value optimized out>, data=<value optimized out>, size=<value optimized out>, opaque=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net.c:446 #5 0x00007ffff7e26756 in qemu_net_queue_deliver (queue=0x7ffff88b36a0, sender=0x7ffff86e6df0, flags=0, data=0x7ffff86e73c4 "", size=70, sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:154 #6 qemu_net_queue_send (queue=0x7ffff88b36a0, sender=0x7ffff86e6df0, flags=0, data=0x7ffff86e73c4 "", size=70, sent_cb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/net/queue.c:188 #7 0x00007ffff7e28adc in tap_send (opaque=0x7ffff86e6df0) at /usr/src/debug/qemu-kvm-0.12.1.2/net/tap.c:210 #8 0x00007ffff7daf66b in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4055 #9 0x00007ffff7dd2bba in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2258 #10 0x00007ffff7db2560 in main_loop (argc=37, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4268 #11 main (argc=37, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6711 (gdb) Best Regards, sluo Thanks. That host is connected to a pxe network (the Red Hat office network). Since no boot device is connected, pxe boot is used. The pxe rom doesn't handle virtio-net properly, a misbehaving guest, not a bug of qemu-kvm. (But we do have a BZ 1067892 to avoid exiting of qemu-kvm with a broken vq) Note that the rom above, only hits the error with "-vga none" (implied by -nodefaults). With default or any -vga, it works fine. So it's a corner case. Closing as WONTFIX. |