Bug 1013418

Summary: qemu-kvm with a virtio-scsi controler without devices attached quits after stop/cont in HMP/QMP
Product: Red Hat Enterprise Linux 7 Reporter: Jun Li <juli>
Component: seabiosAssignee: Laszlo Ersek <lersek>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: acathrow, areis, armbru, hhuang, huding, juzhang, lersek, michen, qzhang, sluo, stefanha, virt-maint, xfu
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: seabios-1.7.2.2-11.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 11:24:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virt-manager screenshot none

Description Jun Li 2013-09-30 02:29:19 UTC
Description of problem:
Boot qemu-kvm with only "virtio-scsi-pci" controller, then run "stop".
qemu-kvm will quit after run "cont" inside the pause status qemu-kvm.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Boot qemu-kvm with only "virtio-scsi-pci".
# /usr/libexec/qemu-kvm -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0,indirect_desc=off,event_idx=off  -spice port=5830,disable-ticketing -vga qxl -monitor stdio
2.run "stop" inside HMP.
(qemu) stop
3.run "cont" inside HMP.
(qemu) cont 

Actual results:
qemu-kvm quit. The error message just like the following:
(qemu) qemu-kvm: virtio: trying to map MMIO memory

Expected results:
qemu-kvm will works well after run "cont" inside HMP.

Additional info:
When add a device to "virtio-scsi-pci" controller, and do above operations, qemu-kvm will works well. 
The command line is just like the following:
# /usr/libexec/qemu-kvm -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0,indirect_desc=off,event_idx=off -drive if=none,id=drive-cd-disk-1,media=cdrom,format=raw,cache=none,werror=stop,rerror=stop -device scsi-cd,drive=drive-cd-disk-1,bus=scsi0.0,id=scsi_cd2,bootindex=2  -spice port=5830,disable-ticketing -vga qxl -monitor stdio

Comment 2 Jun Li 2013-09-30 02:47:12 UTC
Add the Version-Release of component:
Version-Release number of selected component (if applicable):
seabios-1.7.2.2-3.el7.x86_64
qemu-kvm-rhev-1.5.3-7.el7.x86_64
3.10.0-29.el7.x86_64

Comment 3 Ademar Reis 2013-12-10 17:02:03 UTC
Reproduced with upstream packages.

But since HMP will not be supported in RHEL7, I'm closing it as WONTFIX.

Comment 7 Jun Li 2013-12-12 04:31:53 UTC
Created attachment 835576 [details]
virt-manager screenshot

Comment 9 Markus Armbruster 2013-12-19 18:25:57 UTC
Reproduced with upstream qemu and a locally built qemu-kvm-1.5.3-30.el7: crashes with -machine accel=kvm (default in RHEL), doesn't crash with accel=tcg (default upstream).

Comment 10 Markus Armbruster 2013-12-19 18:32:54 UTC
(gdb) bt
#0  virtqueue_map_sg (sg=0x55555672aa70, addr=0x555556724a70, num_sg=1, 
    is_write=0) at /work/armbru/qemu-kvm-rhel7/hw/virtio/virtio.c:420
#1  0x00005555557d8a00 in virtqueue_pop (vq=vq@entry=0x55555659f420, 
    elem=elem@entry=0x555556722a60)
    at /work/armbru/qemu-kvm-rhel7/hw/virtio/virtio.c:497
#2  0x00005555557d1512 in virtio_scsi_pop_req (s=s@entry=0x5555565937b8, 
    vq=vq@entry=0x55555659f420)
    at /work/armbru/qemu-kvm-rhel7/hw/scsi/virtio-scsi.c:116
#3  0x00005555557d1a3b in virtio_scsi_handle_cmd (vdev=0x5555565937b8, vq=
    0x55555659f420) at /work/armbru/qemu-kvm-rhel7/hw/scsi/virtio-scsi.c:357
#4  0x0000555555701d5e in qemu_iohandler_poll (pollfds=0x55555650ae00, 
    ret=ret@entry=6) at /work/armbru/qemu-kvm-rhel7/iohandler.c:143
#5  0x0000555555707756 in main_loop_wait (nonblocking=<optimized out>)
    at /work/armbru/qemu-kvm-rhel7/main-loop.c:465
#6  0x0000555555607b11 in main_loop () at /work/armbru/qemu-kvm-rhel7/vl.c:1984
#7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at /work/armbru/qemu-kvm-rhel7/vl.c:4343
(gdb) p sg[0]
$3 = {iov_base = 0x0, iov_len = 0}
(gdb) p len
$4 = 0

Comment 11 Qunfang Zhang 2013-12-30 05:38:56 UTC
Same error happens when migrating a guest which is during installation. After migration, the qemu-kvm quit on the destination host and prompts:
qemu-kvm: virtio: trying to map MMIO memory

Hi, Markus

Is this the same issue as this bug?  And If I migrate a pre-installed image, there's no problem. The issue only happens when I migrate a guest that is "during installation".

My command line:

/usr/libexec/qemu-kvm -M pc -cpu SandyBridge,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic -enable-kvm -m 2G -smp 4,sockets=4,cores=1,threads=1 -name win8-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -usb -device usb-tablet,id=input1 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device usb-mouse,id=mouse -drive file=/mnt/installation.raw,if=none,id=drive-sata0-0-0,format=raw,cache=none,werror=stop,rerror=stop -device ide-drive,bus=ide.1,unit=0,drive=drive-sata0-0-0,id=sata0-0-0 -drive file=/mnt/en_windows_server_2012_r2_vl_x64_dvd_3319595.iso,if=none,media=cdrom,id=drive-sata0-0-1,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-sata0-0-1,id=sata0-0-1,bootindex=1 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc :10 -vga std -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -qmp tcp:0:5555,server,nowait

Host version:
kernel-3.10.0-64.el7.x86_64
qemu-kvm-1.5.3-30.el7.x86_64


Thanks,
Qunfang

Comment 12 Markus Armbruster 2014-01-13 14:57:03 UTC
I don't know whether this is the same bug.  Could well be.

Comment 13 Laszlo Ersek 2014-01-14 16:55:11 UTC
I reproduced the problem. It doesn't crash for me (I didn't really understand the crash anyway), but it does trigger the path that I thought *should* be triggered instead of crashing:

$ /usr/libexec/qemu-kvm -device virtio-scsi-pci,id=scsi0 -monitor stdio
QEMU 1.5.3 monitor - type 'help' for more information
(qemu) VNC server running on `127.0.0.1:5900'

(qemu) stop
(qemu) cont
(qemu) qemu-kvm: virtio: trying to map MMIO memory

Upstream qemu works correctly, nothing breaks.


I have some results. This is the command line to use:

x86_64-softmmu/qemu-system-x86_64 \
  -device virtio-scsi-pci,id=scsi0 \
  -monitor stdio \
  -bios .../bios.bin \
  -debugcon file:/dev/tty -global isa-debugcon.iobase=0x402

This command line interleaves dprintf()'s from SeaBIOS (the guest) and qemu's printf()'s.

When SeaBIOS starts (RHEL-7 vs. upstream SeaBIOS makes no difference), it tries to interrogate 256 virtio-scsi targets (I found this by adding a debug message to virtio_scsi_cmd), sending virtio-scsi requests. qemu, after waking up, goes through the following call chain:

virtio_scsi_handle_cmd()
  virtio_scsi_pop_req()
    virtqueue_pop()
      virtqueue_num_heads()
        vring_avail_idx()

vring_avail_idx() reads the "available index" virtqueue field from guest memory. This is where the guest communicates the last (= most recent) one of the new descriptor *chains* ("heads") it has put on the available ring.

qemu compares this value against the "last processed, remembered available index", and of course the difference gives the number of the new descriptor chains. qemu then proceeds to fetch each chain, by grabbing all of the constituent descriptors in each chain.

I added some qemu debug messages:
- print the number of new heads (= desc chains) in virtqueue_num_heads(),
- print the *guest-physical* address of the "available index" virtqueue field, in vring_avail_idx().

So, after SeaBIOS iterates through all 256 targets, and fails to boot anything, it basically quiesces in the infinite loop in boot_fail() [src/boot.c]. It obviously submits no further virtio requests of any kind. The last set of debug messages pertaining to virtio-scsi communication are:

[seabios] Searching bootorder for: /pci@i0cf8/*@4/*@0/*@255,0
[seabios] virtio_scsi_cmd: enter, target=255 lun=0
[qemu] vring_avail_idx: pa=00000000000ed802
[qemu[ virtqueue_num_heads: vq=0x7fe822e74820 num_heads=1
                            last_avail=255 next=256
[qemu] vring_avail_idx: pa=00000000000ed802
[qemu] virtqueue_num_heads: vq=0x7fe822e74820
                            num_heads=0 last_avail=256 next=256

There are two calls to virtqueue_num_heads() because virtio_scsi_handle_cmd() calls virtio_scsi_pop_req() in a loop, and we exit that loop when there are no further descriptor chains to process.

At this point, at the qemu monitor prompt, we can issue this command (note that this is the *very first* qemu monitor command, before "stop" or "cont"):

(qemu) x /uw 0x00000000000ed802

which reads the same "available index" (uint16) virtio-ring field from guest RAM. The responses vary; sometimes I get 268 (ie. 268-256==12), sometimes I get 0. Note that this holds for *all four* cases of

{ upstream qemu, RHEL-7 qemu } x { upstream SeaBIOS, RHEL-7 SeaBIOS }

Both cases of:
- "12 new desc chains" and
- "new absolute value of available index == 0"

are garbage of course.

In the first case, RHEL-7 qemu tries to consume those 12 new descriptor chains, and dies with either SIGSEGV or "trying to map MMIO memory" -- obviously the guest hasn't prepared any valid virtio requests.

In the second case, RHEL-7 qemu sees the incorrect jump from the last seen available_idx value of 256 to 0, and dies with

  qemu-system-x86_64: Guest moved used index from 256 to 0

So, the upstream qemu test case and the RHEL-7 qemu test case are identical in that the guest's available index contains garbage after SeaBIOS quiesces in the infinite loop (both upstream SeaBIOS and RHEL-7 SeaBIOS).

The difference is that, when at this point, I issue

(qemu) stop
(qemu) cont

then upstream qemu does *not* enter virtio_scsi_handle_cmd(), whereas RHEL-7 qemu *does* enter virtio_scsi_handle_cmd(). It's a spurious virtio event in RHEL-7 qemu, which causes RHEL-7 to look at the guest's garbage and choke on it. upstream qemu simply doesn't try to pop the guest's garbage.

Comment 14 Laszlo Ersek 2014-01-14 17:43:35 UTC
When I pass ",ioeventfd=false" to "-device virtio-scsi-pci", then RHEL-7 doesn't break. However upstream doesn't break with the default "",ioeventfd=true" either.

Comment 15 Laszlo Ersek 2014-01-14 18:47:53 UTC
Upstream does have the bug. The difference with RHEL-7 is that upstream's default accelerator is TCG, while RHEL-7's is KVM. The problem only hits with -enable-kvm + ioeventfd=true.

For some reason, pausing and unpausing the guest elicits an eventfd notification covering the guest's virtio ring.

Comment 16 Laszlo Ersek 2014-01-15 00:28:08 UTC
This is a SeaBIOS bug.

virtio_scsi_setup() [src/hw/virtio-scsi.c]
  # for each "pci" virtio-scsi HBA:
  init_virtio_scsi(pci)
    vp_init_simple()
    vp_find_vq()
    vp_set_status(ACK | DRIVER | DRIVER_OK)
    # for each "i" in 0..255 inclusive:
      virtio_scsi_scan_target(i)
    free(vq)

Namely, when the HBA is scanned for possible targets in the 0..255 range (inclusive), and none is found (which is our user case here), then we simply call free(vq).

However, by that time we have passed the address of the virtio ring to the hypervisor, plus we set a status of (ACK | DRIVER | DRIVER_OK). This means that the virtio ring is *in use* by the hypervisor, and as soon as free(vq) is called, the virtio-spec has been violated.

In practice, after freeing the area underneath the virtio ring, SeaBIOS's internal memory manager can (and does in fact) repurpose the area, and the new owner write some data into it. Here we branch to two different cases:

(a) When qemu uses TCG, or it uses KVM but with ioeventfd=false, nothing bad happens, seemingly. The bug is masked because:

- SeaBIOS has given up (internally) on the virtio-scsi HBA,

- hence it never kicks (by calling vring_kick()) the HBA, which in practice
  means we never write to the HBA's VIRTIO_PCI_QUEUE_NOTIFY ioport,

- hence qemu thinks that the HBA is simply not being used, and doesn't
  access the (now corrupted) virtio ring.

(b) However when qemu uses KVM with ioeventfd enabled, then KVM (the kernel) *does* immediately notice when the virtio ring is modified (in fact, trampled over). It signals readiness on the eventfd, and qemu tries to pick up the guest's changes. Since those changes are garbage, qemu is fully right to exit.

The solution is clearly to reset the device before freeing the ring.

The bug was introduced in the original "virtio-scsi for SeaBIOS" commit, ie. c5c488f4. I'm listed as reviewer on that commit, so it serves me right to fix the bug.

Comment 17 Laszlo Ersek 2014-01-15 01:07:51 UTC
(In reply to Qunfang Zhang from comment #11)
> Same error happens when migrating a guest which is during installation.
> After migration, the qemu-kvm quit on the destination host and prompts:
> qemu-kvm: virtio: trying to map MMIO memory
> 
> Hi, Markus
> 
> Is this the same issue as this bug?  And If I migrate a pre-installed image,
> there's no problem. The issue only happens when I migrate a guest that is
> "during installation".

You could be at some point during the installation when the guest kernel's virtio-scsi driver has not yet loaded or reset the device. Qemu likely thinks that the virtio ring is still in use, at the old address, but that area has been overwritten sometime after SeaBIOS passed control to the guest kernel. So the root cause could be the same, yes.

KVM seems to signal the overwritten ring area to qemu (via the eventfd) on vmstate change. Because that's the commonality between the "cont" monitor command and the incoming migration: vmstate change.

virtio_pci_vmstate_change(running=true)
  virtio_pci_start_ioeventfd()
    virtio_pci_set_host_notifier_internal(assign=true, set_handler=true)
      event_notifier_init(..., active=1)
        event_notifier_set()
      virtio_queue_set_host_notifier_fd_handler()
        event_notifier_set_handler(..., virtio_queue_host_notifier_read)
      memory_region_add_eventfd()

So: when the vmstate changes to "running", either due to "cont", or due to incoming migration, then the virtio machinery makes sure that the eventfd immediately fires after creating/registering it, *even without the guest kicking it*.

See

commit ade80dc84527ae7418e9fcaf33e09574da0d2b29
Author: Michael S. Tsirkin <mst>
Date:   Wed Mar 17 13:08:13 2010 +0200

    virtio-pci: fill in notifier support

commit 25db9ebe15125deb32958c6df74996f745edf1f9
Author: Stefan Hajnoczi <stefanha.ibm.com>
Date:   Fri Dec 17 12:01:50 2010 +0000

    virtio-pci: Use ioeventfd for virtqueue notify

Comment 18 Laszlo Ersek 2014-01-15 01:17:09 UTC
The complete solution would be to reset all virtio devices
- after grub loads the kernel and the initrd using BIOS services and 
- before grub transfers control to the kernel.

Namely, after grub has transferred control, but the kernel has not yet reinitialized the virtio devices, all virtio rings are invalid. Normally, nothing would "kick" in the guest during this interval, therefore qemu wouldn't look at those rings. However a vmstate change or an incoming migration forces a kick (under KVM and when ioeventfd is enabled), and then qemu chokes on the bad guest state.

I think this is a "blind spot" in VM lifecycle that we'll have to live with for now.

Comment 19 Laszlo Ersek 2014-01-15 01:50:24 UTC
Posted upstream patch for the report in comment 0:

http://news.gmane.org/find-root.php?message_id=1389750520-9778-1-git-send-email-lersek@redhat.com

Comment 20 Markus Armbruster 2014-01-15 09:24:55 UTC
Nice debugging work, Laszlo!

I agree there's a guest bug to be fixed.  But does QEMU need work,
too?

In comment#16 you wrote "Since those changes are garbage, qemu is
fully right to exit."  It's indeed okay for QEMU to throw a fatal
error when the guest screws something up that cannot be recovered
from.  Is this error really not recoverable?

The error message "virtio: trying to map MMIO memory" is useless for
non-developers.  Heck, it's close to useless for most developers!
Could it be improved?

SEGV is not an acceptable way to report a fatal error.  I observed
one (comment#10), but I can't reproduce it anymore.  *Shrug*

Comment 21 Laszlo Ersek 2014-01-15 10:35:23 UTC
(In reply to Markus Armbruster from comment #20)
> Nice debugging work, Laszlo!
> 
> I agree there's a guest bug to be fixed.  But does QEMU need work,
> too?
> 
> In comment#16 you wrote "Since those changes are garbage, qemu is
> fully right to exit."  It's indeed okay for QEMU to throw a fatal
> error when the guest screws something up that cannot be recovered
> from.  Is this error really not recoverable?

No, it's not recoverable. As far as qemu is concerned, the guest is making a virtio request (its "available index" has changed), and that request is invalid (either the index itself, or the descriptor chains advertised by the index).

(from comment #17)
> virtio_pci_vmstate_change(running=true)
>   virtio_pci_start_ioeventfd()
>     virtio_pci_set_host_notifier_internal(assign=true, set_handler=true)
>       event_notifier_init(..., active=1)
>         event_notifier_set()
>       virtio_queue_set_host_notifier_fd_handler()
>         event_notifier_set_handler(..., virtio_queue_host_notifier_read)
>       memory_region_add_eventfd()
> 
> So: when the vmstate changes to "running", either due to "cont", or due to
> incoming migration, then the virtio machinery makes sure that the eventfd
> immediately fires after creating/registering it, *even without the guest
> kicking it*.

Note the event_notifier_set() call -- virtio_pci_set_host_notifier_internal() passes "1" as "active" to event_notifier_init(). This is necessary because by the time we enable the ioeventfd in qemu, the guest might have pre-populated the virtio ring with requests, and we need to start processing those immediately, otherwise the guest might never progress afterwards -- it could be waiting for responses indefinitely without submitting further requests and kicking the host again (and of course the queue could be full so there might not even be *room* for further requests).

So the logic to look at the ring as soon as the ioeventfd is enabled seems sane (in any case I'm CC'ing Stefan), and the guest's data on the ring is indeed invalid. I'm not sure if this "popping from the ring" could be masked/delayed (as an exception) for stop/cont and migration. I think Stefan did consider them (see 25db9ebe). We'd risk a deadlock otherwise, see above.

> The error message "virtio: trying to map MMIO memory" is useless for
> non-developers.  Heck, it's close to useless for most developers!
> Could it be improved?

Yes, we could say "invalid virtio request from guest: trying to map MMIO memory". Because that's the immediate, technical problem with the request (when the corrupted "available index" itself happens to be valid) -- if a descriptor in the descriptor chain being processed points at a guest buffer that qemu knows to be non-RAM, then this is the error to report. We could only make clearer that the problem originates in the guet.

Comment 28 Miroslav Rezanina 2014-02-05 11:42:05 UTC
Fix included in seabios-1.7.2.2-11.el7

Comment 30 huiqingding 2014-02-08 03:10:22 UTC
Reproduce this bug using the following version:
qemu-kvm-1.5.3-45.el7.x86_64
kernel-3.10.0-84.el7.x86_64
seabios-1.7.2.2-10.el7.x86_64

Steps to Reproduce:
1.Boot qemu-kvm with only "virtio-scsi-pci".
# /usr/libexec/qemu-kvm -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0,indirect_desc=off,event_idx=off  -spice port=5830,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:4445,server,nowait
2.run "stop" inside QMP.
{"execute":"stop"}
3.run "cont" inside QMP.
{"execute":"cont"}

Actual results:
After step3, qemu-kvm quits with error information:
(qemu) qemu-kvm: virtio: trying to map MMIO memory

Verify this bug using the following version:
qemu-kvm-1.5.3-45.el7.x86_64
kernel-3.10.0-84.el7.x86_64
seabios-1.7.2.2-11.el7.x86_64

Steps to Verification:
1.Boot qemu-kvm with only "virtio-scsi-pci".
# /usr/libexec/qemu-kvm -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0,indirect_desc=off,event_idx=off  -spice port=5830,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:4445,server,nowait
2.run "stop" inside QMP.
{"execute":"stop"}
3.run "cont" inside QMP.
{"execute":"cont"}

Actual results:
After step3, qemu-kvm does not quit and check "info status" in HMP:
(qemu) info status
VM status: running

Based on the above result, I thinks this bug has been fixed.

Comment 32 Ludek Smid 2014-06-13 11:24:56 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.