Bug 1013418
Summary: | qemu-kvm with a virtio-scsi controler without devices attached quits after stop/cont in HMP/QMP | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jun Li <juli> | ||||
Component: | seabios | Assignee: | Laszlo Ersek <lersek> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.0 | CC: | acathrow, areis, armbru, hhuang, huding, juzhang, lersek, michen, qzhang, sluo, stefanha, virt-maint, xfu | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | seabios-1.7.2.2-11.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-06-13 11:24:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jun Li
2013-09-30 02:29:19 UTC
Add the Version-Release of component: Version-Release number of selected component (if applicable): seabios-1.7.2.2-3.el7.x86_64 qemu-kvm-rhev-1.5.3-7.el7.x86_64 3.10.0-29.el7.x86_64 Reproduced with upstream packages. But since HMP will not be supported in RHEL7, I'm closing it as WONTFIX. Created attachment 835576 [details]
virt-manager screenshot
Reproduced with upstream qemu and a locally built qemu-kvm-1.5.3-30.el7: crashes with -machine accel=kvm (default in RHEL), doesn't crash with accel=tcg (default upstream). (gdb) bt #0 virtqueue_map_sg (sg=0x55555672aa70, addr=0x555556724a70, num_sg=1, is_write=0) at /work/armbru/qemu-kvm-rhel7/hw/virtio/virtio.c:420 #1 0x00005555557d8a00 in virtqueue_pop (vq=vq@entry=0x55555659f420, elem=elem@entry=0x555556722a60) at /work/armbru/qemu-kvm-rhel7/hw/virtio/virtio.c:497 #2 0x00005555557d1512 in virtio_scsi_pop_req (s=s@entry=0x5555565937b8, vq=vq@entry=0x55555659f420) at /work/armbru/qemu-kvm-rhel7/hw/scsi/virtio-scsi.c:116 #3 0x00005555557d1a3b in virtio_scsi_handle_cmd (vdev=0x5555565937b8, vq= 0x55555659f420) at /work/armbru/qemu-kvm-rhel7/hw/scsi/virtio-scsi.c:357 #4 0x0000555555701d5e in qemu_iohandler_poll (pollfds=0x55555650ae00, ret=ret@entry=6) at /work/armbru/qemu-kvm-rhel7/iohandler.c:143 #5 0x0000555555707756 in main_loop_wait (nonblocking=<optimized out>) at /work/armbru/qemu-kvm-rhel7/main-loop.c:465 #6 0x0000555555607b11 in main_loop () at /work/armbru/qemu-kvm-rhel7/vl.c:1984 #7 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /work/armbru/qemu-kvm-rhel7/vl.c:4343 (gdb) p sg[0] $3 = {iov_base = 0x0, iov_len = 0} (gdb) p len $4 = 0 Same error happens when migrating a guest which is during installation. After migration, the qemu-kvm quit on the destination host and prompts: qemu-kvm: virtio: trying to map MMIO memory Hi, Markus Is this the same issue as this bug? And If I migrate a pre-installed image, there's no problem. The issue only happens when I migrate a guest that is "during installation". My command line: /usr/libexec/qemu-kvm -M pc -cpu SandyBridge,hv_spinlocks=0x1fff,hv_relaxed,hv_vapic -enable-kvm -m 2G -smp 4,sockets=4,cores=1,threads=1 -name win8-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -usb -device usb-tablet,id=input1 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device usb-mouse,id=mouse -drive file=/mnt/installation.raw,if=none,id=drive-sata0-0-0,format=raw,cache=none,werror=stop,rerror=stop -device ide-drive,bus=ide.1,unit=0,drive=drive-sata0-0-0,id=sata0-0-0 -drive file=/mnt/en_windows_server_2012_r2_vl_x64_dvd_3319595.iso,if=none,media=cdrom,id=drive-sata0-0-1,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-sata0-0-1,id=sata0-0-1,bootindex=1 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc :10 -vga std -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -qmp tcp:0:5555,server,nowait Host version: kernel-3.10.0-64.el7.x86_64 qemu-kvm-1.5.3-30.el7.x86_64 Thanks, Qunfang I don't know whether this is the same bug. Could well be. I reproduced the problem. It doesn't crash for me (I didn't really understand the crash anyway), but it does trigger the path that I thought *should* be triggered instead of crashing: $ /usr/libexec/qemu-kvm -device virtio-scsi-pci,id=scsi0 -monitor stdio QEMU 1.5.3 monitor - type 'help' for more information (qemu) VNC server running on `127.0.0.1:5900' (qemu) stop (qemu) cont (qemu) qemu-kvm: virtio: trying to map MMIO memory Upstream qemu works correctly, nothing breaks. I have some results. This is the command line to use: x86_64-softmmu/qemu-system-x86_64 \ -device virtio-scsi-pci,id=scsi0 \ -monitor stdio \ -bios .../bios.bin \ -debugcon file:/dev/tty -global isa-debugcon.iobase=0x402 This command line interleaves dprintf()'s from SeaBIOS (the guest) and qemu's printf()'s. When SeaBIOS starts (RHEL-7 vs. upstream SeaBIOS makes no difference), it tries to interrogate 256 virtio-scsi targets (I found this by adding a debug message to virtio_scsi_cmd), sending virtio-scsi requests. qemu, after waking up, goes through the following call chain: virtio_scsi_handle_cmd() virtio_scsi_pop_req() virtqueue_pop() virtqueue_num_heads() vring_avail_idx() vring_avail_idx() reads the "available index" virtqueue field from guest memory. This is where the guest communicates the last (= most recent) one of the new descriptor *chains* ("heads") it has put on the available ring. qemu compares this value against the "last processed, remembered available index", and of course the difference gives the number of the new descriptor chains. qemu then proceeds to fetch each chain, by grabbing all of the constituent descriptors in each chain. I added some qemu debug messages: - print the number of new heads (= desc chains) in virtqueue_num_heads(), - print the *guest-physical* address of the "available index" virtqueue field, in vring_avail_idx(). So, after SeaBIOS iterates through all 256 targets, and fails to boot anything, it basically quiesces in the infinite loop in boot_fail() [src/boot.c]. It obviously submits no further virtio requests of any kind. The last set of debug messages pertaining to virtio-scsi communication are: [seabios] Searching bootorder for: /pci@i0cf8/*@4/*@0/*@255,0 [seabios] virtio_scsi_cmd: enter, target=255 lun=0 [qemu] vring_avail_idx: pa=00000000000ed802 [qemu[ virtqueue_num_heads: vq=0x7fe822e74820 num_heads=1 last_avail=255 next=256 [qemu] vring_avail_idx: pa=00000000000ed802 [qemu] virtqueue_num_heads: vq=0x7fe822e74820 num_heads=0 last_avail=256 next=256 There are two calls to virtqueue_num_heads() because virtio_scsi_handle_cmd() calls virtio_scsi_pop_req() in a loop, and we exit that loop when there are no further descriptor chains to process. At this point, at the qemu monitor prompt, we can issue this command (note that this is the *very first* qemu monitor command, before "stop" or "cont"): (qemu) x /uw 0x00000000000ed802 which reads the same "available index" (uint16) virtio-ring field from guest RAM. The responses vary; sometimes I get 268 (ie. 268-256==12), sometimes I get 0. Note that this holds for *all four* cases of { upstream qemu, RHEL-7 qemu } x { upstream SeaBIOS, RHEL-7 SeaBIOS } Both cases of: - "12 new desc chains" and - "new absolute value of available index == 0" are garbage of course. In the first case, RHEL-7 qemu tries to consume those 12 new descriptor chains, and dies with either SIGSEGV or "trying to map MMIO memory" -- obviously the guest hasn't prepared any valid virtio requests. In the second case, RHEL-7 qemu sees the incorrect jump from the last seen available_idx value of 256 to 0, and dies with qemu-system-x86_64: Guest moved used index from 256 to 0 So, the upstream qemu test case and the RHEL-7 qemu test case are identical in that the guest's available index contains garbage after SeaBIOS quiesces in the infinite loop (both upstream SeaBIOS and RHEL-7 SeaBIOS). The difference is that, when at this point, I issue (qemu) stop (qemu) cont then upstream qemu does *not* enter virtio_scsi_handle_cmd(), whereas RHEL-7 qemu *does* enter virtio_scsi_handle_cmd(). It's a spurious virtio event in RHEL-7 qemu, which causes RHEL-7 to look at the guest's garbage and choke on it. upstream qemu simply doesn't try to pop the guest's garbage. When I pass ",ioeventfd=false" to "-device virtio-scsi-pci", then RHEL-7 doesn't break. However upstream doesn't break with the default "",ioeventfd=true" either. Upstream does have the bug. The difference with RHEL-7 is that upstream's default accelerator is TCG, while RHEL-7's is KVM. The problem only hits with -enable-kvm + ioeventfd=true. For some reason, pausing and unpausing the guest elicits an eventfd notification covering the guest's virtio ring. This is a SeaBIOS bug. virtio_scsi_setup() [src/hw/virtio-scsi.c] # for each "pci" virtio-scsi HBA: init_virtio_scsi(pci) vp_init_simple() vp_find_vq() vp_set_status(ACK | DRIVER | DRIVER_OK) # for each "i" in 0..255 inclusive: virtio_scsi_scan_target(i) free(vq) Namely, when the HBA is scanned for possible targets in the 0..255 range (inclusive), and none is found (which is our user case here), then we simply call free(vq). However, by that time we have passed the address of the virtio ring to the hypervisor, plus we set a status of (ACK | DRIVER | DRIVER_OK). This means that the virtio ring is *in use* by the hypervisor, and as soon as free(vq) is called, the virtio-spec has been violated. In practice, after freeing the area underneath the virtio ring, SeaBIOS's internal memory manager can (and does in fact) repurpose the area, and the new owner write some data into it. Here we branch to two different cases: (a) When qemu uses TCG, or it uses KVM but with ioeventfd=false, nothing bad happens, seemingly. The bug is masked because: - SeaBIOS has given up (internally) on the virtio-scsi HBA, - hence it never kicks (by calling vring_kick()) the HBA, which in practice means we never write to the HBA's VIRTIO_PCI_QUEUE_NOTIFY ioport, - hence qemu thinks that the HBA is simply not being used, and doesn't access the (now corrupted) virtio ring. (b) However when qemu uses KVM with ioeventfd enabled, then KVM (the kernel) *does* immediately notice when the virtio ring is modified (in fact, trampled over). It signals readiness on the eventfd, and qemu tries to pick up the guest's changes. Since those changes are garbage, qemu is fully right to exit. The solution is clearly to reset the device before freeing the ring. The bug was introduced in the original "virtio-scsi for SeaBIOS" commit, ie. c5c488f4. I'm listed as reviewer on that commit, so it serves me right to fix the bug. (In reply to Qunfang Zhang from comment #11) > Same error happens when migrating a guest which is during installation. > After migration, the qemu-kvm quit on the destination host and prompts: > qemu-kvm: virtio: trying to map MMIO memory > > Hi, Markus > > Is this the same issue as this bug? And If I migrate a pre-installed image, > there's no problem. The issue only happens when I migrate a guest that is > "during installation". You could be at some point during the installation when the guest kernel's virtio-scsi driver has not yet loaded or reset the device. Qemu likely thinks that the virtio ring is still in use, at the old address, but that area has been overwritten sometime after SeaBIOS passed control to the guest kernel. So the root cause could be the same, yes. KVM seems to signal the overwritten ring area to qemu (via the eventfd) on vmstate change. Because that's the commonality between the "cont" monitor command and the incoming migration: vmstate change. virtio_pci_vmstate_change(running=true) virtio_pci_start_ioeventfd() virtio_pci_set_host_notifier_internal(assign=true, set_handler=true) event_notifier_init(..., active=1) event_notifier_set() virtio_queue_set_host_notifier_fd_handler() event_notifier_set_handler(..., virtio_queue_host_notifier_read) memory_region_add_eventfd() So: when the vmstate changes to "running", either due to "cont", or due to incoming migration, then the virtio machinery makes sure that the eventfd immediately fires after creating/registering it, *even without the guest kicking it*. See commit ade80dc84527ae7418e9fcaf33e09574da0d2b29 Author: Michael S. Tsirkin <mst> Date: Wed Mar 17 13:08:13 2010 +0200 virtio-pci: fill in notifier support commit 25db9ebe15125deb32958c6df74996f745edf1f9 Author: Stefan Hajnoczi <stefanha.ibm.com> Date: Fri Dec 17 12:01:50 2010 +0000 virtio-pci: Use ioeventfd for virtqueue notify The complete solution would be to reset all virtio devices - after grub loads the kernel and the initrd using BIOS services and - before grub transfers control to the kernel. Namely, after grub has transferred control, but the kernel has not yet reinitialized the virtio devices, all virtio rings are invalid. Normally, nothing would "kick" in the guest during this interval, therefore qemu wouldn't look at those rings. However a vmstate change or an incoming migration forces a kick (under KVM and when ioeventfd is enabled), and then qemu chokes on the bad guest state. I think this is a "blind spot" in VM lifecycle that we'll have to live with for now. Posted upstream patch for the report in comment 0: http://news.gmane.org/find-root.php?message_id=1389750520-9778-1-git-send-email-lersek@redhat.com Nice debugging work, Laszlo! I agree there's a guest bug to be fixed. But does QEMU need work, too? In comment#16 you wrote "Since those changes are garbage, qemu is fully right to exit." It's indeed okay for QEMU to throw a fatal error when the guest screws something up that cannot be recovered from. Is this error really not recoverable? The error message "virtio: trying to map MMIO memory" is useless for non-developers. Heck, it's close to useless for most developers! Could it be improved? SEGV is not an acceptable way to report a fatal error. I observed one (comment#10), but I can't reproduce it anymore. *Shrug* (In reply to Markus Armbruster from comment #20) > Nice debugging work, Laszlo! > > I agree there's a guest bug to be fixed. But does QEMU need work, > too? > > In comment#16 you wrote "Since those changes are garbage, qemu is > fully right to exit." It's indeed okay for QEMU to throw a fatal > error when the guest screws something up that cannot be recovered > from. Is this error really not recoverable? No, it's not recoverable. As far as qemu is concerned, the guest is making a virtio request (its "available index" has changed), and that request is invalid (either the index itself, or the descriptor chains advertised by the index). (from comment #17) > virtio_pci_vmstate_change(running=true) > virtio_pci_start_ioeventfd() > virtio_pci_set_host_notifier_internal(assign=true, set_handler=true) > event_notifier_init(..., active=1) > event_notifier_set() > virtio_queue_set_host_notifier_fd_handler() > event_notifier_set_handler(..., virtio_queue_host_notifier_read) > memory_region_add_eventfd() > > So: when the vmstate changes to "running", either due to "cont", or due to > incoming migration, then the virtio machinery makes sure that the eventfd > immediately fires after creating/registering it, *even without the guest > kicking it*. Note the event_notifier_set() call -- virtio_pci_set_host_notifier_internal() passes "1" as "active" to event_notifier_init(). This is necessary because by the time we enable the ioeventfd in qemu, the guest might have pre-populated the virtio ring with requests, and we need to start processing those immediately, otherwise the guest might never progress afterwards -- it could be waiting for responses indefinitely without submitting further requests and kicking the host again (and of course the queue could be full so there might not even be *room* for further requests). So the logic to look at the ring as soon as the ioeventfd is enabled seems sane (in any case I'm CC'ing Stefan), and the guest's data on the ring is indeed invalid. I'm not sure if this "popping from the ring" could be masked/delayed (as an exception) for stop/cont and migration. I think Stefan did consider them (see 25db9ebe). We'd risk a deadlock otherwise, see above. > The error message "virtio: trying to map MMIO memory" is useless for > non-developers. Heck, it's close to useless for most developers! > Could it be improved? Yes, we could say "invalid virtio request from guest: trying to map MMIO memory". Because that's the immediate, technical problem with the request (when the corrupted "available index" itself happens to be valid) -- if a descriptor in the descriptor chain being processed points at a guest buffer that qemu knows to be non-RAM, then this is the error to report. We could only make clearer that the problem originates in the guet. Fix included in seabios-1.7.2.2-11.el7 Reproduce this bug using the following version: qemu-kvm-1.5.3-45.el7.x86_64 kernel-3.10.0-84.el7.x86_64 seabios-1.7.2.2-10.el7.x86_64 Steps to Reproduce: 1.Boot qemu-kvm with only "virtio-scsi-pci". # /usr/libexec/qemu-kvm -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0,indirect_desc=off,event_idx=off -spice port=5830,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:4445,server,nowait 2.run "stop" inside QMP. {"execute":"stop"} 3.run "cont" inside QMP. {"execute":"cont"} Actual results: After step3, qemu-kvm quits with error information: (qemu) qemu-kvm: virtio: trying to map MMIO memory Verify this bug using the following version: qemu-kvm-1.5.3-45.el7.x86_64 kernel-3.10.0-84.el7.x86_64 seabios-1.7.2.2-11.el7.x86_64 Steps to Verification: 1.Boot qemu-kvm with only "virtio-scsi-pci". # /usr/libexec/qemu-kvm -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0,indirect_desc=off,event_idx=off -spice port=5830,disable-ticketing -vga qxl -monitor stdio -qmp tcp:0:4445,server,nowait 2.run "stop" inside QMP. {"execute":"stop"} 3.run "cont" inside QMP. {"execute":"cont"} Actual results: After step3, qemu-kvm does not quit and check "info status" in HMP: (qemu) info status VM status: running Based on the above result, I thinks this bug has been fixed. This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |