Bug 1343021

Summary: Core dump when quit from HMP after migration finished
Product: Red Hat Enterprise Linux 7 Reporter: yduan
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: amit.shah, chayang, dgilbert, juzhang, knoel, stefanha, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-26.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 21:14:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yduan 2016-06-06 10:51:15 UTC
Description of problem:
Do migration during 'dd' in the guest. When migration is completed, 'dd' is still not finished. 'quit' in the HMP at this time, core dump.

Version-Release number of selected component (if applicable):
Host:
  kernel: 3.10.0-418.el7.x86_64
  qemu-kvm-rhev-2.6.0-4.el7.x86_64
  OVMF-20160419-2.git90bb4c5.el7.noarch
Guest:
  kernel: 3.10.0-327.22.1.el7.x86_64

How reproducible:
2/2

Steps to Reproduce:
1.Start VM with following commands:
/usr/libexec/qemu-kvm \
 -S \
 -name 'rhel6.8-x86' \
 -machine q35,accel=kvm,usb=off,vmport=off \
 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
 -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
 -m 4096 \
 -smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
 -cpu SandyBridge,enforce \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -vga qxl \
 -device AC97,bus=pcie.0 \
 -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151214-111528-C6FB1EaX,server,nowait \
 -mon chardev=qmp_id_qmpmonitor1,mode=control \
 -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151214-111528-C6FB1EaX,server,nowait \
 -mon chardev=qmp_id_catch_monitor,mode=control \
 -device pvpanic,ioport=0x505,id=idSWJ5gV \
 -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151214-111528-C6FB1EaX,server,nowait \
 -device isa-serial,chardev=serial_id_serial0 \
 -debugcon file:q35.ovmf.log \
 -global isa-debugcon.iobase=0x402 \
 -device ich9-usb-ehci1,id=usb1,addr=1e.7,multifunction=on,bus=pcie.0 \
 -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1e.0,firstport=0,bus=pcie.0 \
 -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1e.2,firstport=2,bus=pcie.0 \
 -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1e.4,firstport=4,bus=pcie.0 \
 -device usb-tablet,id=usb-tablet1 \
 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/ifdown_script \
 -device virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,bus=pcie.0,disable-legacy=on,disable-modern=off \
 -drive file=/home/backup/RHEL-7.2-20151030.0-Server-x86_64-dvd1.iso,if=none,media=cdrom,id=drive_cd,readonly=on,format=raw \
 -device ide-cd,bus=ide.0,drive=drive_cd,id=device_cd,bootindex=1 \
 -device ioh3420,bus=pcie.0,id=root.0,slot=1 \
 -device x3130-upstream,bus=root.0,id=upstream1 \
 -device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
 -device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
 -device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
 -object iothread,id=iothread0 \
 -device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0,iothread=iothread0,disable-legacy=on,disable-modern=off \
 -drive file=sysdisk.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=0 \
 -object iothread,id=iothread1 \
 -device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1,iothread=iothread1,disable-legacy=on,disable-modern=off \
 -drive file=datadisk.qcow2,format=qcow2,id=drive_datadisk0,if=none,cache=none,aio=native,werror=stop,rerror=stop,bps=1024000,bps_rd=0,bps_wr=0,iops=1024000,iops_rd=0,iops_wr=0 \
 -device scsi-disk,drive=drive_datadisk0,bus=scsi_pci_bus1.0,id=device_datadisk0 \
 -object iothread,id=iothread2 \
 -enable-kvm \
 -monitor stdio \
 -spice port=5900,disable-ticketing \
 -qmp tcp:0:9999,server,nowait \

2.Boot the guest on destination host with incoming option
3.Do 'dd if=/dev/sdb of=/dev/null iflag=direct bs=8k' in source guest.
4.Migrate to the destination.
5.When migration is completed, 'dd' is still not finished. 'quit' in the destination HMP at this time.

Actual results:
Core dump.
(qemu) q
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/hw/scsi/virtio-scsi.c:543: virtio_scsi_handle_cmd_req_prepare: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed.
mig.sh: line 50: 11469 Aborted                 (core dumped)

Expected results:
It should quit successfully.

Additional info:
(gdb) bt
#0  0x00007f10f4aa95f7 in raise () from /lib64/libc.so.6
#1  0x00007f10f4aaace8 in abort () from /lib64/libc.so.6
#2  0x00007f10f4aa2566 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f10f4aa2612 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f10fce507a3 in virtio_scsi_handle_cmd_req_prepare (req=0x7f1102866100, s=0x7f11023f6328)
    at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:543
#5  virtio_scsi_handle_cmd_vq (s=0x7f11023f6328, vq=0x7f11018c20f0)
    at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:577
#6  0x00007f10fd017f22 in aio_dispatch (ctx=ctx@entry=0x7f10ffc8fc80) at aio-posix.c:330
#7  0x00007f10fd018138 in aio_poll (ctx=0x7f10ffc8fc80, blocking=<optimized out>) at aio-posix.c:479
#8  0x00007f10fcedcb59 in iothread_run (opaque=0x7f10ffc7a6e0) at iothread.c:46
#9  0x00007f10f642cdc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f10f4b6a1cd in clone () from /lib64/libc.so.6

Comment 2 Stefan Hajnoczi 2016-09-07 15:34:08 UTC
virtio-scsi dataplane issue.  Assigning to Fam.

If this becomes urgent assign to me or Paolo, but this is unlikely to be urgent.

Comment 3 Ademar Reis 2016-09-07 15:43:21 UTC
(In reply to Stefan Hajnoczi from comment #2)
> virtio-scsi dataplane issue.  Assigning to Fam.
> 
> If this becomes urgent assign to me or Paolo, but this is unlikely to be
> urgent.

HMP usage is not supported in RHEL, but the coredump is scary, so it should be investigated.

Comment 4 Fam Zheng 2016-09-08 08:56:36 UTC
Can reproduce with upstream 2.7 too. It's not limited to HMP (migrate and quit from QMP also crashes), so it's a geniune bug.

The internal problem is that, during "quit", we remove BDS from each BB, but don't stop dataplane. Right after the aio_context_release() in blk_remove_all_bs(), dataplane threads try to move on, and get surprises because the block backend is suddenly detached from images.

The obvious fix is to just stop dataplane threads before removing.

Comment 5 Fam Zheng 2016-09-08 09:37:32 UTC
Proposed upstream fix:

https://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg01717.html

Comment 7 Miroslav Rezanina 2016-09-20 12:28:49 UTC
Fix included in qemu-kvm-rhev-2.6.0-26.el7

Comment 9 yduan 2016-09-21 09:51:43 UTC
Reproduced with qemu-kvm-rhev-2.6.0-4.el7.x86_64 and verified with qemu-kvm-rhev-2.6.0-26.el7.x86_64.
Steps are same as comment 0.

Comment 12 errata-xmlrpc 2016-11-07 21:14:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html