Bug 1461561

Summary: virtio-blk: drain block before cleanup missing
Product: Red Hat Enterprise Linux 7 Reporter: Michael S. Tsirkin <mst>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: aliang, chayang, coli, juzhang, knoel, michen, mst, pbonzini, stefanha, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: QEMU 2.9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:43:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael S. Tsirkin 2017-06-14 19:22:10 UTC
Description of problem:
the following problem was reported against QEMU 2.7:
https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02996.html

1. Customer commands Disk hot-unplug via Web-based application.
2. The application sends "device_del" that sometimes takes unpredictable time.
3. There are some inflight IOs
4. Customer does reboot or shutdown the guest OS
4. Qemu process generates segfault.

If a disk is unplugged with "device_del" and "drive_del" commands,
qemu does not generate problem. But if only "device_del" command are finished 
and
guest os reboots, qemu generates segfault.
I can reproduce that case easily with following commands
1. generate heavy IO on disk in Guest OS
2. run "device_del" on the qemu monitor
3. run "system_reboot" on the qemu monitor

According to AdressSanitizer, VirtQueue is freed by virtio_cleanup but 
dereferenced
by asynchronous request. I think asynchronous requests should be finished
before virtio_cleanup, so I added blk_drain() before virtio_cleanup.

Following is configure options I used to build qemu with AddressSanitizer.
./configure --target-list=x86_64-softmmu --extra-cflags="-fsanitize=address 
-fno-omit-frame-pointer" --enable-debug


Looking at code, it seems to affect potentially all of 7.0-7.3.

Comment 2 aihua liang 2017-06-16 07:26:58 UTC
I can reproduce it on kernel:3.10.0-514.el7.x86_64+qemu-kvm-rhev:qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64.

Test Steps:
 1.Start guest with qemu cmd:
   #cat blk_test.txt
    /usr/libexec/qemu-kvm \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20170614-233639-etu9X2zc,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20170614-233639-etu9X2zc,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idhq2DAN  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20170614-233639-etu9X2zc,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20170614-233639-etu9X2zc,path=/var/tmp/seabios-20170614-233639-etu9X2zc,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20170614-233639-etu9X2zc,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel73-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,addr=0x3 \
    -drive id=data,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/data_disk.img \
    -device virtio-blk-pci,id=data1,drive=data,bus=pci.0 \
    -device virtio-net-pci,mac=9a:43:44:45:46:47,id=idvMp6XX,vectors=4,netdev=id9qJxPT,bus=pci.0 \
    -netdev tap,id=id9qJxPT,vhost=on \
    -m 4096  \
    -smp 6,cores=2,threads=1,sockets=3  \
    -cpu host \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off,strict=off  \
    -no-shutdown \
    -enable-kvm \
    -monitor stdio \
    -spice ipv4,port=5000,disable-ticketing \

2. Run IO test in guest:
    (guest)# dd if=/dev/zero of=/dev/vdb bs=1M count=40000000

3. Delete vdb before IO test finished in step2.
    (qemu)device_del data1

4. Reboot guest before IO test finished.
    (qemu)system_reset

Test Result:
     blk_test.txt: line 36: 16879 Floating point exception(core dumped)

Comment 6 aihua liang 2017-06-20 07:30:32 UTC
Has verified, the problem has been resolved, so set bug's status to "Verified".
  
Test Version:
  kernel version:3.10.0-682.el7.x86_64
  qemu-kvm-rhev:qemu-kvm-rhev-2.9.0-10.el7.x86_64

Test Steps:
1. Start guest with qemu cmd:
     /usr/libexec/qemu-kvm \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20170614-233639-etu9X2zc,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20170614-233639-etu9X2zc,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idhq2DAN  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20170614-233639-etu9X2zc,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20170614-233639-etu9X2zc,path=/var/tmp/seabios-20170614-233639-etu9X2zc,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20170614-233639-etu9X2zc,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,addr=0x3 \
    -drive id=data,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/data_disk.img \
    -device virtio-blk-pci,id=data1,drive=data,bus=pci.0 \
    -device virtio-net-pci,mac=9a:43:44:45:46:47,id=idvMp6XX,vectors=4,netdev=id9qJxPT,bus=pci.0 \
    -netdev tap,id=id9qJxPT,vhost=on \
    -m 4096  \
    -smp 6,cores=2,threads=1,sockets=3  \
    -cpu host \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=d,menu=off,strict=off  \
    -no-shutdown \
    -enable-kvm \
    -monitor stdio \
    -spice ipv4,port=5000,disable-ticketing \

2. Run dd in guest.
   (guest)# dd if=/dev/zero of=/dev/vdb bs=1M count=40000000

. Delete vdb before IO test finished in step2.
    (qemu)device_del data1

4. Reboot guest before IO test finished.
    (qemu)system_reset

Test Result:
  Guest restart successfully.

Comment 8 errata-xmlrpc 2017-08-02 04:43:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392