Bug 1526423

Summary: QEMU hang with data plane enabled after some sg_write_same operations in guest
Product: Red Hat Enterprise Linux 7 Reporter: CongLi <coli>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: CongLi <coli>
Severity: medium Docs Contact:
Priority: high    
Version: 7.5CC: aliang, chayang, juzhang, knoel, lmiksik, michen, mrezanin, mtessun, virt-maint, yhong
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-18.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:55:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description CongLi 2017-12-15 12:14:01 UTC
Description of problem:
QEMU hang after some sg_write_same operations with data plane enabled

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-12.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a 1G raw image on an XFS file system
# qemu-img create -f raw /home/test.img 1G

2. Start qemu with a command-line like the following:
    -object iothread,id=iothread0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3,iothread=iothread0 \
    -drive id=drive1,if=none,format=raw,file=/home/kvm_autotest_root/images/rhel75-64-virtio-scsi.raw \
    -device scsi-hd,drive=drive1 \
    -drive file=/home/test.img,if=none,id=drive2,format=raw,werror=stop,rerror=stop,discard=on \
    -device scsi-hd,drive=drive2,logical_block_size=4096 \

3. Execute the following commands in guest:

# yes | head -n2048 > buf
# sg_write_same --in buf --num=32 --lba=80 /dev/sdb
# sg_write_same --in /dev/zero --num=96 --lba=0 /dev/sdb
# sg_write_same -U --in /dev/zero --num=16 --lba=0 /dev/sdb
# sg_write_same --in buf --num=65536 --lba=131074 /dev/sdb
# sg_write_same --in /dev/zero --num=65534 --lba=196608 /dev/sdb
# sg_write_same --in /dev/zero --num=0 --lba=128 /dev/sd

4. shutdown guest via shell cmd.

Actual results:
QEMU hang at qemu_mutex_lock().

Expected results:
QEMU should quit successfully after guest shutdown

Additional info:
1. gdb info
(gdb) bt full
#0  0x00007f6614e2248d in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f6614e1dd96 in _L_lock_870 () at /lib64/libpthread.so.0
#2  0x00007f6614e1dc8f in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x000055cd62bdeb7f in qemu_mutex_lock (mutex=mutex@entry=0x55cd64ab1be0)
    at util/qemu-thread-posix.c:65
        err = <optimized out>
        __PRETTY_FUNCTION__ = "qemu_mutex_lock"
        __func__ = "qemu_mutex_lock"
#4  0x000055cd62bda7f9 in aio_context_acquire (ctx=ctx@entry=0x55cd64ab1b80)
    at util/async.c:502
#5  0x000055cd629d13d5 in iothread_stop_all () at iothread.c:305
        ctx = 0x55cd64ab1b80
        container = 0x55cd64ac6a80
        bs = 0x55cd64af4000
        it = {phase = BDRV_NEXT_BACKEND_ROOTS, blk = 0x55cd64a54500, bs = 0x0}
#6  0x000055cd628bd964 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4807
        i = <optimized out>
        snapshot = <optimized out>
        linux_boot = <optimized out>
        initrd_filename = <optimized out>
        kernel_filename = <optimized out>
---Type <return> to continue, or q <return> to quit---
        kernel_cmdline = <optimized out>
        boot_order = <optimized out>
        boot_once = 0x55cd649f0718 "c"
        cyls = <optimized out>
        heads = <optimized out>
        secs = <optimized out>
        translation = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        hda_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = <optimized out>
        olist = <optimized out>
        optind = 43
        optarg = 0x7fffc9948492 "scsi-hd,drive=drive2,logical_block_size=4096"
        loadvm = <optimized out>
        machine_class = 0x0
        cpu_model = <optimized out>
        vga_model = 0x7fffc99481c3 "cirrus"
        qtest_chrdev = <optimized out>
        qtest_log = <optimized out>
        pid_file = <optimized out>
---Type <return> to continue, or q <return> to quit---
        incoming = <optimized out>
        defconfig = <optimized out>
        userconfig = <optimized out>
        nographic = <optimized out>
        display_type = <optimized out>
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = <optimized out>
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7fffc99470e0}
        __func__ = "main"
        __FUNCTION__ = "main"
(gdb) 

2. strace info: 
strace: Process 8375 attached
futex(0x55cd64ab1be0, FUTEX_WAIT_PRIVATE, 2, NULL

3. qemu quit successfully without data plane enabled.

Comment 2 Stefan Hajnoczi 2018-01-04 14:30:43 UTC
Thanks for the bug report, I have posted a QEMU patch upstream:
https://patchwork.ozlabs.org/patch/855622/

Comment 5 Miroslav Rezanina 2018-01-23 13:00:22 UTC
Fix included in qemu-kvm-rhev-2.10.0-18.el7

Comment 7 CongLi 2018-01-23 13:35:10 UTC
Reproduce this bug on:
qemu-kvm-rhev-2.10.0-17.el7.x86_64

Verify this bug on:
qemu-kvm-rhev-2.10.0-18.el7.x86_64

Steps as comment 0.

Thanks.

Comment 9 errata-xmlrpc 2018-04-11 00:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104