RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1319400 - live merge - qemu-kvm hangs in aio_bh_poll
Summary: live merge - qemu-kvm hangs in aio_bh_poll
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Jeff Cody
QA Contact: Qianqian Zhu
URL:
Whiteboard:
: 1349735 (view as bug list)
Depends On:
Blocks: 1329543 1346429
TreeView+ depends on / blocked
 
Reported: 2016-03-19 13:00 UTC by Markus Stockhausen
Modified: 2019-11-14 07:38 UTC (History)
21 users (show)

Fixed In Version: QEMU 2.4
Doc Type: Bug Fix
Doc Text:
During the block-stream job of a live merge operation, the QEMU process in some cases became unresponsive. This update improves the handling of block jobs, which ensures that QEMU stays responsive during block-stream jobs as expected.
Clone Of:
: 1329543 (view as bug list)
Environment:
Last Closed: 2016-11-07 21:01:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
VM CPU usage (43.73 KB, image/png)
2016-03-19 13:12 UTC, Markus Stockhausen
no flags Details
Disk Write (44.22 KB, image/png)
2016-03-19 13:27 UTC, Markus Stockhausen
no flags Details
Disk Read (38.08 KB, image/png)
2016-03-19 13:28 UTC, Markus Stockhausen
no flags Details
Snap state (25.90 KB, image/png)
2016-03-19 13:47 UTC, Markus Stockhausen
no flags Details
VDSM (902.64 KB, text/plain)
2016-03-19 14:20 UTC, Markus Stockhausen
no flags Details
Engine log (458.33 KB, text/plain)
2016-03-19 14:51 UTC, Markus Stockhausen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2673 0 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Markus Stockhausen 2016-03-19 13:00:44 UTC
Description of problem:

Short after starting a live merge operation in OVirt onto a CentOS based node the VM becomes unresponsive.

Version-Release number of selected component (if applicable):

OVirt 3.6.3
CentOS Linux release 7.2.1511 (Core)
QEMU emulator version 2.3.0 (qemu-kvm-ev-2.3.0-31.el7_2.7.1)

How reproducible/Steps to Reproduce:

Unknown

Additional info:

attaching via gdb to the qemu process we get

(gdb) info threads
  Id   Target Id         Frame
  18   Thread 0x7f4e85188700 (LWP 7086) "qemu-kvm" 0x00007f4e8d7dba09 in syscall () from /lib64/libc.so.6
  17   Thread 0x7f4e83f73700 (LWP 7091) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7f4e83772700 (LWP 7092) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  15   Thread 0x7f4e82f71700 (LWP 7093) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x7f4e82770700 (LWP 7094) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  13   Thread 0x7f4e81f6f700 (LWP 7095) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  12   Thread 0x7f4e8176e700 (LWP 7096) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  11   Thread 0x7f4e80f6d700 (LWP 7097) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  10   Thread 0x7f4e8076c700 (LWP 7098) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  9    Thread 0x7f4e7ff6b700 (LWP 7099) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  8    Thread 0x7f4e7f76a700 (LWP 7100) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  7    Thread 0x7f4e7ef69700 (LWP 7101) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  6    Thread 0x7f4e7e768700 (LWP 7102) "qemu-kvm" 0x00007f4e93e07f4d in __lll_lock_wait () from /lib64/libpthread.so.0
  5    Thread 0x7f1e7c3ff700 (LWP 7104) "qemu-kvm" 0x00007f4e93e056d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7f1e7a9ff700 (LWP 38391) "qemu-kvm" 0x00007f4e93e07870 in sem_timedwait () from /lib64/libpthread.so.0
  3    Thread 0x7f1e791fc700 (LWP 38454) "qemu-kvm" 0x00007f4e93e07870 in sem_timedwait () from /lib64/libpthread.so.0
  2    Thread 0x7f1e799fd700 (LWP 38455) "qemu-kvm" 0x00007f4e93e07870 in sem_timedwait () from /lib64/libpthread.so.0
* 1    Thread 0x7f4e951d3c40 (LWP 7085) "qemu-kvm" aio_bh_poll (ctx=ctx@entry=0x7f4e9749a840) at async.c:81

(gdb) thread 5
[Switching to thread 5 (Thread 0x7f1e7c3ff700 (LWP 7104))]
#0  0x00007f4e93e056d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f4e93e056d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f4e9554f5b9 in qemu_cond_wait (cond=cond@entry=0x7f4e974ed4d0, mutex=mutex@entry=0x7f4e974ed500) at util/qemu-thread-posix.c:132
#2  0x00007f4e954d1953 in vnc_worker_thread_loop (queue=queue@entry=0x7f4e974ed4d0) at ui/vnc-jobs.c:222
#3  0x00007f4e954d1d38 in vnc_worker_thread (arg=0x7f4e974ed4d0) at ui/vnc-jobs.c:323
#4  0x00007f4e93e01dc5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4e8d7e128d in clone () from /lib64/libc.so.6

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f4e951d3c40 (LWP 7085))]
#0  aio_bh_poll (ctx=ctx@entry=0x7f4e9749a840) at async.c:74
74              next = bh->next;
(gdb) bt full
#0  aio_bh_poll (ctx=ctx@entry=0x7f4e9749a840) at async.c:74
        bh = 0x7f4ea42163c0
        bhp = <optimized out>
        next = <optimized out>
        ret = 1
#1  0x00007f4e954e13b9 in aio_dispatch_clients (ctx=ctx@entry=0x7f4e9749a840, client_mask=client_mask@entry=10) at aio-posix.c:139
        node = <optimized out>
        progress = false
#2  0x00007f4e954e190b in aio_poll_clients (ctx=0x7f4e9749a840, blocking=blocking@entry=false, client_mask=client_mask@entry=10) at aio-posix.c:300
        node = 0x0
        was_dispatching = false
        i = <optimized out>
        ret = <optimized out>
        progress = false
        timeout = <optimized out>
        __PRETTY_FUNCTION__ = "aio_poll_clients"
#3  0x00007f4e954d5d44 in bdrv_aio_poll (blocking=false, ctx=<optimized out>) at block.c:6370
No locals.
#4  bdrv_drain_one (bs=bs@entry=0x7f4e97583400) at block.c:1995
        bs_busy = false
#5  0x00007f4e954d6f94 in bdrv_drain_all () at block.c:2041
        aio_context = 0x7f4e9749a840
        busy = true
        bs = 0x7f4e97583400
#6  0x00007f4e954dc7af in bdrv_close (bs=bs@entry=0x7f4e9c5d8000) at block.c:1905
        ban = <optimized out>
        ban_next = <optimized out>
#7  0x00007f4e954dca37 in bdrv_delete (bs=0x7f4e9c5d8000) at block.c:2210
No locals.
#8  bdrv_unref (bs=0x7f4e9c5d8000) at block.c:5694
No locals.
#9  0x00007f4e95514593 in mirror_exit (job=0x7f4e9e0a9fa0, opaque=0x7f4e9dfa1700) at block/mirror.c:392
        s = 0x7f4e9e0a9fa0
        data = 0x7f4e9dfa1700
        replace_aio_context = 0x0
#10 0x00007f4e954df5ec in block_job_defer_to_main_loop_bh (opaque=0x7f4ebde8d5d0) at blockjob.c:369
        data = 0x7f4ebde8d5d0
        aio_context = 0x7f4e9749a840
#11 0x00007f4e954d2344 in aio_bh_poll (ctx=ctx@entry=0x7f4e9749a840) at async.c:85
        bh = <optimized out>
        bhp = <optimized out>
        next = 0x7f4e9fe12010
---Type <return> to continue, or q <return> to quit---
        ret = 1
#12 0x00007f4e954e13b9 in aio_dispatch_clients (ctx=0x7f4e9749a840, client_mask=client_mask@entry=-1) at aio-posix.c:139
        node = <optimized out>
        progress = false
#13 0x00007f4e954e177a in aio_dispatch (ctx=<optimized out>) at aio-posix.c:194
No locals.
#14 0x00007f4e954d21be in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at async.c:219
        ctx = <optimized out>
#15 0x00007f4e937067aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
No symbol table info available.
#16 0x00007f4e954e0238 in glib_pollfds_poll () at main-loop.c:209
        context = 0x7f4e974bb140
        pfds = <optimized out>
#17 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:254
        ret = 1
        spin_counter = 1
#18 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:503
        ret = 1
        timeout = 4294967295
        timeout_ns = <optimized out>
#19 0x00007f4e952debae in main_loop () at vl.c:1818
        nonblocking = <optimized out>
        last_io = 1
#20 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4394
        i = <optimized out>
        snapshot = <optimized out>
        linux_boot = <optimized out>
        initrd_filename = <optimized out>
        kernel_filename = <optimized out>
        kernel_cmdline = <optimized out>
        boot_order = 0x7f4e955730ec "cad"
        boot_once = 0x0
        cyls = <optimized out>
        heads = <optimized out>
        secs = <optimized out>
        translation = <optimized out>
        hda_opts = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        icount_opts = <optimized out>
        olist = <optimized out>
        optind = 106
---Type <return> to continue, or q <return> to quit---
        optarg = 0x7f4e9753bca0 "pc-i440fx-rhel7.2.0"
        loadvm = <optimized out>
        machine_class = <optimized out>
        cpu_model = <optimized out>
        vga_model = 0x0
        qtest_chrdev = <optimized out>
        qtest_log = <optimized out>
        pid_file = <optimized out>
        incoming = <optimized out>
        show_vnc_port = <optimized out>
        defconfig = <optimized out>
        userconfig = 134
        log_mask = <optimized out>
        log_file = <optimized out>
        mem_trace = {malloc = 0x7f4e953cd310 <malloc_and_trace>, realloc = 0x7f4e953cd2f0 <realloc_and_trace>,
          free = 0x7f4e953cd2e0 <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0}
        trace_events = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = <optimized out>
        main_loop_err = 0x0
        __func__ = "main"

Having a look at perf we get:

Samples: 106K of event 'cycles', Event count (approx.): 73029357197
  Children      Self  Command   Shared Object            Symbol            
-   90,19%    89,49%  qemu-kvm  qemu-kvm                 [.] aio_bh_poll   
   - aio_bh_poll                                                           
        0                                                                  
-   90,17%     0,00%  qemu-kvm  [unknown]                [.] 0000000000000000
   - 0                                                                       
      - thread_pool_get_aio_context                                          
        0x246c8d499028678b                                                   
-    9,57%     9,44%  qemu-kvm  qemu-kvm                 [.] aio_compute_timeout
     aio_compute_timeout                                                        
+    0,69%     0,02%  qemu-kvm  [kernel.kallsyms]        [k] apic_timer_interrupt
+    0,67%     0,00%  qemu-kvm  [kernel.kallsyms]        [k] smp_apic_timer_interrupt

Comment 1 Markus Stockhausen 2016-03-19 13:02:21 UTC
Start options

qemu      7085     1 99 Mär18 ?       1-19:36:49 /usr/libexec/qemu-kvm -name cishanar00 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Westmere -m size=201326592k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 12,maxcpus=96,sockets=16,cores=6,threads=1 -numa node,nodeid=0,cpus=0-11,mem=196608 -uuid 542affb0-7a2f-4297-84f4-728a315a6abd -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=C83278D1-8EB2-11E2-B97F-D48CB5BC5356,uuid=542affb0-7a2f-4297-84f4-728a315a6abd -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-cishanar00/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-03-18T14:16:13,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/dcd5456b-e50a-4fe2-9cc2-614a0764dfc1/c8633a81-7d50-4837-a7b5-5a696abace13,if=none,id=drive-virtio-disk0,format=qcow2,serial=dcd5456b-e50a-4fe2-9cc2-614a0764dfc1,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9/af580390-a0bf-48b3-94a1-3796f5708eba,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/19644ad4-fe32-4b12-8bfb-d75a16fec85a/f20c7009-5c57-43f7-972f-3e6ce0c14fed,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/f105f1cc-ab59-4c62-a465-e3ef33886c17/86ed6e0d-a68d-4d40-b3e4-47e65f1a4b14,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/f8ebfb39-2ac6-4b87-b193-4204d1854edc/bbd05dd8-c3bf-4d15-9317-73040e04abae,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/4e708243-97c3-4279-a32a-cb463e5e43a7/e24aeb86-f621-46d5-8960-9890e4c61da5,if=none,id=drive-virtio-disk5,format=qcow2,serial=4e708243-97c3-4279-a32a-cb463e5e43a7,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xd,drive=drive-virtio-disk5,id=virtio-disk5 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/7e3906ad-7c6b-471d-8589-4cc74ba81b1f/522610e7-35cf-45fd-a0e5-b7d3d2cf8750,if=none,id=drive-virtio-disk6,format=qcow2,serial=7e3906ad-7c6b-471d-8589-4cc74ba81b1f,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xe,drive=drive-virtio-disk6,id=virtio-disk6 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/4d00064a-ee20-4f3c-aa48-9cd303af7d8d/f58ac0cf-4bf7-4778-b813-49a148492de7,if=none,id=drive-virtio-disk7,format=qcow2,serial=4d00064a-ee20-4f3c-aa48-9cd303af7d8d,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xf,drive=drive-virtio-disk7,id=virtio-disk7 -netdev tap,fd=37,id=hostnet0,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:ee:d3:35,bus=pci.0,addr=0x4 -netdev tap,fd=39,id=hostnet1,vhost=on,vhostfd=40 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:1a:4a:ee:d3:36,bus=pci.0,addr=0x5 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/542affb0-7a2f-4297-84f4-728a315a6abd.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/542affb0-7a2f-4297-84f4-728a315a6abd.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -device usb-tablet,id=input0 -vnc 192.168.11.42:9,share=ignore,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -msg timestamp=on

Comment 2 Markus Stockhausen 2016-03-19 13:03:18 UTC
top shows:

Tasks: 626 total,   4 running, 622 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3,1 us,  3,3 sy,  0,0 ni, 93,5 id,  0,1 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem : 26384976+total, 20887500+free, 46775796 used,  8198952 buff/cache
KiB Swap: 20971516 total, 20971516 free,        0 used. 21540172+avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7085 qemu      20   0  0,189t 0,040t  12160 R 100,3 16,4   2617:44 qemu-kvm
 1615 root      15  -5 5868820  46256  10736 S  80,8  0,0 271:59.12 supervdsmServer
...

Comment 4 Markus Stockhausen 2016-03-19 13:08:45 UTC
OVirt engine log:

18.03.2016 14:53:21 VM cishanar00 is not responding.
18.03.2016 14:52:50 VM cishanar00 is not responding.
18.03.2016 14:52:20 VM cishanar00 is not responding.
18.03.2016 14:43:21 VDSM colovn03 command failed: Volume does not exist: ('af580390-a0bf-48b3-94a1-3796f5708eba',)
18.03.2016 14:36:22 VDSM colovn03 command failed: Volume does not exist: ('f58ac0cf-4bf7-4778-b813-49a148492de7',)
18.03.2016 14:33:32 Snapshot 'vor_update_20160122' deletion for VM 'cishanar00' was initiated by admin@internal.

Comment 5 Markus Stockhausen 2016-03-19 13:12:16 UTC
Created attachment 1138022 [details]
VM CPU usage

Comment 6 Markus Stockhausen 2016-03-19 13:13:03 UTC
VM was started at 14:15. Live merge started at 14:33. CPU usage of VM (qemu) on Ovirt node attached.

Comment 7 Markus Stockhausen 2016-03-19 13:27:13 UTC
For some reason the stalled VM did extensive disk io. Graphs of qemu process attached.

Comment 8 Markus Stockhausen 2016-03-19 13:27:38 UTC
Created attachment 1138023 [details]
Disk Write

Comment 9 Markus Stockhausen 2016-03-19 13:28:02 UTC
Created attachment 1138024 [details]
Disk Read

Comment 10 Markus Stockhausen 2016-03-19 13:46:23 UTC
Before live merge we had the following disks/states:

Disk1: Snapshot yes
Disk2: Snapshot no (af580390-a0bf-48b3-94a1-3796f5708eba)
Disk3: Snapshot yes
Disk4: Snapshot yes
Disk5: Snapshot yes
Disk6: Snapshot yes
Disk7: Snapshot yes
Disk8: Snapshot no (f58ac0cf-4bf7-4778-b813-49a148492de7)

So the message about missing files (see above) should be ok.

Currently all snapshot files (raw+cow2) files are still on disk. 8 raw files + 6 cow2 files. Nevertheless the OVirt webinterface gives illegal state for some of the snapshot files. See screenshot.

Comment 11 Markus Stockhausen 2016-03-19 13:47:41 UTC
Created attachment 1138026 [details]
Snap state

Comment 12 Markus Stockhausen 2016-03-19 14:19:34 UTC
Shortened VDSM log attached

Comment 13 Markus Stockhausen 2016-03-19 14:20:11 UTC
Created attachment 1138039 [details]
VDSM

Comment 14 Markus Stockhausen 2016-03-19 14:51:06 UTC
Created attachment 1138040 [details]
Engine log

Comment 15 Markus Stockhausen 2016-03-19 18:31:24 UTC
VM cishanar00 is running SLES11 SP4.

Log inside VM shows CPU softlockup shortly after start of live merge:

Mar 18 14:34:45 cishanar00 kernel: [ 1072.248023] BUG: soft lockup - CPU#3 stuck for 22s! [PoolThread:4277]
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248026] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod joydev virtio_balloon virtio_net sg virtio_scsi ipv6_lib virtio_console rtc_cmos sr_mod cdrom i2c_piix4 floppy serio_raw button pcspkr container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore intel_agp intel_gtt usb_common scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Supported: Yes
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] CPU 3
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod joydev virtio_balloon virtio_net sg virtio_scsi ipv6_lib virtio_console rtc_cmos sr_mod cdrom i2c_piix4 floppy serio_raw button pcspkr container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore intel_agp intel_gtt usb_common scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Supported: Yes
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Pid: 4277, comm: PoolThread Not tainted 3.0.101-63-default #1 oVirt oVirt Node
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RIP: 0010:[<ffffffff8103d832>]  [<ffffffff8103d832>] flush_tlb_others_ipi+0x122/0x140
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RSP: 0018:ffff882e17ed78e8  EFLAGS: 00000246
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RAX: 0000000000000000 RBX: ffff882e17ed4af8 RCX: ffffffff81d88a98
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RDX: 0000000000000000 RSI: 000000000000007e RDI: ffffffff81d88a98
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RBP: ffffffff81d88a98 R08: 0000000000000001 R09: ffffffff81d88aa0
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146fdee
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] R13: 0000000000000296 R14: 0000000000000002 R15: ffff88303ff99e00
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] FS:  00007f3e69112700(0000) GS:ffff88303e660000(0000) knlGS:0000000000000000
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] CR2: 00007f37b1b3c000 CR3: 0000002e548eb000 CR4: 00000000000006e0
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Process PoolThread (pid: 4277, threadinfo ffff882e17ed6000, task ffff882e17ed4240)
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Stack:
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  ffff882f40b6bc00 ffff882f18ee1280 00007f37b1b3c000 ffffea0095abc900
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  ffff8829bde561f8 8000000001d37225 ffff882c684279e0 ffffffff8103d984
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  0000000000000000 8000000001d37225 0000000000000000 ffffffff8113194b
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Call Trace:
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8103d984>] flush_tlb_page+0x54/0x90
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff81122caa>] do_wp_page+0x31a/0x830
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff81467bf5>] async_page_fault+0x25/0x30
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff810fabd9>] file_read_actor+0x49/0x190
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff810fc893>] do_generic_file_read+0x363/0x490
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff810fd3ec>] generic_file_aio_read+0xfc/0x260
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffffa03bb5a9>] xfs_file_aio_read+0x1d9/0x3a0 [xfs]
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8115de37>] do_sync_read+0xd7/0x120
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8115e5e7>] vfs_read+0xc7/0x130
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8115e753>] sys_read+0x53/0xa0
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<00007f3f0b626b9d>] 0x7f3f0b626b9c
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Code: 42 00 e9 55 ff ff ff 66 90 48 8b 05 a1 5c b9 00 41 8d b7 cf 00 00 00 48 89 ef ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 7c 96 b9 00 48 89 ef e8 40 83 22 00 85 c0 74 ec e9 66 ff
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Call Trace:
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8103d984>] flush_tlb_page+0x54/0x90
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff81122caa>] do_wp_page+0x31a/0x830
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff81467bf5>] async_page_fault+0x25/0x30
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff810fabd9>] file_read_actor+0x49/0x190
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff810fc893>] do_generic_file_read+0x363/0x490
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff810fd3ec>] generic_file_aio_read+0xfc/0x260
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffffa03bb5a9>] xfs_file_aio_read+0x1d9/0x3a0 [xfs]
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8115de37>] do_sync_read+0xd7/0x120
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8115e5e7>] vfs_read+0xc7/0x130
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8115e753>] sys_read+0x53/0xa0
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b
Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052]  [<00007f3f0b626b9d>] 0x7f3f0b626b9c
<<< log ends here!

Comment 19 Qianqian Zhu 2016-03-23 07:12:50 UTC
I can hit this issue as well by using qemu-kvm-rhev-2.3.0-31.el7.x86_64&kernel-3.10.0-327.el7.x86_64. Same steps and results as per Comment 18.

Comment 20 Markus Stockhausen 2016-04-09 18:03:20 UTC
Hi,

today we had a new maintenance window to repeat the snapshot merge. This time the process finished. Nevertheless filesystems inside the VM could not be accessed as soon as the merge started. VM was not usable during merge operation. But it became repsonsive after smerge finished.

From my point of view we have two issues here:

1) VM disk I/O hangs during live merge. CPU of VM raises to 100%.

2) Live merge might stall. That means: merge gets slower and slower.

Comment 21 Markus Stockhausen 2016-04-09 21:05:53 UTC
For case 1 (VM CPU hang) I can give additional info. The third time (forwatever reason) we reproduced the error, only one CPU got stuck. Nevertheless it seems to hang because of the same reason. 


Stack either gives:

[ 1964.320037] BUG: soft lockup - CPU#7 stuck for 22s! [PoolThread:4049]
[ 1964.320042] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod
[ 1964.320095] Supported: Yes
[ 1964.320097] CPU 7
[ 1964.320098] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod
[ 1964.320119] Supported: Yes
[ 1964.320121]
[ 1964.320121] Pid: 4049, comm: PoolThread Not tainted 3.0.101-63-default #1 oVirt oVirt Node
[ 1964.320121] RIP: 0010:[<ffffffff81265b97>]  [<ffffffff81265b97>] __bitmap_empty+0x17/0x90
[ 1964.320121] RSP: 0018:ffff8817a5317b50  EFLAGS: 00000202
[ 1964.320121] RAX: 00000000000000bd RBX: ffff88179c4cacf8 RCX: ffffffff81d8ad98
[ 1964.320121] RDX: 0000000000000000 RSI: 000000000000007e RDI: ffffffff81d8ad98
[ 1964.320121] RBP: ffffffff81d8ad98 R08: 0000000000000001 R09: ffffffff81d8ada0
[ 1964.320121] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146fdee
[ 1964.320121] R13: ffff88183f6109b0 R14: 0000000300000002 R15: ffff88303ff99e80
[ 1964.320121] FS:  00007fddd0264700(0000) GS:ffff88303ee20000(0000) knlGS:0000000000000000
[ 1964.320121] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1964.320121] CR2: 00007fdde097c000 CR3: 0000002fad376000 CR4: 00000000000006e0
[ 1964.320121] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1964.320121] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1964.320121] Process PoolThread (pid: 4049, threadinfo ffff8817a5316000, task ffff88179c4ca440)
[ 1964.320121] Stack:
[ 1964.320121]  ffffffff8103d840 ffff882fbfe2c320 ffff882fad2319c0 00007fdde097c000
[ 1964.320121]  ffffea009f60d578 ffff882fabb59870 8000000001d37225 ffff881751cf3be0
[ 1964.320121]  ffffffff8103d984 0000000000000000 8000000001d37225 0000000000000000
[ 1964.320121] Call Trace:
[ 1964.320121]  [<ffffffff8103d840>] flush_tlb_others_ipi+0x130/0x140
[ 1964.320121]  [<ffffffff8103d984>] flush_tlb_page+0x54/0x90
[ 1964.320121]  [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10
[ 1964.320121]  [<ffffffff81122caa>] do_wp_page+0x31a/0x830
[ 1964.320121]  [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230
[ 1964.320121]  [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0
[ 1964.320121]  [<ffffffff81467bf5>] async_page_fault+0x25/0x30
[ 1964.320121]  [<ffffffff81262f6d>] copy_user_generic_string+0x2d/0x40
[ 1964.320121]  [<ffffffff8117ed91>] seq_read+0x291/0x3b0
[ 1964.320121]  [<ffffffff811bbc47>] proc_reg_read+0x77/0xc0
[ 1964.320121]  [<ffffffff8115e5e7>] vfs_read+0xc7/0x130
[ 1964.320121]  [<ffffffff8115e753>] sys_read+0x53/0xa0
[ 1964.320121]  [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b


Or gives:

[ 2300.320032] BUG: soft lockup - CPU#7 stuck for 23s! [PoolThread:4049]
[ 2300.320037] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod
[ 2300.320082] Supported: Yes
[ 2300.320084] CPU 7
[ 2300.320085] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod
[ 2300.320115] Supported: Yes
[ 2300.320116]
[ 2300.320118] Pid: 4049, comm: PoolThread Not tainted 3.0.101-63-default #1 oVirt oVirt Node
[ 2300.320118] RIP: 0010:[<ffffffff8103d832>]  [<ffffffff8103d832>] flush_tlb_others_ipi+0x122/0x140
[ 2300.320118] RSP: 0018:ffff8817a5317b58  EFLAGS: 00000246
[ 2300.320118] RAX: 0000000000000000 RBX: ffff88179c4cacf8 RCX: ffffffff81d8ad98
[ 2300.320118] RDX: 0000000000000000 RSI: 000000000000007e RDI: ffffffff81d8ad98
[ 2300.320118] RBP: ffffffff81d8ad98 R08: 0000000000000001 R09: ffffffff81d8ada0
[ 2300.320118] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146fdee
[ 2300.320118] R13: ffff88183f6109b0 R14: 0000000300000002 R15: ffff88303ff99e80
[ 2300.320118] FS:  00007fddd0264700(0000) GS:ffff88303ee20000(0000) knlGS:0000000000000000
[ 2300.320118] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2300.320118] CR2: 00007fdde097c000 CR3: 0000002fad376000 CR4: 00000000000006e0
[ 2300.320118] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2300.320118] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2300.320118] Process PoolThread (pid: 4049, threadinfo ffff8817a5316000, task ffff88179c4ca440)
[ 2300.320118] Stack:
[ 2300.320118]  ffff882fbfe2c320 ffff882fad2319c0 00007fdde097c000 ffffea009f60d578
[ 2300.320118]  ffff882fabb59870 8000000001d37225 ffff881751cf3be0 ffffffff8103d984
[ 2300.320118]  0000000000000000 8000000001d37225 0000000000000000 ffffffff8113194b
[ 2300.320118] Call Trace:
[ 2300.320118]  [<ffffffff8103d984>] flush_tlb_page+0x54/0x90
[ 2300.320118]  [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10
[ 2300.320118]  [<ffffffff81122caa>] do_wp_page+0x31a/0x830
[ 2300.320118]  [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230
[ 2300.320118]  [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0
[ 2300.320118]  [<ffffffff81467bf5>] async_page_fault+0x25/0x30
[ 2300.320118]  [<ffffffff81262f6d>] copy_user_generic_string+0x2d/0x40
[ 2300.320118]  [<ffffffff8117ed91>] seq_read+0x291/0x3b0
[ 2300.320118]  [<ffffffff811bbc47>] proc_reg_read+0x77/0xc0
[ 2300.320118]  [<ffffffff8115e5e7>] vfs_read+0xc7/0x130
[ 2300.320118]  [<ffffffff8115e753>] sys_read+0x53/0xa0
[ 2300.320118]  [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b
[ 2300.320118]  [<00007fde67829b9d>] 0x7fde67829b9c

Comment 22 Markus Stockhausen 2016-04-10 18:24:34 UTC
In between we investigated further.

1) We upgraded the OS inside the VM to SLES 12 (based on kernel 3.12). 

2) As VM has large memory (192GB) we used OVirt NUMA pinning. 

But same problem again. VM stalls immediately after starting the live merge. Error gives:

[  304.340014] BUG: soft lockup - CPU#8 stuck for 22s! [sapstartsrv:4791]
[  304.340014] Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache iscsi_ibft iscsi_boot_sysfs af_packet ppdev joydev xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel(X) aesni_intel aes_x86_64 virtio_net virtio_console lrw gf128mul parport_pc glue_helper ablk_helper cryptd virtio_balloon parport i2c_piix4 processor serio_raw pcspkr button dm_mod ext4 crc16 mbcache jbd2 hid_generic usbhid sr_mod cdrom ata_generic virtio_blk virtio_scsi ata_piix ahci cirrus libahci syscopyarea sysfillrect sysimgblt drm_kms_helper ttm uhci_hcd ehci_hcd usbcore drm libata virtio_pci virtio_ring virtio usb_common floppy sg scsi_mod autofs4
[  304.340014] Supported: Yes
[  304.340014] CPU: 8 PID: 4791 Comm: sapstartsrv Tainted: G              X 3.12.28-4-default #1
[  304.340014] Hardware name: oVirt oVirt Node, BIOS seabios-1.7.5-11.el7 04/01/2014
[  304.340014] task: ffff882fc18d8080 ti: ffff882fc3eee000 task.ti: ffff882fc3eee000
[  304.340014] RIP: 0010:[<ffffffff810c24e2>]  [<ffffffff810c24e2>] generic_exec_single+0x72/0x90
[  304.340014] RSP: 0018:ffff882fc3eefd08  EFLAGS: 00000202
[  304.340014] RAX: 000000000000007e RBX: 000000000000007e RCX: 0000000000000002
[  304.340014] RDX: 3fffffffffffffff RSI: 000000000000007e RDI: 0000000000000286
[  304.340014] RBP: ffff88303ee33d40 R08: ffffffff8160c3c8 R09: 0000000000000001
[  304.340014] R10: ffff882fc5185598 R11: ffffea00a72d9460 R12: 0000000000000002
[  304.340014] R13: 000000000000007e R14: ffffffff8160c3c8 R15: 0000000000000001
[  304.340014] FS:  00007f4630bb2700(0000) GS:ffff88303ee40000(0000) knlGS:0000000000000000
[  304.340014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  304.340014] CR2: 00007f46306ae034 CR3: 00000017c2ff2000 CR4: 00000000000006e0
[  304.340014] Stack:
[  304.340014]  0000000000000007 ffff882fc3eefda0 0000000000000008 ffffffff81b2ac80
[  304.340014]  ffffffff81049300 ffff8817c5788780 ffffffff810c25e5 ffffffff81297edf
[  304.340014]  ffffffff81b2ac80 ffff88303ee33d40 ffff88303ee33d40 ffffffff81049300
[  304.340014] Call Trace:
[  304.340014]  [<ffffffff810c25e5>] smp_call_function_single+0xe5/0x190
[  304.340014]  [<ffffffff8104949f>] native_flush_tlb_others+0x8f/0xa0
[  304.340014]  [<ffffffff81049658>] flush_tlb_mm_range+0x148/0x180
[  304.340014]  [<ffffffff8115af1e>] tlb_flush_mmu.part.68+0x2e/0x80
[  304.340014]  [<ffffffff8115b5a5>] tlb_finish_mmu+0x45/0x50
[  304.340014]  [<ffffffff81162da1>] unmap_region+0xb1/0xf0
[  304.340014]  [<ffffffff8116503e>] do_munmap+0x21e/0x3a0
[  304.340014]  [<ffffffff8121d721>] SyS_shmdt+0xb1/0x170
[  304.340014]  [<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[  304.340014]  [<00007f46349880c7>] 0x7f46349880c6
[  304.340014] Code: 48 89 c6 48 89 5d 08 4c 89 ef 48 89 2b 48 89 53 08 48 89 1a e8 d0 e9 44 00 4c 39 fd 74 1e 45 85 e4 75 08 eb 0c 0f 1f 40 00 f3 90 <f6> 43 20 01 75 f8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f7 ff

Comment 23 Markus Stockhausen 2016-04-12 18:35:59 UTC
2 backtraces added that give more details about the __lll_lock_wait threads.

1)

(gdb) bt full
#0  0x00007fc5e9f27f4d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fc5e9f23d02 in _L_lock_791 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007fc5e9f23c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fc5eb66f4d9 in qemu_mutex_lock (mutex=mutex@entry=0x7fc5ebb286e0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
        err = <optimized out>
        __func__ = "qemu_mutex_lock"
#4  0x00007fc5eb4282bc in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.3.0/cpus.c:1120
No locals.
#5  0x00007fc5eb439285 in kvm_cpu_exec (cpu=cpu@entry=0x7fc5ed1c4000) at /usr/src/debug/qemu-2.3.0/kvm-all.c:1831
        run = 0x7fc5eb2e7000
        ret = <optimized out>
        run_ret = 0
#6  0x00007fc5eb426f0a in qemu_kvm_cpu_thread_fn (arg=0x7fc5ed1c4000) at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x7fc5ed1c4000
        r = <optimized out>
#7  0x00007fc5e9f21dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#8  0x00007fc5e390128d in clone () from /lib64/libc.so.6
No symbol table info available.

2) 

(gdb) bt full
#0  0x00007fc5e9f27f4d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fc5e9f23d02 in _L_lock_791 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007fc5e9f23c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007fc5eb66f4d9 in qemu_mutex_lock (mutex=mutex@entry=0x7fc5ebb286e0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
        err = <optimized out>
        __func__ = "qemu_mutex_lock"
#4  0x00007fc5eb4282bc in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.3.0/cpus.c:1120
No locals.
#5  0x00007fc5eb439285 in kvm_cpu_exec (cpu=cpu@entry=0x7fc5ed222000) at /usr/src/debug/qemu-2.3.0/kvm-all.c:1831
        run = 0x7fc5eb2e4000
        ret = <optimized out>
        run_ret = -4
#6  0x00007fc5eb426f0a in qemu_kvm_cpu_thread_fn (arg=0x7fc5ed222000) at /usr/src/debug/qemu-2.3.0/cpus.c:944
        cpu = 0x7fc5ed222000
        r = <optimized out>
#7  0x00007fc5e9f21dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#8  0x00007fc5e390128d in clone () from /lib64/libc.so.6
No symbol table info available.

Comment 24 Pavel Gashev 2016-04-13 13:09:31 UTC
Please note that the main loop tries to bdrv_drain_all() while there are other block commit jobs active.

Comment 25 Markus Stockhausen 2016-04-13 17:50:34 UTC
As advised we deleted snapshots disk by disk. I can confirm that this does mitigate the behaviour. The VM stays responsive even when doing disk IO onto the disk that we merge.

Seems to be a problem of parallelism.

I presume to raise the criticality of this bug to high because it makes OVirt live merge feature nearly unusable.

Comment 29 Jeff Cody 2016-04-18 18:41:11 UTC
I was able to reproduce this bug locally on 7.2, using similar steps as comment #18.

When doing a bdrv_drain_all(), we need to pause running block jobs.  There is a fix upstream to enable this (git commits 751ebd7 and 69da3b0):


commit 751ebd76e654bd1e65da08ecf694325282b4cfcc
Author: Fam Zheng <famz>

    blockjob: Allow nested pause


commit 69da3b0b47c8f6016e9109fcfa608e9e7e99bc05
Author: Fam Zheng <famz>

    block: Pause block jobs in bdrv_drain_all


I have backports for these for 7.2.z; as these are in QEMU as of 2.4.0, there is nothing that needs to be done for RHEV 7.3.

Comment 32 Qianqian Zhu 2016-04-19 03:29:23 UTC
Verified with:


Steps:
1. launch guest:
qemu-kvm -cpu host -m 1024 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/RHEL-Server-7.3-64-virtio.raw,format=raw,if=none,id=drive-virtio-disk0,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :1 -monitor stdio -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x4 -qmp tcp:0:5555,server,nowait -drive file=/home/disk1,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/home/disk2,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/home/disk3,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/home/disk4,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4 -drive file=/home/disk5,if=none,id=drive-virtio-disk5,format=qcow2,serial=4e708243-97c3-4279-a32a-cb463e5e43a7,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xd,drive=drive-virtio-disk5,id=virtio-disk5 -drive file=/home/disk6,if=none,id=drive-virtio-disk6,format=qcow2,serial=7e3906ad-7c6b-471d-8589-4cc74ba81b1f,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xe,drive=drive-virtio-disk6,id=virtio-disk6 -drive file=/home/disk7,if=none,id=drive-virtio-disk7,format=qcow2,serial=4d00064a-ee20-4f3c-aa48-9cd303af7d8d,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xf,drive=drive-virtio-disk7,id=virtio-disk7

2. Generate snapshots with transaction:
{ "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk0",
"snapshot-file": "/home/disk0-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk1", "snapshot-file": "/home/disk1-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk2", "snapshot-file": "/home/disk2-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk3",
"snapshot-file": "/home/disk3-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk4", "snapshot-file": "/home/disk4-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk5", "snapshot-file": "/home/disk5-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk6",
"snapshot-file": "/home/disk6-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk7",
"snapshot-file": "/home/disk7-sn1", "format": "qcow2" } }] } }

3. block stream simultaneously
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-disk0", "on-error": "report" } }
{"return": {}}
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1", "on-error": "report" } }
{"return": {}}

Result:
qemu wont hang, info block could give correct information before block job compeleted.

{"timestamp": {"seconds": 1461035747, "microseconds": 13224}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "stream"}}
{"timestamp": {"seconds": 1461035896, "microseconds": 512210}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}}

Here execute info block before drive-virtio-disk0 block job finished, got below result:
(qemu) info block
drive-virtio-disk0: /home/disk0-sn1 (qcow2)
    Cache mode:       writeback
    Backing file:     /home/RHEL-Server-7.3-64-virtio.raw (chain depth: 1)

drive-virtio-disk1: /home/disk1-sn1 (qcow2)
    Cache mode:       writeback, direct

Comment 33 Qianqian Zhu 2016-04-19 03:31:58 UTC
(In reply to qianqianzhu from comment #32)
> Verified with:
qemu-img-rhev-2.3.0-31.el7_2.12.test.bz1319400.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.12.test.bz1319400.x86_64
kernel-3.10.0-327.el7.x86_64

> Steps:
> 1. launch guest:
> qemu-kvm -cpu host -m 1024 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
> file=/home/RHEL-Server-7.3-64-virtio.raw,format=raw,if=none,id=drive-virtio-
> disk0,werror=stop,rerror=stop -device
> virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :1 -monitor
> stdio -netdev tap,id=hostnet0,vhost=on -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,
> addr=0x4 -qmp tcp:0:5555,server,nowait -drive
> file=/home/disk1,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-
> fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,
> id=virtio-disk1 -drive
> file=/home/disk2,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-
> fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,
> id=virtio-disk2 -drive
> file=/home/disk3,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-
> ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,
> id=virtio-disk3 -drive
> file=/home/disk4,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-
> 2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,
> id=virtio-disk4 -drive
> file=/home/disk5,if=none,id=drive-virtio-disk5,format=qcow2,serial=4e708243-
> 97c3-4279-a32a-cb463e5e43a7,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xd,drive=drive-virtio-disk5,
> id=virtio-disk5 -drive
> file=/home/disk6,if=none,id=drive-virtio-disk6,format=qcow2,serial=7e3906ad-
> 7c6b-471d-8589-4cc74ba81b1f,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xe,drive=drive-virtio-disk6,
> id=virtio-disk6 -drive
> file=/home/disk7,if=none,id=drive-virtio-disk7,format=qcow2,serial=4d00064a-
> ee20-4f3c-aa48-9cd303af7d8d,cache=none,werror=stop,rerror=stop,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0xf,drive=drive-virtio-disk7,
> id=virtio-disk7
> 
> 2. Generate snapshots with transaction:
> { "execute": "transaction", "arguments": { "actions": [ { 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk0",
> "snapshot-file": "/home/disk0-sn1", "format": "qcow2" } }, { 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk1",
> "snapshot-file": "/home/disk1-sn1", "format": "qcow2" } }, { 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk2",
> "snapshot-file": "/home/disk2-sn1", "format": "qcow2" } },{ 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk3",
> "snapshot-file": "/home/disk3-sn1", "format": "qcow2" } }, { 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk4",
> "snapshot-file": "/home/disk4-sn1", "format": "qcow2" } }, { 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk5",
> "snapshot-file": "/home/disk5-sn1", "format": "qcow2" } },{ 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk6",
> "snapshot-file": "/home/disk6-sn1", "format": "qcow2" } },{ 'type':
> 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk7",
> "snapshot-file": "/home/disk7-sn1", "format": "qcow2" } }] } }
> 
> 3. block stream simultaneously
> { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk0",
> "on-error": "report" } }
> {"return": {}}
> { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1",
> "on-error": "report" } }
> {"return": {}}
> 
> Result:
> qemu wont hang, info block could give correct information before block job
> compeleted.
> 
> {"timestamp": {"seconds": 1461035747, "microseconds": 13224}, "event":
> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len":
> 2147483648, "offset": 2147483648, "speed": 0, "type": "stream"}}
> {"timestamp": {"seconds": 1461035896, "microseconds": 512210}, "event":
> "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len":
> 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}}
> 
> Here execute info block before drive-virtio-disk0 block job finished, got
> below result:
> (qemu) info block
> drive-virtio-disk0: /home/disk0-sn1 (qcow2)
>     Cache mode:       writeback
>     Backing file:     /home/RHEL-Server-7.3-64-virtio.raw (chain depth: 1)
> 
> drive-virtio-disk1: /home/disk1-sn1 (qcow2)
>     Cache mode:       writeback, direct

Comment 38 Qianqian Zhu 2016-05-23 06:36:52 UTC
Verified with:
qemu-kvm-rhev-2.6.0-2.el7.x86_64
kernel-3.10.0-401.el7.x86_64

Steps:
1. Launch guest:
/usr/libexec/qemu-kvm -cpu host -m 1024 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/RHEL-Server-7.3-64-virtio.raw,format=raw,if=none,id=drive-virtio-disk0,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :1 -monitor stdio -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x4 -qmp tcp:0:5555,server,nowait -drive file=/home/disk1,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/home/disk2,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/home/disk3,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/home/disk0,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4

2. Generate snapshots with transaction:
{ "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk0",
"snapshot-file": "/home/disk0-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk1", "snapshot-file": "/home/disk1-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk2", "snapshot-file": "/home/disk2-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk3",
"snapshot-file": "/home/disk3-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk4", "snapshot-file": "/home/disk4-sn1", "format": "qcow2" } }] } }

3. block stream simultaneously:
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-disk0", "on-error": "report" } }
{"return": {}}
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1", "on-error": "report" } }
{"return": {}}

Results:
qemu does not hang, later job could finished before previous job, and info block could give correct information before all block job complete.

{"timestamp": {"seconds": 1463981812, "microseconds": 379183}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "stream"}}
{"timestamp": {"seconds": 1463981996, "microseconds": 325298}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}}

(qemu) info block
drive-virtio-disk0: /home/disk0-sn1 (qcow2)
    Cache mode:       writeback
    Backing file:     /home/RHEL-Server-7.3-64-virtio.raw (chain depth: 1)

drive-virtio-disk1: /home/disk1-sn1 (qcow2)
    Cache mode:       writeback, direct

Comment 40 Ala Hino 2016-07-17 11:40:40 UTC
*** Bug 1349735 has been marked as a duplicate of this bug. ***

Comment 41 Qianqian Zhu 2016-08-15 07:16:43 UTC
Moving to verified as per Comment 38.

Comment 43 errata-xmlrpc 2016-11-07 21:01:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.