Bug 1319400
Summary: | live merge - qemu-kvm hangs in aio_bh_poll | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Markus Stockhausen <mst> | ||||||||||||||
Component: | qemu-kvm-rhev | Assignee: | Jeff Cody <jcody> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Qianqian Zhu <qizhu> | ||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||
Priority: | high | ||||||||||||||||
Version: | 7.2 | CC: | acanan, ahino, alitke, amureini, bmcclain, chayang, gveitmic, huding, jherrman, juzhang, knoel, michal.skrivanek, mkalinin, mrezanin, pax, pezhang, qizhu, rmcswain, virt-maint, xfu, ykaul | ||||||||||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||
OS: | Unspecified | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | QEMU 2.4 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: |
During the block-stream job of a live merge operation, the QEMU process in some cases became unresponsive. This update improves the handling of block jobs, which ensures that QEMU stays responsive during block-stream jobs as expected.
|
Story Points: | --- | ||||||||||||||
Clone Of: | |||||||||||||||||
: | 1329543 (view as bug list) | Environment: | |||||||||||||||
Last Closed: | 2016-11-07 21:01:06 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 1329543, 1346429 | ||||||||||||||||
Attachments: |
|
Description
Markus Stockhausen
2016-03-19 13:00:44 UTC
Start options qemu 7085 1 99 Mär18 ? 1-19:36:49 /usr/libexec/qemu-kvm -name cishanar00 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Westmere -m size=201326592k,slots=16,maxmem=4294967296k -realtime mlock=off -smp 12,maxcpus=96,sockets=16,cores=6,threads=1 -numa node,nodeid=0,cpus=0-11,mem=196608 -uuid 542affb0-7a2f-4297-84f4-728a315a6abd -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-2.1511.el7.centos.2.10,serial=C83278D1-8EB2-11E2-B97F-D48CB5BC5356,uuid=542affb0-7a2f-4297-84f4-728a315a6abd -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-cishanar00/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-03-18T14:16:13,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/dcd5456b-e50a-4fe2-9cc2-614a0764dfc1/c8633a81-7d50-4837-a7b5-5a696abace13,if=none,id=drive-virtio-disk0,format=qcow2,serial=dcd5456b-e50a-4fe2-9cc2-614a0764dfc1,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9/af580390-a0bf-48b3-94a1-3796f5708eba,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/19644ad4-fe32-4b12-8bfb-d75a16fec85a/f20c7009-5c57-43f7-972f-3e6ce0c14fed,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/f105f1cc-ab59-4c62-a465-e3ef33886c17/86ed6e0d-a68d-4d40-b3e4-47e65f1a4b14,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/f8ebfb39-2ac6-4b87-b193-4204d1854edc/bbd05dd8-c3bf-4d15-9317-73040e04abae,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/4e708243-97c3-4279-a32a-cb463e5e43a7/e24aeb86-f621-46d5-8960-9890e4c61da5,if=none,id=drive-virtio-disk5,format=qcow2,serial=4e708243-97c3-4279-a32a-cb463e5e43a7,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xd,drive=drive-virtio-disk5,id=virtio-disk5 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/7e3906ad-7c6b-471d-8589-4cc74ba81b1f/522610e7-35cf-45fd-a0e5-b7d3d2cf8750,if=none,id=drive-virtio-disk6,format=qcow2,serial=7e3906ad-7c6b-471d-8589-4cc74ba81b1f,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xe,drive=drive-virtio-disk6,id=virtio-disk6 -drive file=/rhev/data-center/94ed7a19-fade-4bd6-83f2-2cbb2f730b95/c86d8409-4dd6-4e30-86dc-b5175a7ceb86/images/4d00064a-ee20-4f3c-aa48-9cd303af7d8d/f58ac0cf-4bf7-4778-b813-49a148492de7,if=none,id=drive-virtio-disk7,format=qcow2,serial=4d00064a-ee20-4f3c-aa48-9cd303af7d8d,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xf,drive=drive-virtio-disk7,id=virtio-disk7 -netdev tap,fd=37,id=hostnet0,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:ee:d3:35,bus=pci.0,addr=0x4 -netdev tap,fd=39,id=hostnet1,vhost=on,vhostfd=40 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:1a:4a:ee:d3:36,bus=pci.0,addr=0x5 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/542affb0-7a2f-4297-84f4-728a315a6abd.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/542affb0-7a2f-4297-84f4-728a315a6abd.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -device usb-tablet,id=input0 -vnc 192.168.11.42:9,share=ignore,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -msg timestamp=on top shows: Tasks: 626 total, 4 running, 622 sleeping, 0 stopped, 0 zombie %Cpu(s): 3,1 us, 3,3 sy, 0,0 ni, 93,5 id, 0,1 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 26384976+total, 20887500+free, 46775796 used, 8198952 buff/cache KiB Swap: 20971516 total, 20971516 free, 0 used. 21540172+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7085 qemu 20 0 0,189t 0,040t 12160 R 100,3 16,4 2617:44 qemu-kvm 1615 root 15 -5 5868820 46256 10736 S 80,8 0,0 271:59.12 supervdsmServer ... OVirt engine log: 18.03.2016 14:53:21 VM cishanar00 is not responding. 18.03.2016 14:52:50 VM cishanar00 is not responding. 18.03.2016 14:52:20 VM cishanar00 is not responding. 18.03.2016 14:43:21 VDSM colovn03 command failed: Volume does not exist: ('af580390-a0bf-48b3-94a1-3796f5708eba',) 18.03.2016 14:36:22 VDSM colovn03 command failed: Volume does not exist: ('f58ac0cf-4bf7-4778-b813-49a148492de7',) 18.03.2016 14:33:32 Snapshot 'vor_update_20160122' deletion for VM 'cishanar00' was initiated by admin@internal. Created attachment 1138022 [details]
VM CPU usage
VM was started at 14:15. Live merge started at 14:33. CPU usage of VM (qemu) on Ovirt node attached. For some reason the stalled VM did extensive disk io. Graphs of qemu process attached. Created attachment 1138023 [details]
Disk Write
Created attachment 1138024 [details]
Disk Read
Before live merge we had the following disks/states: Disk1: Snapshot yes Disk2: Snapshot no (af580390-a0bf-48b3-94a1-3796f5708eba) Disk3: Snapshot yes Disk4: Snapshot yes Disk5: Snapshot yes Disk6: Snapshot yes Disk7: Snapshot yes Disk8: Snapshot no (f58ac0cf-4bf7-4778-b813-49a148492de7) So the message about missing files (see above) should be ok. Currently all snapshot files (raw+cow2) files are still on disk. 8 raw files + 6 cow2 files. Nevertheless the OVirt webinterface gives illegal state for some of the snapshot files. See screenshot. Created attachment 1138026 [details]
Snap state
Shortened VDSM log attached Created attachment 1138039 [details]
VDSM
Created attachment 1138040 [details]
Engine log
VM cishanar00 is running SLES11 SP4. Log inside VM shows CPU softlockup shortly after start of live merge: Mar 18 14:34:45 cishanar00 kernel: [ 1072.248023] BUG: soft lockup - CPU#3 stuck for 22s! [PoolThread:4277] Mar 18 14:34:45 cishanar00 kernel: [ 1072.248026] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod joydev virtio_balloon virtio_net sg virtio_scsi ipv6_lib virtio_console rtc_cmos sr_mod cdrom i2c_piix4 floppy serio_raw button pcspkr container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore intel_agp intel_gtt usb_common scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Supported: Yes Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] CPU 3 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod joydev virtio_balloon virtio_net sg virtio_scsi ipv6_lib virtio_console rtc_cmos sr_mod cdrom i2c_piix4 floppy serio_raw button pcspkr container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore intel_agp intel_gtt usb_common scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Supported: Yes Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Pid: 4277, comm: PoolThread Not tainted 3.0.101-63-default #1 oVirt oVirt Node Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RIP: 0010:[<ffffffff8103d832>] [<ffffffff8103d832>] flush_tlb_others_ipi+0x122/0x140 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RSP: 0018:ffff882e17ed78e8 EFLAGS: 00000246 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RAX: 0000000000000000 RBX: ffff882e17ed4af8 RCX: ffffffff81d88a98 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RDX: 0000000000000000 RSI: 000000000000007e RDI: ffffffff81d88a98 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] RBP: ffffffff81d88a98 R08: 0000000000000001 R09: ffffffff81d88aa0 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146fdee Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] R13: 0000000000000296 R14: 0000000000000002 R15: ffff88303ff99e00 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] FS: 00007f3e69112700(0000) GS:ffff88303e660000(0000) knlGS:0000000000000000 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] CR2: 00007f37b1b3c000 CR3: 0000002e548eb000 CR4: 00000000000006e0 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Process PoolThread (pid: 4277, threadinfo ffff882e17ed6000, task ffff882e17ed4240) Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Stack: Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] ffff882f40b6bc00 ffff882f18ee1280 00007f37b1b3c000 ffffea0095abc900 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] ffff8829bde561f8 8000000001d37225 ffff882c684279e0 ffffffff8103d984 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] 0000000000000000 8000000001d37225 0000000000000000 ffffffff8113194b Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Call Trace: Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8103d984>] flush_tlb_page+0x54/0x90 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff81122caa>] do_wp_page+0x31a/0x830 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff81467bf5>] async_page_fault+0x25/0x30 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff810fabd9>] file_read_actor+0x49/0x190 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff810fc893>] do_generic_file_read+0x363/0x490 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff810fd3ec>] generic_file_aio_read+0xfc/0x260 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffffa03bb5a9>] xfs_file_aio_read+0x1d9/0x3a0 [xfs] Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8115de37>] do_sync_read+0xd7/0x120 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8115e5e7>] vfs_read+0xc7/0x130 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8115e753>] sys_read+0x53/0xa0 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<00007f3f0b626b9d>] 0x7f3f0b626b9c Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Code: 42 00 e9 55 ff ff ff 66 90 48 8b 05 a1 5c b9 00 41 8d b7 cf 00 00 00 48 89 ef ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 7c 96 b9 00 48 89 ef e8 40 83 22 00 85 c0 74 ec e9 66 ff Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] Call Trace: Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8103d984>] flush_tlb_page+0x54/0x90 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff81122caa>] do_wp_page+0x31a/0x830 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff81467bf5>] async_page_fault+0x25/0x30 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff810fabd9>] file_read_actor+0x49/0x190 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff810fc893>] do_generic_file_read+0x363/0x490 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff810fd3ec>] generic_file_aio_read+0xfc/0x260 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffffa03bb5a9>] xfs_file_aio_read+0x1d9/0x3a0 [xfs] Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8115de37>] do_sync_read+0xd7/0x120 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8115e5e7>] vfs_read+0xc7/0x130 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8115e753>] sys_read+0x53/0xa0 Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b Mar 18 14:34:45 cishanar00 kernel: [ 1072.248052] [<00007f3f0b626b9d>] 0x7f3f0b626b9c <<< log ends here! I can hit this issue as well by using qemu-kvm-rhev-2.3.0-31.el7.x86_64&kernel-3.10.0-327.el7.x86_64. Same steps and results as per Comment 18. Hi, today we had a new maintenance window to repeat the snapshot merge. This time the process finished. Nevertheless filesystems inside the VM could not be accessed as soon as the merge started. VM was not usable during merge operation. But it became repsonsive after smerge finished. From my point of view we have two issues here: 1) VM disk I/O hangs during live merge. CPU of VM raises to 100%. 2) Live merge might stall. That means: merge gets slower and slower. For case 1 (VM CPU hang) I can give additional info. The third time (forwatever reason) we reproduced the error, only one CPU got stuck. Nevertheless it seems to hang because of the same reason. Stack either gives: [ 1964.320037] BUG: soft lockup - CPU#7 stuck for 22s! [PoolThread:4049] [ 1964.320042] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod [ 1964.320095] Supported: Yes [ 1964.320097] CPU 7 [ 1964.320098] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod [ 1964.320119] Supported: Yes [ 1964.320121] [ 1964.320121] Pid: 4049, comm: PoolThread Not tainted 3.0.101-63-default #1 oVirt oVirt Node [ 1964.320121] RIP: 0010:[<ffffffff81265b97>] [<ffffffff81265b97>] __bitmap_empty+0x17/0x90 [ 1964.320121] RSP: 0018:ffff8817a5317b50 EFLAGS: 00000202 [ 1964.320121] RAX: 00000000000000bd RBX: ffff88179c4cacf8 RCX: ffffffff81d8ad98 [ 1964.320121] RDX: 0000000000000000 RSI: 000000000000007e RDI: ffffffff81d8ad98 [ 1964.320121] RBP: ffffffff81d8ad98 R08: 0000000000000001 R09: ffffffff81d8ada0 [ 1964.320121] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146fdee [ 1964.320121] R13: ffff88183f6109b0 R14: 0000000300000002 R15: ffff88303ff99e80 [ 1964.320121] FS: 00007fddd0264700(0000) GS:ffff88303ee20000(0000) knlGS:0000000000000000 [ 1964.320121] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1964.320121] CR2: 00007fdde097c000 CR3: 0000002fad376000 CR4: 00000000000006e0 [ 1964.320121] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1964.320121] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1964.320121] Process PoolThread (pid: 4049, threadinfo ffff8817a5316000, task ffff88179c4ca440) [ 1964.320121] Stack: [ 1964.320121] ffffffff8103d840 ffff882fbfe2c320 ffff882fad2319c0 00007fdde097c000 [ 1964.320121] ffffea009f60d578 ffff882fabb59870 8000000001d37225 ffff881751cf3be0 [ 1964.320121] ffffffff8103d984 0000000000000000 8000000001d37225 0000000000000000 [ 1964.320121] Call Trace: [ 1964.320121] [<ffffffff8103d840>] flush_tlb_others_ipi+0x130/0x140 [ 1964.320121] [<ffffffff8103d984>] flush_tlb_page+0x54/0x90 [ 1964.320121] [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10 [ 1964.320121] [<ffffffff81122caa>] do_wp_page+0x31a/0x830 [ 1964.320121] [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230 [ 1964.320121] [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0 [ 1964.320121] [<ffffffff81467bf5>] async_page_fault+0x25/0x30 [ 1964.320121] [<ffffffff81262f6d>] copy_user_generic_string+0x2d/0x40 [ 1964.320121] [<ffffffff8117ed91>] seq_read+0x291/0x3b0 [ 1964.320121] [<ffffffff811bbc47>] proc_reg_read+0x77/0xc0 [ 1964.320121] [<ffffffff8115e5e7>] vfs_read+0xc7/0x130 [ 1964.320121] [<ffffffff8115e753>] sys_read+0x53/0xa0 [ 1964.320121] [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b Or gives: [ 2300.320032] BUG: soft lockup - CPU#7 stuck for 23s! [PoolThread:4049] [ 2300.320037] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod [ 2300.320082] Supported: Yes [ 2300.320084] CPU 7 [ 2300.320085] Modules linked in: edd nfs lockd fscache auth_rpcgss nfs_acl sunrpc mperf fuse xfs loop dm_mod ipv6_lib i2c_piix4 sg sr_mod virtio_net joydev virtio_console cdrom rtc_cmos virtio_scsi virtio_balloon floppy serio_raw pcspkr button container ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_core sysimgblt sysfillrect syscopyarea uhci_hcd ehci_hcd processor thermal_sys hwmon usbcore usb_common intel_agp intel_gtt scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix libata scsi_mod [ 2300.320115] Supported: Yes [ 2300.320116] [ 2300.320118] Pid: 4049, comm: PoolThread Not tainted 3.0.101-63-default #1 oVirt oVirt Node [ 2300.320118] RIP: 0010:[<ffffffff8103d832>] [<ffffffff8103d832>] flush_tlb_others_ipi+0x122/0x140 [ 2300.320118] RSP: 0018:ffff8817a5317b58 EFLAGS: 00000246 [ 2300.320118] RAX: 0000000000000000 RBX: ffff88179c4cacf8 RCX: ffffffff81d8ad98 [ 2300.320118] RDX: 0000000000000000 RSI: 000000000000007e RDI: ffffffff81d8ad98 [ 2300.320118] RBP: ffffffff81d8ad98 R08: 0000000000000001 R09: ffffffff81d8ada0 [ 2300.320118] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146fdee [ 2300.320118] R13: ffff88183f6109b0 R14: 0000000300000002 R15: ffff88303ff99e80 [ 2300.320118] FS: 00007fddd0264700(0000) GS:ffff88303ee20000(0000) knlGS:0000000000000000 [ 2300.320118] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2300.320118] CR2: 00007fdde097c000 CR3: 0000002fad376000 CR4: 00000000000006e0 [ 2300.320118] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2300.320118] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2300.320118] Process PoolThread (pid: 4049, threadinfo ffff8817a5316000, task ffff88179c4ca440) [ 2300.320118] Stack: [ 2300.320118] ffff882fbfe2c320 ffff882fad2319c0 00007fdde097c000 ffffea009f60d578 [ 2300.320118] ffff882fabb59870 8000000001d37225 ffff881751cf3be0 ffffffff8103d984 [ 2300.320118] 0000000000000000 8000000001d37225 0000000000000000 ffffffff8113194b [ 2300.320118] Call Trace: [ 2300.320118] [<ffffffff8103d984>] flush_tlb_page+0x54/0x90 [ 2300.320118] [<ffffffff8113194b>] ptep_clear_flush+0xb/0x10 [ 2300.320118] [<ffffffff81122caa>] do_wp_page+0x31a/0x830 [ 2300.320118] [<ffffffff811242bd>] handle_pte_fault+0x1cd/0x230 [ 2300.320118] [<ffffffff8146afbd>] do_page_fault+0x1fd/0x4c0 [ 2300.320118] [<ffffffff81467bf5>] async_page_fault+0x25/0x30 [ 2300.320118] [<ffffffff81262f6d>] copy_user_generic_string+0x2d/0x40 [ 2300.320118] [<ffffffff8117ed91>] seq_read+0x291/0x3b0 [ 2300.320118] [<ffffffff811bbc47>] proc_reg_read+0x77/0xc0 [ 2300.320118] [<ffffffff8115e5e7>] vfs_read+0xc7/0x130 [ 2300.320118] [<ffffffff8115e753>] sys_read+0x53/0xa0 [ 2300.320118] [<ffffffff8146f3f2>] system_call_fastpath+0x16/0x1b [ 2300.320118] [<00007fde67829b9d>] 0x7fde67829b9c In between we investigated further. 1) We upgraded the OS inside the VM to SLES 12 (based on kernel 3.12). 2) As VM has large memory (192GB) we used OVirt NUMA pinning. But same problem again. VM stalls immediately after starting the live merge. Error gives: [ 304.340014] BUG: soft lockup - CPU#8 stuck for 22s! [sapstartsrv:4791] [ 304.340014] Modules linked in: binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache iscsi_ibft iscsi_boot_sysfs af_packet ppdev joydev xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel(X) aesni_intel aes_x86_64 virtio_net virtio_console lrw gf128mul parport_pc glue_helper ablk_helper cryptd virtio_balloon parport i2c_piix4 processor serio_raw pcspkr button dm_mod ext4 crc16 mbcache jbd2 hid_generic usbhid sr_mod cdrom ata_generic virtio_blk virtio_scsi ata_piix ahci cirrus libahci syscopyarea sysfillrect sysimgblt drm_kms_helper ttm uhci_hcd ehci_hcd usbcore drm libata virtio_pci virtio_ring virtio usb_common floppy sg scsi_mod autofs4 [ 304.340014] Supported: Yes [ 304.340014] CPU: 8 PID: 4791 Comm: sapstartsrv Tainted: G X 3.12.28-4-default #1 [ 304.340014] Hardware name: oVirt oVirt Node, BIOS seabios-1.7.5-11.el7 04/01/2014 [ 304.340014] task: ffff882fc18d8080 ti: ffff882fc3eee000 task.ti: ffff882fc3eee000 [ 304.340014] RIP: 0010:[<ffffffff810c24e2>] [<ffffffff810c24e2>] generic_exec_single+0x72/0x90 [ 304.340014] RSP: 0018:ffff882fc3eefd08 EFLAGS: 00000202 [ 304.340014] RAX: 000000000000007e RBX: 000000000000007e RCX: 0000000000000002 [ 304.340014] RDX: 3fffffffffffffff RSI: 000000000000007e RDI: 0000000000000286 [ 304.340014] RBP: ffff88303ee33d40 R08: ffffffff8160c3c8 R09: 0000000000000001 [ 304.340014] R10: ffff882fc5185598 R11: ffffea00a72d9460 R12: 0000000000000002 [ 304.340014] R13: 000000000000007e R14: ffffffff8160c3c8 R15: 0000000000000001 [ 304.340014] FS: 00007f4630bb2700(0000) GS:ffff88303ee40000(0000) knlGS:0000000000000000 [ 304.340014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 304.340014] CR2: 00007f46306ae034 CR3: 00000017c2ff2000 CR4: 00000000000006e0 [ 304.340014] Stack: [ 304.340014] 0000000000000007 ffff882fc3eefda0 0000000000000008 ffffffff81b2ac80 [ 304.340014] ffffffff81049300 ffff8817c5788780 ffffffff810c25e5 ffffffff81297edf [ 304.340014] ffffffff81b2ac80 ffff88303ee33d40 ffff88303ee33d40 ffffffff81049300 [ 304.340014] Call Trace: [ 304.340014] [<ffffffff810c25e5>] smp_call_function_single+0xe5/0x190 [ 304.340014] [<ffffffff8104949f>] native_flush_tlb_others+0x8f/0xa0 [ 304.340014] [<ffffffff81049658>] flush_tlb_mm_range+0x148/0x180 [ 304.340014] [<ffffffff8115af1e>] tlb_flush_mmu.part.68+0x2e/0x80 [ 304.340014] [<ffffffff8115b5a5>] tlb_finish_mmu+0x45/0x50 [ 304.340014] [<ffffffff81162da1>] unmap_region+0xb1/0xf0 [ 304.340014] [<ffffffff8116503e>] do_munmap+0x21e/0x3a0 [ 304.340014] [<ffffffff8121d721>] SyS_shmdt+0xb1/0x170 [ 304.340014] [<ffffffff81519329>] system_call_fastpath+0x16/0x1b [ 304.340014] [<00007f46349880c7>] 0x7f46349880c6 [ 304.340014] Code: 48 89 c6 48 89 5d 08 4c 89 ef 48 89 2b 48 89 53 08 48 89 1a e8 d0 e9 44 00 4c 39 fd 74 1e 45 85 e4 75 08 eb 0c 0f 1f 40 00 f3 90 <f6> 43 20 01 75 f8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f7 ff 2 backtraces added that give more details about the __lll_lock_wait threads. 1) (gdb) bt full #0 0x00007fc5e9f27f4d in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fc5e9f23d02 in _L_lock_791 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00007fc5e9f23c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fc5eb66f4d9 in qemu_mutex_lock (mutex=mutex@entry=0x7fc5ebb286e0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73 err = <optimized out> __func__ = "qemu_mutex_lock" #4 0x00007fc5eb4282bc in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.3.0/cpus.c:1120 No locals. #5 0x00007fc5eb439285 in kvm_cpu_exec (cpu=cpu@entry=0x7fc5ed1c4000) at /usr/src/debug/qemu-2.3.0/kvm-all.c:1831 run = 0x7fc5eb2e7000 ret = <optimized out> run_ret = 0 #6 0x00007fc5eb426f0a in qemu_kvm_cpu_thread_fn (arg=0x7fc5ed1c4000) at /usr/src/debug/qemu-2.3.0/cpus.c:944 cpu = 0x7fc5ed1c4000 r = <optimized out> #7 0x00007fc5e9f21dc5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #8 0x00007fc5e390128d in clone () from /lib64/libc.so.6 No symbol table info available. 2) (gdb) bt full #0 0x00007fc5e9f27f4d in __lll_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fc5e9f23d02 in _L_lock_791 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00007fc5e9f23c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00007fc5eb66f4d9 in qemu_mutex_lock (mutex=mutex@entry=0x7fc5ebb286e0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73 err = <optimized out> __func__ = "qemu_mutex_lock" #4 0x00007fc5eb4282bc in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.3.0/cpus.c:1120 No locals. #5 0x00007fc5eb439285 in kvm_cpu_exec (cpu=cpu@entry=0x7fc5ed222000) at /usr/src/debug/qemu-2.3.0/kvm-all.c:1831 run = 0x7fc5eb2e4000 ret = <optimized out> run_ret = -4 #6 0x00007fc5eb426f0a in qemu_kvm_cpu_thread_fn (arg=0x7fc5ed222000) at /usr/src/debug/qemu-2.3.0/cpus.c:944 cpu = 0x7fc5ed222000 r = <optimized out> #7 0x00007fc5e9f21dc5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #8 0x00007fc5e390128d in clone () from /lib64/libc.so.6 No symbol table info available. Please note that the main loop tries to bdrv_drain_all() while there are other block commit jobs active. As advised we deleted snapshots disk by disk. I can confirm that this does mitigate the behaviour. The VM stays responsive even when doing disk IO onto the disk that we merge. Seems to be a problem of parallelism. I presume to raise the criticality of this bug to high because it makes OVirt live merge feature nearly unusable. I was able to reproduce this bug locally on 7.2, using similar steps as comment #18. When doing a bdrv_drain_all(), we need to pause running block jobs. There is a fix upstream to enable this (git commits 751ebd7 and 69da3b0): commit 751ebd76e654bd1e65da08ecf694325282b4cfcc Author: Fam Zheng <famz> blockjob: Allow nested pause commit 69da3b0b47c8f6016e9109fcfa608e9e7e99bc05 Author: Fam Zheng <famz> block: Pause block jobs in bdrv_drain_all I have backports for these for 7.2.z; as these are in QEMU as of 2.4.0, there is nothing that needs to be done for RHEV 7.3. Verified with: Steps: 1. launch guest: qemu-kvm -cpu host -m 1024 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/RHEL-Server-7.3-64-virtio.raw,format=raw,if=none,id=drive-virtio-disk0,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :1 -monitor stdio -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x4 -qmp tcp:0:5555,server,nowait -drive file=/home/disk1,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/home/disk2,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/home/disk3,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/home/disk4,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4 -drive file=/home/disk5,if=none,id=drive-virtio-disk5,format=qcow2,serial=4e708243-97c3-4279-a32a-cb463e5e43a7,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xd,drive=drive-virtio-disk5,id=virtio-disk5 -drive file=/home/disk6,if=none,id=drive-virtio-disk6,format=qcow2,serial=7e3906ad-7c6b-471d-8589-4cc74ba81b1f,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xe,drive=drive-virtio-disk6,id=virtio-disk6 -drive file=/home/disk7,if=none,id=drive-virtio-disk7,format=qcow2,serial=4d00064a-ee20-4f3c-aa48-9cd303af7d8d,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xf,drive=drive-virtio-disk7,id=virtio-disk7 2. Generate snapshots with transaction: { "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk0", "snapshot-file": "/home/disk0-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk1", "snapshot-file": "/home/disk1-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk2", "snapshot-file": "/home/disk2-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk3", "snapshot-file": "/home/disk3-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk4", "snapshot-file": "/home/disk4-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk5", "snapshot-file": "/home/disk5-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk6", "snapshot-file": "/home/disk6-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk7", "snapshot-file": "/home/disk7-sn1", "format": "qcow2" } }] } } 3. block stream simultaneously { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk0", "on-error": "report" } } {"return": {}} { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1", "on-error": "report" } } {"return": {}} Result: qemu wont hang, info block could give correct information before block job compeleted. {"timestamp": {"seconds": 1461035747, "microseconds": 13224}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "stream"}} {"timestamp": {"seconds": 1461035896, "microseconds": 512210}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}} Here execute info block before drive-virtio-disk0 block job finished, got below result: (qemu) info block drive-virtio-disk0: /home/disk0-sn1 (qcow2) Cache mode: writeback Backing file: /home/RHEL-Server-7.3-64-virtio.raw (chain depth: 1) drive-virtio-disk1: /home/disk1-sn1 (qcow2) Cache mode: writeback, direct (In reply to qianqianzhu from comment #32) > Verified with: qemu-img-rhev-2.3.0-31.el7_2.12.test.bz1319400.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.12.test.bz1319400.x86_64 kernel-3.10.0-327.el7.x86_64 > Steps: > 1. launch guest: > qemu-kvm -cpu host -m 1024 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive > file=/home/RHEL-Server-7.3-64-virtio.raw,format=raw,if=none,id=drive-virtio- > disk0,werror=stop,rerror=stop -device > virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :1 -monitor > stdio -netdev tap,id=hostnet0,vhost=on -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0, > addr=0x4 -qmp tcp:0:5555,server,nowait -drive > file=/home/disk1,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1- > fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1, > id=virtio-disk1 -drive > file=/home/disk2,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4- > fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2, > id=virtio-disk2 -drive > file=/home/disk3,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc- > ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3, > id=virtio-disk3 -drive > file=/home/disk4,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39- > 2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4, > id=virtio-disk4 -drive > file=/home/disk5,if=none,id=drive-virtio-disk5,format=qcow2,serial=4e708243- > 97c3-4279-a32a-cb463e5e43a7,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xd,drive=drive-virtio-disk5, > id=virtio-disk5 -drive > file=/home/disk6,if=none,id=drive-virtio-disk6,format=qcow2,serial=7e3906ad- > 7c6b-471d-8589-4cc74ba81b1f,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xe,drive=drive-virtio-disk6, > id=virtio-disk6 -drive > file=/home/disk7,if=none,id=drive-virtio-disk7,format=qcow2,serial=4d00064a- > ee20-4f3c-aa48-9cd303af7d8d,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0xf,drive=drive-virtio-disk7, > id=virtio-disk7 > > 2. Generate snapshots with transaction: > { "execute": "transaction", "arguments": { "actions": [ { 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk0", > "snapshot-file": "/home/disk0-sn1", "format": "qcow2" } }, { 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk1", > "snapshot-file": "/home/disk1-sn1", "format": "qcow2" } }, { 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk2", > "snapshot-file": "/home/disk2-sn1", "format": "qcow2" } },{ 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk3", > "snapshot-file": "/home/disk3-sn1", "format": "qcow2" } }, { 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk4", > "snapshot-file": "/home/disk4-sn1", "format": "qcow2" } }, { 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk5", > "snapshot-file": "/home/disk5-sn1", "format": "qcow2" } },{ 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk6", > "snapshot-file": "/home/disk6-sn1", "format": "qcow2" } },{ 'type': > 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk7", > "snapshot-file": "/home/disk7-sn1", "format": "qcow2" } }] } } > > 3. block stream simultaneously > { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk0", > "on-error": "report" } } > {"return": {}} > { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1", > "on-error": "report" } } > {"return": {}} > > Result: > qemu wont hang, info block could give correct information before block job > compeleted. > > {"timestamp": {"seconds": 1461035747, "microseconds": 13224}, "event": > "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len": > 2147483648, "offset": 2147483648, "speed": 0, "type": "stream"}} > {"timestamp": {"seconds": 1461035896, "microseconds": 512210}, "event": > "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": > 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}} > > Here execute info block before drive-virtio-disk0 block job finished, got > below result: > (qemu) info block > drive-virtio-disk0: /home/disk0-sn1 (qcow2) > Cache mode: writeback > Backing file: /home/RHEL-Server-7.3-64-virtio.raw (chain depth: 1) > > drive-virtio-disk1: /home/disk1-sn1 (qcow2) > Cache mode: writeback, direct Verified with: qemu-kvm-rhev-2.6.0-2.el7.x86_64 kernel-3.10.0-401.el7.x86_64 Steps: 1. Launch guest: /usr/libexec/qemu-kvm -cpu host -m 1024 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/RHEL-Server-7.3-64-virtio.raw,format=raw,if=none,id=drive-virtio-disk0,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -vnc :1 -monitor stdio -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x4 -qmp tcp:0:5555,server,nowait -drive file=/home/disk1,if=none,id=drive-virtio-disk1,format=qcow2,serial=531fa1b1-fbd0-42ca-9f6b-cf764c91f8a9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/home/disk2,if=none,id=drive-virtio-disk2,format=qcow2,serial=19644ad4-fe32-4b12-8bfb-d75a16fec85a,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/home/disk3,if=none,id=drive-virtio-disk3,format=qcow2,serial=f105f1cc-ab59-4c62-a465-e3ef33886c17,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xb,drive=drive-virtio-disk3,id=virtio-disk3 -drive file=/home/disk0,if=none,id=drive-virtio-disk4,format=qcow2,serial=f8ebfb39-2ac6-4b87-b193-4204d1854edc,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xc,drive=drive-virtio-disk4,id=virtio-disk4 2. Generate snapshots with transaction: { "execute": "transaction", "arguments": { "actions": [ { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk0", "snapshot-file": "/home/disk0-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk1", "snapshot-file": "/home/disk1-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk2", "snapshot-file": "/home/disk2-sn1", "format": "qcow2" } },{ 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk3", "snapshot-file": "/home/disk3-sn1", "format": "qcow2" } }, { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "drive-virtio-disk4", "snapshot-file": "/home/disk4-sn1", "format": "qcow2" } }] } } 3. block stream simultaneously: { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk0", "on-error": "report" } } {"return": {}} { "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1", "on-error": "report" } } {"return": {}} Results: qemu does not hang, later job could finished before previous job, and info block could give correct information before all block job complete. {"timestamp": {"seconds": 1463981812, "microseconds": 379183}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "stream"}} {"timestamp": {"seconds": 1463981996, "microseconds": 325298}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream"}} (qemu) info block drive-virtio-disk0: /home/disk0-sn1 (qcow2) Cache mode: writeback Backing file: /home/RHEL-Server-7.3-64-virtio.raw (chain depth: 1) drive-virtio-disk1: /home/disk1-sn1 (qcow2) Cache mode: writeback, direct *** Bug 1349735 has been marked as a duplicate of this bug. *** Moving to verified as per Comment 38. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |