Bug 1202704
Summary: | libvirt doesn't deal with the block job info correctly when fail to do active blockcommit | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Shanzhi Yu <shyu> | ||||||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 7.1 | CC: | dyuan, eblake, mst, mzhan, pkrempa, rbalakri, xuzhang, yanyang | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libvirt-1.2.15-1.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2015-11-19 06:20:36 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Shanzhi Yu
2015-03-17 09:24:26 UTC
Created attachment 1002725 [details]
script to reproduce this bug
run the script multiple times, will always met such error + virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot Block Commit: [100 %]error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed Then check the block job info. # virsh blockjob testvm3 vda Block Commit: [100 %] # virsh blockjob testvm3 vda --pivot error: Requested operation is not valid: pivot of disk 'vda' requires an active copy job # virsh dumpxml testvm3 |grep disk -A 14 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/c/../b/../a/a'/> <backingStore type='network' index='1'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/r7-qcow2.img'> <host name='10.66.5.38'/> </source> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> # virsh snapshot-create-as testvm3 --no-metadata --disk-only error: internal error: unable to execute QEMU command 'transaction': Device 'drive-virtio-disk0' is busy: block device is in use by block job: commit # virsh blockjob testvm3 vda --abort # virsh snapshot-create-as testvm3 --no-metadata --disk-only Domain snapshot 1426566021 created The following upstream series should fix the problem: commit 634285f9c1c29ebad271d34463be5ca24cbf7159 Author: Peter Krempa <pkrempa> Date: Wed Apr 1 19:17:11 2015 +0200 qemu: Refactor qemuDomainBlockJobAbort() Change few variable names and refactor the code flow. As an additional bonus the function now fails if the event state is not as expected. commit 8a609afb6f2a0c28ee0985a48f04bd018f46bcc1 Author: Peter Krempa <pkrempa> Date: Wed Apr 1 19:00:20 2015 +0200 qemu: drivePivot: Fix assumption when 'block-job-complete' fails QEMU does not abandon the mirror. The job carries on in the synchronised phase and it might be either pivoted again or cancelled. The commit hints that the described behavior was happening in a downstream version. If the command returns false there are two possible options: 1) qemu did not reach the point where it would ask the block job to pivot 2) pivotting failed in the actual qemu coroutine If either of those would happen we return failure and reset the condition that waits for the block job to complete. This makes the API fail but in case where qemu would actually abandon the mirror the fact is notified via the event and handled asynchronously. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202704 commit 065a81082d55f9c521b8be954d35229a633b7f92 Author: Peter Krempa <pkrempa> Date: Wed Apr 1 11:45:35 2015 +0200 qemu: blockPull: Refactor the rest of qemuDomainBlockJobImpl Since it now handles only block pull code paths we can refactor it and remove tons of cruft. commit cfc0a3d4ce6b1075e50c3fa3ef4f1e29445b72f3 Author: Peter Krempa <pkrempa> Date: Wed Apr 1 10:40:06 2015 +0200 qemu: blockjob: Separate qemuDomainBlockJobAbort from qemuDomainBlockJobImpl Sacrifice a few lines of code in favor of the code being more readable. commit 1344a74ef2ad683f0bac06131c8417a828e55511 Author: Peter Krempa <pkrempa> Date: Wed Apr 1 09:47:04 2015 +0200 qemu: blockjob: Split qemuDomainBlockJobSetSpeed from qemuDomainBlockJobImpl qemuDomainBlockJobImpl become an unmaintainable mess over the years of adding new stuff to it. This patch starts splitting up individual functions from it until it can be killed entirely. In bulk this will add lines of code rather than delete them but it will be traded for maintainability. commit 7db64d6b0ae23798e9162588052818b54f62574b Author: Peter Krempa <pkrempa> Date: Tue Mar 31 17:13:21 2015 +0200 qemu: monitor: Extract handling of JSON block job error codes My intention is to split qemuMonitorJSONBlockJob() into simpler separate functions for every block job type. Since the error handling code is the same for all block jobs, this patch extracts the code into a separate function that will later be reused in more places. With the new helper qemuMonitorJSONErrorIsClass we can save a few function calls as we can extract the error object once. commit 72613b18ac91add555688acc109f4171bdef4061 Author: Peter Krempa <pkrempa> Date: Thu Apr 9 11:26:43 2015 +0200 qemu: monitor: json: Refactor error code class checker Split out the function that checks the actual error class string into a separate helper as it will be useful later and refactor qemuMonitorJSONHasError to return bool type and remove few useless checks. Basically virJSONValueObjectHasKey are useless here since the next call to virJSONValueObjectGet is checking the return value again (which can't fail at that point). By removing the first check we save a function call. v1.2.14-143-g634285f Peter, Blockcommit with option pivot still returns error when using the libvirt built with v1.2.14-143-g634285f. It seems that the issue is NOT fixed. libvirt Version libvirt-1.2.15-1.el7.v1.2.14.143.g634285f.x86_64 #virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot Block Commit: [100 %]error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed Fortunately, the info got from blockjob is right and pivot can be done # virsh blockjob testvm3 vda Active Block Commit: [100 %] #virsh dumpxml testvm3 | grep disk -a10 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/../c/c'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/b'/> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> <backingStore type='network' index='3'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> </backingStore> </backingStore> <mirror type='file' job='active-commit' ready='yes'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/b'/> </mirror> <target dev='vda' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> # virsh blockjob testvm3 vda --pivot # virsh dumpxml testvm3 | grep disk -a10 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/../c/../b/b'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> <backingStore type='network' index='2'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> </backingStore> <target dev='vda' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> (In reply to yangyang from comment #4) > Peter, > Blockcommit with option pivot still returns error when using the libvirt > built with v1.2.14-143-g634285f. It seems that the issue is NOT fixed. This is tracked under bz#1197592 > > libvirt Version > libvirt-1.2.15-1.el7.v1.2.14.143.g634285f.x86_64 > > #virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot > Block Commit: [100 %]error: failed to pivot job for disk vda > error: internal error: unable to execute QEMU command 'block-job-complete': > The active block job for device 'drive-virtio-disk0' cannot be completed > > Fortunately, the info got from blockjob is right and pivot can be done > # virsh blockjob testvm3 vda > Active Block Commit: [100 %] This is actually correct. After the failed pivot, previous libvirt version would think that the block job has failed and you would not be able to pivot again. Verified on libvirt-1.2.15-2.el7.x86_64 1. do blockcommit by running blockcommit script for several times # virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot Block Commit: [100 %]error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed # virsh blockjob testvm3 vda Active Block Commit: [100 %] # virsh dumpxml testvm3 | grep disk -a10 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> <backingStore type='network' index='1'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> <mirror type='network' job='active-commit' ready='yes'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> </mirror> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> test bandwidth # virsh blockjob testvm3 vda 10 [root@rhel7_test yy]# virsh blockjob testvm3 vda Active Block Commit: [100 %] Bandwidth limit: 10485760 bytes/s (10.000 MiB/s) test blockjobAbort # virsh blockjob testvm3 vda --abort --async [root@rhel7_test yy]# virsh blockjob testvm3 vda No current block job for vda # virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot Block Commit: [100 %]error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed # virsh dumpxml testvm3 | grep disk -a20 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/../c/../b/b'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> <backingStore type='network' index='2'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> </backingStore> <mirror type='file' job='active-commit' ready='yes'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> </mirror> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> test blockjobPivot # virsh blockjob testvm3 vda --pivot [root@rhel7_test yy]# virsh blockjob testvm3 vda No current block job for vda # virsh dumpxml testvm3 | grep disk -a20 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> <backingStore type='network' index='1'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> 2. do blockpull by running blockpull script Formatting 'a', fmt=qcow2 size=5368709120 backing_file='gluster+rdma://10.66.4.164/gluster-vol1/rhel7.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off Formatting 'b', fmt=qcow2 size=5368709120 backing_file='../a/a' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off Formatting 'c', fmt=qcow2 size=5368709120 backing_file='../b/b' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off Formatting 'd', fmt=qcow2 size=5368709120 backing_file='../c/c' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off Domain testvm3 created from /dev/stdin <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static'>1</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type> <boot dev='hd'/> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/d'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/tmp/images/d/../c/c'/> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/b'/> <backingStore type='file' index='3'> <format type='qcow2'/> <source file='/tmp/images/d/../c/../b/../a/a'/> <backingStore type='network' index='4'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> </backingStore> </backingStore> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'> <listen type='address' address='127.0.0.1'/> </graphics> <video> <model type='cirrus' vram='16384' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> Block Pull: [100 %] Pull complete Block Pull: [100 %] Pull complete Block Pull: [100 %] Pull complete # virsh dumpxml testvm3 | grep disk -a10 <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/tmp/images/d/d'/> <backingStore type='network' index='1'> <format type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164' transport='rdma'/> </source> <backingStore/> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </disk> As blockjob can be aborted and pivoted when block-job-complete fails and blockpull also works fine, mark it as verified. Created attachment 1030390 [details]
script for testing blockcommit
Created attachment 1030391 [details]
script for testing blockpull
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |