Bug 1202704
| Summary: | libvirt doesn't deal with the block job info correctly when fail to do active blockcommit | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Shanzhi Yu <shyu> | ||||||||
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 7.1 | CC: | dyuan, eblake, mst, mzhan, pkrempa, rbalakri, xuzhang, yanyang | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | libvirt-1.2.15-1.el7 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2015-11-19 06:20:36 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Shanzhi Yu
2015-03-17 09:24:26 UTC
Created attachment 1002725 [details]
script to reproduce this bug
run the script multiple times, will always met such error
+ virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed
Then check the block job info.
# virsh blockjob testvm3 vda
Block Commit: [100 %]
# virsh blockjob testvm3 vda --pivot
error: Requested operation is not valid: pivot of disk 'vda' requires an active copy job
# virsh dumpxml testvm3 |grep disk -A 14
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/c/../b/../a/a'/>
<backingStore type='network' index='1'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/r7-qcow2.img'>
<host name='10.66.5.38'/>
</source>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
# virsh snapshot-create-as testvm3 --no-metadata --disk-only
error: internal error: unable to execute QEMU command 'transaction': Device 'drive-virtio-disk0' is busy: block device is in use by block job: commit
# virsh blockjob testvm3 vda --abort
# virsh snapshot-create-as testvm3 --no-metadata --disk-only
Domain snapshot 1426566021 created
The following upstream series should fix the problem:
commit 634285f9c1c29ebad271d34463be5ca24cbf7159
Author: Peter Krempa <pkrempa>
Date: Wed Apr 1 19:17:11 2015 +0200
qemu: Refactor qemuDomainBlockJobAbort()
Change few variable names and refactor the code flow. As an additional
bonus the function now fails if the event state is not as expected.
commit 8a609afb6f2a0c28ee0985a48f04bd018f46bcc1
Author: Peter Krempa <pkrempa>
Date: Wed Apr 1 19:00:20 2015 +0200
qemu: drivePivot: Fix assumption when 'block-job-complete' fails
QEMU does not abandon the mirror. The job carries on in the synchronised
phase and it might be either pivoted again or cancelled. The commit
hints that the described behavior was happening in a downstream version.
If the command returns false there are two possible options:
1) qemu did not reach the point where it would ask the block job to
pivot
2) pivotting failed in the actual qemu coroutine
If either of those would happen we return failure and reset the
condition that waits for the block job to complete. This makes the API
fail but in case where qemu would actually abandon the mirror the fact
is notified via the event and handled asynchronously.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202704
commit 065a81082d55f9c521b8be954d35229a633b7f92
Author: Peter Krempa <pkrempa>
Date: Wed Apr 1 11:45:35 2015 +0200
qemu: blockPull: Refactor the rest of qemuDomainBlockJobImpl
Since it now handles only block pull code paths we can refactor it and
remove tons of cruft.
commit cfc0a3d4ce6b1075e50c3fa3ef4f1e29445b72f3
Author: Peter Krempa <pkrempa>
Date: Wed Apr 1 10:40:06 2015 +0200
qemu: blockjob: Separate qemuDomainBlockJobAbort from qemuDomainBlockJobImpl
Sacrifice a few lines of code in favor of the code being more readable.
commit 1344a74ef2ad683f0bac06131c8417a828e55511
Author: Peter Krempa <pkrempa>
Date: Wed Apr 1 09:47:04 2015 +0200
qemu: blockjob: Split qemuDomainBlockJobSetSpeed from qemuDomainBlockJobImpl
qemuDomainBlockJobImpl become an unmaintainable mess over the years of
adding new stuff to it. This patch starts splitting up individual
functions from it until it can be killed entirely.
In bulk this will add lines of code rather than delete them but it will
be traded for maintainability.
commit 7db64d6b0ae23798e9162588052818b54f62574b
Author: Peter Krempa <pkrempa>
Date: Tue Mar 31 17:13:21 2015 +0200
qemu: monitor: Extract handling of JSON block job error codes
My intention is to split qemuMonitorJSONBlockJob() into simpler separate
functions for every block job type. Since the error handling code is the
same for all block jobs, this patch extracts the code into a separate
function that will later be reused in more places.
With the new helper qemuMonitorJSONErrorIsClass we can save a few
function calls as we can extract the error object once.
commit 72613b18ac91add555688acc109f4171bdef4061
Author: Peter Krempa <pkrempa>
Date: Thu Apr 9 11:26:43 2015 +0200
qemu: monitor: json: Refactor error code class checker
Split out the function that checks the actual error class string into a
separate helper as it will be useful later and refactor
qemuMonitorJSONHasError to return bool type and remove few useless
checks.
Basically virJSONValueObjectHasKey are useless here since the next call
to virJSONValueObjectGet is checking the return value again (which can't
fail at that point). By removing the first check we save a function
call.
v1.2.14-143-g634285f
Peter,
Blockcommit with option pivot still returns error when using the libvirt built with v1.2.14-143-g634285f. It seems that the issue is NOT fixed.
libvirt Version
libvirt-1.2.15-1.el7.v1.2.14.143.g634285f.x86_64
#virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed
Fortunately, the info got from blockjob is right and pivot can be done
# virsh blockjob testvm3 vda
Active Block Commit: [100 %]
#virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/../c/c'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/b'/>
<backingStore type='file' index='2'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
<backingStore type='network' index='3'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
</backingStore>
</backingStore>
<mirror type='file' job='active-commit' ready='yes'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/b'/>
</mirror>
<target dev='vda' bus='virtio'/>
<boot order='1'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
# virsh blockjob testvm3 vda --pivot
# virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/../c/../b/b'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
<backingStore type='network' index='2'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
</backingStore>
<target dev='vda' bus='virtio'/>
<boot order='1'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
(In reply to yangyang from comment #4) > Peter, > Blockcommit with option pivot still returns error when using the libvirt > built with v1.2.14-143-g634285f. It seems that the issue is NOT fixed. This is tracked under bz#1197592 > > libvirt Version > libvirt-1.2.15-1.el7.v1.2.14.143.g634285f.x86_64 > > #virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot > Block Commit: [100 %]error: failed to pivot job for disk vda > error: internal error: unable to execute QEMU command 'block-job-complete': > The active block job for device 'drive-virtio-disk0' cannot be completed > > Fortunately, the info got from blockjob is right and pivot can be done > # virsh blockjob testvm3 vda > Active Block Commit: [100 %] This is actually correct. After the failed pivot, previous libvirt version would think that the block job has failed and you would not be able to pivot again. Verified on libvirt-1.2.15-2.el7.x86_64
1. do blockcommit by running blockcommit script for several times
# virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed
# virsh blockjob testvm3 vda
Active Block Commit: [100 %]
# virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
<backingStore type='network' index='1'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
<mirror type='network' job='active-commit' ready='yes'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
</mirror>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
test bandwidth
# virsh blockjob testvm3 vda 10
[root@rhel7_test yy]# virsh blockjob testvm3 vda
Active Block Commit: [100 %] Bandwidth limit: 10485760 bytes/s (10.000 MiB/s)
test blockjobAbort
# virsh blockjob testvm3 vda --abort --async
[root@rhel7_test yy]# virsh blockjob testvm3 vda
No current block job for vda
# virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed
# virsh dumpxml testvm3 | grep disk -a20
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/../c/../b/b'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
<backingStore type='network' index='2'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
</backingStore>
<mirror type='file' job='active-commit' ready='yes'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
</mirror>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
test blockjobPivot
# virsh blockjob testvm3 vda --pivot
[root@rhel7_test yy]# virsh blockjob testvm3 vda
No current block job for vda
# virsh dumpxml testvm3 | grep disk -a20
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
<backingStore type='network' index='1'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
2. do blockpull by running blockpull script
Formatting 'a', fmt=qcow2 size=5368709120 backing_file='gluster+rdma://10.66.4.164/gluster-vol1/rhel7.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
Formatting 'b', fmt=qcow2 size=5368709120 backing_file='../a/a' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
Formatting 'c', fmt=qcow2 size=5368709120 backing_file='../b/b' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
Formatting 'd', fmt=qcow2 size=5368709120 backing_file='../c/c' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
Domain testvm3 created from /dev/stdin
<memory unit='KiB'>1048576</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<vcpu placement='static'>1</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
<boot dev='hd'/>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/d'/>
<backingStore type='file' index='1'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/c'/>
<backingStore type='file' index='2'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/b'/>
<backingStore type='file' index='3'>
<format type='qcow2'/>
<source file='/tmp/images/d/../c/../b/../a/a'/>
<backingStore type='network' index='4'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
</backingStore>
</backingStore>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<controller type='usb' index='0'>
<alias name='usb0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
<listen type='address' address='127.0.0.1'/>
</graphics>
<video>
<model type='cirrus' vram='16384' heads='1'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
Block Pull: [100 %]
Pull complete
Block Pull: [100 %]
Pull complete
Block Pull: [100 %]
Pull complete
# virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/tmp/images/d/d'/>
<backingStore type='network' index='1'>
<format type='qcow2'/>
<source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
<host name='10.66.4.164' transport='rdma'/>
</source>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
As blockjob can be aborted and pivoted when block-job-complete fails and blockpull also works fine, mark it as verified.
Created attachment 1030390 [details]
script for testing blockcommit
Created attachment 1030391 [details]
script for testing blockpull
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |