Bug 1197592
Summary: | blockcopy always failed when with option "--pivot" | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Shanzhi Yu <shyu> |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | Yehuda Zimmerman <yzimmerm> |
Priority: | high | ||
Version: | 7.1 | CC: | dyuan, eblake, gsun, jherrman, mzhan, nerijus, pkrempa, rbalakri, t.rohde, xuzhang, yanyang, yzimmerm |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.3.2-1.el7 | Doc Type: | Release Note |
Doc Text: |
"blockcopy" with "--pivot" option no longer fails
Previously, "blockcopy" always failed when the "--pivot" option was specified. With this release, the _libvirt_ package was updated to prevent this issue. "blockcopy" can now be used with the "--pivot" option.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-03 18:13:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1288337, 1305606, 1313485 |
Description
Shanzhi Yu
2015-03-02 07:37:29 UTC
I'm trying to figure out what is going wrong here. In my local reproduction test case, at one point I ran out of disk space, which results in qemu aborting the job with BLOCK_JOB_ERROR, but libvirt is not looking to receive that error. Is your test setup something that could be failing due to low disk space, as that would give the same symptoms I saw, or is it something else? Your libvirt is new enough that you could use 'virsh qemu-monitor-event $dom --pretty --loop' in a second window to see if any unexpected qemu events are being ignored by libvirt when they shouldn't be (I'm also trying that locally). (In reply to Eric Blake from comment #2) > I'm trying to figure out what is going wrong here. In my local reproduction > test case, at one point I ran out of disk space, which results in qemu > aborting the job with BLOCK_JOB_ERROR, but libvirt is not looking to receive > that error. Is your test setup something that could be failing due to low > disk space, as that would give the same symptoms I saw, or is it something > else? No, I am sure there is enough disk space, and, nothing particular. (In reply to Eric Blake from comment #3) > Your libvirt is new enough that you could use 'virsh qemu-monitor-event $dom > --pretty --loop' in a second window to see if any unexpected qemu events are > being ignored by libvirt when they shouldn't be (I'm also trying that > locally). Ok then, I prepare xml file r7.xml and a qcow2 image with OS installed. then I do below test: terminal I: # for i in {1..10};do virsh create r7.xml; sleep 1;virsh blockcopy r7 vda /var/lib/libvirt/images/r7.clone --pivot --verbose --wait;virsh destroy r7;rm -fr /var/lib/libvirt/images/r7.clone;sleep 1; done Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Domain r7 created from r7.xml error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed Domain r7 destroyed Domain r7 created from r7.xml error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed Domain r7 destroyed Domain r7 created from r7.xml error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed Domain r7 destroyed Domain r7 created from r7.xml Block Copy: [100 %] Successfully pivoted Domain r7 destroyed Terminal II: # virsh qemu-monitor-event r7 --pretty --loop event BLOCK_JOB_READY at 1429169902.165004 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169902.405104 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169902.412935 for domain r7: <null> event RESUME at 1429169919.016565 for domain r7: <null> event BLOCK_JOB_READY at 1429169923.843333 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169924.069432 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169924.077232 for domain r7: <null> event RESUME at 1429169925.666298 for domain r7: <null> event BLOCK_JOB_READY at 1429169930.603544 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169930.725262 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169930.734993 for domain r7: <null> event RESUME at 1429169932.309771 for domain r7: <null> event BLOCK_JOB_READY at 1429169937.299802 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169937.377559 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169937.390576 for domain r7: <null> event RESUME at 1429169938.933651 for domain r7: <null> event BLOCK_JOB_READY at 1429169943.904885 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169943.991670 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169943.999019 for domain r7: <null> event RESUME at 1429169945.563332 for domain r7: <null> event BLOCK_JOB_READY at 1429169950.353163 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169950.635630 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169950.644113 for domain r7: <null> event RESUME at 1429169952.194842 for domain r7: <null> event BLOCK_JOB_READY at 1429169957.123371 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169957.251146 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169957.258473 for domain r7: <null> event RESUME at 1429169958.809844 for domain r7: <null> event SHUTDOWN at 1429169959.860556 for domain r7: <null> event RESUME at 1429169961.723367 for domain r7: <null> event SHUTDOWN at 1429169962.775693 for domain r7: <null> event RESUME at 1429169964.432833 for domain r7: <null> event SHUTDOWN at 1429169965.502135 for domain r7: <null> event RESUME at 1429169967.374067 for domain r7: <null> event BLOCK_JOB_READY at 1429169972.260246 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event BLOCK_JOB_COMPLETED at 1429169972.433841 for domain r7: { "device": "drive-virtio-disk0", "len": 1223557120, "offset": 1223557120, "speed": 0, "type": "mirror" } event SHUTDOWN at 1429169972.441158 for domain r7: <null> ^Cevent loop interrupted events received: 37 Hi Eric, I always met below error even through I separate the blockcopy to two phase. "error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed" First I do blockcopy, then finish the blockjob with "blockjob --pivot", but still met the error. Steps: 1. Prepare a xml file and storage file installed with rhel7 OS. 2. Do cycle blockcopy. #j=0;for i in {1..100};do virsh create r7.xml;sleep 20;virsh blockcopy r7 vda /var/lib/libvirt/images/r7-$i.cp --verbose --wait;virsh blockjob r7 vda --pivot; a=$?; if [[ $a -eq 0 ]]; then let j=j+1 ;fi; virsh destroy r7 ;rm -fr /var/lib/libvirt/images/r7-$i.cp;done ; echo $j; Domain r7 created from r7.xml Block Copy: [100 %] Now in mirroring phase error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed Domain r7 destroyed Domain r7 created from r7.xml ... ... Block Copy: [100 %] Now in mirroring phase Domain r7 destroyed 83 So there're 17 failed cases in this testing. But If I wait seconds after mirror phase, then finish the blockcopy job with --pivot, I could not meet the error again. same script as in step 2. Just add "sleep 5" before "virsh blockjob r7 vda --pivot". j=0;for i in {1..100};do virsh create r7.xml;sleep 20;virsh blockcopy r7 vda /var/lib/libvirt/images/r7-$i.cp --verbose --wait;sleep 5;virsh blockjob r7 vda --pivot; a=$?; if [[ $a -eq 0 ]]; then let j=j+1 ;fi; virsh destroy r7 ;rm -fr /var/lib/libvirt/images/r7-$i.cp;done ; echo $j; Domain r7 created from r7.xml Block Copy: [100 %] Now in mirroring phase ... ... Block Copy: [100 %] Now in mirroring phase Domain r7 destroyed 100 The cases 100 percent pass in this testing. Does this bug related with qemu bug 1130489 - qemu do not close src image immediately after mirroring done ? Should this bug got a fix ASAP? BTW, I do above testing with upstream qemu and libvirt # virsh --version 1.3.0 # qemu-kvm --version QEMU emulator version 2.3.0 (qemu-2.3.0-10.fc22), Copyright (c) 2003-2008 Fabrice Bellard The problem lies in the fact that qemu does not switch internally to the synchronised phase as soon as the cursors hit 100% (info.cur == info.end), but it might take a while. In qemu 2.2 a new field was introduced in the query-block-jobs output that notes when the block job is ready. Since libvirt expected that once the job hit 100% it was switched to synchronised even before we received the event we allowed to send the block-job-complete command to qemu without reporting the proper error. Additionally virsh needs a fix where we need to wait for the event before attempting to pivot. I've posted a series that should fix this problem along with others so I'm reassigning the bug to myself. I give a simple test with patch series "[libvirt] [PATCH 00/13] Improve virsh block job handling" # git describe v1.2.17-130-ga864149 As steps in comment 5, it works fine. Thanks Fixed upstream: commit faa143912381aa48e33839b194b32cc14d574589 Author: Peter Krempa <pkrempa> Date: Mon Jul 13 17:04:49 2015 +0200 virsh: Refactor block job waiting in cmdBlockCopy Similarly to the refactor of cmdBlockCommit in a previous commit this does the same change for cmdBlockCopy. commit 7408403560f7d054da75acaab855a95c51a92e2b Author: Peter Krempa <pkrempa> Date: Mon Jul 13 17:04:49 2015 +0200 virsh: Refactor block job waiting in cmdBlockCommit Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to use the common code. This additionally fixes a bug when working with new qemus, where when doing an active commit with --pivot the pivoting would fail, since qemu reaches 100% completion but the job doesn't switch to synchronized phase right away. commit 2e7827636476fdf976f17cd234b636973dedffc0 Author: Peter Krempa <pkrempa> Date: Mon Jul 13 17:04:49 2015 +0200 virsh: Refactor block job waiting in cmdBlockPull Introduce helper function that will provide logic for waiting for block job completion so the 3 open coded places can be unified and improved. This patch introduces the whole logic and uses it to fix cmdBlockJobPull. The vshBlockJobWait function provides common logic for block job waiting that should be robust enough to work across all previous versions of libvirt. Since virsh allows passing user-provided strings as paths of block devices we can't reliably use block job events for detection of block job states so the function contains a great deal of fallback logic. commit eae59247c59aa02147b2b4a50177e8e877fdb218 Author: Peter Krempa <pkrempa> Date: Wed Jul 15 15:11:02 2015 +0200 qemu: Update state of block job to READY only if it actually is ready Few parts of the code looked at the current progress of and assumed that a two phase blockjob is in the _READY state as soon as the progress reached 100% (info.cur == info.end). In current versions of qemu this assumption is invalid and qemu exposes a new flag 'ready' in the query-block-jobs output that is set to true if the job is actually finished. This patch adds internal data handling for reading the 'ready' flag and acting appropriately as long as the flag is present. While this still doesn't fix the virsh client problem with two phase block jobs and the --pivot option, it at least improves the error message: $ virsh blockcommit --wait --verbose vm vda --base vda[1] --active --pivot Block commit: [100 %]error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed to $ virsh blockcommit --wait --verbose VM vda --base vda[1] --active --pivot Block commit: [100 %]error: failed to pivot job for disk vda error: block copy still active: disk 'vda' not ready for pivot yet v1.2.17-142-gfaa1439 Peter, Regarding the problem that virsh still reports error when doing blockcopy (or blockcommit) with option pivot, is it able to be fixed in libvirt side? If YES, do you need a new bug to track it? # virsh blockcopy rhel7.0 vda /tmp/f.img --wait --verbose --pivot Block Copy: [100 %]error: failed to pivot job for disk vda error: block copy still active: disk 'vda' not ready for pivot yet # virsh blockcommit rhel7.0 vda --active --wait --verbose --pivot Block Copy: [100 %]error: failed to pivot job for disk vda error: block copy still active: disk 'vda' not ready for pivot yet The output above looks like patch "qemu: Update state of block job to READY only if it actually is ready" was applied, but none of the others were. Since this requires a client-side fix, did you update the client machine too? libvirt and libvirt-client I used are latest versions. I still have the problem. # rpm -qa | grep libvirt libvirt-1.2.17-8.el7.x86_64 libvirt-client-1.2.17-8.el7.x86_64 Peter, According to comment #11,#12,#13, I re-assign it. Please feel free to correct me if I'm wrong. blockcommit completes successfully if I change virtual disk Cache mode from Hypervisor default to none. If not, almost every time for a bigger disk I get: Block commit: [100 %]error: failed to pivot job for disk vda error: block copy still active: disk 'vda' not ready for pivot yet Even with "Cache mode" none blockcommit sometimes fails when VM has postgresql db running. Adding --quiesce to virsh snapshot-create-as and installing qemu-guest-agent helped. Another few upstream commits finally fix the issue: commit 86c4df83b913dd73b79caeed2038291374384dc5 Author: Michael Chapman <mike.org> Date: Wed Jan 27 13:24:54 2016 +1100 virsh: improve waiting for block job readiness After a block job hits 100%, we only need to apply a timeout waiting for a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2 callbacks were able to be registered. If neither callback could be registered, there's clearly no need for a timeout. If both callbacks were registered, then we're guaranteed to eventually get one of the events. The path being used by virsh must be exactly the source path or target device in the domain's disk definition, and these are the respective strings sent back in these two events. Signed-off-by: Michael Chapman <mike.org> commit 8fa216bbb40df33e7fce5d727aa3dc334480878a Author: Michael Chapman <mike.org> Date: Wed Jan 27 13:24:53 2016 +1100 virsh: ensure SIGINT action is reset on all errors If virTimeMillisNow() fails, the SIGINT action must be reset back to its previous state. Signed-off-by: Michael Chapman <mike.org> commit 15dee2ef24f2f19f6dcd30d997b81c8a14582361 Author: Michael Chapman <mike.org> Date: Wed Jan 27 13:24:52 2016 +1100 virsh: be consistent with style of loop exit When waiting for a block job, the various statuses (COMPLETED, READY, CANCELED, etc.) should all be treated consistently by having the loop be exited with "break". Use "goto cleanup" for the error cases only, when no block job status is available. Signed-off-by: Michael Chapman <mike.org> commit 704dfd6b0fafe7eafca93a03793389239f8ab869 Author: Michael Chapman <mike.org> Date: Wed Jan 27 13:24:51 2016 +1100 virsh: avoid unnecessary progress updates There is no need to call virshPrintJobProgress() unless the block job's cur or end cursors have changed since the last iteration. Signed-off-by: Michael Chapman <mike.org> v1.3.1-87-g86c4df8 Verified with libvirt-1.3.3-2.el7.x86_64 1. test blockcopy Prepare a xml, use gluster backend disk as source <disk type='network' device='disk'> <driver name='qemu' type='qcow2' io='threads' ioeventfd='on' event_idx='off'/> <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'> <host name='10.66.4.164'/> </source> <backingStore/> <target dev='vda' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> # virsh create vm5.xml Domain vm5 created from vm5.xml 1.1 test blockcopy & pivot job # virsh blockcopy vm5 vda /tmp/vm5.copy --wait --verbose --pivot Block Copy: [100 %] Successfully pivoted # virsh dumpxml vm5 | grep disk -a6 <disk type='file' device='disk'> <driver name='qemu' type='qcow2' io='threads' ioeventfd='on' event_idx='off'/> <source file='/tmp/vm5.copy'/> <backingStore/> <target dev='vda' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> 1.2 test blockcopy & blockdev & pivot job # virsh destroy vm5 Domain vm5 destroyed # virsh create vm5.xml Domain vm5 created from vm5.xml # virsh blockcopy vm5 vda /dev/sdl --blockdev --wait --verbose --pivot Block Copy: [100 %] Successfully pivoted <disk type='block' device='disk'> <driver name='qemu' type='qcow2' io='threads' ioeventfd='on' event_idx='off'/> <source dev='/dev/sdl'/> <backingStore/> <target dev='vda' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> 1.3 test blockcopy & cancel job # virsh blockcopy vm5 vda /tmp/vm5.copy --wait --verbose --pivot Block Copy: [ 16 %]^C Copy aborted 2. test blockcommit Prepare a running vm with following xml <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.2-20151008.0.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> # virsh snapshot-create-as vm1 s1 --disk-only --quiesce Domain snapshot s1 created # virsh snapshot-create-as vm1 s2 --disk-only --quiesce Domain snapshot s2 created # virsh snapshot-create-as vm1 s3 --disk-only --quiesce Domain snapshot s3 created 2.1 test blockcommit & pivot job # virsh blockcommit vm1 vda --active --wait --verbose --pivot --shallow Block commit: [100 %] Successfully pivoted <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.2-20151008.0.s2'/> <backingStore type='file' index='1'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.2-20151008.0.s1'/> <backingStore type='file' index='2'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/RHEL-7.2-20151008.0.qcow2'/> <backingStore/> </backingStore> </backingStore> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> 2.2. test blockcommit & abort job # virsh blockcommit vm1 vda --active --wait --verbose --pivot --timeout 1 Block commit: [ 88 %] Commit aborted blockcopy and blockcommit work well, move it to verified status Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |