Bug 1135169
Summary: | blockcopy job was cancel by "CTRL+C" while it show there still be one block job in background | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Shanzhi Yu <shyu> |
Component: | libvirt | Assignee: | Erik Skultety <eskultet> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.1 | CC: | dyuan, eblake, eskultet, jdenemar, jsc, mzhan, nerijus, rbalakri, xuzhang, yanyang, zpeng |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.2.8-10.el7 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Abort blockcopy/blockcommit job byt either CTRL+C or by abort via virsh cmd
Consequence: Blockcopy/Blockcommit job indicates it was aborted successfully, however the cleanup routine is skipped not destroying the reference to the active blockjob, so any further calls to any blockjob returns error stating that the disk is still in an active blockjob
Fix: Check for another flag (VIR_DOMAIN_BLOCK_JOB_CANCELED) was added, so the cleanup routine is executed in this case as well
Result: All blockjobs can be aborted successfully
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-03-05 07:43:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Shanzhi Yu
2014-08-29 02:27:10 UTC
The issue is reproduced on virsh command blockcommit. steps: 1.Do blockcommit and then cancel it by ctrl+c, timeout or abort before the job is completed # virsh blockcommit test1 hda --top /var/lib/libvirt/images/test1.s4 --shallow --active --wait --verbose --async Block Commit: [ 63 %]^C ---- press ctrl+c Commit aborted OR abort the job by using virsh comm blockjob # virsh blockjob test1 hda --abort At the same time the commit job will return with the following messages Block Commit: [100 %] Now in synchronized phase OR do blockcommit with timeout 2. Check block job info # virsh blockjob test1 hda 3. Abort the job again # virsh blockjob test1 hda --abort error: Requested operation is not valid: another job on disk 'hda' is still being ended 4. check the xml # virsh dumpxml test1 | grep disk -a6 <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/libvirt/images/test1.s3'/> <mirror type='file' job='active-commit' ready='abort'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/test1.s2'/> </mirror> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> This issue is reproduced even if the blockcopy job was not cancelled, but finished successfully. This is a big problem, because in order to get rid of the phantom block job, and to make the domain persistent again, it has to be restarted entirely. Linux host 3.16-3-amd64 #1 SMP Debian 3.16.5-1 (2014-10-10) x86_64 GNU/Linux libvirt-bin: Installed: 1.2.9-3 qemu-kvm: Installed: 2.1+dfsg-5+b1 Steps to reproduce: 1. Making guest transient # virsh undefine guest 2. Start blockcopy # virsh blockcopy guest hda /vm/guest-copy.qcow2 --wait --verbose --finish Output: Block Copy: [100 %] Successfully copied 3. Making domain persistent again # virsh define guest.xml Output: error: Failed to define domain from guest.xml error: block copy still active: domain has active block job 4. Checking active jobs # virsh blockjob guest hda --info Output: No current block job for hda 5. Trying to abort job for guest # virsh blockjob guest hda --abort Output: error: Requested operation is not valid: another job on disk 'hda' is still being ended 6. Check guest XML virsh dumpxml guest > guest.xml <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/vm/guest.qcow2'/> <backingStore/> <mirror type='file' file='/vm/guest-copy.qcow2' format='qcow2' job='copy' ready='abort'> <format type='qcow2'/> <source file='/vm/guest-copy.qcow2'/> </mirror> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> Additional info: I have six guests running, pulling blockcopy backups every night. The behaviour described above only affects the same two or three guests, sometimes not even immediately (meaning the 1st or 2nd nightly blockcopy might leave the guest without an active block job, but not the one after). This seems rather erratic. More info: This behaviour did not occur with libvirt-bin 1.2.4-3 / qemu-kvm 2.0.0+dfsg-6 (before dist-upgrade ..) Fixed upstream: commit 35ce5abcdeef51fdde89983a3f1650ba6904ff34 Author: Erik Skultety <eskultet> Date: Thu Nov 27 10:17:44 2014 +0100 qemu: fix block{commit,copy} abort handling When a block{commit,copy} job was aborted on a domain, block job handler did not process it correctly, leaving a phantom job in the background. Any further calls to any blockjob causes "block <jobtype> still active" error. This patch fixes the blockjob handler so that it checks not only for VIR_DOMAIN_BLOCK_JOB_FAILED status, but VIR_DOMAIN_BLOCK_JOB_CANCELED status as well, followed by our existing cleanup routine. v1.2.10-209-g35ce5ab Comment 3 is incorrect, the patch was not upstream yet... But it is now upstream as v1.2.10-218-g8e23e0e: commit 8e23e0e977fbcc4a7880e187a63c509d6e6879c6 Author: Erik Skultety <eskultet> Date: Thu Nov 27 13:29:42 2014 +0100 qemu: fix block{commit,copy} abort handling When a block{commit,copy} job was aborted on a domain, block job handler did not process it correctly, leaving a phantom job in the background. Any further calls to any blockjob causes "block <jobtype> still active" error. This patch fixes the blockjob handler so that it checks not only for VIR_DOMAIN_BLOCK_JOB_FAILED status, but VIR_DOMAIN_BLOCK_JOB_CANCELED status as well, followed by our existing cleanup routine. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1135169 Signed-off-by: Jiri Denemark <jdenemar> I will verify this bug after test with blockcopy,blockcommit,blockpull cmd. All there cmds can be cancel correctly. with libvirt-1.2.8-10.el7.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html I have similar problem with RH 7.2 beta: # virsh snapshot-create-as --domain rasa sn1 --diskspec vda,file=/var/lib/libvirt/images/rasa-sn1.qcow2 --disk-only --atomic --no-metadata Now copy original no longer updated /var/lib/libvirt/images/rasa.qcow2 to another place. # virsh blockcommit rasa vda --active --verbose --pivot Block commit: [100 %]error: failed to pivot job for disk vda error: block copy still active: disk 'vda' not ready for pivot yet # virsh domblklist rasa Target Source ------------------------------------------------ vda /var/lib/libvirt/images/rasa-sn1.qcow2 If blockcommit had succeeded, it would be now: vda /var/lib/libvirt/images/rasa.qcow2 Now both files rasa.qcow2 and rasa-sn1.qcow2 are written to, and # virsh blockjob rasa vda Active Block Commit: [100 %] But trying to virsh blockcommit rasa vda --active --verbose --pivot once more: error: block copy still active: disk 'vda' already in active block job How do I make rasa.qcow2 the only active vda? Now both the original rasa.qcow2 and snapshot rasa-sn1.qcow2 are updated. |