Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1135169 - blockcopy job was cancel by "CTRL+C" while it show there still be one block job in background
blockcopy job was cancel by "CTRL+C" while it show there still be one block j...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.1
Unspecified Unspecified
high Severity medium
: rc
: ---
Assigned To: Erik Skultety
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-28 22:27 EDT by Shanzhi Yu
Modified: 2015-10-21 09:48 EDT (History)
11 users (show)

See Also:
Fixed In Version: libvirt-1.2.8-10.el7
Doc Type: Bug Fix
Doc Text:
Cause: Abort blockcopy/blockcommit job byt either CTRL+C or by abort via virsh cmd Consequence: Blockcopy/Blockcommit job indicates it was aborted successfully, however the cleanup routine is skipped not destroying the reference to the active blockjob, so any further calls to any blockjob returns error stating that the disk is still in an active blockjob Fix: Check for another flag (VIR_DOMAIN_BLOCK_JOB_CANCELED) was added, so the cleanup routine is executed in this case as well Result: All blockjobs can be aborted successfully
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-03-05 02:43:25 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0323 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2015-03-05 07:10:54 EST

  None (edit)
Description Shanzhi Yu 2014-08-28 22:27:10 EDT
Description of problem:

blockcopy job was cancel by "CTRL+C" while it show there still be one block job in background

Version-Release number of selected component (if applicable):

libvirt-1.2.7-2.el7.x86_64
qemu-kvm-rhev-2.1.0-2.el7.x86_64


How reproducible:

100%

Steps to Reproduce:

1.Prepare transient guest
# virsh list --transient
 Id    Name                           State
----------------------------------------------------
 4     rhel6                          running


2.Do blockcopy job with --wait and --verbose and cancel the job with "CTRL+C"  or with "timeout" options or use "blockjob --abort" before copy job is finished

# virsh blockcopy rhel6 vda /var/lib/libvirt/images/copy.img --verbose  --wait
Block Copy: [ 32 %]^C
Copy aborted

3.Check block job info

# virsh blockjob rhel6 vda


4.Do block copy again

# virsh blockcopy rhel6 vda /var/lib/libvirt/images/copy.img --verbose  --wait
error: block copy still active: disk 'vda' already in active block job

# virsh dumpxml rhel6 |grep mirror -A 3
      <mirror type='file' file='/var/lib/libvirt/images/copy.img' format='qcow2' job='copy' ready='abort'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/copy.img'/>
      </mirror>


Actual results:


Expected results:

blockcopy job should can be cancelled in first phase(copy data from source)

Additional info:
Comment 1 yangyang 2014-08-29 04:49:43 EDT
The issue is reproduced on virsh command blockcommit.
steps:
1.Do blockcommit and then cancel it by ctrl+c, timeout or abort before the job is completed

# virsh blockcommit test1 hda --top /var/lib/libvirt/images/test1.s4 --shallow --active --wait --verbose --async
Block Commit: [ 63 %]^C      ---- press ctrl+c  
Commit aborted

OR abort the job by using virsh comm blockjob
 # virsh blockjob test1 hda --abort
 At the same time the commit job will return with the following messages
 Block Commit: [100 %]
 Now in synchronized phase
 
OR do blockcommit with timeout

2. Check block job info

# virsh blockjob test1 hda

3. Abort the job again
# virsh blockjob test1 hda --abort
error: Requested operation is not valid: another job on disk 'hda' is still being ended

4. check the xml
# virsh dumpxml test1 | grep disk -a6
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/test1.s3'/>
      <mirror type='file' job='active-commit' ready='abort'>
        <format type='qcow2'/>
        <source file='/var/lib/libvirt/images/test1.s2'/>
      </mirror>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
Comment 2 Jan Schumacher 2014-10-30 13:25:13 EDT
This issue is reproduced even if the blockcopy job was not cancelled, but finished successfully. This is a big problem, because in order to get rid of the phantom block job, and to make the domain persistent again, it has to be restarted entirely.


Linux host 3.16-3-amd64 #1 SMP Debian 3.16.5-1 (2014-10-10) x86_64 GNU/Linux

libvirt-bin:
  Installed: 1.2.9-3

qemu-kvm:
  Installed: 2.1+dfsg-5+b1


Steps to reproduce:

1. Making guest transient
# virsh undefine guest


2. Start blockcopy
# virsh blockcopy guest hda /vm/guest-copy.qcow2 --wait --verbose --finish

Output:

Block Copy: [100 %]
Successfully copied


3. Making domain persistent again
# virsh define guest.xml

Output:

error: Failed to define domain from guest.xml
error: block copy still active: domain has active block job


4. Checking active jobs
# virsh blockjob guest hda --info

Output:

No current block job for hda


5. Trying to abort job for guest
# virsh blockjob guest hda --abort

Output:

error: Requested operation is not valid: another job on disk 'hda' is still being ended


6. Check guest XML
virsh dumpxml guest > guest.xml

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/vm/guest.qcow2'/>
      <backingStore/>
      <mirror type='file' file='/vm/guest-copy.qcow2' format='qcow2' job='copy' ready='abort'>
        <format type='qcow2'/>
        <source file='/vm/guest-copy.qcow2'/>
      </mirror>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>




Additional info: I have six guests running, pulling blockcopy backups every night. The behaviour described above only affects the same two or three guests, sometimes not even immediately (meaning the 1st or 2nd nightly blockcopy might leave the guest without an active block job, but not the one after). This seems rather erratic.

More info: This behaviour did not occur with libvirt-bin 1.2.4-3 / qemu-kvm 2.0.0+dfsg-6 (before dist-upgrade ..)
Comment 3 Erik Skultety 2014-11-27 07:50:27 EST
Fixed upstream:

commit 35ce5abcdeef51fdde89983a3f1650ba6904ff34
Author: Erik Skultety <eskultet@redhat.com>
Date:   Thu Nov 27 10:17:44 2014 +0100

    qemu: fix block{commit,copy} abort handling
    
    When a block{commit,copy} job was aborted on a domain, block job handler
    did not process it correctly, leaving a phantom job in the background.
    Any further calls to any blockjob causes "block <jobtype> still active"
    error. This patch fixes the blockjob handler so that it checks not only
    for VIR_DOMAIN_BLOCK_JOB_FAILED status, but VIR_DOMAIN_BLOCK_JOB_CANCELED
    status as well, followed by our existing cleanup routine.

v1.2.10-209-g35ce5ab
Comment 4 Jiri Denemark 2014-12-01 04:13:37 EST
Comment 3 is incorrect, the patch was not upstream yet... But it is now upstream as v1.2.10-218-g8e23e0e:

commit 8e23e0e977fbcc4a7880e187a63c509d6e6879c6
Author: Erik Skultety <eskultet@redhat.com>
Date:   Thu Nov 27 13:29:42 2014 +0100

    qemu: fix block{commit,copy} abort handling
    
    When a block{commit,copy} job was aborted on a domain, block job handler
    did not process it correctly, leaving a phantom job in the background.
    Any further calls to any blockjob causes "block <jobtype> still active"
    error. This patch fixes the blockjob handler so that it checks not only
    for VIR_DOMAIN_BLOCK_JOB_FAILED status, but VIR_DOMAIN_BLOCK_JOB_CANCELED
    status as well, followed by our existing cleanup routine.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1135169
    
    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
Comment 7 Shanzhi Yu 2014-12-04 04:08:11 EST
I will verify this bug after test with blockcopy,blockcommit,blockpull cmd.
All there cmds can be cancel correctly.

with libvirt-1.2.8-10.el7.x86_64
Comment 9 errata-xmlrpc 2015-03-05 02:43:25 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html
Comment 10 Nerijus Baliūnas 2015-10-21 09:48:29 EDT
I have similar problem with RH 7.2 beta:

# virsh snapshot-create-as --domain rasa sn1 --diskspec
 vda,file=/var/lib/libvirt/images/rasa-sn1.qcow2 --disk-only
 --atomic --no-metadata

Now copy original no longer updated /var/lib/libvirt/images/rasa.qcow2 to another place.

# virsh blockcommit rasa vda --active --verbose --pivot
Block commit: [100 %]error: failed to pivot job for disk vda
error: block copy still active: disk 'vda' not ready for pivot yet

# virsh domblklist rasa
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/rasa-sn1.qcow2

If blockcommit had succeeded, it would be now:
vda        /var/lib/libvirt/images/rasa.qcow2

Now both files rasa.qcow2 and rasa-sn1.qcow2 are written to, and
# virsh blockjob rasa vda
Active Block Commit: [100 %]

But trying to virsh blockcommit rasa vda --active --verbose --pivot once more:
error: block copy still active: disk 'vda' already in active block job

How do I make rasa.qcow2 the only active vda?
Now both the original rasa.qcow2 and snapshot rasa-sn1.qcow2 are updated.

Note You need to log in before you can comment on or make changes to this bug.