Bug 1202704 - libvirt doesn't deal with the block job info correctly when fail to do active blockcommit
Summary: libvirt doesn't deal with the block job info correctly when fail to do active...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-17 09:24 UTC by Shanzhi Yu
Modified: 2015-11-19 06:20 UTC (History)
8 users (show)

Fixed In Version: libvirt-1.2.15-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 06:20:36 UTC
Target Upstream Version:


Attachments (Terms of Use)
script to reproduce this bug (1.91 KB, application/x-shellscript)
2015-03-17 09:28 UTC, Shanzhi Yu
no flags Details
script for testing blockcommit (2.84 KB, application/x-shellscript)
2015-05-27 08:01 UTC, Yang Yang
no flags Details
script for testing blockpull (1.40 KB, application/x-shellscript)
2015-05-27 08:02 UTC, Yang Yang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description Shanzhi Yu 2015-03-17 09:24:26 UTC
Description of problem:

libvirt doesn't deal with the block job info correctly when fail to do active blockcommit 

Version-Release number of selected component (if applicable):


How reproducible:

always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:


This bug is used track the issues in https://bugzilla.redhat.com/show_bug.cgi?id=1199036#c11

Comment 1 Shanzhi Yu 2015-03-17 09:28:45 UTC
Created attachment 1002725 [details]
script to reproduce this bug

Comment 2 Shanzhi Yu 2015-03-17 09:32:41 UTC
run the script multiple times, will always met such error 

+ virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed

Then check the block job info.

# virsh blockjob testvm3 vda 
Block Commit: [100 %]

# virsh blockjob testvm3 vda --pivot 
error: Requested operation is not valid: pivot of disk 'vda' requires an active copy job

# virsh dumpxml testvm3 |grep disk -A 14
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/c/../b/../a/a'/>
      <backingStore type='network' index='1'>
        <format type='qcow2'/>
        <source protocol='gluster' name='gluster-vol1/r7-qcow2.img'>
          <host name='10.66.5.38'/>
        </source>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

# virsh snapshot-create-as testvm3 --no-metadata  --disk-only 
error: internal error: unable to execute QEMU command 'transaction': Device 'drive-virtio-disk0' is busy: block device is in use by block job: commit

# virsh blockjob testvm3 vda --abort 

# virsh snapshot-create-as testvm3 --no-metadata  --disk-only 
Domain snapshot 1426566021 created

Comment 3 Peter Krempa 2015-04-14 08:31:07 UTC
The following upstream series should fix the problem:

commit 634285f9c1c29ebad271d34463be5ca24cbf7159
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Apr 1 19:17:11 2015 +0200

    qemu: Refactor qemuDomainBlockJobAbort()
    
    Change few variable names and refactor the code flow. As an additional
    bonus the function now fails if the event state is not as expected.

commit 8a609afb6f2a0c28ee0985a48f04bd018f46bcc1
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Apr 1 19:00:20 2015 +0200

    qemu: drivePivot: Fix assumption when 'block-job-complete' fails
    
    QEMU does not abandon the mirror. The job carries on in the synchronised
    phase and it might be either pivoted again or cancelled. The commit
    hints that the described behavior was happening in a downstream version.
    
    If the command returns false there are two possible options:
    1) qemu did not reach the point where it would ask the block job to
    pivot
    2) pivotting failed in the actual qemu coroutine
    
    If either of those would happen we return failure and reset the
    condition that waits for the block job to complete. This makes the API
    fail but in case where qemu would actually abandon the mirror the fact
    is notified via the event and handled asynchronously.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202704

commit 065a81082d55f9c521b8be954d35229a633b7f92
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Apr 1 11:45:35 2015 +0200

    qemu: blockPull: Refactor the rest of qemuDomainBlockJobImpl
    
    Since it now handles only block pull code paths we can refactor it and
    remove tons of cruft.

commit cfc0a3d4ce6b1075e50c3fa3ef4f1e29445b72f3
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Apr 1 10:40:06 2015 +0200

    qemu: blockjob: Separate qemuDomainBlockJobAbort from qemuDomainBlockJobImpl
    
    Sacrifice a few lines of code in favor of the code being more readable.

commit 1344a74ef2ad683f0bac06131c8417a828e55511
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Wed Apr 1 09:47:04 2015 +0200

    qemu: blockjob: Split qemuDomainBlockJobSetSpeed from qemuDomainBlockJobImpl
    
    qemuDomainBlockJobImpl become an unmaintainable mess over the years of
    adding new stuff to it. This patch starts splitting up individual
    functions from it until it can be killed entirely.
    
    In bulk this will add lines of code rather than delete them but it will
    be traded for maintainability.

commit 7db64d6b0ae23798e9162588052818b54f62574b
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Tue Mar 31 17:13:21 2015 +0200

    qemu: monitor: Extract handling of JSON block job error codes
    
    My intention is to split qemuMonitorJSONBlockJob() into simpler separate
    functions for every block job type. Since the error handling code is the
    same for all block jobs, this patch extracts the code into a separate
    function that will later be reused in more places.
    
    With the new helper qemuMonitorJSONErrorIsClass we can save a few
    function calls as we can extract the error object once.

commit 72613b18ac91add555688acc109f4171bdef4061
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Thu Apr 9 11:26:43 2015 +0200

    qemu: monitor: json: Refactor error code class checker
    
    Split out the function that checks the actual error class string into a
    separate helper as it will be useful later and refactor
    qemuMonitorJSONHasError to return bool type and remove few useless
    checks.
    
    Basically virJSONValueObjectHasKey are useless here since the next call
    to virJSONValueObjectGet is checking the return value again (which can't
    fail at that point). By removing the first check we save a function
    call.

v1.2.14-143-g634285f

Comment 4 Yang Yang 2015-04-15 10:00:30 UTC
Peter,
Blockcommit with option pivot still returns error when using the libvirt built with v1.2.14-143-g634285f. It seems that the issue is NOT fixed.

libvirt Version
libvirt-1.2.15-1.el7.v1.2.14.143.g634285f.x86_64

#virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed

Fortunately, the info got from blockjob is right and pivot can be done
# virsh blockjob testvm3 vda 
Active Block Commit: [100 %]

#virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/../c/c'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/tmp/images/d/../c/../b/b'/>
        <backingStore type='file' index='2'>
          <format type='qcow2'/>
          <source file='/tmp/images/d/../c/../b/../a/a'/>
          <backingStore type='network' index='3'>
            <format type='qcow2'/>
            <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
              <host name='10.66.4.164' transport='rdma'/>
            </source>
            <backingStore/>
          </backingStore>
        </backingStore>
      </backingStore>
      <mirror type='file' job='active-commit' ready='yes'>
        <format type='qcow2'/>
        <source file='/tmp/images/d/../c/../b/b'/>
      </mirror>
      <target dev='vda' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

# virsh blockjob testvm3 vda --pivot

# virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/../c/../b/b'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/tmp/images/d/../c/../b/../a/a'/>
        <backingStore type='network' index='2'>
          <format type='qcow2'/>
          <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
            <host name='10.66.4.164' transport='rdma'/>
          </source>
          <backingStore/>
        </backingStore>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

Comment 5 Peter Krempa 2015-04-22 13:07:25 UTC
(In reply to yangyang from comment #4)
> Peter,
> Blockcommit with option pivot still returns error when using the libvirt
> built with v1.2.14-143-g634285f. It seems that the issue is NOT fixed.

This is tracked under bz#1197592
> 
> libvirt Version
> libvirt-1.2.15-1.el7.v1.2.14.143.g634285f.x86_64
> 
> #virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
> Block Commit: [100 %]error: failed to pivot job for disk vda
> error: internal error: unable to execute QEMU command 'block-job-complete':
> The active block job for device 'drive-virtio-disk0' cannot be completed
> 
> Fortunately, the info got from blockjob is right and pivot can be done
> # virsh blockjob testvm3 vda 
> Active Block Commit: [100 %]

This is actually correct. After the failed pivot, previous libvirt version would think that the block job has failed and you would not be able to pivot again.

Comment 7 Yang Yang 2015-05-27 07:59:05 UTC
Verified on libvirt-1.2.15-2.el7.x86_64

1. do blockcommit by running blockcommit script for several times

# virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed

# virsh blockjob testvm3 vda
Active Block Commit: [100 %]

# virsh dumpxml testvm3 | grep disk -a10
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/../c/../b/../a/a'/>
      <backingStore type='network' index='1'>
        <format type='qcow2'/>
        <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
          <host name='10.66.4.164' transport='rdma'/>
        </source>
        <backingStore/>
      </backingStore>
      <mirror type='network' job='active-commit' ready='yes'>
        <format type='qcow2'/>
        <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
          <host name='10.66.4.164' transport='rdma'/>
        </source>
      </mirror>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

test bandwidth
# virsh blockjob testvm3 vda 10

[root@rhel7_test yy]# virsh blockjob testvm3 vda 
Active Block Commit: [100 %]    Bandwidth limit: 10485760 bytes/s (10.000 MiB/s)

test blockjobAbort
# virsh blockjob testvm3 vda --abort --async

[root@rhel7_test yy]# virsh blockjob testvm3 vda 
No current block job for vda

# virsh -k0 blockcommit --active --verbose testvm3 vda --shallow --pivot
Block Commit: [100 %]error: failed to pivot job for disk vda
error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed

# virsh dumpxml testvm3 | grep disk -a20
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/../c/../b/b'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/tmp/images/d/../c/../b/../a/a'/>
        <backingStore type='network' index='2'>
          <format type='qcow2'/>
          <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
            <host name='10.66.4.164' transport='rdma'/>
          </source>
          <backingStore/>
        </backingStore>
      </backingStore>
      <mirror type='file' job='active-commit' ready='yes'>
        <format type='qcow2'/>
        <source file='/tmp/images/d/../c/../b/../a/a'/>
      </mirror>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

test blockjobPivot
# virsh blockjob testvm3 vda --pivot 

[root@rhel7_test yy]# virsh blockjob testvm3 vda 
No current block job for vda

# virsh dumpxml testvm3 | grep disk -a20
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/../c/../b/../a/a'/>
      <backingStore type='network' index='1'>
        <format type='qcow2'/>
        <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
          <host name='10.66.4.164' transport='rdma'/>
        </source>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

2. do blockpull by running blockpull script 
Formatting 'a', fmt=qcow2 size=5368709120 backing_file='gluster+rdma://10.66.4.164/gluster-vol1/rhel7.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting 'b', fmt=qcow2 size=5368709120 backing_file='../a/a' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting 'c', fmt=qcow2 size=5368709120 backing_file='../b/b' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting 'd', fmt=qcow2 size=5368709120 backing_file='../c/c' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
Domain testvm3 created from /dev/stdin

  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/d'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/tmp/images/d/../c/c'/>
        <backingStore type='file' index='2'>
          <format type='qcow2'/>
          <source file='/tmp/images/d/../c/../b/b'/>
          <backingStore type='file' index='3'>
            <format type='qcow2'/>
            <source file='/tmp/images/d/../c/../b/../a/a'/>
            <backingStore type='network' index='4'>
              <format type='qcow2'/>
              <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
                <host name='10.66.4.164' transport='rdma'/>
              </source>
              <backingStore/>
            </backingStore>
          </backingStore>
        </backingStore>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
Block Pull: [100 %]
Pull complete
Block Pull: [100 %]
Pull complete
Block Pull: [100 %]
Pull complete

#  virsh dumpxml testvm3 | grep disk -a10

<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/images/d/d'/>
      <backingStore type='network' index='1'>
        <format type='qcow2'/>
        <source protocol='gluster' name='gluster-vol1/rhel7.qcow2'>
          <host name='10.66.4.164' transport='rdma'/>
        </source>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

As blockjob can be aborted and pivoted when block-job-complete fails and blockpull also works fine, mark it as verified.

Comment 8 Yang Yang 2015-05-27 08:01:57 UTC
Created attachment 1030390 [details]
script for testing blockcommit

Comment 9 Yang Yang 2015-05-27 08:02:29 UTC
Created attachment 1030391 [details]
script for testing blockpull

Comment 11 errata-xmlrpc 2015-11-19 06:20:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.