Bug 1236496

Summary: Libvirt doesn't keep disk mirror state synchronization with qemu
Product: Red Hat Enterprise Linux 7 Reporter: Shanzhi Yu <shyu>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: dyuan, mzhan, pkrempa, rbalakri, xuzhang, yanyang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.17-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 06:42:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shanzhi Yu 2015-06-29 08:51:03 UTC
Description of problem:

Libvirt doesn't keep disk mirror state synchronization with qemu

Version-Release number of selected component (if applicable):

libvirt-1.2.16-1.el7.x86_64

How reproducible:

100%

Steps to Reproduce:

1. Trigger a failed blockcopy job, just as below

# virsh blockcopy r7 vda /var/lib/libvirt/images/r7.s1 --granularity 4096 --verbose --wait
Block Copy: [ 0 %]
Now in mirroring phase

# virsh blockjob r7 vda
No current block job for vda

# virsh dumpxml r7|grep mirror


2.Query block job status

# virsh qemu-monitor-command r7 '{"execute":"query-block-jobs"}'
{"return":[],"id":"libvirt-31"}


3.Try a blockjob with "--abort"

# virsh blockjob r7 vda --abort
error: Requested operation is not valid: another job on disk 'vda' is still being ended


Actual results:

In step2, there is no block job, while in step 3 libvirt still report there is a blockjob when do "--abort"
Seem the disk->mirrorState doesn't sync before do abort

Expected results:


Additional info:

I used to reproduce it on upstream, but now I can't

$ sed -n 16360,16368p src/qemu/qemu_driver.c 

    if (!(device = qemuDiskPathToAlias(vm, path, &idx)))
        goto endjob;
    disk = vm->def->disks[idx];

    if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_NONE &&
        disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_READY) {
        virReportError(VIR_ERR_OPERATION_INVALID,
                       _("another job on disk '%s' is still being ended"),

Comment 2 Peter Krempa 2015-06-29 12:03:10 UTC
This is already fixed upstream by:

commit b247d47f397f02dde622a78715c06658bb587003
Author: Jiri Denemark <jdenemar>
Date:   Tue May 19 08:44:16 2015 +0200

    qemu: Don't mess with disk->mirrorState
    
    This patch reverts commit 76c61cdca20c106960af033e5d0f5da70177af0f.
    
    VIR_DOMAIN_DISK_MIRROR_STATE_ABORT says we asked for a block job to be
    aborted rather than saying it was aborted. Let's just use
    VIR_DOMAIN_DISK_MIRROR_STATE_NONE consistently whenever a block job
    finishes since no caller depends on VIR_DOMAIN_DISK_MIRROR_STATE_ABORT
    (anymore) to check whether a block job failed or it was cancelled.
    
    Signed-off-by: Jiri Denemark <jdenemar>

v1.2.16-247-gb247d47

Comment 4 Yang Yang 2015-07-06 03:51:10 UTC
I can reproduce it with libvirt-1.2.16-1.el7.x86_64. 
Verified with libvirt-1.2.17-1.el7.x86_64

Steps
1. prepare a transient domain and do blockcopy
# virsh blockcopy simple vda /tmp/simple.copy --granularity 4096 --wait --verbose
Block Copy: [  0 %]error: Block Copy unexpectedly failed

2. query blockjob status
# virsh blockjob simple vda
No current block job for vda
[root@rhel7_test ~]# virsh qemu-monitor-command simple '{"execute":"query-block-jobs"}'
{"return":[],"id":"libvirt-101"}

3. abort blockjob
# virsh blockjob simple vda --abort
error: Requested operation is not valid: No active operation on device: drive-virtio-disk0

Comment 6 errata-xmlrpc 2015-11-19 06:42:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html