Bug 1235004

Summary: blockcommit on gluster can't be restarted after the previous job fails due to network connectivity loss
Product: Red Hat Enterprise Linux 7 Reporter: Peter Krempa <pkrempa>
Component: qemu-kvm-rhevAssignee: Jeff Cody <jcody>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: dyuan, huding, juzhang, knoel, mzhan, pzhang, rbalakri, virt-bugs, virt-maint, xfu, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1227551 Environment:
Last Closed: 2017-01-31 02:00:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1227551    

Description Peter Krempa 2015-06-23 17:49:24 UTC
I've added comments in square brackets to explain what's happening on the libvirt<->qemu interface.

+++ This bug was initially created as a clone of Bug #1227551 +++

Description of problem:
Broke the network connection during blockcommit , libvirt would report wrong info about the result of blockcommit and then it would fail to do blockcommit again .

Version-Release number of selected component (if applicable):
libvirt-1.2.15-2.el7.x86_64
qemu-kvm-rhev-2.3.0-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.prepare a healthy guest , and base image on gluster .
#virsh dumpxml gluster | grep disk -A 9
<disk type='network' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source protocol='gluster' name='gluster-vol1/r7q2.img'>
        <host name='$server_IP'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

[the initial disk is based on gluster]

2.create snapshots for this guest .
# for i in {1..3}; do virsh snapshot-create-as gluster s$i --disk-only --diskspec vda,file=/tmp/s$i ; done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created

[three snapshots on the *local filesystem* are created]


# virsh snapshot-list gluster
 Name                 Creation Time             State
------------------------------------------------------------
 s1                   2015-06-01 16:42:46 +0800 disk-snapshot
 s2                   2015-06-01 16:43:59 +0800 disk-snapshot
 s3                   2015-06-01 16:44:12 +0800 disk-snapshot

3.do blockcommit
in terminal 1 (do blockcommit):
#virsh blockcommit gluster vda --active --verbose --wait
Block Commit: [30 %]

[this is active layer block commit]

in terminal 2 (broke the network connection to gluster server ):
#  iptables -A OUTPUT -d $server_IP -j DROP

4.check result after a few minutes later :
in terminal 1 (blockcommit finished):
#virsh blockcommit gluster vda --active --verbose --wait
Block Commit: [100 %]
Now in synchronized phase

[The above output is due to a bug in virsh. This is being fixed in bug 1227551]

# virsh blockjob gluster vda --info
No current block job for vda

[The above command calls query-block-jobs.]

5.recover network connection and do blockcommit again .
#  iptables -D OUTPUT -d $server_IP -j DROP

# virsh blockcommit gluster vda --active --verbose --wait
error: internal error: unable to execute QEMU command 'block-commit': Error (Operation not permitted) flushing drive
[Libvirt doesn't think at this point that there is a pending block job, otherwise the error would be different, so the event propagated successfully and cleared the pending block job flag in libvirt]


Actual results:
As step 4 ,It reports that blockcommit finished and in mirror phase but actually it is not .
As step 5, fail to do blockcommit again after recover the network connection.

Expected results:
In step4 give correct info about the result of blockcommit .
[Handled by bug 1227551]
In step 5 , It can do blockcommit successfully after recover the network connection.

Comment 2 Ademar Reis 2015-07-10 20:34:13 UTC
May be related to Bug 1171261