Bug 896507

Summary: 3.1 - engine: live snapshot fails due to race on multiple move of disks (live storage migration)
Product: Red Hat Enterprise Linux 6 Reporter: Chris Pelland <cpelland>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED DUPLICATE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: abaron, aburden, amureini, bazulay, dron, dyasny, fsimonce, hateya, iheim, ilvovsky, lpeer, Rhev-m-bugs, sgrinber, thildred, yeylon, ykaplan, ykaul, zdover
Target Milestone: rcKeywords: ZStream
Target Release: 6.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage,
Fixed In Version: vdsm-4.10.2-1.2.el6 Doc Type: Bug Fix
Doc Text:
Previously, live snapshotting failed because of a race condition that existed when you tried to move virtual machine disks between storage domains. This error occurred when a host interrupted the block volume creation process between the lvcreate step and the lvchange step. A patch has been introduced which adds an init tag in lvcreate so that other hosts can identify logical volumes engaged in the creation process as "partial". When identified as "partial", hosts ignore these logical volumes. This eliminates the race condition that caused live snapshotting to fail.
Story Points: ---
Clone Of: 876558 Environment:
Last Closed: 2013-02-07 15:09:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 876558    
Bug Blocks:    
Attachments:
Description Flags
logs none

Comment 4 Dafna Ron 2013-01-29 14:00:26 UTC
tested on vdsm-4.10.2-1.2.el6.x86_64

we are failing to create snapshot with: 

Thread-4533::ERROR::2013-01-29 13:22:10,610::libvirtvm::2197::vm.Vm::(diskReplicateFinish) vmId=`cc5c2485-e7ac-4094-9d30-b2c819b8430b`::Unable to stop the replication for the drive: vda
Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 2194, in diskReplicateFinish
    self._dom.blockJobAbort(srcDrive.name, blockJobFlags)
  File "/usr/share/vdsm/libvirtvm.py", line 515, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 512, in blockJobAbort
    if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
libvirtError: internal error unable to execute QEMU command '__com.redhat_drive-reopen': Could not open '/rhev/data-center/f1c4c67b-0647-40c7-975a-837777656129/a68a5097-3032-4e99-bc43-fdebf4f46df2/images/83dd02e0-2e19-4179-827d-7d49cded
ccb5/3b6d4f67-594a-4736-87e0-aed140a8ef5c': Operation not permitted


full logs will be attached

Comment 5 Dafna Ron 2013-01-29 14:01:02 UTC
Created attachment 689846 [details]
logs

Comment 6 Dafna Ron 2013-01-29 14:03:28 UTC
I reproduced on by creating 30 vm's (using pool and detaching the vms). 
the vm's were on 2 iscsi storage domains and I live migrated all the disks (selecting each disk -> move) to a 3ed domain. 

moving back to devel with all logs

Comment 7 Ayal Baron 2013-02-07 12:41:08 UTC
(In reply to comment #4)
> tested on vdsm-4.10.2-1.2.el6.x86_64
> 
> we are failing to create snapshot with: 
> 
> Thread-4533::ERROR::2013-01-29
> 13:22:10,610::libvirtvm::2197::vm.Vm::(diskReplicateFinish)
> vmId=`cc5c2485-e7ac-4094-9d30-b2c819b8430b`::Unable to stop the replication
> for the drive: vda
> Traceback (most recent call last):
>   File "/usr/share/vdsm/libvirtvm.py", line 2194, in diskReplicateFinish
>     self._dom.blockJobAbort(srcDrive.name, blockJobFlags)
>   File "/usr/share/vdsm/libvirtvm.py", line 515, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 83, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 512, in
> blockJobAbort
>     if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed',
> dom=self)
> libvirtError: internal error unable to execute QEMU command
> '__com.redhat_drive-reopen': Could not open
> '/rhev/data-center/f1c4c67b-0647-40c7-975a-837777656129/a68a5097-3032-4e99-
> bc43-fdebf4f46df2/images/83dd02e0-2e19-4179-827d-7d49cded
> ccb5/3b6d4f67-594a-4736-87e0-aed140a8ef5c': Operation not permitted
> 
> 
> full logs will be attached

Fede, isn't this issue a dup of the libvirt bug?

Comment 8 Federico Simoncelli 2013-02-07 14:54:32 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > libvirtError: internal error unable to execute QEMU command
> > '__com.redhat_drive-reopen': Could not open
> > '/rhev/data-center/f1c4c67b-0647-40c7-975a-837777656129/a68a5097-3032-4e99-
> > bc43-fdebf4f46df2/images/83dd02e0-2e19-4179-827d-7d49cded
> > ccb5/3b6d4f67-594a-4736-87e0-aed140a8ef5c': Operation not permitted
> > 
> > 
> > full logs will be attached
> 
> Fede, isn't this issue a dup of the libvirt bug?

Yes it looks like a duplicate of bug 903248

Comment 9 Ayal Baron 2013-02-07 15:09:14 UTC

*** This bug has been marked as a duplicate of bug 903248 ***