Bug 968351

Summary: engine: target domain is reported in problem on host where vm is running during createVolume and engine continues with LSM (sending cloneImageStructure)
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Ayal Baron <abaron>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acanan, acathrow, amureini, iheim, jkt, lpeer, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Flags: scohen: needinfo+
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-05-29 14:09:37 UTC
Created attachment 754404 [details]
logs

Description of problem:

in iscsi storage with two hosts, I started a LSM action for a vm running on the hsm host. 
during the createVolume step I block connectivity to the target domain from the hsm host only. 

the domain is reported as having problem and yet we send the cloneImageStructure which will fail on Drive replication error error.

Task cannot be cleaned with ArrayIndexOutOfBoundsException

Version-Release number of selected component (if applicable):

sf17.2

How reproducible:

100%

Steps to Reproduce:
1. create two domains on two different storage servers. 
2. create a vm from template and run it on hsm host
3. start LSM for the vm's disk
4. during the createVolume task, block connectivity to the target domain using iptabes from the hsm host only

Actual results:

even though domain is reported in problem before the createVolume task has finished, engine continues to the next step in the LSM and we fail in vdsm with  Drive replication error error and fail to clear the task with ArrayIndexOutOfBoundsException

Expected results:

if target domain is reported as problematic I think that LSM should roll back after the createVolume step. 

Additional info: logs


createSnapshot is sent: 

2013-05-29 16:48:26,312 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-49) [38521dc8] START, CreateSnapshotVDSCommand( storagePoolId 
= 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = 3.2, storageDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, imageGroupId = a4d4637b
-9470-4373-99e3-476e7b13aabc, imageSizeInBytes = 16106127360, volumeFormat = COW, newImageId = 5c00cb56-21d3-48df-b58b-48e70414f62a, newImageDescription = , imageId = c72c5377-
7b03-4b8f-9540-d402be2ea8bf, sourceImageGroupId = a4d4637b-9470-4373-99e3-476e7b13aabc), log id: afe42a8


domain is reported as problematic on the host: 

2013-05-29 16:49:06,474 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-47) domain 7414f930-bbdb-4ec6-8132-4640cbb3c722:tiger-01 in problem. vds: cougar02

cloneImageGroupStructure is sent: 

2013-05-29 16:50:52,118 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CloneImageGroupStructureVDSCommand] (pool-4-thread-48) [24e0e6dc] START, CloneImageGroupStructureVDSCom
mand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, im
ageGroupId = a4d4637b-9470-4373-99e3-476e7b13aabc, dstDomainId = 7414f930-bbdb-4ec6-8132-4640cbb3c722), log id: 22eb354a

LSM fails: 

2013-05-29 16:51:13,134 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-48) [24e0e6dc] Failed in VmReplicateDiskStartVDS method
2013-05-29 16:51:13,135 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-48) [24e0e6dc] Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, error = Drive replication error

Comment 6 Itamar Heim 2014-01-21 22:28:33 UTC
Closing - RHEV 3.3 Released

Comment 7 Itamar Heim 2014-01-21 22:28:37 UTC
Closing - RHEV 3.3 Released

Comment 8 Itamar Heim 2014-01-21 22:31:31 UTC
Closing - RHEV 3.3 Released