Bug 968294
Summary: | engine: create of thin copy vm fails because of "Storage domain does not exist" error from vdsm during GetImageInfoVDSCommand and vm get stuck in image locked | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||
Component: | ovirt-engine | Assignee: | Maor <mlipchuk> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.2.0 | CC: | abaron, acathrow, dron, iheim, jkt, lpeer, Rhev-m-bugs, scohen, yeylon | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 3.3.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-01-21 22:19:21 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Dafna, why is this a regression? Allon, problem is that the addVm calls are indeed called with the active master domain: masterDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d yet call getVolumeInfo with the faulty domain. vdsm fails correctly since the GetInfo command is sent with a domain which is reported as faulty: 7414f930-bbdb-4ec6-8132-4640cbb3c722 2013-05-29 12:19:51,583 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-47) Domain 7414f930-bbdb-4ec6-8132-4640cbb3c722:tiger-01 was reported by all hosts in status UP as problematic. Moving the domain to NonOperational. AddVm is called 3 times by user (several minutes apart) and keeps failing on the same thing: 2013-05-29 14:19:02,617 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-48) [619f7ff3] IrsBroker::getImageInfo::Failed getting image info imageId = f82a0d58-0791-4137-b1e6-22a8794acd2a does not exist on domainName = tiger-01 , domainId = 7414f930-bbdb-4ec6-8132-4640cbb3c722, error code: StorageDomainDoesNotExist, message: Storage domain does not exist: () 2013-05-29 14:05:40,916 INFO [org.ovirt.engine.core.bll.AddVmCommand] (pool-4-thread-48) [72c2c7cb] Running command: AddVmCommand internal: false. Entities affected : ID: 066a4468-2023-4baa-b7a4-625c4d9a5ba0 Type: VdsGroups, ID: 8241801a-fd55-480c-b92f-3926eb935368 Type: VmTemplate, ID: 38755249-4bb3-4841-bf5b-05f4a521514d Type: Storage ... 2013-05-29 14:06:44,352 INFO [org.ovirt.engine.core.bll.AddVmCommand] (pool-4-thread-44) [53bee9fa] Running command: AddVmCommand internal: false. Entities affected : ID: 066a4468-2023-4baa-b7a4-625c4d9a5ba0 Type: VdsGroups, ID: 8241801a-fd55-480c-b92f-3926eb935368 Type: VmTemplate, ID: 38755249-4bb3-4841-bf5b-05f4a521514d Type: Storage ... 2013-05-29 14:19:02,204 INFO [org.ovirt.engine.core.bll.AddVmCommand] (pool-4-thread-48) [619f7ff3] Running command: AddVmCommand internal: false. Entities affected : ID: 066a4468-2023-4baa-b7a4-625c4d9a5ba0 Type: VdsGroups, ID: 8241801a-fd55-480c-b92f-3926eb935368 Type: VmTemplate, ID: 38755249-4bb3-4841-bf5b-05f4a521514d Type: Storage its a regression because I remember testing this scenario on 3.1 when multiple domains feature came out and we were able to create the vm on the active domain. 1. a new CDA should be added for validating storage domain (Bug https://bugzilla.redhat.com/show_bug.cgi?id=975053) 2. getImageInfo has been removed in commit 2575a223515a4f984157e8017e272cdd5ac98db0 and a new compensation has been added to disk at 32783a9f41c150b07c1146c1336fd87bd122956c Could be that this could not reproduce after 2 has been merged. The image should not stay in image locked, after commits (described in comment 5) has been merged After a failure in create vm from template (thin) with a blocked data domain that contains the image, the image is get deleted from the system, there are no disks in 'LOCKED' state Verified on RHEVM3.3 - IS5: rhevm-3.3.0-0.7.master.el6ev.noarch Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released |
Created attachment 754318 [details] logs Description of problem: I tried creating a thin copy vm when one of the domains holding the template copy is inactive and GetImageInfoVDSCommand fails in vdsm with domain does not exists error -> vm gest stuck in image locked Version-Release number of selected component (if applicable): sf7.2 vdsm-4.10.2-22.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. create two iscsi storage domains located on two different storage servers 2. create a template and copy the template to both domains 3. block connectivity to the non-master domain using iptables from all hosts 4. once the domain becomes inactive try to create a new thin copy vm on the active domain ***use vdsm-4.10.2-22.0.el6ev.x86_64*** Actual results: we get an error from vdsm during GetImageInfoVDSCommand and the vm gets stuck in image locked. Expected results: even if there is a failure in vdsm engine should still release the lock and remove the vm (since the vm is based on template there is no reason to keep the vm). Additional info: logs 2013-05-29 14:19:02,617 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-48) [619f7ff3] IrsBroker::getImageInfo::Failed getting image inf o imageId = f82a0d58-0791-4137-b1e6-22a8794acd2a does not exist on domainName = tiger-01 , domainId = 7414f930-bbdb-4ec6-8132-4640cbb3c722, error code: StorageDomainDoesNotExi st, message: Storage domain does not exist: () 2013-05-29 14:19:02,617 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-48) [619f7ff3] Command org.ovirt.engine.core.vdsbroker.irsbroker.GetI mageInfoVDSCommand return value OneImageInfoReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ()]] 2013-05-29 14:19:02,617 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-48) [619f7ff3] FINISH, GetImageInfoVDSCommand, log id: 662f6e39 2013-05-29 14:19:02,617 ERROR [org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand] (pool-4-thread-48) [619f7ff3] Command org.ovirt.engine.core.bll.CreateSnapshotFromTe mplateCommand throw Vdc Bll exception. With error message VdcBLLException: 2013-05-29 14:19:02,620 ERROR [org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand] (pool-4-thread-48) [619f7ff3] Transaction rolled-back for command: org.ovirt.engine. core.bll.CreateSnapshotFromTemplateCommand. 2013-05-29 14:19:02,620 ERROR [org.ovirt.engine.core.bll.AddVmCommand] (pool-4-thread-48) [619f7ff3] Command org.ovirt.engine.core.bll.AddVmCommand throw Vdc Bll exception. Wit h error message VdcBLLException: RESOURCE_MANAGER_VM_SNAPSHOT_MISSMATCH 2013-05-29 14:19:02,621 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-48) Unable to get value of property: glusterVolume for class org.ovirt.en gine.core.bll.AddVmCommand