Bug 1019295

Summary: [Engine] storage domain exists only in DB and cannot be reached
Product: Red Hat Enterprise Virtualization Manager Reporter: Raz Tamir <ratamir>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED DUPLICATE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: low    
Version: 3.3.0CC: bazulay, ecohen, gklein, iheim, lpeer, lsurette, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon
Target Milestone: ---Flags: scohen: Triaged+
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-19 14:57:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log none

Description Raz Tamir 2013-10-15 12:32:12 UTC
Created attachment 812485 [details]
engine log

Description of problem:
Immediately after removing storage domain, if removing the only host that exists, the storage domain will not remove from the DB and cannot be reached anymore. 
from engine.log : 
2013-10-15 14:49:28,735 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (ajp-/127.0.0.1:8702-5) START, GetVGInfoVDSCommand(HostName = sd, HostId = c6077130-61d1-486b-8670-9c72d7db521f, VGID=wsENXE-WbrQ-unsn-8Ody-vyMM-nZnG-OS5U2K), log id: 12ccd9d4
2013-10-15 14:49:28,750 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (ajp-/127.0.0.1:8702-5) Failed in GetVGInfoVDS method
2013-10-15 14:49:28,750 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (ajp-/127.0.0.1:8702-5) Error code VolumeGroupDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to GetVGInfoVDS, error = Volume Group does not exist: ('vg_uuid: wsENXE-WbrQ-unsn-8Ody-vyMM-nZnG-OS5U2K',)
2013-10-15 14:49:28,750 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (ajp-/127.0.0.1:8702-5) Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand return value 
 
OneVGReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=506, mMessage=Volume Group does not exist: ('vg_uuid: wsENXE-WbrQ-unsn-8Ody-vyMM-nZnG-OS5U2K',)]]



Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.24.master.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
setup with 1 iSCSI DC, 1 host, 1 storage domain.

1. in DC tab, click on the active DC and in it's storage tab click on maintenance
2. remove the DC
3. from the storage tab remove the domain and mmediately after removing remove the only host that exists

Actual results:
a message appears : Faild to remove storage domain

Expected results:
can't removing the host while the domain is in removal phase



Additional info:

Comment 2 Liron Aravot 2014-10-19 14:57:23 UTC
The mentioned scenario seems incorrect, adding the correct scenario for proper documentation:
1. A master domain is being deactivated, the master role isn't moved to another domain. [A]

2. It's being attempted to remove the domain [B]

3. while RemoveStorageDomain is executed, the used host for the operation is being moved to maintenance. [C]

4. RemoveStorageDomain fails, the domain can't be removed. [D]


so i'm splitting this bug to 3 different issues:
1. When attempting to execute FormatStorageDomain becasue of StorageDomainDoesNotExist the engine should ignore the failure in case of some errors.

2. We shouldn't be able to move host to maintenance when actions are being executed on it.

3. Failure to disconnect from the storage server shouldn't fail RemoveStorageDomainCommand.


Issue 3 which is the main issue here was handled in bug 1001626, so closing this one as duplicate.
I'll open different bugs for the two other issues.


----------------------------------------------------------
[A] 2013-10-15 14:07:54,610 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-44) [c2b1451] START, DeactivateStor
ageDomainVDSCommand( storagePoolId = 1ebeb7d1-c77b-40ec-b94c-5e55157b9eeb, ignoreFailoverLimit = false, storageDomainId = 41a9aaca-2b24-4aeb-b67a-0401a297dd2f
, masterDomainId = 00000000-0000-0000-0000-000000000000, masterVersion = 1), log id: 7182328b
2013-10-15 14:07:54,679 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeactivateStorageDomainVDSCommand] (pool-5-thread-44) [c2b1451] FINISH, DeactivateSto
rageDomainVDSCommand, log id: 7182328b

[B] It's being attempted to remove the domain
2013-10-15 14:08:55,662 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-6) Lock Acquired to object EngineLock [exclusiveLocks= key: 41a9aaca-2b24-4aeb-b67a-0401a297dd2f value: STORAGE

[C] While RemoveStorageDomain is executed, the host is being moved to maintenance -
2013-10-15 14:09:00,361 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-45) [ccdd9cf] Correlation ID: ccdd9cf, Job ID: d7adfa42-a6b4-452e-8863-55a6b5bb7f28, Call Stack: null, Custom Event ID: -1, Message: Host gold-vdsc.qa.lab.tlv.redhat.com was switched to Maintenance mode by admin@internal.

Which causes to RemoveStorageDomain to fail:
2013-10-15 14:09:04,318 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (pool-5-thread-44) [270ce6f3] vdsManager::disposing
2013-10-15 14:09:04,333 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-6) Command DisconnectStorageServerVDS execution failed. Exception: VDSNetworkException: java.net.SocketException: Socket closed
2013-10-15 14:09:04,333 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-6) FINISH, DisconnectStorageServerVDSCommand, log id: 1486d8fc
2013-10-15 14:09:04,333 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-6) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.SocketException: Socket closed (Failed with error VDS_NETWORK_ERROR and code 5022)

[D]
2013-10-15 14:10:35,627 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-4) START, FormatStorageDomainVDSCommand(HostName = g, HostId = dc4a559b-704f-4c75-8227-bf7a66a42ce1, storageDomainId=41a9aaca-2b24-4aeb-b67a-0401a297dd2f), log id: 67d924f9
2013-10-15 14:10:43,514 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-4) Failed in FormatStorageDomainVDS method
2013-10-15 14:10:43,515 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-4) Error code StorageDomainDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('41a9aaca-2b24-4aeb-b67a-0401a297dd2f',)
----------------------------------------------------------

*** This bug has been marked as a duplicate of bug 1001626 ***