Bug 966037

Summary: [engine-backend] in a case of missing device, the domain is inaccessible but engine reports it as up
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED WORKSFORME QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, acanan, acathrow, amureini, ebenahar, hateya, iheim, jkt, lpeer, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.4.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-10 16:38:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1078909, 1142926    
Attachments:
Description Flags
logs none

Description Elad 2013-05-22 10:38:02 UTC
Created attachment 751654 [details]
logs

Description of problem:

Despite host cannot perform connectStorageServer, engine still reports it as 'up' state.

Version-Release number of selected component (if applicable):

rhevm-3.2.0-10.26.rc.el6ev.noarch
vdsm-4.10.2-19.0.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce: 
On 1 host and one iscsi domain:
1. Try to extend the domain and during the extension, remove the device (pv) that you extended the domain with from the host with:

'multipath -f 1elad1313678616'

2. vdsm will fail to extend the domain and than will fail in connectStorageServer:


2013-05-22 13:15:01,794 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-49) START, ConnectStoragePoolVDSCommand(HostName = nott-vds1, HostId = 61ada6ee-b58a-11e2-b34e-
001a4a169734, storagePoolId = e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, vds_spm_id = 1, masterDomainId = da07317a-eaa1-4cf8-aaae-ac41c4b1fd87, masterVersion = 1), log id: 2e09ca40
2013-05-22 13:15:03,214 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (QuartzScheduler_Worker-77) No string for UNASSIGNED type. Use default Log
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: 'spUUID=e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, msdUUID=da07317a-eaa1-4cf8-aaae-ac41c4b1fd87']]
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) HostName = nott-vds1
2013-05-22 13:15:04,218 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErro
rException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, msdUUID=da07317a-eaa1-4cf8-aaae-ac41c4b1fd87'
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 2e09ca40
2013-05-22 13:15:04,219 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-4-thread-49) Could not connect host nott-vds1 to pool iscsi

3. The pool becomes non-responsive and the host non-operational


Actual results:
Engine reports that the domain is up even though there is no active hosts in the pool and the pool is non-responsive. there is nothing that user can do in order to remove the damaged domain.

Expected results:
The domain should become unknown

Additional info: logs

Comment 1 Liron Aravot 2013-12-05 09:43:44 UTC
Elad, the attached logs aren't match the one that you quoted.
please try to reproduce, that shouldn't happen.
if it does, please attach correct logs.

Comment 2 Elad 2013-12-10 16:38:19 UTC
No reproduction so far, checked on 3.2.5. Closing for now as WORKSFORME, will re-open if necessary.