Bug 966037 - [engine-backend] in a case of missing device, the domain is inaccessible but engine reports it as up
[engine-backend] in a case of missing device, the domain is inaccessible but ...
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
x86_64 Unspecified
unspecified Severity high
: ---
: 3.4.0
Assigned To: Liron Aravot
Elad
storage
: Triaged
Depends On:
Blocks: rhev3.4beta 1142926
  Show dependency treegraph
 
Reported: 2013-05-22 06:38 EDT by Elad
Modified: 2016-02-10 12:38 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-10 11:38:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (1.04 MB, application/x-gzip)
2013-05-22 06:38 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-05-22 06:38:02 EDT
Created attachment 751654 [details]
logs

Description of problem:

Despite host cannot perform connectStorageServer, engine still reports it as 'up' state.

Version-Release number of selected component (if applicable):

rhevm-3.2.0-10.26.rc.el6ev.noarch
vdsm-4.10.2-19.0.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce: 
On 1 host and one iscsi domain:
1. Try to extend the domain and during the extension, remove the device (pv) that you extended the domain with from the host with:

'multipath -f 1elad1313678616'

2. vdsm will fail to extend the domain and than will fail in connectStorageServer:


2013-05-22 13:15:01,794 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-49) START, ConnectStoragePoolVDSCommand(HostName = nott-vds1, HostId = 61ada6ee-b58a-11e2-b34e-
001a4a169734, storagePoolId = e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, vds_spm_id = 1, masterDomainId = da07317a-eaa1-4cf8-aaae-ac41c4b1fd87, masterVersion = 1), log id: 2e09ca40
2013-05-22 13:15:03,214 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (QuartzScheduler_Worker-77) No string for UNASSIGNED type. Use default Log
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: 'spUUID=e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, msdUUID=da07317a-eaa1-4cf8-aaae-ac41c4b1fd87']]
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) HostName = nott-vds1
2013-05-22 13:15:04,218 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErro
rException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, msdUUID=da07317a-eaa1-4cf8-aaae-ac41c4b1fd87'
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 2e09ca40
2013-05-22 13:15:04,219 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-4-thread-49) Could not connect host nott-vds1 to pool iscsi

3. The pool becomes non-responsive and the host non-operational


Actual results:
Engine reports that the domain is up even though there is no active hosts in the pool and the pool is non-responsive. there is nothing that user can do in order to remove the damaged domain.

Expected results:
The domain should become unknown

Additional info: logs
Comment 1 Liron Aravot 2013-12-05 04:43:44 EST
Elad, the attached logs aren't match the one that you quoted.
please try to reproduce, that shouldn't happen.
if it does, please attach correct logs.
Comment 2 Elad 2013-12-10 11:38:19 EST
No reproduction so far, checked on 3.2.5. Closing for now as WORKSFORME, will re-open if necessary.

Note You need to log in before you can comment on or make changes to this bug.