Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1148803

Summary: Import File or Block Storage Domain should be locked in the memory
Product: Red Hat Enterprise Virtualization Manager Reporter: Ori Gofen <ogofen>
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Ori Gofen <ogofen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, amureini, ecohen, gklein, iheim, lpeer, lsurette, ogofen, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: org.ovirt.engine-root-3.5.0-23 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm+engine logs + images none

Description Ori Gofen 2014-10-02 12:12:19 UTC
Created attachment 943347 [details]
vdsm+engine logs + images

Description of problem:

The operation of Destroying a File storage domain and while doing so,Importing it back, is allowed by oVirt engine,it results with some kind of dead lock
and a pop up Error window explaining(see image):

"error while executing action:internal engine error"

engine log:
2014-10-02 14:47:19,091 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Failed in DetachStorageDomainVDS method
2014-10-02 14:47:19,091 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Command org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand return value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=309, mMessage=Unknown pool id, pool not connected: (u'00000000-0000-0000-0000-000000000000',)]]
2014-10-02 14:47:19,092 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Could not force detach domain 0c59e07d-d1f9-4e1c-9949-da4c84627d92 on pool 00000002-0002-0002-0002-000000000008. error: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DetachStorageDomainVDS, error = Unknown pool id, pool not connected: (u'00000000-0000-0000-0000-000000000000',), code = 309
2014-10-02 14:47:19,092 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] FINISH, DetachStorageDomainVDSCommand, log id: 5d7757ac
2014-10-02 14:47:19,092 WARN  [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Detaching Storage Domain 0c59e07d-d1f9-4e1c-9949-da4c84627d92 from its previous storage pool 00000000-0000-0000-0000-000000000000 has failed. The meta data of the Storage Domain might still indicate that it is attached to a different Storage Pool.
2014-10-02 14:47:19,094 ERROR [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Command org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand throw Vdc Bll exception. With error message VdcBLLException: null (Failed with error ENGINE and code 5001)
2014-10-02 14:47:19,173 INFO  [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Command [id=85feaabf-0420-4a29-8804-e48951da47c8]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: storagePoolId = 00000002-0002-0002-0002-000000000008, storageId = 0c59e07d-d1f9-4e1c-9949-da4c84627d92.
2014-10-02 14:47:19,192 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Correlation ID: 77f4a4c0, Job ID: 08ede1d4-2d85-44ec-9caa-27f25d93c04b, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain nfs_2 to Data Center Default. (User: admin)

vdsm log:

Thread-43990::ERROR::2014-10-02 14:47:02,345::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 0c59e07d-d1f9-4e1c-9949-da4c84627d92 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'0c59e07d-d1f9-4e1c-9949-da4c84627d92',)
Thread-43990::ERROR::2014-10-02 14:47:02,356::task::866::Storage.TaskManager.Task::(_setError) Task=`56106a85-0aeb-46ae-8936-e960a85a242a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 817, in detachStorageDomain
    pool.detachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 956, in detachSD
    dom = sdCache.produce(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'0c59e07d-d1f9-4e1c-9949-da4c84627d92',)
Thread-43990::DEBUG::2014-10-02 14:47:02,372::task::885::Storage.TaskManager.Task::(_run) Task=`56106a85-0aeb-46ae-8936-e960a85a242a`::Task._run: 56106a85-0aeb-46ae-8936-e960a85a242a (u'0c59e07d-d1f9-4e1c-9949-da4c84627d92', u'00000002-0002-0002-0002-000000000008', True, 0) {} failed - stopping task

After Importing the domain every operation we attempt fails.

Version-Release number of selected component (if applicable):
vt4

How reproducible:
100%

Steps to Reproduce:
1.Have a file domain which is not master
2.Maintain the domain,while doing so Destroy the domain
3.Import the domain back to system
4.try to activate it or remove it

Actual results:
A dead lock prevents every action on the domain

Expected results:
this operation should be blocked

Additional info:

Comment 1 Maor 2014-10-19 12:51:04 UTC
Ori, Are you working with Json RPC? can u please try to reproduce this without json RPC on the Host

Comment 2 Ori Gofen 2014-10-20 12:34:46 UTC
(In reply to Maor from comment #1)
> Ori, Are you working with Json RPC? can u please try to reproduce this
> without json RPC on the Host

Maor,yes, couldn't reproduce this without json RPC enabled

Comment 4 Allon Mureinik 2014-10-21 12:01:39 UTC
I very much doubt this issue is indeed a JSON RPC regression, nor can i find any evidence to it in the logs - although it could be that moving to JSON RPC has uncovered a dormant bug in the storage code.

AddExistingFileStorageDomainCommand does not take ANY locks (as the rest of the commands in this hierarchy) even though it should be mutually exclusive from destroying/deleting the domain.

We have to fix this first, and only then, once the commands take the proper locks can this be moved to infra if the problem persists.

Comment 5 Allon Mureinik 2014-11-24 17:54:15 UTC
Maor, this bug is just about adding a lock between the two flows.
Why isn't this solved yet?

Comment 6 Allon Mureinik 2014-11-24 17:54:51 UTC
(In reply to Allon Mureinik from comment #5)
> Maor, this bug is just about adding a lock between the two flows.
> Why isn't this solved yet?
(This is needed regardless of any potential JSONRPC problems)

Comment 7 Maor 2014-11-25 13:06:05 UTC
(In reply to Allon Mureinik from comment #6)
> (In reply to Allon Mureinik from comment #5)
> > Maor, this bug is just about adding a lock between the two flows.
> > Why isn't this solved yet?
> (This is needed regardless of any potential JSONRPC problems)

The original failure here is of json RPC.
The import of File Storage Domain uses a connection path which can't be locked, only when the Storage Connection is being created we can lock it.
The lock of the Storage Connection is being done already at AddStorageServerConnection, so if we will import both Storage Domains at the same time we should be blocked.

Block Storage domain does not use AddStorageServerConnection, so it should be added with the appropriate locks as published in the patches.

Comment 8 Maor 2014-11-25 16:31:31 UTC
since the original json exception is not related to the fix this was a json issue which was already fixed.
I'm Changing the summary title to indicate the fix which Allon referenced to it in his comment, 

the following scenarios should be supported:
1. Importing the same File Storage domain from two different setups at the same time - Should be blocked from one setup
2. Importing the same Block Storage domain from two different setups at the same time - Should be blocked from one setup
3. a. Open a dialog of import block Storage Domain and connect to a target with existing Storage Domain
   b. pick a Storage Domain at both dialogs
   c. Import at one setup the Storage Domain and activate it on the new Data Center
   d. change the name of the Storage domain
   e. Try to import the Storage Domain
The expected result should be that we will be clocked by a CDA message.

Comment 9 Ori Gofen 2014-12-14 13:40:22 UTC
verified on 13.1

Comment 10 Allon Mureinik 2015-02-16 19:11:29 UTC
RHEV-M 3.5.0 has been released, closing this bug.

Comment 11 Allon Mureinik 2015-02-16 19:11:31 UTC
RHEV-M 3.5.0 has been released, closing this bug.