Bug 1148803
| Summary: | Import File or Block Storage Domain should be locked in the memory | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ori Gofen <ogofen> | ||||
| Component: | ovirt-engine | Assignee: | Maor <mlipchuk> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Ori Gofen <ogofen> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.5.0 | CC: | acanan, amureini, ecohen, gklein, iheim, lpeer, lsurette, ogofen, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | org.ovirt.engine-root-3.5.0-23 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | Bug | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Ori, Are you working with Json RPC? can u please try to reproduce this without json RPC on the Host (In reply to Maor from comment #1) > Ori, Are you working with Json RPC? can u please try to reproduce this > without json RPC on the Host Maor,yes, couldn't reproduce this without json RPC enabled I very much doubt this issue is indeed a JSON RPC regression, nor can i find any evidence to it in the logs - although it could be that moving to JSON RPC has uncovered a dormant bug in the storage code. AddExistingFileStorageDomainCommand does not take ANY locks (as the rest of the commands in this hierarchy) even though it should be mutually exclusive from destroying/deleting the domain. We have to fix this first, and only then, once the commands take the proper locks can this be moved to infra if the problem persists. Maor, this bug is just about adding a lock between the two flows. Why isn't this solved yet? (In reply to Allon Mureinik from comment #5) > Maor, this bug is just about adding a lock between the two flows. > Why isn't this solved yet? (This is needed regardless of any potential JSONRPC problems) (In reply to Allon Mureinik from comment #6) > (In reply to Allon Mureinik from comment #5) > > Maor, this bug is just about adding a lock between the two flows. > > Why isn't this solved yet? > (This is needed regardless of any potential JSONRPC problems) The original failure here is of json RPC. The import of File Storage Domain uses a connection path which can't be locked, only when the Storage Connection is being created we can lock it. The lock of the Storage Connection is being done already at AddStorageServerConnection, so if we will import both Storage Domains at the same time we should be blocked. Block Storage domain does not use AddStorageServerConnection, so it should be added with the appropriate locks as published in the patches. since the original json exception is not related to the fix this was a json issue which was already fixed. I'm Changing the summary title to indicate the fix which Allon referenced to it in his comment, the following scenarios should be supported: 1. Importing the same File Storage domain from two different setups at the same time - Should be blocked from one setup 2. Importing the same Block Storage domain from two different setups at the same time - Should be blocked from one setup 3. a. Open a dialog of import block Storage Domain and connect to a target with existing Storage Domain b. pick a Storage Domain at both dialogs c. Import at one setup the Storage Domain and activate it on the new Data Center d. change the name of the Storage domain e. Try to import the Storage Domain The expected result should be that we will be clocked by a CDA message. verified on 13.1 RHEV-M 3.5.0 has been released, closing this bug. RHEV-M 3.5.0 has been released, closing this bug. |
Created attachment 943347 [details] vdsm+engine logs + images Description of problem: The operation of Destroying a File storage domain and while doing so,Importing it back, is allowed by oVirt engine,it results with some kind of dead lock and a pop up Error window explaining(see image): "error while executing action:internal engine error" engine log: 2014-10-02 14:47:19,091 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Failed in DetachStorageDomainVDS method 2014-10-02 14:47:19,091 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Command org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=309, mMessage=Unknown pool id, pool not connected: (u'00000000-0000-0000-0000-000000000000',)]] 2014-10-02 14:47:19,092 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Could not force detach domain 0c59e07d-d1f9-4e1c-9949-da4c84627d92 on pool 00000002-0002-0002-0002-000000000008. error: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DetachStorageDomainVDS, error = Unknown pool id, pool not connected: (u'00000000-0000-0000-0000-000000000000',), code = 309 2014-10-02 14:47:19,092 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DetachStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] FINISH, DetachStorageDomainVDSCommand, log id: 5d7757ac 2014-10-02 14:47:19,092 WARN [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Detaching Storage Domain 0c59e07d-d1f9-4e1c-9949-da4c84627d92 from its previous storage pool 00000000-0000-0000-0000-000000000000 has failed. The meta data of the Storage Domain might still indicate that it is attached to a different Storage Pool. 2014-10-02 14:47:19,094 ERROR [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Command org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand throw Vdc Bll exception. With error message VdcBLLException: null (Failed with error ENGINE and code 5001) 2014-10-02 14:47:19,173 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Command [id=85feaabf-0420-4a29-8804-e48951da47c8]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: storagePoolId = 00000002-0002-0002-0002-000000000008, storageId = 0c59e07d-d1f9-4e1c-9949-da4c84627d92. 2014-10-02 14:47:19,192 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-11) [77f4a4c0] Correlation ID: 77f4a4c0, Job ID: 08ede1d4-2d85-44ec-9caa-27f25d93c04b, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain nfs_2 to Data Center Default. (User: admin) vdsm log: Thread-43990::ERROR::2014-10-02 14:47:02,345::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 0c59e07d-d1f9-4e1c-9949-da4c84627d92 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'0c59e07d-d1f9-4e1c-9949-da4c84627d92',) Thread-43990::ERROR::2014-10-02 14:47:02,356::task::866::Storage.TaskManager.Task::(_setError) Task=`56106a85-0aeb-46ae-8936-e960a85a242a`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 817, in detachStorageDomain pool.detachSD(sdUUID) File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 956, in detachSD dom = sdCache.produce(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 98, in produce domain.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/nfsSD.py", line 112, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'0c59e07d-d1f9-4e1c-9949-da4c84627d92',) Thread-43990::DEBUG::2014-10-02 14:47:02,372::task::885::Storage.TaskManager.Task::(_run) Task=`56106a85-0aeb-46ae-8936-e960a85a242a`::Task._run: 56106a85-0aeb-46ae-8936-e960a85a242a (u'0c59e07d-d1f9-4e1c-9949-da4c84627d92', u'00000002-0002-0002-0002-000000000008', True, 0) {} failed - stopping task After Importing the domain every operation we attempt fails. Version-Release number of selected component (if applicable): vt4 How reproducible: 100% Steps to Reproduce: 1.Have a file domain which is not master 2.Maintain the domain,while doing so Destroy the domain 3.Import the domain back to system 4.try to activate it or remove it Actual results: A dead lock prevents every action on the domain Expected results: this operation should be blocked Additional info: