Bug 1011569 - Cannot remove an iscsi storage connection not attached to any storage domain
Summary: Cannot remove an iscsi storage connection not attached to any storage domain
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-restapi
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.3.0
Assignee: Daniel Erez
QA Contact: Katarzyna Jachim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-24 14:46 UTC by Katarzyna Jachim
Modified: 2016-02-10 17:00 UTC (History)
13 users (show)

Fixed In Version: is18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:
abaron: Triaged+


Attachments (Terms of Use)
vdsm.log + engine.log + server.log + db dump (1.96 MB, application/x-compressed-tar)
2013-09-24 14:48 UTC, Katarzyna Jachim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 19753 0 None None None Never
oVirt gerrit 19806 0 None None None Never

Description Katarzyna Jachim 2013-09-24 14:46:42 UTC
Description of problem:
After a failure of an automated tests I cleaned my RHEVM setup (removed all storage domains, clusters etc.) but forgot to clean orphaned storage connections. I re-run the test, it failed because of the old connections - so I cleaned the RHEVM again and tried to clean orphaned connections. It failed with the following error:

CALL:
DELETE https://kj-rh33.rhev.lab.eng.brq.redhat.com/api/storageconnections/b001aac3-3066-4eaf-aa70-a5d217ce678e

RESPONSE:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><fault><reason>Operation Failed</reason><detail>[Cannot remove Storage Connection. Storage connection parameters are used by the following storage domains : .]</detail></fault>

As you can see, the list of the storage domain above is empty - and actually there is no storage domain in my setup (checked with GUI, REST API and in db).


Version-Release number of selected component (if applicable): is15


How reproducible: happened once


Steps to Reproduce:


Actual results:
Removal of the storage connection fails.


Expected results:
It should be possible to remove the connection.


Additional info:

Comment 1 Katarzyna Jachim 2013-09-24 14:48:19 UTC
Created attachment 802271 [details]
vdsm.log + engine.log + server.log + db dump

Comment 2 Ayal Baron 2013-09-24 21:31:30 UTC
Do you have direct LUNs that may be using this connection?

Comment 3 Alissa 2013-09-25 08:02:02 UTC
Looking at the db dump, it seems that there are still leftovers mentioning this connection related to a lun in the luns-connections table:

lun_storage_server_connection_map (lun_id, storage_server_connection) FROM stdin;
1kjachim02	b001aac3-3066-4eaf-aa70-a5d217ce678e

And the lun 1kjachim02 also still exists in the luns table, with a volumeGroupId. 
volumeGroupId is an indication of the fact that storage domain is using the lun (and respectively - the connection) even that in this probably not clean setup the storage domain was deleted without cleanup of its luns.


The engine log has this (mention of domain sd_288968_2):
2013-09-24 13:31:46,236 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-1) Lock Acquired to object EngineLock [exclusiveLocks= key: null value: STORAGE_CONNECTION
key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
, sharedLocks= ]
2013-09-24 13:31:46,238 WARN  [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-1) CanDoAction of action RemoveStorageServerConnection failed. Reasons:VAR__ACTION__REMOVE,VAR__TYPE__STORAGE__CONNECTION,$domainNames sd_288968_2,sd_288968_2,ACTION_TYPE_FAILED_STORAGE_CONNECTION_BELONGS_TO_SEVERAL_STORAGE_DOMAINS
2013-09-24 13:31:46,239 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-1) Lock freed to object EngineLock [exclusiveLocks= key: null value: STORAGE_CONNECTION
key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
, sharedLocks= ]
2013-09-24 13:31:46,297 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-1) Operation Failed: [Cannot remove Storage Connection. Storage connection parameters are used by the following storage domains : sd_288968_2,sd_288968_2.]

According to the audit log, there was an attempt to remove this domain but it failed - so this might cause not a clean removal that left the luns not deleted:

3ae219c3-9be6-4f69-8834-c2c326bd0970	RemoveStorageDomain	Removing Storage Domain sd_288968_2 from Data Center <UNKNOWN>	FAILED

Comment 4 Ayal Baron 2013-09-29 07:20:12 UTC
(In reply to Alissa from comment #3)
> Looking at the db dump, it seems that there are still leftovers mentioning
> this connection related to a lun in the luns-connections table:
> 
> lun_storage_server_connection_map (lun_id, storage_server_connection) FROM
> stdin;
> 1kjachim02	b001aac3-3066-4eaf-aa70-a5d217ce678e
> 
> And the lun 1kjachim02 also still exists in the luns table, with a
> volumeGroupId. 
> volumeGroupId is an indication of the fact that storage domain is using the
> lun (and respectively - the connection) even that in this probably not clean
> setup the storage domain was deleted without cleanup of its luns.
> 
> 
> The engine log has this (mention of domain sd_288968_2):
> 2013-09-24 13:31:46,236 INFO 
> [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand]
> (ajp-/127.0.0.1:8702-1) Lock Acquired to object EngineLock [exclusiveLocks=
> key: null value: STORAGE_CONNECTION
> key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
> , sharedLocks= ]
> 2013-09-24 13:31:46,238 WARN 
> [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand]
> (ajp-/127.0.0.1:8702-1) CanDoAction of action RemoveStorageServerConnection
> failed.
> Reasons:VAR__ACTION__REMOVE,VAR__TYPE__STORAGE__CONNECTION,$domainNames
> sd_288968_2,sd_288968_2,
> ACTION_TYPE_FAILED_STORAGE_CONNECTION_BELONGS_TO_SEVERAL_STORAGE_DOMAINS
> 2013-09-24 13:31:46,239 INFO 
> [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand]
> (ajp-/127.0.0.1:8702-1) Lock freed to object EngineLock [exclusiveLocks=
> key: null value: STORAGE_CONNECTION
> key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
> , sharedLocks= ]
> 2013-09-24 13:31:46,297 ERROR
> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
> (ajp-/127.0.0.1:8702-1) Operation Failed: [Cannot remove Storage Connection.
> Storage connection parameters are used by the following storage domains :
> sd_288968_2,sd_288968_2.]
> 
> According to the audit log, there was an attempt to remove this domain but
> it failed - so this might cause not a clean removal that left the luns not
> deleted:
> 
> 3ae219c3-9be6-4f69-8834-c2c326bd0970	RemoveStorageDomain	Removing Storage
> Domain sd_288968_2 from Data Center <UNKNOWN>	FAILED

How can the user get out of this state?

Comment 5 Alissa 2013-09-29 08:02:02 UTC
I am not sure this is a classic user case. This is a test environment and I don't know how it was cleaned, or in which method entities were deleted.

Having said that, I think that removal of storage domain should be atomical if it isn't already. 
If removal of storage domain consists of several deletions from several db tables, it should be either all (commit) or nothing (rollback) - without leftovers.
That way, there will be no leftovers in db and user will not get into this kind of situation.

Comment 6 Daniel Erez 2013-09-29 09:31:01 UTC
Hi Katarzyna,
In which API calls did you use to clean the environment? Specifically, how did you remove sd_288968_2 storage domain?

Comment 7 Katarzyna Jachim 2013-10-01 12:51:30 UTC
from engine.log:

2013-09-24 13:35:42,151 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) START, FormatStorageDomainVDSCommand(HostName = 10.34.63.216, HostId = 44cfe251-31ba-49c1-9980-e266562d016b, storageDomainId=5c4578e2-b901-47e2-a167-bbd0457203e4), log id: 4ccc57f

I killed my test at 2013-09-24 15:33:20,261 (if you want, I may attach also test log) and tried to clean my RHEV-M setup manually, so removed everything which is possible via GUI and then tried to remove left storage connections via REST API.

Comment 8 Daniel Erez 2013-10-01 13:02:11 UTC
(In reply to Katarzyna Jachim from comment #7)
> from engine.log:
> 
> 2013-09-24 13:35:42,151 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand]
> (ajp-/127.0.0.1:8702-11) START, FormatStorageDomainVDSCommand(HostName =
> 10.34.63.216, HostId = 44cfe251-31ba-49c1-9980-e266562d016b,
> storageDomainId=5c4578e2-b901-47e2-a167-bbd0457203e4), log id: 4ccc57f
> 
> I killed my test at 2013-09-24 15:33:20,261 (if you want, I may attach also
> test log) and tried to clean my RHEV-M setup manually, so removed everything
> which is possible via GUI and then tried to remove left storage connections
> via REST API.

- Yes, please attach the test log.
- Have you used remove or force remove/destroy from the GUI?

Comment 10 Katarzyna Jachim 2013-10-23 13:56:26 UTC
I don't have an exact scenario for verification, but I haven't seen it in the newest versions, so I assume it is fixed.

Comment 11 Itamar Heim 2014-01-21 22:24:20 UTC
Closing - RHEV 3.3 Released

Comment 12 Itamar Heim 2014-01-21 22:25:11 UTC
Closing - RHEV 3.3 Released

Comment 13 Itamar Heim 2014-01-21 22:28:46 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.