Bug 1011569 - Cannot remove an iscsi storage connection not attached to any storage domain
Cannot remove an iscsi storage connection not attached to any storage domain
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-restapi (Show other bugs)
3.3.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.3.0
Assigned To: Daniel Erez
Katarzyna Jachim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-24 10:46 EDT by Katarzyna Jachim
Modified: 2016-02-10 12:00 EST (History)
13 users (show)

See Also:
Fixed In Version: is18
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
abaron: Triaged+


Attachments (Terms of Use)
vdsm.log + engine.log + server.log + db dump (1.96 MB, application/x-compressed-tar)
2013-09-24 10:48 EDT, Katarzyna Jachim
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 19753 None None None Never
oVirt gerrit 19806 None None None Never

  None (edit)
Description Katarzyna Jachim 2013-09-24 10:46:42 EDT
Description of problem:
After a failure of an automated tests I cleaned my RHEVM setup (removed all storage domains, clusters etc.) but forgot to clean orphaned storage connections. I re-run the test, it failed because of the old connections - so I cleaned the RHEVM again and tried to clean orphaned connections. It failed with the following error:

CALL:
DELETE https://kj-rh33.rhev.lab.eng.brq.redhat.com/api/storageconnections/b001aac3-3066-4eaf-aa70-a5d217ce678e

RESPONSE:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><fault><reason>Operation Failed</reason><detail>[Cannot remove Storage Connection. Storage connection parameters are used by the following storage domains : .]</detail></fault>

As you can see, the list of the storage domain above is empty - and actually there is no storage domain in my setup (checked with GUI, REST API and in db).


Version-Release number of selected component (if applicable): is15


How reproducible: happened once


Steps to Reproduce:


Actual results:
Removal of the storage connection fails.


Expected results:
It should be possible to remove the connection.


Additional info:
Comment 1 Katarzyna Jachim 2013-09-24 10:48:19 EDT
Created attachment 802271 [details]
vdsm.log + engine.log + server.log + db dump
Comment 2 Ayal Baron 2013-09-24 17:31:30 EDT
Do you have direct LUNs that may be using this connection?
Comment 3 Alissa 2013-09-25 04:02:02 EDT
Looking at the db dump, it seems that there are still leftovers mentioning this connection related to a lun in the luns-connections table:

lun_storage_server_connection_map (lun_id, storage_server_connection) FROM stdin;
1kjachim02	b001aac3-3066-4eaf-aa70-a5d217ce678e

And the lun 1kjachim02 also still exists in the luns table, with a volumeGroupId. 
volumeGroupId is an indication of the fact that storage domain is using the lun (and respectively - the connection) even that in this probably not clean setup the storage domain was deleted without cleanup of its luns.


The engine log has this (mention of domain sd_288968_2):
2013-09-24 13:31:46,236 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-1) Lock Acquired to object EngineLock [exclusiveLocks= key: null value: STORAGE_CONNECTION
key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
, sharedLocks= ]
2013-09-24 13:31:46,238 WARN  [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-1) CanDoAction of action RemoveStorageServerConnection failed. Reasons:VAR__ACTION__REMOVE,VAR__TYPE__STORAGE__CONNECTION,$domainNames sd_288968_2,sd_288968_2,ACTION_TYPE_FAILED_STORAGE_CONNECTION_BELONGS_TO_SEVERAL_STORAGE_DOMAINS
2013-09-24 13:31:46,239 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-1) Lock freed to object EngineLock [exclusiveLocks= key: null value: STORAGE_CONNECTION
key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
, sharedLocks= ]
2013-09-24 13:31:46,297 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-1) Operation Failed: [Cannot remove Storage Connection. Storage connection parameters are used by the following storage domains : sd_288968_2,sd_288968_2.]

According to the audit log, there was an attempt to remove this domain but it failed - so this might cause not a clean removal that left the luns not deleted:

3ae219c3-9be6-4f69-8834-c2c326bd0970	RemoveStorageDomain	Removing Storage Domain sd_288968_2 from Data Center <UNKNOWN>	FAILED
Comment 4 Ayal Baron 2013-09-29 03:20:12 EDT
(In reply to Alissa from comment #3)
> Looking at the db dump, it seems that there are still leftovers mentioning
> this connection related to a lun in the luns-connections table:
> 
> lun_storage_server_connection_map (lun_id, storage_server_connection) FROM
> stdin;
> 1kjachim02	b001aac3-3066-4eaf-aa70-a5d217ce678e
> 
> And the lun 1kjachim02 also still exists in the luns table, with a
> volumeGroupId. 
> volumeGroupId is an indication of the fact that storage domain is using the
> lun (and respectively - the connection) even that in this probably not clean
> setup the storage domain was deleted without cleanup of its luns.
> 
> 
> The engine log has this (mention of domain sd_288968_2):
> 2013-09-24 13:31:46,236 INFO 
> [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand]
> (ajp-/127.0.0.1:8702-1) Lock Acquired to object EngineLock [exclusiveLocks=
> key: null value: STORAGE_CONNECTION
> key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
> , sharedLocks= ]
> 2013-09-24 13:31:46,238 WARN 
> [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand]
> (ajp-/127.0.0.1:8702-1) CanDoAction of action RemoveStorageServerConnection
> failed.
> Reasons:VAR__ACTION__REMOVE,VAR__TYPE__STORAGE__CONNECTION,$domainNames
> sd_288968_2,sd_288968_2,
> ACTION_TYPE_FAILED_STORAGE_CONNECTION_BELONGS_TO_SEVERAL_STORAGE_DOMAINS
> 2013-09-24 13:31:46,239 INFO 
> [org.ovirt.engine.core.bll.storage.RemoveStorageServerConnectionCommand]
> (ajp-/127.0.0.1:8702-1) Lock freed to object EngineLock [exclusiveLocks=
> key: null value: STORAGE_CONNECTION
> key: b001aac3-3066-4eaf-aa70-a5d217ce678e value: STORAGE_CONNECTION
> , sharedLocks= ]
> 2013-09-24 13:31:46,297 ERROR
> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
> (ajp-/127.0.0.1:8702-1) Operation Failed: [Cannot remove Storage Connection.
> Storage connection parameters are used by the following storage domains :
> sd_288968_2,sd_288968_2.]
> 
> According to the audit log, there was an attempt to remove this domain but
> it failed - so this might cause not a clean removal that left the luns not
> deleted:
> 
> 3ae219c3-9be6-4f69-8834-c2c326bd0970	RemoveStorageDomain	Removing Storage
> Domain sd_288968_2 from Data Center <UNKNOWN>	FAILED

How can the user get out of this state?
Comment 5 Alissa 2013-09-29 04:02:02 EDT
I am not sure this is a classic user case. This is a test environment and I don't know how it was cleaned, or in which method entities were deleted.

Having said that, I think that removal of storage domain should be atomical if it isn't already. 
If removal of storage domain consists of several deletions from several db tables, it should be either all (commit) or nothing (rollback) - without leftovers.
That way, there will be no leftovers in db and user will not get into this kind of situation.
Comment 6 Daniel Erez 2013-09-29 05:31:01 EDT
Hi Katarzyna,
In which API calls did you use to clean the environment? Specifically, how did you remove sd_288968_2 storage domain?
Comment 7 Katarzyna Jachim 2013-10-01 08:51:30 EDT
from engine.log:

2013-09-24 13:35:42,151 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-11) START, FormatStorageDomainVDSCommand(HostName = 10.34.63.216, HostId = 44cfe251-31ba-49c1-9980-e266562d016b, storageDomainId=5c4578e2-b901-47e2-a167-bbd0457203e4), log id: 4ccc57f

I killed my test at 2013-09-24 15:33:20,261 (if you want, I may attach also test log) and tried to clean my RHEV-M setup manually, so removed everything which is possible via GUI and then tried to remove left storage connections via REST API.
Comment 8 Daniel Erez 2013-10-01 09:02:11 EDT
(In reply to Katarzyna Jachim from comment #7)
> from engine.log:
> 
> 2013-09-24 13:35:42,151 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand]
> (ajp-/127.0.0.1:8702-11) START, FormatStorageDomainVDSCommand(HostName =
> 10.34.63.216, HostId = 44cfe251-31ba-49c1-9980-e266562d016b,
> storageDomainId=5c4578e2-b901-47e2-a167-bbd0457203e4), log id: 4ccc57f
> 
> I killed my test at 2013-09-24 15:33:20,261 (if you want, I may attach also
> test log) and tried to clean my RHEV-M setup manually, so removed everything
> which is possible via GUI and then tried to remove left storage connections
> via REST API.

- Yes, please attach the test log.
- Have you used remove or force remove/destroy from the GUI?
Comment 10 Katarzyna Jachim 2013-10-23 09:56:26 EDT
I don't have an exact scenario for verification, but I haven't seen it in the newest versions, so I assume it is fixed.
Comment 11 Itamar Heim 2014-01-21 17:24:20 EST
Closing - RHEV 3.3 Released
Comment 12 Itamar Heim 2014-01-21 17:25:11 EST
Closing - RHEV 3.3 Released
Comment 13 Itamar Heim 2014-01-21 17:28:46 EST
Closing - RHEV 3.3 Released

Note You need to log in before you can comment on or make changes to this bug.