Bug 1622072 - Openstack didn't remove volume on instance deletion
Summary: Openstack didn't remove volume on instance deletion
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Francois Palin
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1827413 1827416 1827419 1827420
TreeView+ depends on / blocked
 
Reported: 2018-08-24 11:34 UTC by Pablo Iranzo Gómez
Modified: 2023-03-21 18:58 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1827413 (view as bug list)
Environment:
Last Closed: 2021-07-07 10:38:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1834659 0 None None None 2019-06-28 15:39:10 UTC
OpenStack gerrit 669674 0 'None' MERGED Add retry to cinder API calls related to volume detach 2020-10-17 21:11:59 UTC
Red Hat Issue Tracker OSP-3138 0 None None None 2022-08-23 18:48:40 UTC
Red Hat Knowledge Base (Solution) 3076931 0 None None None 2018-08-24 11:34:24 UTC

Description Pablo Iranzo Gómez 2018-08-24 11:34:25 UTC
Description of problem:

Hi
On instance deletion, the backing volume is not removed.

Sounded like https://bugzilla.redhat.com/show_bug.cgi?id=1198169 which was closed for OSP6 and what is described on kbase: https://access.redhat.com/solutions/3076931.


Version-Release number of selected component (if applicable):

openstack-cinder-9.1.4-12.el7ost.noarch

How reproducible:

When we deploy an non-ephemeral instance (i.e. Creating a new volume), and indicate "YES" in "Delete Volume on Instance delete", it does not properly work. if we delete the instance, the volume is not removed. The status remains as "In-use" and "Attached to None on /dev/vda". 
An example: 
abcfa1db-1748-4f04-9a29-128cf22efcc5	- 	130GiB 	In-use 	- 	Attached to None on /dev/vda

Comment 4 Gorka Eguileor 2018-08-24 14:15:26 UTC
I see errors in the volume logs caused by a missing default volume type named HBSDVSPG200.  We should check the configuration to see if this is expected, which is possible since these are happening on the Cinder-API.

With the logs on INFO level is hard to tell what's going on with precision, but it all points to Nova ignoring an error on the call to terminate connection (so the volume is still attached) and then trying to delete the volume, which cannot be deleted since it's still attached.

The error that Nova is ignoring, is Cinder timing out at the API service on what I assume is the terminate connection call, but we cannot know why since there are no log entries on the Volume service during the minute that the API is waiting before timing out.

We would need DEBUG log levels on the Cinder services to tell what's going on on the terminate connection.

Comment 6 Lee Yarwood 2018-08-30 11:41:36 UTC
(In reply to Gorka Eguileor from comment #4)
> I see errors in the volume logs caused by a missing default volume type
> named HBSDVSPG200.  We should check the configuration to see if this is
> expected, which is possible since these are happening on the Cinder-API.
> 
> With the logs on INFO level is hard to tell what's going on with precision,
> but it all points to Nova ignoring an error on the call to terminate
> connection (so the volume is still attached) and then trying to delete the
> volume, which cannot be deleted since it's still attached.

I can see the os-terminate_connection failures due to RPC timeouts to c-vol in the c-api logs but I can't match it up to anything on the n-cpu side. Most of these appear to be successful anyway AFAICT.

Looking at the n-cpu code in Newton I can see how failures in os-terminate_connection could result in this behaviour as we wouldn't call Cinder to actually detach the volume from the server but there's zero evidence of this happening in the logs.

> The error that Nova is ignoring, is Cinder timing out at the API service on
> what I assume is the terminate connection call, but we cannot know why since
> there are no log entries on the Volume service during the minute that the
> API is waiting before timing out.
> 
> We would need DEBUG log levels on the Cinder services to tell what's going
> on on the terminate connection.

Pablo, can we get DEBUG logs from Nova and Cinder, along with an example instance UUID so I can trace this, the example UUID in c#0 isn't present anywhere in the sosreports.

Comment 10 Alex Stupnikov 2018-12-10 08:44:30 UTC
Hello. May I ask you to update this bug and let me know if support could provide something for you? BR, Alex.


Note You need to log in before you can comment on or make changes to this bug.