Description of problem: Hi On instance deletion, the backing volume is not removed. Sounded like https://bugzilla.redhat.com/show_bug.cgi?id=1198169 which was closed for OSP6 and what is described on kbase: https://access.redhat.com/solutions/3076931. Version-Release number of selected component (if applicable): openstack-cinder-9.1.4-12.el7ost.noarch How reproducible: When we deploy an non-ephemeral instance (i.e. Creating a new volume), and indicate "YES" in "Delete Volume on Instance delete", it does not properly work. if we delete the instance, the volume is not removed. The status remains as "In-use" and "Attached to None on /dev/vda". An example: abcfa1db-1748-4f04-9a29-128cf22efcc5 - 130GiB In-use - Attached to None on /dev/vda
I see errors in the volume logs caused by a missing default volume type named HBSDVSPG200. We should check the configuration to see if this is expected, which is possible since these are happening on the Cinder-API. With the logs on INFO level is hard to tell what's going on with precision, but it all points to Nova ignoring an error on the call to terminate connection (so the volume is still attached) and then trying to delete the volume, which cannot be deleted since it's still attached. The error that Nova is ignoring, is Cinder timing out at the API service on what I assume is the terminate connection call, but we cannot know why since there are no log entries on the Volume service during the minute that the API is waiting before timing out. We would need DEBUG log levels on the Cinder services to tell what's going on on the terminate connection.
(In reply to Gorka Eguileor from comment #4) > I see errors in the volume logs caused by a missing default volume type > named HBSDVSPG200. We should check the configuration to see if this is > expected, which is possible since these are happening on the Cinder-API. > > With the logs on INFO level is hard to tell what's going on with precision, > but it all points to Nova ignoring an error on the call to terminate > connection (so the volume is still attached) and then trying to delete the > volume, which cannot be deleted since it's still attached. I can see the os-terminate_connection failures due to RPC timeouts to c-vol in the c-api logs but I can't match it up to anything on the n-cpu side. Most of these appear to be successful anyway AFAICT. Looking at the n-cpu code in Newton I can see how failures in os-terminate_connection could result in this behaviour as we wouldn't call Cinder to actually detach the volume from the server but there's zero evidence of this happening in the logs. > The error that Nova is ignoring, is Cinder timing out at the API service on > what I assume is the terminate connection call, but we cannot know why since > there are no log entries on the Volume service during the minute that the > API is waiting before timing out. > > We would need DEBUG log levels on the Cinder services to tell what's going > on on the terminate connection. Pablo, can we get DEBUG logs from Nova and Cinder, along with an example instance UUID so I can trace this, the example UUID in c#0 isn't present anywhere in the sosreports.
Hello. May I ask you to update this bug and let me know if support could provide something for you? BR, Alex.