Bug 1847958 - [OSP 17] n-api should reject user requests to detach volumes from instances hosted on down computes
Summary: [OSP 17] n-api should reject user requests to detach volumes from instances h...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: beta
: 17.0
Assignee: Lee Yarwood
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
: 1873563 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-17 12:41 UTC by David Vallee Delisle
Modified: 2024-10-01 16:39 UTC (History)
12 users (show)

Fixed In Version: openstack-nova-23.0.3-0.20210908140341.e39bbdc.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:10:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1909120 0 None None None 2020-12-23 14:53:58 UTC
OpenStack gerrit 768351 0 None NEW Add regression test for bug #1909120 2021-01-10 07:32:45 UTC
OpenStack gerrit 768352 0 None NEW api: Reject requests to detach a volume when the compute is down 2021-01-10 07:32:45 UTC
Red Hat Issue Tracker OSP-5984 0 None None None 2021-11-18 14:36:37 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:11:51 UTC

Description David Vallee Delisle 2020-06-17 12:41:37 UTC
Description of problem:
When a tenant tries to remove their overcloud stack and one of the compute is down, the volume detachment resource can't be deleted because the RPC request isn't consumed. 

Nova should realize that the compute is down before sending this RPC request and possibly react differently instead of timing out.

Version-Release number of selected component (if applicable):
openstack-nova-common-14.1.0-22.el7ost.noarch

How reproducible:
All the time

Steps to Reproduce:
1. Spawn a stack with an instance attached to a volume
2. Shutdown the compute
3. Delete the stack

Actual results:
Stack fails to delete

Expected results:
Stack should delete

Additional info:

Comment 1 Artom Lifshitz 2020-06-19 15:09:01 UTC
Detaching a volume when the instance's compute host is down is expected to result in an error, so that part is normal.

We discussed this on our bug triage call, and our suggestion is to use the local instance delete functionality. This allows you to delete an instance even if the compute host is down. Do not attempt to detach any volumes from it, just delete the instance. We're not sure how this can be achieved via Heat, but that would be our suggestion for a workaround. I'm going to aggressively close this as NOTABUG (without prejudice), but feel free to re-open it, either to pass it to DFG:DF for input around Heat, or if it does turn out that there's something actionable for the Nova team here.

Comment 2 Artom Lifshitz 2020-07-06 14:37:38 UTC
I was asked in an email to clarify what I meant by local delete. It's part of the regular delete workflow. It triggers when the controller can't reach the instance's compute host (or the instance has no compute host because it's shelved offloaded). You can see the code here [1] as part of the _delete() method. In other words, nothing special needs to be done from the user's point of view - just delete the instance as you normally would, and if the compute host is down, Nova will do the right thing.

[1] https://opendev.org/openstack/nova/src/tag/newton-eol/nova/compute/api.py#L1808-L1811

Comment 3 David Vallee Delisle 2020-08-17 17:37:10 UTC
Reopening this bug passively because I still have some questions here:

Reproduced this on OSP13.

So we have two different behaviors here.
1) deleting a stack hosted by a down compute node
- When we delete the stack instead of deleting the server, heat deletes the volume attachment first, by sending a DELETE to the nova-api [1]
- nova-api posts on the compute's Q and we tell Cinder: Ok it's done, I deleted it, without any validation.
- This results in cinder poking the volume attachment api every 2 seconds, marking the volume as "deleting" and at some point, heat times out and it's mad, now we need to do all kind of manipulation to fix this situation.

2) deleting a VM hosted on a down compute node.
- When we delete the server instead of deleting just the volume attachment, we work differently. We notice that the nova-compute service is down and we just tell cinder to remove the attachment [2] and run the local delete routine.

When detaching a volume, shouldn't nova know that the compute is down and act accordingly instead?

Thanks,

DVD

[1]
~~~
../nova/nova-api.log:2020-08-17 17:11:32.883 19 DEBUG nova.compute.api [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] [instance: b0025a0e-8840-43af-b17f-b41d30182f4f] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2391
../nova/nova-api.log:2020-08-17 17:11:32.895 19 DEBUG oslo_concurrency.lockutils [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] Lock "5afc2764-9b62-411a-8278-0e979ae996c5" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
../nova/nova-api.log:2020-08-17 17:11:32.895 19 DEBUG oslo_concurrency.lockutils [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] Lock "5afc2764-9b62-411a-8278-0e979ae996c5" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
../nova/nova-api.log:2020-08-17 17:11:32.939 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] REQ: curl -g -i -X GET http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09 -H "X-Service-Token: {SHA1}e987b01e170291c3131e099356c190b51da246cb" -H "User-Agent: python-cinderclient" -H "X-OpenStack-Request-ID: req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}68f26937487515f24e6466a578d5ee2f0f1fabc8" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:372
../nova/nova-api.log:2020-08-17 17:11:33.494 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] RESP: [200] Date: Mon, 17 Aug 2020 17:11:32 GMT Server: Apache x-compute-request-id: req-8141408c-78e9-4a59-9402-11f7113b3f98 OpenStack-API-Version: volume 3.0 Vary: OpenStack-API-Version,Accept-Encoding x-openstack-request-id: req-8141408c-78e9-4a59-9402-11f7113b3f98 Content-Encoding: gzip Content-Length: 618 Content-Type: application/json
../nova/nova-api.log:2020-08-17 17:11:33.494 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] GET call to cinderv3 for http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09 used request id req-8141408c-78e9-4a59-9402-11f7113b3f98 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:722
../nova/nova-api.log:2020-08-17 17:11:33.507 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] REQ: curl -g -i -X POST http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09/action -H "X-Service-Token: {SHA1}e987b01e170291c3131e099356c190b51da246cb" -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: {SHA1}68f26937487515f24e6466a578d5ee2f0f1fabc8" -H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80" -d '{"os-begin_detaching": null}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:372
../nova/nova-api.log:2020-08-17 17:11:34.050 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] RESP: [202] Date: Mon, 17 Aug 2020 17:11:33 GMT Server: Apache x-compute-request-id: req-9b8a0910-a7bb-456d-88a9-2d059b235533 OpenStack-API-Version: volume 3.0 Vary: OpenStack-API-Version x-openstack-request-id: req-9b8a0910-a7bb-456d-88a9-2d059b235533 Content-Length: 0 Content-Type: application/json _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:419
../nova/nova-api.log:2020-08-17 17:11:34.050 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] POST call to cinderv3 for http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09/action used request id req-9b8a0910-a7bb-456d-88a9-2d059b235533 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:722
../nova/nova-api.log:2020-08-17 17:11:34.061 19 INFO nova.api.openstack.requestlog [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] 172.17.0.12 "DELETE /v2.1/servers/b0025a0e-8840-43af-b17f-b41d30182f4f/os-volume_attachments/e03cafa8-16d2-4465-b299-09e4130aac09" status: 202 len: 0 microversion: 2.1 time: 1.437265
~~~

[2]
~~~
nova-api.log:2020-08-17 16:43:08.543 3227 DEBUG nova.api.openstack.wsgi [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Calling method '<bound method ServersController.delete of <nova.api.openstack.compute.servers.ServersController object at 0x7f55def7a5d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:636
nova-api.log:2020-08-17 16:43:08.543 3227 DEBUG nova.compute.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2223
nova-api.log:2020-08-17 16:43:08.602 3227 DEBUG nova.compute.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] Going to try to terminate instance delete /usr/lib/python2.7/site-packages/nova/compute/api.py:2056
nova-api.log:2020-08-17 16:43:08.662 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Attempting to reserve resources for project e2e5b97cb65743d5883ff8640fb3a29f and user 7d6250472d844e2f919a12cc34494e53. Deltas: {'instances': -1, 'ram': -4096, 'cores': -2} reserve /usr/lib/python2.7/site-packages/nova/quota.py:510
nova-api.log:2020-08-17 16:43:08.665 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Quota limits for project e2e5b97cb65743d5883ff8640fb3a29f: {'project_id': u'e2e5b97cb65743d5883ff8640fb3a29f'} reserve /usr/lib/python2.7/site-packages/nova/quota.py:519
nova-api.log:2020-08-17 16:43:08.666 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Getting quotas for project e2e5b97cb65743d5883ff8640fb3a29f. Resources: ['instances', 'ram', 'cores'] _get_quotas /usr/lib/python2.7/site-packages/nova/quota.py:369
nova-api.log:2020-08-17 16:43:08.669 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Quotas for project e2e5b97cb65743d5883ff8640fb3a29f after resource sync: {'instances': 10, 'ram': 51200, 'cores': 20} reserve /usr/lib/python2.7/site-packages/nova/quota.py:525
nova-api.log:2020-08-17 16:43:08.670 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Getting quotas for user 7d6250472d844e2f919a12cc34494e53 and project e2e5b97cb65743d5883ff8640fb3a29f. Resources: ['instances', 'ram', 'cores'] _get_quotas /usr/lib/python2.7/site-packages/nova/quota.py:361
nova-api.log:2020-08-17 16:43:08.676 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Quotas for project e2e5b97cb65743d5883ff8640fb3a29f and user 7d6250472d844e2f919a12cc34494e53 after resource sync: {'instances': 10, 'ram': 51200, 'cores': 20} reserve /usr/lib/python2.7/site-packages/nova/quota.py:533
nova-api.log:2020-08-17 16:43:08.686 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Created reservations ['2e5cbecf-aeba-461e-88a1-638aacc7af0c', '8210e834-56ff-4a47-85cb-4fc810711946', 'b9fb5bbe-d3d2-4990-80c3-4bb6226dfcb8'] reserve /usr/lib/python2.7/site-packages/nova/quota.py:1377
nova-api.log:2020-08-17 16:43:08.690 3227 DEBUG nova.servicegroup.drivers.db [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Seems service nova-compute on host ess10z9-scpu-0.gsslab.rdu2.redhat.com is down. Last heartbeat was 2020-08-17 16:39:30. Elapsed time is 218.690463 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:81
nova-api.log:2020-08-17 16:43:08.690 3227 WARNING nova.compute.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] instance's host ess10z9-scpu-0.gsslab.rdu2.redhat.com is down, deleting from database
nova-api.log:2020-08-17 16:43:08.691 3227 DEBUG nova.network.neutronv2.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] deallocate_for_instance() deallocate_for_instance /usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py:1171
nova-api.log:2020-08-17 16:43:08.868 3227 DEBUG neutronclient.v2_0.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] GET call to neutron for http://172.17.0.11:9696/v2.0/ports.json?device_id=84e86ae4-929c-4d0b-8ec7-54e29d5aa471 used request id req-f34d8463-c5fb-421b-927a-ef3a54b5fe46 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128
nova-api.log:2020-08-17 16:43:08.875 3227 DEBUG neutronclient.v2_0.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] GET call to neutron for http://172.17.0.11:9696/v2.0/extensions.json used request id req-2faaf9c9-4ab6-45d4-b6c1-4ffa7b025e46 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128
nova-api.log:2020-08-17 16:43:09.406 3227 DEBUG neutronclient.v2_0.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] DELETE call to neutron for http://172.17.0.11:9696/v2.0/ports/cd74eec6-f098-4bff-8321-ffdb65603742.json used request id req-6fe54be9-54ef-4e5f-9281-49f0151216f0 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128
nova-api.log:2020-08-17 16:43:09.417 3227 DEBUG nova.network.base_api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] Updating instance_info_cache with network_info: [] update_instance_cache_with_nw_info /usr/lib/python2.7/site-packages/nova/network/base_api.py:43
nova-api.log:2020-08-17 16:43:09.434 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] REQ: curl -g -i -X POST http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}2a014b83af84c48a76b3c23ce31ee867259f829f" -d '{"os-terminate_connection": {"connector": {"initiator": "iqn.1994-05.com.redhat:a63d1c42ce1e", "ip": "172.17.0.18", "platform": "x86_64", "host": "ess10z9-scpu-0.gsslab.rdu2.redhat.com", "do_local_attach": false, "os_type": "linux2", "multipath": false}}}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:342
nova-api.log:2020-08-17 16:43:11.310 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] RESP: [202] Content-Type: text/html; charset=UTF-8 Content-Length: 0 X-Openstack-Request-Id: req-34e0ca3e-6b88-47eb-9b3f-4552dd17d8f6 Date: Mon, 17 Aug 2020 16:43:11 GMT
nova-api.log:2020-08-17 16:43:11.311 3227 DEBUG cinderclient.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] POST call to cinderv2 for http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action used request id req-34e0ca3e-6b88-47eb-9b3f-4552dd17d8f6 _log_request_id /usr/lib/python2.7/site-packages/cinderclient/client.py:95
nova-api.log:2020-08-17 16:43:11.314 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] REQ: curl -g -i -X GET http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1 -H "User-Agent: python-cinderclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}2a014b83af84c48a76b3c23ce31ee867259f829f" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:342
nova-api.log:2020-08-17 16:43:11.538 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] RESP: [200] X-Compute-Request-Id: req-1b7b070b-1ee2-4a66-93eb-1bf276860d71 Content-Type: application/json Content-Length: 1380 X-Openstack-Request-Id: req-1b7b070b-1ee2-4a66-93eb-1bf276860d71 Date: Mon, 17 Aug 2020 16:43:11 GMT
nova-api.log:2020-08-17 16:43:11.538 3227 DEBUG cinderclient.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] GET call to cinderv2 for http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1 used request id req-1b7b070b-1ee2-4a66-93eb-1bf276860d71 _log_request_id /usr/lib/python2.7/site-packages/cinderclient/client.py:95
nova-api.log:2020-08-17 16:43:11.539 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] REQ: curl -g -i -X POST http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}2a014b83af84c48a76b3c23ce31ee867259f829f" -d '{"os-detach": {"attachment_id": null}}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:342
nova-api.log:2020-08-17 16:43:14.036 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] RESP: [202] Content-Type: text/html; charset=UTF-8 Content-Length: 0 X-Openstack-Request-Id: req-959f48aa-2781-4d9d-a4aa-174729ca9f86 Date: Mon, 17 Aug 2020 16:43:14 GMT
nova-api.log:2020-08-17 16:43:14.037 3227 DEBUG cinderclient.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] POST call to cinderv2 for http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action used request id req-959f48aa-2781-4d9d-a4aa-174729ca9f86 _log_request_id /usr/lib/python2.7/site-packages/cinderclient/client.py:95
nova-api.log:2020-08-17 16:43:14.166 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Committed reservations [u'2e5cbecf-aeba-461e-88a1-638aacc7af0c', u'8210e834-56ff-4a47-85cb-4fc810711946', u'b9fb5bbe-d3d2-4990-80c3-4bb6226dfcb8'] commit /usr/lib/python2.7/site-packages/nova/quota.py:1403
nova-api.log:2020-08-17 16:43:14.167 3227 INFO nova.osapi_compute.wsgi.server [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] 172.17.0.22 "DELETE /v2.1/servers/84e86ae4-929c-4d0b-8ec7-54e29d5aa471 HTTP/1.1" status: 204 len: 317 time: 5.6299770
~~~

Comment 4 Lee Yarwood 2020-08-17 21:23:14 UTC
(In reply to David Vallee Delisle from comment #3)
> Reopening this bug passively because I still have some questions here:
> 
> Reproduced this on OSP13.
> 
> So we have two different behaviors here.
> 1) deleting a stack hosted by a down compute node
> - When we delete the stack instead of deleting the server, heat deletes the
> volume attachment first, by sending a DELETE to the nova-api [1]

^ this confused me as the attachments API is a cinder API but here you're talking about os-volume_attachments within n-api as Heat is attempting to detach the volume instead of deleting the entire instance as covered below.

> - nova-api posts on the compute's Q and we tell Cinder: Ok it's done, I
> deleted it, without any validation.
> - This results in cinder poking the volume attachment api every 2 seconds,
> marking the volume as "deleting" and at some point, heat times out and it's
> mad, now we need to do all kind of manipulation to fix this situation.

So to me this is a bug, we shouldn't be casting to a dead n-cpu and returning 202 from n-api [3] when we timeout.

We should either return 500 because n-cpu is down from n-api *or* handling the attachment delete locally in n-api.

> 2) deleting a VM hosted on a down compute node.
> - When we delete the server instead of deleting just the volume attachment,
> we work differently. We notice that the nova-compute service is down and we
> just tell cinder to remove the attachment [2] and run the local delete
> routine.
> 
> When detaching a volume, shouldn't nova know that the compute is down and
> act accordingly instead?

Yup agreed, I'll take a look at this shortly.

[3] https://github.com/openstack/nova/blob/2cc051137692752983f2577e2d259a02be893f75/nova/api/openstack/compute/volumes.py#L501-L545

Comment 5 Lee Yarwood 2020-08-17 22:07:54 UTC
My bad, n-api actually returns 500 correctly in this case as shown by the following func test failure:

Traceback (most recent call last):
  File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/api/openstack/wsgi.py", line 671, in wrapped
    return f(*args, **kwargs)
  File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/api/openstack/compute/volumes.py", line 536, in delete
    self.compute_api.detach_volume(context, instance, volume)
  File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/api.py", line 227, in inner
    return function(self, context, instance, *args, **kwargs)
  File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/api.py", line 154, in inner
    return f(self, context, instance, *args, **kw)
  File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/api.py", line 4727, in detach_volume
    self.compute_rpcapi.detach_volume(context, instance=instance,
  File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/rpcapi.py", line 655, in detach_volume
    cctxt.cast(ctxt, 'detach_volume',
  File "/home/lyarwood/redhat/devel/sr
    c/openstack/nova/.tox/functional/lib/python3.8/site-packages/fixtures/_fixtures/monkeypatch.py", line 89, in avoid_get
    return captured_method(*args, **kwargs)
  File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 175, in call
    self.transport._send(self.target, msg_ctxt, msg,
  File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send
    return self._driver.send(target, ctxt, message,
  File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 222, in send
    return self._send(target, ctxt, message, wait_for_reply, timeout,
  File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 213, in _send
    raise oslo_messaging.MessagingTimeout(
oslo_messaging.exceptions.MessagingTimeout: No reply on topic compute
2020-08-17 23:03:45,457 INFO [nova.api.openstack.wsgi] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_messaging.exceptions.MessagingTimeout'>
2020-08-17 23:03:45,459 INFO [nova.api.openstack.requestlog] 127.0.0.1 "DELETE /v2.1/servers/e6792086-d564-4f2b-82fa-46578a3b4687/os-volume_attachments/6ca404f3-d844-4169-bb96-bc792f37de98" status: 500 len: 216 microversion: 2.87 time: 60.029035

I'm still on the fence about even attempting to cast down if we know the compute is down tbh, I'll bring this up during triage later in the week.

Comment 6 Lee Yarwood 2020-08-26 19:57:15 UTC
Apologies I forgot to update the bug after triage last Friday.

n-api in c#5 returned a 500 due to the use of the CastAsCallFixture in my functional test so please ignore that.

It was agreed on the call that I'd use this bug to track a bugfix into Nova to have n-api return 500 if the n-cpu hosting the instance is down.

I've also opened the following RFE to introduce a new /servers/{server_id}/os-volume_attachments/{volume_id}/action API that can force detach volumes while a compute is marked as down:

[OSP 17][RFE] Introduce a force detach volume API
https://bugzilla.redhat.com/show_bug.cgi?id=1872882

This admin-only API should only be used once the compute host has been fenced and will do a local detach in the API, removing the host volume attachments in Cinder.

Marking this bug as triaged, I'll hopefully find time to push something shortly for this.

Comment 8 Lee Yarwood 2020-09-11 13:07:02 UTC
*** Bug 1873563 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2022-09-21 12:10:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.