Description of problem: When a tenant tries to remove their overcloud stack and one of the compute is down, the volume detachment resource can't be deleted because the RPC request isn't consumed. Nova should realize that the compute is down before sending this RPC request and possibly react differently instead of timing out. Version-Release number of selected component (if applicable): openstack-nova-common-14.1.0-22.el7ost.noarch How reproducible: All the time Steps to Reproduce: 1. Spawn a stack with an instance attached to a volume 2. Shutdown the compute 3. Delete the stack Actual results: Stack fails to delete Expected results: Stack should delete Additional info:
Detaching a volume when the instance's compute host is down is expected to result in an error, so that part is normal. We discussed this on our bug triage call, and our suggestion is to use the local instance delete functionality. This allows you to delete an instance even if the compute host is down. Do not attempt to detach any volumes from it, just delete the instance. We're not sure how this can be achieved via Heat, but that would be our suggestion for a workaround. I'm going to aggressively close this as NOTABUG (without prejudice), but feel free to re-open it, either to pass it to DFG:DF for input around Heat, or if it does turn out that there's something actionable for the Nova team here.
I was asked in an email to clarify what I meant by local delete. It's part of the regular delete workflow. It triggers when the controller can't reach the instance's compute host (or the instance has no compute host because it's shelved offloaded). You can see the code here [1] as part of the _delete() method. In other words, nothing special needs to be done from the user's point of view - just delete the instance as you normally would, and if the compute host is down, Nova will do the right thing. [1] https://opendev.org/openstack/nova/src/tag/newton-eol/nova/compute/api.py#L1808-L1811
Reopening this bug passively because I still have some questions here: Reproduced this on OSP13. So we have two different behaviors here. 1) deleting a stack hosted by a down compute node - When we delete the stack instead of deleting the server, heat deletes the volume attachment first, by sending a DELETE to the nova-api [1] - nova-api posts on the compute's Q and we tell Cinder: Ok it's done, I deleted it, without any validation. - This results in cinder poking the volume attachment api every 2 seconds, marking the volume as "deleting" and at some point, heat times out and it's mad, now we need to do all kind of manipulation to fix this situation. 2) deleting a VM hosted on a down compute node. - When we delete the server instead of deleting just the volume attachment, we work differently. We notice that the nova-compute service is down and we just tell cinder to remove the attachment [2] and run the local delete routine. When detaching a volume, shouldn't nova know that the compute is down and act accordingly instead? Thanks, DVD [1] ~~~ ../nova/nova-api.log:2020-08-17 17:11:32.883 19 DEBUG nova.compute.api [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] [instance: b0025a0e-8840-43af-b17f-b41d30182f4f] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2391 ../nova/nova-api.log:2020-08-17 17:11:32.895 19 DEBUG oslo_concurrency.lockutils [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] Lock "5afc2764-9b62-411a-8278-0e979ae996c5" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273 ../nova/nova-api.log:2020-08-17 17:11:32.895 19 DEBUG oslo_concurrency.lockutils [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] Lock "5afc2764-9b62-411a-8278-0e979ae996c5" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285 ../nova/nova-api.log:2020-08-17 17:11:32.939 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] REQ: curl -g -i -X GET http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09 -H "X-Service-Token: {SHA1}e987b01e170291c3131e099356c190b51da246cb" -H "User-Agent: python-cinderclient" -H "X-OpenStack-Request-ID: req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}68f26937487515f24e6466a578d5ee2f0f1fabc8" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:372 ../nova/nova-api.log:2020-08-17 17:11:33.494 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] RESP: [200] Date: Mon, 17 Aug 2020 17:11:32 GMT Server: Apache x-compute-request-id: req-8141408c-78e9-4a59-9402-11f7113b3f98 OpenStack-API-Version: volume 3.0 Vary: OpenStack-API-Version,Accept-Encoding x-openstack-request-id: req-8141408c-78e9-4a59-9402-11f7113b3f98 Content-Encoding: gzip Content-Length: 618 Content-Type: application/json ../nova/nova-api.log:2020-08-17 17:11:33.494 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] GET call to cinderv3 for http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09 used request id req-8141408c-78e9-4a59-9402-11f7113b3f98 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:722 ../nova/nova-api.log:2020-08-17 17:11:33.507 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] REQ: curl -g -i -X POST http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09/action -H "X-Service-Token: {SHA1}e987b01e170291c3131e099356c190b51da246cb" -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: {SHA1}68f26937487515f24e6466a578d5ee2f0f1fabc8" -H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80" -d '{"os-begin_detaching": null}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:372 ../nova/nova-api.log:2020-08-17 17:11:34.050 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] RESP: [202] Date: Mon, 17 Aug 2020 17:11:33 GMT Server: Apache x-compute-request-id: req-9b8a0910-a7bb-456d-88a9-2d059b235533 OpenStack-API-Version: volume 3.0 Vary: OpenStack-API-Version x-openstack-request-id: req-9b8a0910-a7bb-456d-88a9-2d059b235533 Content-Length: 0 Content-Type: application/json _http_log_response /usr/lib/python2.7/site-packages/keystoneauth1/session.py:419 ../nova/nova-api.log:2020-08-17 17:11:34.050 19 DEBUG cinderclient.v3.client [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] POST call to cinderv3 for http://172.17.0.19:8776/v3/9ebe06a47c334a17b6269f77010b883a/volumes/e03cafa8-16d2-4465-b299-09e4130aac09/action used request id req-9b8a0910-a7bb-456d-88a9-2d059b235533 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:722 ../nova/nova-api.log:2020-08-17 17:11:34.061 19 INFO nova.api.openstack.requestlog [req-dfb0dba7-97be-4d81-b224-5b43c9d9ad80 d209c969f40f4e7a919bd86d165d0b7f 9ebe06a47c334a17b6269f77010b883a - default default] 172.17.0.12 "DELETE /v2.1/servers/b0025a0e-8840-43af-b17f-b41d30182f4f/os-volume_attachments/e03cafa8-16d2-4465-b299-09e4130aac09" status: 202 len: 0 microversion: 2.1 time: 1.437265 ~~~ [2] ~~~ nova-api.log:2020-08-17 16:43:08.543 3227 DEBUG nova.api.openstack.wsgi [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Calling method '<bound method ServersController.delete of <nova.api.openstack.compute.servers.ServersController object at 0x7f55def7a5d0>>' _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:636 nova-api.log:2020-08-17 16:43:08.543 3227 DEBUG nova.compute.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2223 nova-api.log:2020-08-17 16:43:08.602 3227 DEBUG nova.compute.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] Going to try to terminate instance delete /usr/lib/python2.7/site-packages/nova/compute/api.py:2056 nova-api.log:2020-08-17 16:43:08.662 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Attempting to reserve resources for project e2e5b97cb65743d5883ff8640fb3a29f and user 7d6250472d844e2f919a12cc34494e53. Deltas: {'instances': -1, 'ram': -4096, 'cores': -2} reserve /usr/lib/python2.7/site-packages/nova/quota.py:510 nova-api.log:2020-08-17 16:43:08.665 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Quota limits for project e2e5b97cb65743d5883ff8640fb3a29f: {'project_id': u'e2e5b97cb65743d5883ff8640fb3a29f'} reserve /usr/lib/python2.7/site-packages/nova/quota.py:519 nova-api.log:2020-08-17 16:43:08.666 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Getting quotas for project e2e5b97cb65743d5883ff8640fb3a29f. Resources: ['instances', 'ram', 'cores'] _get_quotas /usr/lib/python2.7/site-packages/nova/quota.py:369 nova-api.log:2020-08-17 16:43:08.669 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Quotas for project e2e5b97cb65743d5883ff8640fb3a29f after resource sync: {'instances': 10, 'ram': 51200, 'cores': 20} reserve /usr/lib/python2.7/site-packages/nova/quota.py:525 nova-api.log:2020-08-17 16:43:08.670 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Getting quotas for user 7d6250472d844e2f919a12cc34494e53 and project e2e5b97cb65743d5883ff8640fb3a29f. Resources: ['instances', 'ram', 'cores'] _get_quotas /usr/lib/python2.7/site-packages/nova/quota.py:361 nova-api.log:2020-08-17 16:43:08.676 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Quotas for project e2e5b97cb65743d5883ff8640fb3a29f and user 7d6250472d844e2f919a12cc34494e53 after resource sync: {'instances': 10, 'ram': 51200, 'cores': 20} reserve /usr/lib/python2.7/site-packages/nova/quota.py:533 nova-api.log:2020-08-17 16:43:08.686 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Created reservations ['2e5cbecf-aeba-461e-88a1-638aacc7af0c', '8210e834-56ff-4a47-85cb-4fc810711946', 'b9fb5bbe-d3d2-4990-80c3-4bb6226dfcb8'] reserve /usr/lib/python2.7/site-packages/nova/quota.py:1377 nova-api.log:2020-08-17 16:43:08.690 3227 DEBUG nova.servicegroup.drivers.db [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Seems service nova-compute on host ess10z9-scpu-0.gsslab.rdu2.redhat.com is down. Last heartbeat was 2020-08-17 16:39:30. Elapsed time is 218.690463 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:81 nova-api.log:2020-08-17 16:43:08.690 3227 WARNING nova.compute.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] instance's host ess10z9-scpu-0.gsslab.rdu2.redhat.com is down, deleting from database nova-api.log:2020-08-17 16:43:08.691 3227 DEBUG nova.network.neutronv2.api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] deallocate_for_instance() deallocate_for_instance /usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py:1171 nova-api.log:2020-08-17 16:43:08.868 3227 DEBUG neutronclient.v2_0.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] GET call to neutron for http://172.17.0.11:9696/v2.0/ports.json?device_id=84e86ae4-929c-4d0b-8ec7-54e29d5aa471 used request id req-f34d8463-c5fb-421b-927a-ef3a54b5fe46 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128 nova-api.log:2020-08-17 16:43:08.875 3227 DEBUG neutronclient.v2_0.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] GET call to neutron for http://172.17.0.11:9696/v2.0/extensions.json used request id req-2faaf9c9-4ab6-45d4-b6c1-4ffa7b025e46 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128 nova-api.log:2020-08-17 16:43:09.406 3227 DEBUG neutronclient.v2_0.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] DELETE call to neutron for http://172.17.0.11:9696/v2.0/ports/cd74eec6-f098-4bff-8321-ffdb65603742.json used request id req-6fe54be9-54ef-4e5f-9281-49f0151216f0 _append_request_id /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:128 nova-api.log:2020-08-17 16:43:09.417 3227 DEBUG nova.network.base_api [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] [instance: 84e86ae4-929c-4d0b-8ec7-54e29d5aa471] Updating instance_info_cache with network_info: [] update_instance_cache_with_nw_info /usr/lib/python2.7/site-packages/nova/network/base_api.py:43 nova-api.log:2020-08-17 16:43:09.434 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] REQ: curl -g -i -X POST http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}2a014b83af84c48a76b3c23ce31ee867259f829f" -d '{"os-terminate_connection": {"connector": {"initiator": "iqn.1994-05.com.redhat:a63d1c42ce1e", "ip": "172.17.0.18", "platform": "x86_64", "host": "ess10z9-scpu-0.gsslab.rdu2.redhat.com", "do_local_attach": false, "os_type": "linux2", "multipath": false}}}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:342 nova-api.log:2020-08-17 16:43:11.310 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] RESP: [202] Content-Type: text/html; charset=UTF-8 Content-Length: 0 X-Openstack-Request-Id: req-34e0ca3e-6b88-47eb-9b3f-4552dd17d8f6 Date: Mon, 17 Aug 2020 16:43:11 GMT nova-api.log:2020-08-17 16:43:11.311 3227 DEBUG cinderclient.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] POST call to cinderv2 for http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action used request id req-34e0ca3e-6b88-47eb-9b3f-4552dd17d8f6 _log_request_id /usr/lib/python2.7/site-packages/cinderclient/client.py:95 nova-api.log:2020-08-17 16:43:11.314 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] REQ: curl -g -i -X GET http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1 -H "User-Agent: python-cinderclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}2a014b83af84c48a76b3c23ce31ee867259f829f" _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:342 nova-api.log:2020-08-17 16:43:11.538 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] RESP: [200] X-Compute-Request-Id: req-1b7b070b-1ee2-4a66-93eb-1bf276860d71 Content-Type: application/json Content-Length: 1380 X-Openstack-Request-Id: req-1b7b070b-1ee2-4a66-93eb-1bf276860d71 Date: Mon, 17 Aug 2020 16:43:11 GMT nova-api.log:2020-08-17 16:43:11.538 3227 DEBUG cinderclient.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] GET call to cinderv2 for http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1 used request id req-1b7b070b-1ee2-4a66-93eb-1bf276860d71 _log_request_id /usr/lib/python2.7/site-packages/cinderclient/client.py:95 nova-api.log:2020-08-17 16:43:11.539 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] REQ: curl -g -i -X POST http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}2a014b83af84c48a76b3c23ce31ee867259f829f" -d '{"os-detach": {"attachment_id": null}}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:342 nova-api.log:2020-08-17 16:43:14.036 3227 DEBUG cinderclient.v2.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] RESP: [202] Content-Type: text/html; charset=UTF-8 Content-Length: 0 X-Openstack-Request-Id: req-959f48aa-2781-4d9d-a4aa-174729ca9f86 Date: Mon, 17 Aug 2020 16:43:14 GMT nova-api.log:2020-08-17 16:43:14.037 3227 DEBUG cinderclient.client [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] POST call to cinderv2 for http://172.17.0.11:8776/v2/e2e5b97cb65743d5883ff8640fb3a29f/volumes/3b9d15a0-5b67-4e84-9fa1-e6af7cb79bd1/action used request id req-959f48aa-2781-4d9d-a4aa-174729ca9f86 _log_request_id /usr/lib/python2.7/site-packages/cinderclient/client.py:95 nova-api.log:2020-08-17 16:43:14.166 3227 DEBUG nova.quota [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] Committed reservations [u'2e5cbecf-aeba-461e-88a1-638aacc7af0c', u'8210e834-56ff-4a47-85cb-4fc810711946', u'b9fb5bbe-d3d2-4990-80c3-4bb6226dfcb8'] commit /usr/lib/python2.7/site-packages/nova/quota.py:1403 nova-api.log:2020-08-17 16:43:14.167 3227 INFO nova.osapi_compute.wsgi.server [req-31af21ca-2bb0-47cf-ba00-ff87993b5f2b 7d6250472d844e2f919a12cc34494e53 e2e5b97cb65743d5883ff8640fb3a29f - default default] 172.17.0.22 "DELETE /v2.1/servers/84e86ae4-929c-4d0b-8ec7-54e29d5aa471 HTTP/1.1" status: 204 len: 317 time: 5.6299770 ~~~
(In reply to David Vallee Delisle from comment #3) > Reopening this bug passively because I still have some questions here: > > Reproduced this on OSP13. > > So we have two different behaviors here. > 1) deleting a stack hosted by a down compute node > - When we delete the stack instead of deleting the server, heat deletes the > volume attachment first, by sending a DELETE to the nova-api [1] ^ this confused me as the attachments API is a cinder API but here you're talking about os-volume_attachments within n-api as Heat is attempting to detach the volume instead of deleting the entire instance as covered below. > - nova-api posts on the compute's Q and we tell Cinder: Ok it's done, I > deleted it, without any validation. > - This results in cinder poking the volume attachment api every 2 seconds, > marking the volume as "deleting" and at some point, heat times out and it's > mad, now we need to do all kind of manipulation to fix this situation. So to me this is a bug, we shouldn't be casting to a dead n-cpu and returning 202 from n-api [3] when we timeout. We should either return 500 because n-cpu is down from n-api *or* handling the attachment delete locally in n-api. > 2) deleting a VM hosted on a down compute node. > - When we delete the server instead of deleting just the volume attachment, > we work differently. We notice that the nova-compute service is down and we > just tell cinder to remove the attachment [2] and run the local delete > routine. > > When detaching a volume, shouldn't nova know that the compute is down and > act accordingly instead? Yup agreed, I'll take a look at this shortly. [3] https://github.com/openstack/nova/blob/2cc051137692752983f2577e2d259a02be893f75/nova/api/openstack/compute/volumes.py#L501-L545
My bad, n-api actually returns 500 correctly in this case as shown by the following func test failure: Traceback (most recent call last): File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/api/openstack/wsgi.py", line 671, in wrapped return f(*args, **kwargs) File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/api/openstack/compute/volumes.py", line 536, in delete self.compute_api.detach_volume(context, instance, volume) File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/api.py", line 227, in inner return function(self, context, instance, *args, **kwargs) File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/api.py", line 154, in inner return f(self, context, instance, *args, **kw) File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/api.py", line 4727, in detach_volume self.compute_rpcapi.detach_volume(context, instance=instance, File "/home/lyarwood/redhat/devel/src/openstack/nova/nova/compute/rpcapi.py", line 655, in detach_volume cctxt.cast(ctxt, 'detach_volume', File "/home/lyarwood/redhat/devel/sr c/openstack/nova/.tox/functional/lib/python3.8/site-packages/fixtures/_fixtures/monkeypatch.py", line 89, in avoid_get return captured_method(*args, **kwargs) File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 175, in call self.transport._send(self.target, msg_ctxt, msg, File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send return self._driver.send(target, ctxt, message, File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 222, in send return self._send(target, ctxt, message, wait_for_reply, timeout, File "/home/lyarwood/redhat/devel/src/openstack/nova/.tox/functional/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 213, in _send raise oslo_messaging.MessagingTimeout( oslo_messaging.exceptions.MessagingTimeout: No reply on topic compute 2020-08-17 23:03:45,457 INFO [nova.api.openstack.wsgi] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. <class 'oslo_messaging.exceptions.MessagingTimeout'> 2020-08-17 23:03:45,459 INFO [nova.api.openstack.requestlog] 127.0.0.1 "DELETE /v2.1/servers/e6792086-d564-4f2b-82fa-46578a3b4687/os-volume_attachments/6ca404f3-d844-4169-bb96-bc792f37de98" status: 500 len: 216 microversion: 2.87 time: 60.029035 I'm still on the fence about even attempting to cast down if we know the compute is down tbh, I'll bring this up during triage later in the week.
Apologies I forgot to update the bug after triage last Friday. n-api in c#5 returned a 500 due to the use of the CastAsCallFixture in my functional test so please ignore that. It was agreed on the call that I'd use this bug to track a bugfix into Nova to have n-api return 500 if the n-cpu hosting the instance is down. I've also opened the following RFE to introduce a new /servers/{server_id}/os-volume_attachments/{volume_id}/action API that can force detach volumes while a compute is marked as down: [OSP 17][RFE] Introduce a force detach volume API https://bugzilla.redhat.com/show_bug.cgi?id=1872882 This admin-only API should only be used once the compute host has been fenced and will do a local detach in the API, removing the host volume attachments in Cinder. Marking this bug as triaged, I'll hopefully find time to push something shortly for this.
*** Bug 1873563 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543