Description of problem: When we create a stack too large for the number of resources available, cinder leaves the volume in attaching ~~~ 2018-03-29 18:16:57Z [dvd-iperf.resourcegroup]: CREATE_FAILED BadRequest: resources.resourcegroup.resources[0].resources.instance_1: Invalid volume: Invalid input received: Invalid volume: Volume status must be available to reserve, but the status is attaching. (HTTP 400) (Request-ID: req-bf16393c-d4cc-4cb1-bedc-cc8 2018-03-29 18:16:57Z [dvd-iperf]: CREATE_FAILED Resource CREATE failed: BadRequest: resources.resourcegroup.resources[0].resources.instance_1: Invalid volume: Invalid input received: Invalid volume: Volume status must be available to reserve, but the status is attaching. (HTTP 400) ~~~ I'm not sure if it's a heat problem or a cinder problem. Version-Release number of selected component (if applicable): openstack-cinder-11.0.1-2 How reproducible: All the time Steps to Reproduce: 1. stack create a stack too big for the available resource 2. wait Actual results: 2018-03-29 18:16:57Z [dvd-iperf.resourcegroup]: CREATE_FAILED BadRequest: resources.resourcegroup.resources[0].resources.instance_1: Invalid volume: Invalid input received: Invalid volume: Volume status must be available to reserve, but the status is attaching. (HTTP 400) (Request-ID: req-bf16393c-d4cc-4cb1-bedc-cc8 2018-03-29 18:16:57Z [dvd-iperf]: CREATE_FAILED Resource CREATE failed: BadRequest: resources.resourcegroup.resources[0].resources.instance_1: Invalid volume: Invalid input received: Invalid volume: Volume status must be available to reserve, but the status is attaching. (HTTP 400) Expected results: Stack creation should fail with not enough resource available and the volumes should either be deleted or be available. Additional info: ~~~ 2018-03-28 18:04:46.978 47 INFO heat.engine.resource [req-dec914ce-7f07-4384-ace9-b61dc3efc25f - admin - default default] CREATE: Server "instance_1" [45347f28-571a-4375-bacc-277b140caedb] Stack "iperf-resourceg roup-wup7ms3m5snx-0-aiao3t3ahq4j" [cce02bae-a737-4a31-b96d-f761f1e4a095] 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource Traceback (most recent call last): 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 831, in _action_recorder 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource yield 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 939, in _do_action 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 351, in wrapper 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource step = next(subtask) 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 890, in action_handler_task 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource done = check(handler_data) 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server.py", line 869, in check_create_complete 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource check = self.client_plugin()._check_active(server_id) 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 238, in _check_active 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500" 2018-03-28 18:04:46.978 47 ERROR heat.engine.resource 2018-03-28 18:04:53.175 47 INFO heat.engine.scheduler [req-dec914ce-7f07-4384-ace9-b61dc3efc25f - admin - default default] Task pause timed out 2018-03-28 18:04:59.890 47 INFO heat.engine.resource [req-dec914ce-7f07-4384-ace9-b61dc3efc25f - admin - default default] CREATE: Server "instance_1" [45347f28-571a-4375-bacc-277b140caedb] Stack "iperf-resourcegroup-wup7ms3m5snx-0-aiao3t3ahq4j" [cce02bae-a737-4a31-b96d-f761f1e4a095] 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource Traceback (most recent call last): 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 831, in _action_recorder 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource yield 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 939, in _do_action 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 334, in wrapper 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource step = next(subtask) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 884, in action_handler_task 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource handler_data = handler(*args) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server.py", line 859, in handle_create 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource admin_pass=admin_pass) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1403, in create 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource **boot_kwargs) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 802, in _boot 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource return_raw=return_raw, **kwargs) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 361, in _create 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource resp, body = self.api.client.post(url, body=body) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 294, in post 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource return self.request(url, 'POST', **kwargs) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 83, in request 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource raise exceptions.from_response(resp, body, url, method) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource BadRequest: Invalid volume: Invalid input received: Invalid volume: Volume status must be available to reserve, but the status is attaching. (HTTP 400) (Request-ID: req-a0c71ed3-8f06-4d2e-bac0-da94797fd664) (HTTP 400) (Request-ID: req-3681d1e3-4c90-4ed8-9434-9dbc3650a04c) 2018-03-28 18:04:59.890 47 ERROR heat.engine.resource ~~~
The exception doesn't seem to indicate a Cinder issue, as it looks like someone is trying to reserve a volume for attaching twice, and not failing gracefully when Cinder complains that the volume is already reserved on the second call. Cinder is mostly just following orders, so the RCA analysis should focus on the Heat template first, then on Nova, and as the last possible culprit look into Cinder.
When Heat sees a Nova server go into an ERROR state while trying to create it (in this case because of "Message: No valid host was found. There are not enough hosts available., Code: 500"), it deletes the server and tries again. Obviously it's failing the second time because the server is gone but Cinder still thinks it's attaching. Arguably this is a bug in Cinder/Nova - the server shouldn't disappear until everything it was doing has been cleaned up. Heat could do better by returning the first failure - which it already does if things always fail by going into an ERROR state, but not when a non-retryable error (BadRequest in this case) occurs later.
Reassigning to Nova, since according to Zane's investigation Nova is proceeding without error on instance deletion when the volume is not left in the right state.
It's pretty easily reproducible, let me know if you want to see this in a live session.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2332