Description of problem: I tried to deploy an overcloud using the following hooks: resource_registry: resources: "*NodesPostDeployment": "*_Step1": hooks: [pre-create, pre-update] "*_Step2": hooks: [pre-create, pre-update] "*_Step3": hooks: [pre-create, pre-update] "*_Step4": hooks: [pre-create, pre-update] I then cleared 3 hooks which resulted in a CREATE_FAILED status on the 3rd resource: heat hook-clear --pre-create overcloud ObjectStorageNodesPostDeployment/StorageDeployment_Step1 heat hook-clear --pre-create overcloud ObjectStorageNodesPostDeployment/StorageRingbuilderDeployment_Step2 heat hook-clear --pre-create overcloud ControllerNodesPostDeployment/ControllerDeploymentLoadBalancer_Step1 heat resource-list -n 5 overcloud | grep _Step | StorageDeployment_Step1 | cd31de94-2fac-4b6f-95b0-243fcca27553 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-04-29T06:49:18Z | ObjectStorageNodesPostDeployment | | StorageRingbuilderDeployment_Step2 | f485dccc-8eab-48f2-b74a-923f920799f2 | OS::Heat::StructuredDeployments | CREATE_COMPLETE | 2015-04-29T06:49:18Z | ObjectStorageNodesPostDeployment | | ControllerDeploymentLoadBalancer_Step1 | | OS::Heat::StructuredDeployments | CREATE_FAILED | 2015-04-29T06:50:03Z | ControllerNodesPostDeployment | When running stack-list you can see that the stack is still in CREATE_IN_PROGRESS state: heat stack-list +--------------------------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +--------------------------------------+------------+--------------------+----------------------+ | 1da41b7b-e6b4-42ec-9586-1f1d95dfa3e3 | overcloud | CREATE_IN_PROGRESS | 2015-04-29T06:46:51Z | +--------------------------------------+------------+--------------------+----------------------+ The stack creation should be FAILED if one of the resources in it is in failed state. How reproducible: 100% Steps to reproduce: 1. Create a file called breakpoint.yaml which has the contents from the description above. 2. Edit the deployment script /bin/instack-deploy-overcloud and add "-e breakpoint.yaml" to the heat stack-create command 3. To see that the breakpoint was reached, run the command "heat resource-list overcloud" to find the resource id of the ObjectStorageNodesPostDeployment resource, and then run event-list on it. For example: heat event-list b647a518-ee71-48f4-8f82-9c8a3d8445b2 4. Clear the breakpoints with the same "heat hook-clear" commands as in the description. 5. Make a recursive resource-list to see that there is a resource in FAILED state: heat resource-list -n 5 overcloud | grep _Step 6. See that the stack's state is not FAILED: heat stack-list
Stack creation doesn't fail when you don't use breakpoints, so this is not just a bug of the wrong status being displayed...
When a resource fails, we don't stop processing any other resources that are in progress immediately (we don't start any new ones though), since ideally we'd like them to complete successfully so they don't need to be replaced in a future update. At the moment we wait for up to 4 minutes before stopping anything in progress, so you may well see a different result after 4 minutes. (Ideally we wouldn't wait for hooks, but the code that controls the timeout currently has no way of knowing that a hook is what's being processed.)