Bug 1459984

Summary: request backport of "Only recreate CHECK FAILED resources in ResourceGroup"
Product: Red Hat OpenStack Reporter: Zane Bitter <zbitter>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: augol, dcadzow, lmiccini, mburns, mlopes, pablo.iranzo, rhel-osp-director-maint, rrasouli, sbaker, shardy, srevivo, zbitter
Target Milestone: z1Keywords: FeatureBackport, Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-heat-8.0.0-9.el7ost Doc Type: Bug Fix
Doc Text:
Cause: If a resource in a nested stack was marked CHECK_FAILED during a "stack check" operation, causing the parent resource to also enter the CHECK_FAILED state, the parent resource would be replaced by any subsequent stack update. Consequence: If a single resource in a nested stack failed the "stack check", then the entire nested stack would be replaced on any subsequent stack update, instead of just the failed resource. Fix: The parent resource is *not* replaced if it is in the CHECK_FAILED state and the underlying nested stack is also in the CHECK_FAILED state. Result: Users can still replace the entire nested stack by using the "mark unhealthy" command to put the parent resource in the CHECK_FAILED state, but if it goes into a CHECK_FAILED state due to a failed "stack check" operation then only the failed nested resource (and not the parent resource and its entire nested stack) is replaced (on a subsequent stack update).
Story Points: ---
Clone Of: 1459854 Environment:
Last Closed: 2017-07-19 17:04:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1459854    
Bug Blocks:    

Description Zane Bitter 2017-06-08 18:17:55 UTC
+++ This bug was initially created as a clone of Bug #1459854 +++

Description of problem:

removing one server and updating the stack causes re-creation of all servers.

This has been fixed upstream:

https://review.openstack.org/#/c/445662/9

and backported to newton:

https://review.openstack.org/#/c/452947/


Version-Release number of selected component (if applicable):

openstack-heat-api-7.0.2-1.el7ost.noarch 
openstack-heat-engine-7.0.2-1.el7ost.noarch 
openstack-heat-common-7.0.2-1.el7ost.noarch

How reproducible:

always

Steps to Reproduce:

1. create a stack

$ nova list
host-0.example.com  192.168.0.5
host-1.example.com  192.168.0.6

hosts are 

  - in a OS::Heat::ResourceGroup 
  - in a ServerGroup with an anti-affinity policy


2. Delete one node
$ nova delete host-0.example.com
$ nova list
host-1.example.com

Stack check correctly detects the missing node
$ heat action-check mystack  

3. Run stack-update to recover the missing node
$ heat stack-update -x mystack


Actual results:

Two more nodes are created, one with a conflicting name (the previously existing one).

$ nova list
host-1.example.com 192.168.0.6
host-0.example.com 192.168.0.11
host-1.example.com 192.168.0.12


logs:
2017-06-08 07:59:10Z [shift.openshift_infra_nodes.1]: CHECK_COMPLETE  Stack CHECK completed successfully. 'CHECK' not fully supported (see resources)  # SoftwareConfig & co
2017-06-08 07:59:10Z [shift.openshift_infra_nodes.1]: CHECK_COMPLETE  state changed
2017-06-08 07:59:10Z [shift.openshift_infra_nodes.0]: CHECK_FAILED  ['NotFound: resources[0].resources.root_volume: Volume 96187160-3c71-47de-98c9-2e3693e90003 could not be found. (HTTP 404) (Request-ID: req-696f3b71-1a4e-4984-a7cb-9d29b31157ad)']. 'CHECK' not fully supported (see resources)
2017-06-08 07:59:10Z [shift.openshift_infra_nodes]: CHECK_FAILED  Resource CHECK failed: ["['NotFound: resources[0].resources.root_volume: Volume 96187160-3c71-47de-98c9-2e3693e90003 could not be found. (HTTP 404) (Request-ID: req-696f3b71-1a4e-4984-a7cb-9d29b31157ad)']. 'CHECK' not fully supported (see resources)"]. 'C
2017-06-08 07:59:11Z [shift.openshift_infra_nodes]: CHECK_FAILED  ["['NotFound: resources.openshift_infra_nodes.resources[0].resources.root_volume: Volume 96187160-3c71-47de-98c9-2e3693e90003 could not be found. (HTTP 404) (Request-ID: req-696f3b71-1a4e-4984-a7cb-9d29b31157ad)']. 'CHECK' not fully supported (see resourc
2017-06-08 07:59:19Z [shift]: CHECK_FAILED  Resource CHECK failed: ['["[\'NotFound: resources.openshift_infra_nodes.resources[0].resources.root_volume: Volume 96187160-3c71-47de-98c9-2e3693e90003 could not be found. (HTTP 404) (Request-ID: req-696f3b71-1a4e-4984-a7cb-9d29b31157ad)\']. \'CHECK\' not

Expected results:

The stack to be recovered to the original state (only the missing server to be recreated).

Comment 3 Ronnie Rasouli 2017-07-18 14:16:43 UTC
Verified:

used this template

heat_template_version: 2013-05-23

description: Hot Template to deploy 2 servers

resources:
  my_indexed_group:
    type: OS::Heat::ResourceGroup
    properties:
      count: 2
      resource_def:
        type: OS::Nova::Server
        properties:
          # create a unique name for each server
          #  using its index in the group
          name: my_server_%index%
          image: heat_cirros_image
          flavor: m1.small
          networks:
            - network: heat-net

openstack stack create -t myserver mystack

deleted by nova 1 server

# openstack stack check mystack

The stack list show 1 check failure 

then rerun the stack
# heat stack-update -x mystack

2 servers has been created

Comment 5 errata-xmlrpc 2017-07-19 17:04:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1779