Bug 1042199 - [RFE][heat]: Troubleshooting: pause on error
Summary: [RFE][heat]: Troubleshooting: pause on error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ga
: 6.0 (Juno)
Assignee: RHOS Maint
QA Contact:
URL: https://blueprints.launchpad.net/heat...
Whiteboard: upstream_milestone_juno-rc1 upstream_...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-12 21:21 UTC by RHOS Integration
Modified: 2016-04-27 01:35 UTC (History)
4 users (show)

Fixed In Version: openstack-heat-2014.2.1-5.el7ost
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-09 20:04:55 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description RHOS Integration 2013-12-12 21:21:04 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/heat/+spec/troubleshooting-low-level-control.

Description:

Problem:
When a Heat template is deployed and an error occurs, the VM's are rolled back and deleted.  Server logs can help to determine the problem, but often we need to log into the VM's being deployed to debug the scripts and environment.  This blueprint proposes pausing the template deployment at the point of error so that the user can inspect the partial stack for problem determination.
 
Proposed support:
Command line + API:

   - stack-create: add a debugging option
     - validation of template: pause and point to exact failure in template (better message, suggest solution), and continue stack create once correction is made in template
     - error during deployment:  Heat engine pauses deployment, leaving all resources/components as is.  Failed template is shown as in PAUSED_ERROR state

   - stack-show:  to inspect template, show info on the current state of each resources and components
     - successfully deployed
     - error with message
     - not yet deployed

   - logs collection as they are available

   - stack-continue:  new option to continue deployment 

 
Related blueprint:  
Use stack-update to attempt recovery of failed create or update
https://blueprints.launchpad.net/heat/+spec/retry-failed-update
 
Concern:
How to handle concurrency when pausing deployment:  
   - do nothing:  sufficient in many cases
   - serialize deployment during debugging: deterministic behavior but not guaranteed to reproduce error 
   - trace/replay deployment

Specification URL (additional information):

None

Comment 2 Scott Lewis 2015-02-09 20:04:55 UTC
This bug has been closed as a part of the RHEL-OSP 6 general availability release. For details, see https://rhn.redhat.com/errata/rhel7-rhos-6-errata.html


Note You need to log in before you can comment on or make changes to this bug.