Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1189502 - [RFE][sahara]: Clean up clusters that are in non-final state for a long time
[RFE][sahara]: Clean up clusters that are in non-final state for a long time
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-sahara (Show other bugs)
unspecified
Unspecified Unspecified
high Severity high
: beta
: 8.0 (Liberty)
Assigned To: Elise Gafford
Luigi Toscano
https://blueprints.launchpad.net/saha...
upstream_milestone_kilo-3 upstream_de...
: FutureFeature
Depends On: 1233159
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-05 08:47 EST by RHOS Integration
Modified: 2016-04-07 17:00 EDT (History)
8 users (show)

See Also:
Fixed In Version: openstack-sahara-3.0.0-3.el7ost
Doc Type: Enhancement
Doc Text:
With this update, configuration settings now exist to set timeouts, after which clusters which have failed to reach the 'Active' state will be automatically deleted.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-07 17:00:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:0603 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 Enhancement Advisory 2016-04-07 20:53:53 EDT

  None (edit)
Description RHOS Integration 2015-02-05 08:47:15 EST
Cloned from launchpad blueprint https://blueprints.launchpad.net/sahara/+spec/periodic-cleanup.

Description:

For now it is possible that sahara cluster becomes stuck because of different reasons (e.g. if sahara service was restarted during provisioning or neutron failed to assign floating IP). This could lead to clusters holding resources for a long time. This could happen in different tenants and it is hard to check such conditions manually.

Proposed solution: delete sahara cluster in non-final state if it wasn't updated for a long time.

Specification URL (additional information):

http://specs.openstack.org/openstack/sahara-specs/specs/kilo/periodic-cleanup.html
Comment 5 Elise Gafford 2015-06-26 14:41:39 EDT
Hi Keith,

The feature "[RFE][sahara]: Clean up clusters that are in non-final state for a long time" [1] (documented upstream at [2]), is bugged. Luigi found that there is an error on security context creation [3][4]: at present, the job attempts to clean up all old clusters, but requires a delegated trust id in order to do so. Such trust ids are only created for transient clusters. 

In order to properly fix this issue, I have signed on to implement a new feature (creating delegated trusts on long-running clusters, to be deleted after provisioning is complete [5]). I will complete this task; it's relatively simple technically. However, there is a question of timeline.

The original RFE was, effectively, never completed successfully upstream. At this point, in order to finish it for RHOS 7, we'll be implementing a new, security-related feature, potentially without significant time for upstream review. As the RFE feature is a convenience feature, it seems to me that this is an unnecessary risk, and that the prudent course would be to let the security change go through its full upstream process before backporting to stable/kilo and to RHOS 7 (likely in the first point release.)

However, if we think the feature is important enough for RHOS 7 that it warrants rapid implementation and backport, I'm happy to mark this as a blocker and do that; just looking for your opinion given the new data.

Thanks,
Ethan

[1]: RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1189502
[2]: Spec: http://specs.openstack.org/openstack/sahara-specs/specs/kilo/periodic-cleanup.html
[3]: Bug on RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1233159
[4]: Upstream bug: https://bugs.launchpad.net/sahara/+bug/1468722
[5]: Feature required to finish RFE: https://blueprints.launchpad.net/sahara/+spec/cluster-creation-with-trust
Comment 7 Elise Gafford 2015-09-09 12:38:33 EDT
Agreed not to backport this feature to RHOS 7, as it is only repairable via addition of a fairly major security feature. Pushing to RHOS 8.
Comment 11 Luigi Toscano 2016-03-04 12:29:54 EST
If cleanup_time_for_incomplete_clusters is set to 1 (== 1 hour), and cluster provisioning is forcibly interrupted (by restarting the -engine daemon when the cluster is in the initialization phase), the periodic cluster cleanup process is triggered after cleanup_time_for_incomplete_clusters time and the cluster in non-final state ("Spawning", "Waiting" or "Preparing") is removed.

Verified on:
openstack-sahara-api-3.0.1-1.el7ost.noarch
openstack-sahara-common-3.0.1-1.el7ost.noarch
openstack-sahara-engine-3.0.1-1.el7ost.noarch
Comment 13 errata-xmlrpc 2016-04-07 17:00:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html

Note You need to log in before you can comment on or make changes to this bug.