1189502 – [RFE][sahara]: Clean up clusters that are in non-final state for a long time

Bug 1189502 - [RFE][sahara]: Clean up clusters that are in non-final state for a long time

Summary: [RFE][sahara]: Clean up clusters that are in non-final state for a long time

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-sahara
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	8.0 (Liberty)
Assignee:	Elise Gafford
QA Contact:	Luigi Toscano
Docs Contact:
URL:	https://blueprints.launchpad.net/saha...
Whiteboard:	upstream_milestone_kilo-3 upstream_de...
Depends On:	1233159
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-05 13:47 UTC by RHOS Integration
Modified:	2016-04-07 21:00 UTC (History)
CC List:	8 users (show)
Fixed In Version:	openstack-sahara-3.0.0-3.el7ost
Doc Type:	Enhancement
Doc Text:	With this update, configuration settings now exist to set timeouts, after which clusters which have failed to reach the 'Active' state will be automatically deleted.
Clone Of:
Environment:
Last Closed:	2016-04-07 21:00:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:0603	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 8 Enhancement Advisory	2016-04-08 00:53:53 UTC

Description RHOS Integration 2015-02-05 13:47:15 UTC

Cloned from launchpad blueprint https://blueprints.launchpad.net/sahara/+spec/periodic-cleanup.

Description:

For now it is possible that sahara cluster becomes stuck because of different reasons (e.g. if sahara service was restarted during provisioning or neutron failed to assign floating IP). This could lead to clusters holding resources for a long time. This could happen in different tenants and it is hard to check such conditions manually.

Proposed solution: delete sahara cluster in non-final state if it wasn't updated for a long time.

Specification URL (additional information):

http://specs.openstack.org/openstack/sahara-specs/specs/kilo/periodic-cleanup.html

Comment 5 Elise Gafford 2015-06-26 18:41:39 UTC

Hi Keith,

The feature "[RFE][sahara]: Clean up clusters that are in non-final state for a long time" [1] (documented upstream at [2]), is bugged. Luigi found that there is an error on security context creation [3][4]: at present, the job attempts to clean up all old clusters, but requires a delegated trust id in order to do so. Such trust ids are only created for transient clusters. 

In order to properly fix this issue, I have signed on to implement a new feature (creating delegated trusts on long-running clusters, to be deleted after provisioning is complete [5]). I will complete this task; it's relatively simple technically. However, there is a question of timeline.

The original RFE was, effectively, never completed successfully upstream. At this point, in order to finish it for RHOS 7, we'll be implementing a new, security-related feature, potentially without significant time for upstream review. As the RFE feature is a convenience feature, it seems to me that this is an unnecessary risk, and that the prudent course would be to let the security change go through its full upstream process before backporting to stable/kilo and to RHOS 7 (likely in the first point release.)

However, if we think the feature is important enough for RHOS 7 that it warrants rapid implementation and backport, I'm happy to mark this as a blocker and do that; just looking for your opinion given the new data.

Thanks,
Ethan

[1]: RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1189502
[2]: Spec: http://specs.openstack.org/openstack/sahara-specs/specs/kilo/periodic-cleanup.html
[3]: Bug on RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1233159
[4]: Upstream bug: https://bugs.launchpad.net/sahara/+bug/1468722
[5]: Feature required to finish RFE: https://blueprints.launchpad.net/sahara/+spec/cluster-creation-with-trust

Comment 7 Elise Gafford 2015-09-09 16:38:33 UTC

Agreed not to backport this feature to RHOS 7, as it is only repairable via addition of a fairly major security feature. Pushing to RHOS 8.

Comment 11 Luigi Toscano 2016-03-04 17:29:54 UTC

If cleanup_time_for_incomplete_clusters is set to 1 (== 1 hour), and cluster provisioning is forcibly interrupted (by restarting the -engine daemon when the cluster is in the initialization phase), the periodic cluster cleanup process is triggered after cleanup_time_for_incomplete_clusters time and the cluster in non-final state ("Spawning", "Waiting" or "Preparing") is removed.

Verified on:
openstack-sahara-api-3.0.1-1.el7ost.noarch
openstack-sahara-common-3.0.1-1.el7ost.noarch
openstack-sahara-engine-3.0.1-1.el7ost.noarch

Comment 13 errata-xmlrpc 2016-04-07 21:00:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html

Note You need to log in before you can comment on or make changes to this bug.