1124662 – [RFE][sahara]: [EDP] Support job execution cancellation in the client

Bug 1124662 - [RFE][sahara]: [EDP] Support job execution cancellation in the client

Summary: [RFE][sahara]: [EDP] Support job execution cancellation in the client

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	RFEs
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	RHOS Maint
QA Contact:
Docs Contact:
URL:	https://blueprints.launchpad.net/saha...
Whiteboard:	upstream_milestone_none upstream_defi...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-30 04:01 UTC by RHOS Integration
Modified:	2015-03-19 17:26 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-19 17:04:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description RHOS Integration 2014-07-30 04:01:11 UTC

Cloned from launchpad blueprint https://blueprints.launchpad.net/sahara/+spec/edp-job-cancel.

Description:

Currently the Oozie and Spark EDP engines support job execution cancellation, and the v1.1 REST api exposes a job execution cancellation endpoint in addition to job execution deletion (which removes the job execution record from the Sahara db)

However, the client only exposes job execution deletion, and provides no way to cancel a job execution.  Consequently the UI exposes only "delete" as well.  Running "delete" removes the job execution from Sahara, but it does not stop the job. This is problematic for a few reasons:

* the user may think the job has been stopped, but it hasn't
* even if the "delete" operation in the client is extended to do "cancel then delete", this will remove the job execution from Sahara thereby breaking the relaunch capability from the UI.  Relaunch may not be useful for ephemeral clusters (although it might be if there is a delay before cluster termination after job completion) but it is useful for long-running clusters
* without cancellation, a user cannot stop a job that they realize is configured incorrectly and relaunch it, they must wait for completion

Proposal is the following:
* the client should support cancellation as an additional operation
* the UI should support the cancellation operation in addition to deletion
* the semantics of deletion on a non-terminated job should be decided -- really remove the record and leave the job running, or cancel then delete?  (drafter supports cancel then delete)
* there was discussion about whether "cancel" should be in the v2 API, but that discussion should be separate.  It is in the V1 API currently and the client should support it.  V2 discussions can happen later.

(spec in progress)

Specification URL (additional information):

None

Note You need to log in before you can comment on or make changes to this bug.