Cloned from launchpad blueprint https://blueprints.launchpad.net/sahara/+spec/edp-job-cancel.
Currently the Oozie and Spark EDP engines support job execution cancellation, and the v1.1 REST api exposes a job execution cancellation endpoint in addition to job execution deletion (which removes the job execution record from the Sahara db)
However, the client only exposes job execution deletion, and provides no way to cancel a job execution. Consequently the UI exposes only "delete" as well. Running "delete" removes the job execution from Sahara, but it does not stop the job. This is problematic for a few reasons:
* the user may think the job has been stopped, but it hasn't
* even if the "delete" operation in the client is extended to do "cancel then delete", this will remove the job execution from Sahara thereby breaking the relaunch capability from the UI. Relaunch may not be useful for ephemeral clusters (although it might be if there is a delay before cluster termination after job completion) but it is useful for long-running clusters
* without cancellation, a user cannot stop a job that they realize is configured incorrectly and relaunch it, they must wait for completion
Proposal is the following:
* the client should support cancellation as an additional operation
* the UI should support the cancellation operation in addition to deletion
* the semantics of deletion on a non-terminated job should be decided -- really remove the record and leave the job running, or cancel then delete? (drafter supports cancel then delete)
* there was discussion about whether "cancel" should be in the v2 API, but that discussion should be separate. It is in the V1 API currently and the client should support it. V2 discussions can happen later.
(spec in progress)
Specification URL (additional information):