1641668 – [RFE] Add ability for heketi server to automatically resolve stale or failed operations

Bug 1641668 - [RFE] Add ability for heketi server to automatically resolve stale or failed operations

Summary: [RFE] Add ability for heketi server to automatically resolve stale or failed ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	ocs-3.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	OCS 3.11.1
Assignee:	John Mulligan
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1634745 1635736 1636409 1636477 OCS-3.11.1-devel-triage-done 1644143 1644171
TreeView+	depends on / blocked

Reported:	2018-10-22 13:29 UTC by John Mulligan
Modified:	2019-02-08 07:32 UTC (History)
CC List:	12 users (show)
Fixed In Version:	heketi-8.0.0-4.el7rhgs
Doc Type:	Enhancement
Doc Text:	Previously, an ongoing operation interrupted by a server restart or failure was not immediately recoverable and would leave a stale operation in the heketi database. These stale and failed operations captured the state of the system which had to be manually resolved. With this fix, Heketi supports an automatic clean up feature that can also be activated offline or on demand.
Clone Of:
Environment:
Last Closed:	2019-02-07 03:41:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:0286	0	None	None	None	2019-02-07 03:41:26 UTC

Description John Mulligan 2018-10-22 13:29:37 UTC

Description of problem:

When Heketi is terminated while working on an operation (for example, volume create) or Heketi is unable to roll back an operation at the time of failure (for example, volume can not be created or removed from gluster) it leaves the operation behind in the heketi db.

Leaving this data behind is useful because it will enable the server to resolve the issues in the future. However, this feature for using this data to resolve the issues later has not yet been implemented. This bug exists to provide an overall goal of using these db entries to "automatically" reconcile the entries with the backend storage subsystem.



Additional info:

The feature set can be broken down into a rough list of items:

* Ability to revert/clean up stale or failed pending operations
** Covered operations: volume create/delete/expand, block volume create/delete
** Not initially covered: device or node replace (for this release)
* On heketi server start clean up operations automatically
* Periodic clean up of operations (on long running server)
* Commands for:
** Manually kick off an operation cleanup (all)
** Manually kick off cleanup of a specific operation (by id)
** Command to list operations and their statuses

Items in the above list may be broken down into individual BZs for tracking purposes.

Comment 13 Anjana KD 2019-01-07 07:12:47 UTC

Updated doc text. Kindly verify for Technical accuracy.

Comment 15 Anjana KD 2019-01-08 03:33:25 UTC

Thankyou for pointing that out.

Comment 24 errata-xmlrpc 2019-02-07 03:41:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0286

Note You need to log in before you can comment on or make changes to this bug.