Bug 1074632

Summary:	RFE: Manage storage node snapshots
Product:	[Other] RHQ Project	Reporter:	John Sanda <jsanda>
Component:	Plugins, Storage Node	Assignee:	Thomas Heute <theute>
Status:	ON_QA ---	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.9	CC:	hrupp, jshaughn, theute
Target Milestone:	GA
Target Release:	RHQ 4.13
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1074633 (view as bug list)		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1118104, 1074633, 1099024

Description John Sanda 2014-03-10 17:28:07 UTC

Description of problem:
Snapshots are generated weekly during scheduled maintenance and when nodes are (un)deployed. A snapshot consists of hard links to SSTable files; consequently, it takes up little disk space. But when an SSTable is deleted during compaction, space will not be reclaimed if the SSTable is included in a snapshot. This can add up over time. There is currently nothing in place for managing snapshots. Here are a few possible options,

1) Move snapshots older than X to a specified location

2) Move all snapshots to a specified location

3) Delete snapshots older than X

4) Move N snapshots (from oldest to youngest) to a specified location

5) Delete N snapshots (from oldest to youngest) to a specified location

This could be done as a reoccurring operation. We could also introduce some new metrics to monitor snapshot disk usage similar to what we already have for the data directories. If the disk usage exceeds a threshold, we can fire an alert and perform one of the above actions. This is another good step we should take for providing storage node disk management.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Heiko W. Rupp 2014-05-08 14:43:00 UTC

Bump the target version now that 4.11 is out.

Comment 2 Libor Zoubek 2014-07-14 15:28:39 UTC

Based on discussion with John:

 - I'll create Storage Cluster Settings (under Topology/StorageNodes/Settings) that would provide interface to above snapshot management options
 - I'll implement periodic snaphosts as scheduled resource operations
 - when this settings changes, we'll reschedule operations on all storage nodes
 - need to schedule when new node joins to cluster
 - by default we'll completely disable automatic snapshots

Comment 3 Libor Zoubek 2014-07-24 09:47:24 UTC

1st. part merged to master

commit fe30df29d11d85bff76efca6e26302e4b6f96429
Merge: 181e5e7 2ac7010
Author: jsanda <jsanda>
Date:   Wed Jul 23 09:34:08 2014 -0400

    Merge pull request #95 from lzoubek/bugs/1074632

    Bug 1074632 - RFE: Manage storage node snapshots

Comment 4 Libor Zoubek 2014-07-25 13:33:22 UTC

Also this commit fixes tests which did not pass on jenkins

https://github.com/rhq-project/rhq/commit/2de223b66109636a8b073e70e510f35cd6b70238

Comment 5 Libor Zoubek 2014-08-07 19:21:44 UTC

Additional commit fixing tests:

https://github.com/rhq-project/rhq/commit/0bef943a102c2bd13f42964fa35c8dd651ca9790#diff-d41d8cd98f00b204e9800998ecf8427e


2nd (server-side) part in master:

https://github.com/rhq-project/rhq/commit/6640eabc6735327f8261f0fed23afd942d0ce801#diff-d41d8cd98f00b204e9800998ecf8427e