1074632 – RFE: Manage storage node snapshots

Bug 1074632 - RFE: Manage storage node snapshots

Summary: RFE: Manage storage node snapshots

Keywords:
Status:	ON_QA
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Plugins, Storage Node
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	GA
Target Release:	RHQ 4.13
Assignee:	Thomas Heute
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1118104 1074633 1099024
TreeView+	depends on / blocked

Reported:	2014-03-10 17:28 UTC by John Sanda
Modified:	2022-03-31 04:27 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Clones:	1074633 (view as bug list)
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Description John Sanda 2014-03-10 17:28:07 UTC

Description of problem:
Snapshots are generated weekly during scheduled maintenance and when nodes are (un)deployed. A snapshot consists of hard links to SSTable files; consequently, it takes up little disk space. But when an SSTable is deleted during compaction, space will not be reclaimed if the SSTable is included in a snapshot. This can add up over time. There is currently nothing in place for managing snapshots. Here are a few possible options,

1) Move snapshots older than X to a specified location

2) Move all snapshots to a specified location

3) Delete snapshots older than X

4) Move N snapshots (from oldest to youngest) to a specified location

5) Delete N snapshots (from oldest to youngest) to a specified location

This could be done as a reoccurring operation. We could also introduce some new metrics to monitor snapshot disk usage similar to what we already have for the data directories. If the disk usage exceeds a threshold, we can fire an alert and perform one of the above actions. This is another good step we should take for providing storage node disk management.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Heiko W. Rupp 2014-05-08 14:43:00 UTC

Bump the target version now that 4.11 is out.

Comment 2 Libor Zoubek 2014-07-14 15:28:39 UTC

Based on discussion with John:

 - I'll create Storage Cluster Settings (under Topology/StorageNodes/Settings) that would provide interface to above snapshot management options
 - I'll implement periodic snaphosts as scheduled resource operations
 - when this settings changes, we'll reschedule operations on all storage nodes
 - need to schedule when new node joins to cluster
 - by default we'll completely disable automatic snapshots

Comment 3 Libor Zoubek 2014-07-24 09:47:24 UTC

1st. part merged to master

commit fe30df29d11d85bff76efca6e26302e4b6f96429
Merge: 181e5e7 2ac7010
Author: jsanda <jsanda>
Date:   Wed Jul 23 09:34:08 2014 -0400

    Merge pull request #95 from lzoubek/bugs/1074632

    Bug 1074632 - RFE: Manage storage node snapshots

Comment 4 Libor Zoubek 2014-07-25 13:33:22 UTC

Also this commit fixes tests which did not pass on jenkins

https://github.com/rhq-project/rhq/commit/2de223b66109636a8b073e70e510f35cd6b70238

Comment 5 Libor Zoubek 2014-08-07 19:21:44 UTC

Additional commit fixing tests:

https://github.com/rhq-project/rhq/commit/0bef943a102c2bd13f42964fa35c8dd651ca9790#diff-d41d8cd98f00b204e9800998ecf8427e


2nd (server-side) part in master:

https://github.com/rhq-project/rhq/commit/6640eabc6735327f8261f0fed23afd942d0ce801#diff-d41d8cd98f00b204e9800998ecf8427e

Note You need to log in before you can comment on or make changes to this bug.