Bug 1074632

Summary: RFE: Manage storage node snapshots
Product: [Other] RHQ Project Reporter: John Sanda <jsanda>
Component: Plugins, Storage NodeAssignee: Thomas Heute <theute>
Status: ON_QA --- QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.9CC: hrupp, jshaughn, theute
Target Milestone: GA   
Target Release: RHQ 4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1074633 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1118104, 1074633, 1099024    

Description John Sanda 2014-03-10 17:28:07 UTC
Description of problem:
Snapshots are generated weekly during scheduled maintenance and when nodes are (un)deployed. A snapshot consists of hard links to SSTable files; consequently, it takes up little disk space. But when an SSTable is deleted during compaction, space will not be reclaimed if the SSTable is included in a snapshot. This can add up over time. There is currently nothing in place for managing snapshots. Here are a few possible options,

1) Move snapshots older than X to a specified location

2) Move all snapshots to a specified location

3) Delete snapshots older than X

4) Move N snapshots (from oldest to youngest) to a specified location

5) Delete N snapshots (from oldest to youngest) to a specified location

This could be done as a reoccurring operation. We could also introduce some new metrics to monitor snapshot disk usage similar to what we already have for the data directories. If the disk usage exceeds a threshold, we can fire an alert and perform one of the above actions. This is another good step we should take for providing storage node disk management.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Heiko W. Rupp 2014-05-08 14:43:00 UTC
Bump the target version now that 4.11 is out.

Comment 2 Libor Zoubek 2014-07-14 15:28:39 UTC
Based on discussion with John:

 - I'll create Storage Cluster Settings (under Topology/StorageNodes/Settings) that would provide interface to above snapshot management options
 - I'll implement periodic snaphosts as scheduled resource operations
 - when this settings changes, we'll reschedule operations on all storage nodes
 - need to schedule when new node joins to cluster
 - by default we'll completely disable automatic snapshots

Comment 3 Libor Zoubek 2014-07-24 09:47:24 UTC
1st. part merged to master

commit fe30df29d11d85bff76efca6e26302e4b6f96429
Merge: 181e5e7 2ac7010
Author: jsanda <jsanda>
Date:   Wed Jul 23 09:34:08 2014 -0400

    Merge pull request #95 from lzoubek/bugs/1074632

    Bug 1074632 - RFE: Manage storage node snapshots

Comment 4 Libor Zoubek 2014-07-25 13:33:22 UTC
Also this commit fixes tests which did not pass on jenkins

https://github.com/rhq-project/rhq/commit/2de223b66109636a8b073e70e510f35cd6b70238