Description of problem: Snapshots are generated weekly during scheduled maintenance and when nodes are (un)deployed. A snapshot consists of hard links to SSTable files; consequently, it takes up little disk space. But when an SSTable is deleted during compaction, space will not be reclaimed if the SSTable is included in a snapshot. This can add up over time. There is currently nothing in place for managing snapshots. Here are a few possible options, 1) Move snapshots older than X to a specified location 2) Move all snapshots to a specified location 3) Delete snapshots older than X 4) Move N snapshots (from oldest to youngest) to a specified location 5) Delete N snapshots (from oldest to youngest) to a specified location This could be done as a reoccurring operation. We could also introduce some new metrics to monitor snapshot disk usage similar to what we already have for the data directories. If the disk usage exceeds a threshold, we can fire an alert and perform one of the above actions. This is another good step we should take for providing storage node disk management. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Bump the target version now that 4.11 is out.
Based on discussion with John: - I'll create Storage Cluster Settings (under Topology/StorageNodes/Settings) that would provide interface to above snapshot management options - I'll implement periodic snaphosts as scheduled resource operations - when this settings changes, we'll reschedule operations on all storage nodes - need to schedule when new node joins to cluster - by default we'll completely disable automatic snapshots
1st. part merged to master commit fe30df29d11d85bff76efca6e26302e4b6f96429 Merge: 181e5e7 2ac7010 Author: jsanda <jsanda> Date: Wed Jul 23 09:34:08 2014 -0400 Merge pull request #95 from lzoubek/bugs/1074632 Bug 1074632 - RFE: Manage storage node snapshots
Also this commit fixes tests which did not pass on jenkins https://github.com/rhq-project/rhq/commit/2de223b66109636a8b073e70e510f35cd6b70238
Additional commit fixing tests: https://github.com/rhq-project/rhq/commit/0bef943a102c2bd13f42964fa35c8dd651ca9790#diff-d41d8cd98f00b204e9800998ecf8427e 2nd (server-side) part in master: https://github.com/rhq-project/rhq/commit/6640eabc6735327f8261f0fed23afd942d0ce801#diff-d41d8cd98f00b204e9800998ecf8427e