Bug 1074633

Summary: RFE: Manage storage node snapshots
Product: [JBoss] JBoss Operations Network Reporter: John Sanda <jsanda>
Component: Storage NodeAssignee: Libor Zoubek <lzoubek>
Status: CLOSED CURRENTRELEASE QA Contact: Armine Hovsepyan <ahovsepy>
Severity: medium Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: ahovsepy, bkramer, hrupp, jkremser, loleary, lzoubek, mfoley, stephan.vollmer, theute
Target Milestone: ER05   
Target Release: JON 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Snapshots are generated weekly during scheduled maintenance, and when nodes are (un)deployed. A snapshot consists of hard links to SSTable files, which in themselves consume very little disk space. When an SSTable was deleted during compaction, space was not reclaimed if the SSTable was included in a snapshot. This behavior caused SSTable data to build-up over time to unacceptable levels. This issue revealed that there was no mechanism to manage snapshots through the UI. Multiples fixes are now included in the product. To address issues with servers running out of disk space on storage nodes, snapshots are no longer part of the weekly scheduled job, and are disabled by default. When enabled, snapshots can now be copied to a different location with more available storage space. Server-side capability now exists to manage snapshots for the storage cluster. System settings have been introduced, which can be updated through the Storage Administration pages. Users can also enable and disable snapshot management of storage clusters, and set cron expressions to run management tasks regularly.
Story Points: ---
Clone Of: 1074632
: 1099024 (view as bug list) Environment:
Last Closed: 2014-12-11 14:01:53 UTC Type: Enhancement
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1074632    
Bug Blocks: 1099024, 1118105, 1141885, 1141886, 1142356    

Description John Sanda 2014-03-10 17:29:39 UTC
+++ This bug was initially created as a clone of Bug #1074632 +++

Description of problem:
Snapshots are generated weekly during scheduled maintenance and when nodes are (un)deployed. A snapshot consists of hard links to SSTable files; consequently, it takes up little disk space. But when an SSTable is deleted during compaction, space will not be reclaimed if the SSTable is included in a snapshot. This can add up over time. There is currently nothing in place for managing snapshots. Here are a few possible options,

1) Move snapshots older than X to a specified location

2) Move all snapshots to a specified location

3) Delete snapshots older than X

4) Move N snapshots (from oldest to youngest) to a specified location

5) Delete N snapshots (from oldest to youngest) to a specified location

This could be done as a reoccurring operation. We could also introduce some new metrics to monitor snapshot disk usage similar to what we already have for the data directories. If the disk usage exceeds a threshold, we can fire an alert and perform one of the above actions. This is another good step we should take for providing storage node disk management.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Larry O'Leary 2014-07-03 20:31:03 UTC
Note that snapshots also appear to be created during server JBoss ON server start. Is this expected?

Comment 2 Libor Zoubek 2014-08-07 19:23:52 UTC
upstream work done, see comments Bug 1074632 for cherry-picking

Comment 3 Jirka Kremser 2014-08-11 16:58:54 UTC
cherry-picking commits
6640eabc6735327f8261f0
0bef943a102c2bd13f4296
2de223b66109636a8b073e
and
2ac701032cf7b05de8a285


branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/a2a567746
time:    2014-08-11 18:56:54 +0200
commit:  a2a56774602a65677b4ef36bbecea7c19137ca7c
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots

         2nd piece of impl above BZ. This patch adds server side
         capability to manage snapshots for storage cluster. This
         basically means that we regularly run takeSnaphost operation
         with several parameters - so user can decide whether to keep
         all of them, keep last or move older ones to specified
         location.

         6 new private/readonly system settings have been introduced -
         those settings can be updated only via Storage admin pages.

         User can enable/disable snapshots management of storage cluster
         and set cron expression to run management task regulary
         (additional 4 settings are basically parameters for
         takeSnapshot operation introduced within previous commit for
         this BZ). When Cluster Setting is saved in UI we re-schedule 
         takeSnapshot operations on all StorageNode resources in
         inventory.

         Snapshot related code was removed from 
         StorageNodeManagerBean#runClusterMaintenance()

         (cherry picked from commit
         6640eabc6735327f8261f0fed23afd942d0ce801) Signed-off-by: Jirka
         Kremser <jkremser>
         


branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/b9a429f70
time:    2014-08-11 18:56:54 +0200
commit:  b9a429f70973e757f2e89c13421c24bf03cafa80
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots

         next attempt to fix storageNode takeSnapshots itests

         (cherry picked from commit
         0bef943a102c2bd13f42964fa35c8dd651ca9790) Signed-off-by: Jirka
         Kremser <jkremser>
         


branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/1da0e2100
time:    2014-08-11 18:56:54 +0200
commit:  1da0e2100b8336d5d4cdf298bbaf618392e44b2c
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots

         StorageNodeComponentItest - output more details when assertion
         fails

         (cherry picked from commit
         2de223b66109636a8b073e70e510f35cd6b70238) Signed-off-by: Jirka
         Kremser <jkremser>
         


branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/b0ab71681
time:    2014-08-11 18:56:54 +0200
commit:  b0ab71681d20379c0c98c04d9c252aa5eab3a1fe
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots

         Added "Take Snapshot" operation to StorageNode resource. This
         operation takes several other parameters and it can either move
         or delete older snapshots. Basically it allows to
         - keep N latest snapshots and move/delete older ones
         - keep snapshots not older than N days and move/delete the
         older ones

         (cherry picked from commit
         2ac701032cf7b05de8a285210de619a5f62e5191) Signed-off-by: Jirka
         Kremser <jkremser>

Comment 4 Libor Zoubek 2014-08-12 12:13:07 UTC
Additional commit went to master & release branch

master
commit 1c43d4db43f575c22ecf37d3bdb67e4ae449b613
Author: Libor Zoubek <lzoubek>
Date:   Tue Aug 12 13:26:35 2014 +0200

    Bug 1074633 - RFE: Manage storage node snapshots
    
    Improve UX - disable several fields in Cluster Settings based on other
    field's selection when not relevant.



release branch
commit 0cf44decd9d4e569b168b4634c5e4abbf5b2572e
Author: Libor Zoubek <lzoubek>
Date:   Tue Aug 12 13:26:35 2014 +0200

    Bug 1074633 - RFE: Manage storage node snapshots
    
    Improve UX - disable several fields in Cluster Settings based on other
    field's selection when not relevant.
    
    (cherry picked from commit 1c43d4db43f575c22ecf37d3bdb67e4ae449b613)
    Signed-off-by: Libor Zoubek <lzoubek>

Comment 5 Simeon Pinder 2014-08-19 23:50:41 UTC
Moving to ON_QA as available to test in the following brew build:

https://brewweb.devel.redhat.com//buildinfo?buildID=379025

Comment 9 Libor Zoubek 2014-09-25 08:00:09 UTC
additional commit fixing test failure

branch:  master
link:    https://github.com/rhq-project/rhq/commit/aa02fec5a
time:    2014-09-25 09:59:04 +0200
commit:  aa02fec5a20eb492bb911f5117f5eb1b020582d2
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots
         Fix NPE that occured in 
         org.rhq.enterprise.server.discovery.DiscoveryBossBeanTest.testAutoImportStorageNode

Comment 10 Libor Zoubek 2014-09-25 20:17:44 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/41802d12c
time:    2014-09-25 22:17:02 +0200
commit:  41802d12c152863f271c0aa8ad00902d62c3bbf9
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots
         avoid NPE

Comment 11 Libor Zoubek 2014-09-26 07:59:22 UTC
test failure fixed in master - cherry-picked to release branch

branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/8e5616c2b
time:    2014-09-26 09:57:46 +0200
commit:  8e5616c2be3f8b9d5ef215ead8c27e94b1083303
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots
         avoid NPE
         (cherry picked from commit
         41802d12c152863f271c0aa8ad00902d62c3bbf9) Signed-off-by: Libor
         Zoubek <lzoubek>
         


branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/26d8e06ce
time:    2014-09-26 09:57:38 +0200
commit:  26d8e06ced4836f70eb87c3bf385dfc066a0a0a4
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots

         Fix NPE that occured in 
         org.rhq.enterprise.server.discovery.DiscoveryBossBeanTest.testAutoImportStorageNode

         (cherry picked from commit
         aa02fec5a20eb492bb911f5117f5eb1b020582d2) Signed-off-by: Libor
         Zoubek <lzoubek>

Comment 12 Simeon Pinder 2014-10-01 21:33:38 UTC
Moving to ON_QA as available for test with build:
https://brewweb.devel.redhat.com/buildinfo?buildID=388959

Comment 13 Armine Hovsepyan 2014-10-06 14:54:15 UTC
moving to ER05 to verify with #1141885

Comment 15 Libor Zoubek 2014-10-14 15:21:12 UTC
fixed tests

branch:  master
link:    https://github.com/rhq-project/rhq/commit/56c84f19c
time:    2014-10-14 17:18:48 +0200
commit:  56c84f19c4e4a39de0d0aea67f4ccdcf905c0b28
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots
         Fix StorageNodeComponentItest - basically our tests were
         running too fast. Failures happened beause of fact, that
         lastModified file attribute is stored in seconds - so our tests
         tended to generate several snapshots within same second and
         that led to incorrect behavior of tested "takeSnapshot" 
         operation. Several console logging has also been added. 
         changes. Lines starting


branch:  release/jon3.3.x
link:    https://github.com/rhq-project/rhq/commit/084594a8f
time:    2014-10-14 17:19:47 +0200
commit:  084594a8fdd9de051a0dedb21889077cc9990e02
author:  Libor Zoubek - lzoubek
message: Bug 1074633 - RFE: Manage storage node snapshots
         Fix StorageNodeComponentItest - basically our tests were
         running too fast. Failures happened beause of fact, that
         lastModified file attribute is stored in seconds - so our tests
         tended to generate several snapshots within same second and
         that led to incorrect behavior of tested "takeSnapshot" 
         operation. Several console logging has also been added. 
         changes. Lines starting
         (cherry picked from commit
         56c84f19c4e4a39de0d0aea67f4ccdcf905c0b28) Signed-off-by: Libor
         Zoubek <lzoubek>

Comment 16 Armine Hovsepyan 2014-10-22 09:48:02 UTC
all realted bug verified, rfe is qualified. thank you