Bug 962768

Summary: Cassandra Migration - Migration estimation for large db is 5 times more than actual migration process
Product: [JBoss] JBoss Operations Network Reporter: Armine Hovsepyan <ahovsepy>
Component: InstallerAssignee: Stefan Negrea <snegrea>
Status: CLOSED WONTFIX QA Contact: Mike Foley <mfoley>
Severity: medium Docs Contact:
Priority: medium    
Version: JON 3.2CC: hrupp, jshaughn, mfoley, snegrea
Target Milestone: ---   
Target Release: JON 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-28 17:59:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Armine Hovsepyan 2013-05-14 12:10:33 UTC
Description of problem:
Cassandra Migration  - Migration estimation for large db is 5 times more than actual migration process

Version-Release number of selected component (if applicable):
jenkins build 215

How reproducible:
2  of 2

Steps to Reproduce:
1. prepare env with large data in postgres (my example 17500000 rows in rhq_meas_data_num_rxx)
2. run migration jar
3.
  
Actual results:
estimation is 101min - actual run takes 21 mins

Expected results:
Estimate is reasonably accurate (within 10% accurate)

Additional info:
estimation and migration timing can be found  in Jenkins - Migrator_Run 49 and 50 builds

Comment 1 Heiko W. Rupp 2013-08-24 18:17:31 UTC
Stefan, is there a way to improve the estimate (without having the estimation running for too long?)

Comment 3 Jay Shaughnessy 2014-08-26 14:01:17 UTC
Flagging Stefan to answer above question and triage.  Currently assigned but no target.

Comment 4 Stefan Negrea 2014-08-28 17:59:49 UTC
The code uses a linear approximation with padding based on the migration of a sample amount of actual data. The goal of the approximation is to give an upper bound for the migration. There is no way to account for changes in network speeds or additional environment parameters, so a conservative estimation will cover for adverse external factors.

The estimation is an upper bound for the migration time and as long as the migration occurs in less time this feature works as expected.