Description of problem: Cassandra Migration - Migration estimation for large db is 5 times more than actual migration process Version-Release number of selected component (if applicable): jenkins build 215 How reproducible: 2 of 2 Steps to Reproduce: 1. prepare env with large data in postgres (my example 17500000 rows in rhq_meas_data_num_rxx) 2. run migration jar 3. Actual results: estimation is 101min - actual run takes 21 mins Expected results: Estimate is reasonably accurate (within 10% accurate) Additional info: estimation and migration timing can be found in Jenkins - Migrator_Run 49 and 50 builds
Stefan, is there a way to improve the estimate (without having the estimation running for too long?)
Flagging Stefan to answer above question and triage. Currently assigned but no target.
The code uses a linear approximation with padding based on the migration of a sample amount of actual data. The goal of the approximation is to give an upper bound for the migration. There is no way to account for changes in network speeds or additional environment parameters, so a conservative estimation will cover for adverse external factors. The estimation is an upper bound for the migration time and as long as the migration occurs in less time this feature works as expected.