Bug 1445498 - Hourly metrics_## tables grow filling up the VMDB filesystem when real-time purges fail
Summary: Hourly metrics_## tables grow filling up the VMDB filesystem when real-time p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance
Version: 5.7.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: GA
: 5.9.0
Assignee: Jillian Tullo
QA Contact: Tasos Papaioannou
URL:
Whiteboard: c&u
Depends On:
Blocks: 1462358 1465086
TreeView+ depends on / blocked
 
Reported: 2017-04-25 19:39 UTC by Thomas Hennessy
Modified: 2020-06-11 13:41 UTC (History)
14 users (show)

Fixed In Version: 5.9.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1462358 1465086 (view as bug list)
Environment:
Last Closed: 2018-03-06 15:10:25 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thomas Hennessy 2017-04-25 19:39:15 UTC
Description of problem:when C&U real-time records are not purged consistently, the number of unpurged records grows so larger that the huorly metrics_## tables only grow in size eventually filling the entire VMDB filesystem.


Version-Release number of selected component (if applicable):5.7.1.3


How reproducible:  needs a test environment simulating the behavior of a multi-thousand VM environment where C&U realtime data is captured over several days. In this specific customer case, there are about 5k VM instances each collecting C&U.  Purgeing is failing due to the timeout of the purge message with the 600 seconds timeout value.  

this is a vm ware environment with 5k vms, so there are about 5x10^3 * 1.8x10^2 => 900,000 realtime rows expected to be captured per hour,


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Original database maintenance scripts had a provision to change from REINDEX of hourly metrics_## tables to TRUNCATE but the script comments of the original are not preserved in the current scripts.

I think it might be a good idea to change from REINDEX and to always TRUNCATE the tables after 23 hours to avoid the probems which we know will surface if the VMDB filesystem is allowed to fill.

Comment 3 Thomas Hennessy 2017-04-25 19:45:25 UTC
proposed modified script should look like:

++++++++++++++++++++++++++++++++++++++++++++++

#!/bin/bash

source /etc/default/evm
LOGFILE=/var/www/miq/vmdb/log/hourly_continuous_pg_maint_stdout.log
TABLE_NAME=metrics_$(date -u +"%H" --date='+1 hours')

echo "current time is $(date) -> target for TRUNCATE TABLE is '$TABLE_NAME' table" >> $LOGFILE
psql -U postgres vmdb_production -a -e -c "TRUNCATE TABLE $TABLE_NAME" >> $LOGFILE 2>&1
echo "TRUNCATE TABLE $TABLE_NAME completed at $(date)" >> $LOGFILE
echo "=================" >> $LOGFILE

++++++++++++++++++++++++++++++++++++++++++++++

Comment 5 Thomas Hennessy 2017-05-17 14:57:03 UTC
The BZ is open because it represents a failure in the product to consistently remove realtime C&U tuples from realtime tables.  This is not the only customer who has reported this problem with the latest CFME 4.2 code, so the bug still exists.

The case is closed because I provided the customer with a work around which removes the exposure of this failure  causing his filesystem to fill because we are failing to remove realtime tuples, so *his* problem is addressed while *the product problem* persists.

Comment 6 Greg Blomquist 2017-05-18 18:45:02 UTC
I'm dropping the priority here because it looks like there's a KB-style workaround.  That can give GTs team time to look over the possible options here.

Comment 7 Jillian Tullo 2017-06-06 13:29:24 UTC
BZ: https://github.com/ManageIQ/manageiq/pull/15312

Comment 10 Tasos Papaioannou 2017-10-09 20:40:06 UTC
Verified on 5.9.0.1.


Note You need to log in before you can comment on or make changes to this bug.