Bug 1126105
Summary: | RHQ storage determineMostRecentRawDataSinceLastShutdown sub-optimal | ||
---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Elias Ross <genman> |
Component: | Core Server, Storage Node | Assignee: | John Sanda <jsanda> |
Status: | CLOSED WONTFIX | QA Contact: | Mike Foley <mfoley> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.12 | CC: | hrupp, jsanda |
Target Milestone: | --- | ||
Target Release: | RHQ 4.13 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-09-11 16:00:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1133605 |
Description
Elias Ross
2014-08-01 21:48:25 UTC
The metrics_cache_index table does not contain any metric data. It stores schedule ids and two or three different timestamps. It is a replacement of the metrics_index table from pre-4.12. The concern Elias raises about the size of a partition is very valid. It was a concern with the former metrics_index table and even more so with the new metrics_cache_index table. I think we might need to consider some changes to cap the size of the rows in the index table. Let me explain with an example for aggregating data from the current time slice. This example is applicable to both 4.12 as well earlier versions of RHQ. Suppose we have N schedules with data to be aggregated. The index partition (from metrics_cache_index or from metrics_index) will contain N rows. We load all of those rows in a single query. As N gets really big, it can create hot spots on a node and make us more susceptible to read timeouts. Since schedule ids are monotonically increasing integers, we can easily implement paging to reduce the likelihood of read timeouts. That does not address the issue of partitions being big, i.e., really wide rows. We could break up the single partition into multiple partitions where schedule id offsets are part of the partition key. This means more reads during aggregation but I think it can effectively prevent the problems that Elias is experiencing. I am closing this because the determineMostRecentRawDataSinceLastShutdown method has been removed as part of the work for bug 1114202. |