Red Hat Bugzilla – Bug 536048
Metric avg wrong for groups
Last modified: 2015-02-01 18:25:36 EST
20:31:06 < ghinkle> there is another problem in metrics... the aggregation of group data
20:32:47 < ghinkle> if i have metric A scheduled for 5 minutes on one resource and 10 minutes for another and then look at them in a group where the buckets are say 10 minutes surrounding all data points, the average displayed will be 6.667 instead of 7.5
The avg and sum values for groups are wrong.
Especially it makes no sense that if one value is collected 10 times in a timespan and the other only 1 time, that the first one gets 90% of the avg and the other 10.
Joe, could you investigate whats required to address this issue
(10:00:22 AM) ghinkle: ccrouch, i recommend we push rhq-43 out of the next release
We had similar problems in JON 1.4
I think the "workaround" would be to not have different schedules for the same metric across resources in the group. I'm sure there are reasons for doing it otherwise, but it seems to me the more common use case would be to collect the same metric with the same collection schedule across resources.
mazz, in general that sounds like a good strategy, but it's nearly impossible to prevent that in practice. the issue lies in the fact that with a ManyToMany relationship between Resources and ResourceGroups, a resource (and, thus, it's metrics) may actually be in many, many different groups.
pragmatically speaking, your suggested workaround would then scale up to requiring that all metrics of a particular type must be moved in tandem - i.e., disallow individual schedules and only allow updates at the metric template-level...because if we didn't do it that way the logic would simply be too complex to get correct across all current group permutations in the system.
i think this is fixable, just not fun / simple.
maybe we check and if we see two or more different coll interval values, we just put a yellow bar at the top of the graphs and say, "these values may be inaccurate due to the differing intervals across resources"
This bug was previously known as http://jira.rhq-project.org/browse/RHQ-43
Mass move to component = Monitoring