Bug 1212164 - Chargeback: CPU Total Field reporting incorrect values.
Summary: Chargeback: CPU Total Field reporting incorrect values.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.3.0
Hardware: All
OS: All
high
high
Target Milestone: GA
: 5.4.0
Assignee: Jason Frey
QA Contact: Nandini Chandra
URL:
Whiteboard:
: 1004057 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-15 17:09 UTC by Josh Carter
Modified: 2019-07-11 08:57 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In the previous version of CloudForms Management Engine, the CPU Total Field under Chargeback reports used to display incorrect values due to gaps in rolling up "realtime" data collected from systems. This issue was fixed by avoiding gaps in metrics_rollups through not deriving currently "available" values if there was no record of data collection for usage and storage value for the past 60 minutes. In the latest version of CloudForms Management Engine, CPU Total Field under Chargeback reports display correct values.
Clone Of:
Environment:
Last Closed: 2015-06-16 12:58:32 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1100 0 normal SHIPPED_LIVE CFME 5.4.0 bug fixes, and enhancement update 2015-06-16 16:28:42 UTC

Comment 12 Jason Frey 2015-04-23 17:56:45 UTC
Is this a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1038869 ?  In that ticket we are seeing the "wrong" value for total CPUs in various places.  However, the backend database is correct.  One issue there is in the UI where the summary and other places report vcpus instead of total cores.  In addition, there is a problem with chargeback using the wrong value.

Comment 13 Jason Frey 2015-04-23 18:02:09 UTC
One interesting I found in that other ticket is that our VMware appliance is configured with 4 cpus @ 1 cores each = 4 logical CPUs, but our RHEV appliance is configured with 1 cpu @ 4 cores = 4 logical CPUs.  If chargeback is looking at CPUs, then different numbers are used.  I think chargeback should be looking at the logical CPUs column.

Comment 14 Greg Blomquist 2015-04-24 21:17:31 UTC
There are really four things going on here.

1) The customer has not configured the Metrics authentication for the RHEVM provider.  Therefore absolutely no realtime compute metrics are being collected for the RHEVM VMs.  (I know this non-obvious given that there are hourly rollups, but there's more).

2) Storage C&U has an out-of-the-box default configuration to collect Storage metrics every 2 hours.  *AND*, when Storage metrics are collected, it attempts to update an hourly rollup record for each VM on that Storage.  Here's the kicker, Storage C&U and Compute C&U happen on independent schedules.  This means that when the Storage C&U gets ready to update the hourly rollup record for each VM, it looks for an hourly row to update.  If there's no hourly row yet, it creates one, assuming that Compute C&U will just come along later and update that same row.

3) Anytime an hourly record is created for a VM, a set of "derived" fields are calculated.  Among those derived fields are:  
  * cpu_usagemhz_rate_average
  * derived_vm_numvcpus
  * derived_memory_used
  * derived_memory_available
When looking at the metric rollup data in comment #4, it's clear that cpu_usagemhz_rate_average and derived_memory_used are nil, while derived_vm_numvcpus and derived_memory_available are non-nil.  This is because the cpu_usagemhz_rate_average and derived_memory_used values come from Compute metrics.  While derived_vm_numvcpus and derived_memory_available are static values that come from inventory collection (refresh).

4) Finally, the chargeback report is configured to charge for "allocated" memory and CPU.

Taking this all into account, we can explain everything happening here:

--> There are VM hourly rollups every two hours for RHEVM VMs because:
    - there are no RHEVM VM realtime metrics because it's not configured, and
    - the Storage C&U, running every 2 hours, has created rollups for VMs without realtime metrcis but with static memory and CPU values collected from inventory (refresh).

--> The chargeback report shows values because it is going off of hourly rollups for VMs and looking at CPU and memory allocated to the VMs.

The underlying question is: What *should* happen?

Comment 15 Jason Frey 2015-04-27 14:05:38 UTC
Spoke to Oleg, and he had a great idea.  If we bump Storage C&U up from collecting every two hours to every hour, then we don't have any gaps, and we automatically get the derived values hourly.

Comment 16 Jason Frey 2015-04-30 21:30:08 UTC
https://github.com/ManageIQ/manageiq/pull/2815

Comment 17 CFME Bot 2015-05-01 18:06:17 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/435a5b58d56f194922b1b3d818e065141ad6f95c

commit 435a5b58d56f194922b1b3d818e065141ad6f95c
Author:     Jason Frey <jfrey>
AuthorDate: Thu Apr 30 15:34:16 2015 -0400
Commit:     Jason Frey <jfrey>
CommitDate: Fri May 1 12:59:34 2015 -0400

    Do not derive "available" values if we don't have any usage values.
    
    The lack of cpu or mem usage values implies that the target being
    collected is either off, or not configured for collection.  In both
    cases, collecting "allocated" values does not make sense.  If off, the
    target will not be given those resources, so they are not really
    available.  If not configured for collection, then we should not be
    doing the derivation at all.
    
    The circumstance for this situation occurs when normal C&U for a target
    is not enabled, but storage C&U still occurs.  When storage C&U comes
    along it calls process_derived_columns, but some of those derived columns
    should not be calculated in that state.  If the normal C&U were to come
    along later, then it would fill in the missing details.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1038869
    https://bugzilla.redhat.com/show_bug.cgi?id=1212164

 vmdb/app/models/metric/common.rb           |   2 +
 vmdb/app/models/metric/processing.rb       |  15 ++-
 vmdb/spec/factories/metric.rb              |  21 +---
 vmdb/spec/factories/metric_rollup.rb       |  20 ++++
 vmdb/spec/models/metric/processing_spec.rb | 180 +++++++++++++++++++++++++++++
 vmdb/spec/models/metric_spec.rb            |   6 +-
 6 files changed, 223 insertions(+), 21 deletions(-)
 create mode 100644 vmdb/spec/factories/metric_rollup.rb
 create mode 100644 vmdb/spec/models/metric/processing_spec.rb

Comment 18 CFME Bot 2015-05-01 18:06:22 UTC
New commit detected on manageiq/master:
https://github.com/ManageIQ/manageiq/commit/8abba9ae65cc87696f7c45399ceb2dae100cd610

commit 8abba9ae65cc87696f7c45399ceb2dae100cd610
Author:     Jason Frey <jfrey>
AuthorDate: Thu Apr 30 15:27:23 2015 -0400
Commit:     Jason Frey <jfrey>
CommitDate: Thu Apr 30 17:27:37 2015 -0400

    Change storage capture to 60m to avoid leaving gaps in metrics_rollups.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1038869
    https://bugzilla.redhat.com/show_bug.cgi?id=1212164

 vmdb/config/vmdb.tmpl.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 20 CFME Bot 2015-05-06 17:26:31 UTC
New commit detected on cfme/5.3.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=0da46e4efcb3cce468b9f32ab92609187858b207

commit 0da46e4efcb3cce468b9f32ab92609187858b207
Author:     Jason Frey <jfrey>
AuthorDate: Thu Apr 30 15:34:16 2015 -0400
Commit:     Jason Frey <jfrey>
CommitDate: Wed May 6 12:46:14 2015 -0400

    Do not derive "available" values if we don't have any usage values.
    
    The lack of cpu or mem usage values implies that the target being
    collected is either off, or not configured for collection.  In both
    cases, collecting "allocated" values does not make sense.  If off, the
    target will not be given those resources, so they are not really
    available.  If not configured for collection, then we should not be
    doing the derivation at all.
    
    The circumstance for this situation occurs when normal C&U for a target
    is not enabled, but storage C&U still occurs.  When storage C&U comes
    along it calls process_derived_columns, but some of those derived columns
    should not be calculated in that state.  If the normal C&U were to come
    along later, then it would fill in the missing details.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1038869
    https://bugzilla.redhat.com/show_bug.cgi?id=1212164
    https://bugzilla.redhat.com/show_bug.cgi?id=1219144

 vmdb/app/models/metric/common.rb           |   2 +
 vmdb/app/models/metric/processing.rb       |  15 ++-
 vmdb/spec/factories/metric.rb              |  21 +---
 vmdb/spec/factories/metric_rollup.rb       |  20 ++++
 vmdb/spec/models/metric/processing_spec.rb | 180 +++++++++++++++++++++++++++++
 vmdb/spec/models/metric_spec.rb            |   6 +-
 6 files changed, 223 insertions(+), 21 deletions(-)
 create mode 100644 vmdb/spec/factories/metric_rollup.rb
 create mode 100644 vmdb/spec/models/metric/processing_spec.rb

Comment 21 CFME Bot 2015-05-06 17:26:47 UTC
New commit detected on cfme/5.3.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=5398fb75e9779e88967a2b8dea803f90e03c05fa

commit 5398fb75e9779e88967a2b8dea803f90e03c05fa
Author:     Jason Frey <jfrey>
AuthorDate: Thu Apr 30 15:27:23 2015 -0400
Commit:     Jason Frey <jfrey>
CommitDate: Wed May 6 12:46:02 2015 -0400

    Change storage capture to 60m to avoid leaving gaps in metrics_rollups.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1038869
    https://bugzilla.redhat.com/show_bug.cgi?id=1212164
    https://bugzilla.redhat.com/show_bug.cgi?id=1219144

 vmdb/config/vmdb.tmpl.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 23 Nandini Chandra 2015-06-05 05:01:41 UTC
Verified that there are no gaps in metrics_rollups when a VM is powered off.

Verified in 5.4.0.2.

Comment 24 Nandini Chandra 2015-06-09 10:46:41 UTC
*** Bug 1004057 has been marked as a duplicate of this bug. ***

Comment 26 errata-xmlrpc 2015-06-16 12:58:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1100.html


Note You need to log in before you can comment on or make changes to this bug.