Bug 1475034

Summary: Metrics chart reporting 74000 Millicores for an app running on a node with only 8 cores
Product: OpenShift Container Platform Reporter: Eric Jones <erjones>
Component: HawkularAssignee: Solly Ross <sross>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.1CC: aos-bugs, erjones, mrichter, mwringe, pweil, sross
Target Milestone: ---   
Target Release: 3.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-03 13:43:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Jones 2017-07-25 21:51:43 UTC
Description of problem:
application with several replications running just fine suddenly has metrics reporting significantly more cores that is possible (node has 8 cores, app reported 74,000 millicores).


Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.3.1.11

Additional info:
Attaching files shortly

Comment 2 Matt Wringe 2017-07-26 17:50:01 UTC
@sross: it looks like Heapster is using 15s for its interval, and I believe at this interval we can sometimes get strange cpu usage results back. Is this something we have seen before? A very large cpu spike which is nonsense.

Comment 3 Solly Ross 2017-07-28 19:24:29 UTC
those logs do not look like a healthy Heapster :-/

I'd try switching to an interval of 30s, as well as checking what the summary endpoint says, and what happens if you switch to using the summary source (`--source=kubernetes.summary_api:...` instead of `--source=kubernetes:...`.

We've seen spikes like that due to bad (non-monotonically increasing) CPU metrics and overflow, or occasionally due to bad metrics coming from Kubelet/cAdvisor, but I thought we'd fixed most of those issues.