1475034 – Metrics chart reporting 74000 Millicores for an app running on a node with only 8 cores

Bug 1475034 - Metrics chart reporting 74000 Millicores for an app running on a node with only 8 cores

Summary: Metrics chart reporting 74000 Millicores for an app running on a node with on...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Hawkular
Sub Component:
Version:	3.3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.3.1
Assignee:	Solly Ross
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-25 21:51 UTC by Eric Jones
Modified:	2020-12-14 09:14 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-03 13:43:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Eric Jones 2017-07-25 21:51:43 UTC

Description of problem:
application with several replications running just fine suddenly has metrics reporting significantly more cores that is possible (node has 8 cores, app reported 74,000 millicores).


Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.3.1.11

Additional info:
Attaching files shortly

Comment 2 Matt Wringe 2017-07-26 17:50:01 UTC

@sross: it looks like Heapster is using 15s for its interval, and I believe at this interval we can sometimes get strange cpu usage results back. Is this something we have seen before? A very large cpu spike which is nonsense.

Comment 3 Solly Ross 2017-07-28 19:24:29 UTC

those logs do not look like a healthy Heapster :-/

I'd try switching to an interval of 30s, as well as checking what the summary endpoint says, and what happens if you switch to using the summary source (`--source=kubernetes.summary_api:...` instead of `--source=kubernetes:...`.

We've seen spikes like that due to bad (non-monotonically increasing) CPU metrics and overflow, or occasionally due to bad metrics coming from Kubelet/cAdvisor, but I thought we'd fixed most of those issues.

Note You need to log in before you can comment on or make changes to this bug.