1391996 – Openshift Metrics Heapster pod restarting when Openshift metrics configured to monitor many pods ( in this specific case 15k )

Bug 1391996 - Openshift Metrics Heapster pod restarting when Openshift metrics configured to monitor many pods ( in this specific case 15k )

Summary: Openshift Metrics Heapster pod restarting when Openshift metrics configured t...

Keywords:
Status:	CLOSED DUPLICATE of bug 1465532
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Hawkular
Sub Component:
Version:	3.3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Matt Wringe
QA Contact:	Peng Li
Docs Contact:
URL:
Whiteboard:	aos-scalability-34
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-04 14:47 UTC by Elvir Kuric
Modified:	2017-08-04 15:47 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-04 15:47:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
metrics pods logs (2.55 MB, application/x-7z-compressed) 2016-11-04 14:47 UTC, Elvir Kuric	no flags	Details
View All

Description Elvir Kuric 2016-11-04 14:47:25 UTC

Created attachment 1217416 [details]
metrics pods logs

Description of problem:

# oc get pods 
when executed in openshift-infra project gives output as 

# oc get pods
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-mp5gn   1/1       Running     0          2d
hawkular-cassandra-2-z8rl0   1/1       Running     2          2d
hawkular-metrics-2z5so       1/1       Running     0          2d
hawkular-metrics-5srpo       1/1       Running     0          2d
heapster-0npf8               1/1       Running     18         2d
metrics-deployer-op83i       0/1       Completed   0          3d


from where is visible that heapster pod was restarted many times for unknown reason. 


Version-Release number of selected component (if applicable):
Openshiftrpm -qa | grep atomic
atomic-openshift-dockerregistry-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-pod-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-clients-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-node-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-tests-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-clients-redistributable-3.3.1.1-1.git.0.629a1d8.el7.x86_64
tuned-profiles-atomic-openshift-node-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-master-3.3.1.1-1.git.0.629a1d8.el7.x86_64
tuned-profiles-atomic-2.7.1-3.el7.noarch
atomic-openshift-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-sdn-ovs-3.3.1.1-1.git.0.629a1d8.el7.x86_64

and metrics images v.3.3 

How reproducible:

I have seen this issue when openshift metrics was supposed to monitor 15k pods across 220 nodes. 


Actual results:
heapster pod fails 

Expected results:


Additional info:
log files for heapster / hawkular / cassandra attached to BZ

Comment 3 Stefan Negrea 2017-08-04 15:47:56 UTC


*** This bug has been marked as a duplicate of bug 1465532 ***

Note You need to log in before you can comment on or make changes to this bug.