Bug 1391996

Summary: Openshift Metrics Heapster pod restarting when Openshift metrics configured to monitor many pods ( in this specific case 15k )
Product: OpenShift Container Platform Reporter: Elvir Kuric <ekuric>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED DUPLICATE QA Contact: Peng Li <penli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3.1CC: aos-bugs, jeder, pweil, snegrea, tstclair
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: aos-scalability-34
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-04 15:47:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
metrics pods logs none

Description Elvir Kuric 2016-11-04 14:47:25 UTC
Created attachment 1217416 [details]
metrics pods logs

Description of problem:

# oc get pods 
when executed in openshift-infra project gives output as 

# oc get pods
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-mp5gn   1/1       Running     0          2d
hawkular-cassandra-2-z8rl0   1/1       Running     2          2d
hawkular-metrics-2z5so       1/1       Running     0          2d
hawkular-metrics-5srpo       1/1       Running     0          2d
heapster-0npf8               1/1       Running     18         2d
metrics-deployer-op83i       0/1       Completed   0          3d


from where is visible that heapster pod was restarted many times for unknown reason. 


Version-Release number of selected component (if applicable):
Openshiftrpm -qa | grep atomic
atomic-openshift-dockerregistry-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-pod-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-clients-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-node-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-tests-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-clients-redistributable-3.3.1.1-1.git.0.629a1d8.el7.x86_64
tuned-profiles-atomic-openshift-node-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-master-3.3.1.1-1.git.0.629a1d8.el7.x86_64
tuned-profiles-atomic-2.7.1-3.el7.noarch
atomic-openshift-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-sdn-ovs-3.3.1.1-1.git.0.629a1d8.el7.x86_64

and metrics images v.3.3 

How reproducible:

I have seen this issue when openshift metrics was supposed to monitor 15k pods across 220 nodes. 


Actual results:
heapster pod fails 

Expected results:


Additional info:
log files for heapster / hawkular / cassandra attached to BZ

Comment 3 Stefan Negrea 2017-08-04 15:47:56 UTC

*** This bug has been marked as a duplicate of bug 1465532 ***