Bug 1391996 - Openshift Metrics Heapster pod restarting when Openshift metrics configured to monitor many pods ( in this specific case 15k )
Summary: Openshift Metrics Heapster pod restarting when Openshift metrics configured t...
Keywords:
Status: CLOSED DUPLICATE of bug 1465532
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.3.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard: aos-scalability-34
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-04 14:47 UTC by Elvir Kuric
Modified: 2017-08-04 15:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-04 15:47:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
metrics pods logs (2.55 MB, application/x-7z-compressed)
2016-11-04 14:47 UTC, Elvir Kuric
no flags Details

Description Elvir Kuric 2016-11-04 14:47:25 UTC
Created attachment 1217416 [details]
metrics pods logs

Description of problem:

# oc get pods 
when executed in openshift-infra project gives output as 

# oc get pods
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-mp5gn   1/1       Running     0          2d
hawkular-cassandra-2-z8rl0   1/1       Running     2          2d
hawkular-metrics-2z5so       1/1       Running     0          2d
hawkular-metrics-5srpo       1/1       Running     0          2d
heapster-0npf8               1/1       Running     18         2d
metrics-deployer-op83i       0/1       Completed   0          3d


from where is visible that heapster pod was restarted many times for unknown reason. 


Version-Release number of selected component (if applicable):
Openshiftrpm -qa | grep atomic
atomic-openshift-dockerregistry-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-pod-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-clients-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-node-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-tests-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-clients-redistributable-3.3.1.1-1.git.0.629a1d8.el7.x86_64
tuned-profiles-atomic-openshift-node-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-master-3.3.1.1-1.git.0.629a1d8.el7.x86_64
tuned-profiles-atomic-2.7.1-3.el7.noarch
atomic-openshift-3.3.1.1-1.git.0.629a1d8.el7.x86_64
atomic-openshift-sdn-ovs-3.3.1.1-1.git.0.629a1d8.el7.x86_64

and metrics images v.3.3 

How reproducible:

I have seen this issue when openshift metrics was supposed to monitor 15k pods across 220 nodes. 


Actual results:
heapster pod fails 

Expected results:


Additional info:
log files for heapster / hawkular / cassandra attached to BZ

Comment 3 Stefan Negrea 2017-08-04 15:47:56 UTC

*** This bug has been marked as a duplicate of bug 1465532 ***


Note You need to log in before you can comment on or make changes to this bug.