Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1318726 - Deploying a new pod after metrics is running stops metrics collection [NEEDINFO]
Deploying a new pod after metrics is running stops metrics collection
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular (Show other bugs)
3.1.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Wesley Hearn
chunchen
:
Depends On:
Blocks: OSOPS_V3
  Show dependency treegraph
 
Reported: 2016-03-17 11:30 EDT by Wesley Hearn
Modified: 2016-09-29 22:16 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 12:33:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
whearn: needinfo? (gshipley)


Attachments (Terms of Use)
log from the heapster and hawkular pods. (11.79 KB, text/plain)
2016-03-17 11:30 EDT, Wesley Hearn
no flags Details
heapster log with verbose logging (1.88 MB, text/plain)
2016-03-17 15:25 EDT, Wesley Hearn
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 16:19:17 EDT

  None (edit)
Description Wesley Hearn 2016-03-17 11:30:59 EDT
Created attachment 1137422 [details]
log from the heapster and hawkular pods.

Description of problem:
Metrics stop working after deploying a new pod

Version-Release number of selected component (if applicable):
3.1.1

How reproducible:
Always

Steps to Reproduce:
1. Deploy a 3.1.1.6 cluster with 3.1.1 metrics
2. Deploy a pod
3. Deploy metrics
4. Notice metrics are working
5. Deploy a new pod

Actual results:
Metrics stop working

Expected results:
Metrics to keep working

Additional info:
Comment 1 Matt Wringe 2016-03-17 11:32:48 EDT
Is this related to any pod? Or will any pod reproduce this issue?
Comment 2 Wesley Hearn 2016-03-17 11:33:52 EDT
Any pod will reproduce it from what I have seen.
Comment 3 Matt Wringe 2016-03-17 15:02:15 EDT
I cannot reproduce. Using OSE v3.1.1.6 and Metric components with version 3.1.1 an deployed the hello-openshift pod as a test pod.

Is there anything else you can tell us about your setup? What deployment options you used when deploying the metrics components may help.

The logs don't indicate any errors as to what would be causing this.

@chunchen are you able to reproduce this?
Comment 4 Wesley Hearn 2016-03-17 15:25 EDT
Created attachment 1137485 [details]
heapster log with verbose logging
Comment 7 Xia Zhao 2016-03-18 06:56:57 EDT
Progress update: I've got the OSE 3.1 env installed and logging components deployed there, but I didn't get fluentd pod to be running. Need some more time to adjust the encironment and then try to reproduce this issue.

Thanks,
Xia
Comment 8 Matt Wringe 2016-03-18 09:04:52 EDT
@Xia this has nothing to do with logging components. Why are you trying to test with logging here?
Comment 9 Xia Zhao 2016-03-20 22:55:39 EDT
@mwringe, Oops, it's my mistake. Messed up when setting up enviornmennts previously. I will switch to deploying the metrics components with OSE3.1 there.
Comment 10 Xia Zhao 2016-03-21 05:23:39 EDT
@mwringe, I didn't get it reproduced. Here are my test steps:
1. Deployed metrics stacks in metrics namespace on OSE 3.1.1.6 --> worked fine
2. Deployed logging stacks in logging namespace --> Metrics are available for logging stacks
3. Deployed new pods into metrics namespace:
oc new-app --docker-image=docker.io/chunyunchen/java-mainclass:2.2.94-SNAPSHOT -n metrics
--> Metrics are available for the java pod inside metrics namespace

Tested with latest images pulled from brew registry:
openshift3/metrics-deployer    d3b5bd02c6ad
openshift3/metrics-hawkular-metrics    0d825e62d05a
openshift3/metrics-heapster    9a6aa3a55a44
openshift3/metrics-cassandra    2f9af4d01e97
Comment 13 Matt Wringe 2016-03-21 10:09:24 EDT
Just to keep track of a few observations here:

1) it doesn't necessarily look like its an issue where deploying a new pod causes metrics collection to completely stop, it looks more like its an intermittent failure to gather metrics.

2) even when its functioning, the graphs are not right. The most recent values are all zero or near zero even when we are getting metrics. It appears to be an issue with time synchronization causing this.
Comment 14 Xia Zhao 2016-03-21 23:17:55 EDT
@mwringe, by saying the issue reproduced, I meant I encountered the thing that metrics stopped working fine after new pod is deployed to the same project. And you are right the router in default namespace turned to be pending because the node label region=primary somehow missed. I'm working on fix this issue and see how will metrics go.
BTW, I saw good metrics charts with exact stats displayed on web console before deploying the camel-spring pod into metrics project, and metrics service URL https://hawkular-metrics.0318-gtf.qe.rhcloud.com/hawkular/metrics is also accesible at that time.
Comment 15 Xia Zhao 2016-03-22 01:41:56 EDT
@mwringe, After fixed the router issue, metrics components continued working fine in my OSE 3.1. The CPU and memory stats are visible on web console UI. So the original issue about "metrics stopped working after new pod is deployed" is still not reproduced.
Comment 18 Matt Wringe 2016-03-22 15:42:35 EDT
@jdyson: any ideas about what might be causing this? I can't tell if it would be something wrong with heapster or with something else going wrong with the system.
Comment 24 Xia Zhao 2016-04-10 21:55:27 EDT
Verified this as not reproduced according to my comment in #15.
Comment 26 errata-xmlrpc 2016-05-12 12:33:23 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.