1318726 – Deploying a new pod after metrics is running stops metrics collection

Bug 1318726 - Deploying a new pod after metrics is running stops metrics collection

Summary: Deploying a new pod after metrics is running stops metrics collection

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Hawkular
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Wesley Hearn
QA Contact:	chunchen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	OSOPS_V3
TreeView+	depends on / blocked

Reported:	2016-03-17 15:30 UTC by Wesley Hearn
Modified:	2023-09-14 03:19 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-12 16:33:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
log from the heapster and hawkular pods. (11.79 KB, text/plain) 2016-03-17 15:30 UTC, Wesley Hearn	no flags	Details
heapster log with verbose logging (1.88 MB, text/plain) 2016-03-17 19:25 UTC, Wesley Hearn	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:1064	0	normal	SHIPPED_LIVE	Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update	2016-05-12 20:19:17 UTC

Description Wesley Hearn 2016-03-17 15:30:59 UTC

Created attachment 1137422 [details]
log from the heapster and hawkular pods.

Description of problem:
Metrics stop working after deploying a new pod

Version-Release number of selected component (if applicable):
3.1.1

How reproducible:
Always

Steps to Reproduce:
1. Deploy a 3.1.1.6 cluster with 3.1.1 metrics
2. Deploy a pod
3. Deploy metrics
4. Notice metrics are working
5. Deploy a new pod

Actual results:
Metrics stop working

Expected results:
Metrics to keep working

Additional info:

Comment 1 Matt Wringe 2016-03-17 15:32:48 UTC

Is this related to any pod? Or will any pod reproduce this issue?

Comment 2 Wesley Hearn 2016-03-17 15:33:52 UTC

Any pod will reproduce it from what I have seen.

Comment 3 Matt Wringe 2016-03-17 19:02:15 UTC

I cannot reproduce. Using OSE v3.1.1.6 and Metric components with version 3.1.1 an deployed the hello-openshift pod as a test pod.

Is there anything else you can tell us about your setup? What deployment options you used when deploying the metrics components may help.

The logs don't indicate any errors as to what would be causing this.

@chunchen are you able to reproduce this?

Comment 4 Wesley Hearn 2016-03-17 19:25:10 UTC

Created attachment 1137485 [details]
heapster log with verbose logging

Comment 7 Xia Zhao 2016-03-18 10:56:57 UTC

Progress update: I've got the OSE 3.1 env installed and logging components deployed there, but I didn't get fluentd pod to be running. Need some more time to adjust the encironment and then try to reproduce this issue.

Thanks,
Xia

Comment 8 Matt Wringe 2016-03-18 13:04:52 UTC

@Xia this has nothing to do with logging components. Why are you trying to test with logging here?

Comment 9 Xia Zhao 2016-03-21 02:55:39 UTC

@mwringe, Oops, it's my mistake. Messed up when setting up enviornmennts previously. I will switch to deploying the metrics components with OSE3.1 there.

Comment 10 Xia Zhao 2016-03-21 09:23:39 UTC

@mwringe, I didn't get it reproduced. Here are my test steps:
1. Deployed metrics stacks in metrics namespace on OSE 3.1.1.6 --> worked fine
2. Deployed logging stacks in logging namespace --> Metrics are available for logging stacks
3. Deployed new pods into metrics namespace:
oc new-app --docker-image=docker.io/chunyunchen/java-mainclass:2.2.94-SNAPSHOT -n metrics
--> Metrics are available for the java pod inside metrics namespace

Tested with latest images pulled from brew registry:
openshift3/metrics-deployer    d3b5bd02c6ad
openshift3/metrics-hawkular-metrics    0d825e62d05a
openshift3/metrics-heapster    9a6aa3a55a44
openshift3/metrics-cassandra    2f9af4d01e97

Comment 13 Matt Wringe 2016-03-21 14:09:24 UTC

Just to keep track of a few observations here:

1) it doesn't necessarily look like its an issue where deploying a new pod causes metrics collection to completely stop, it looks more like its an intermittent failure to gather metrics.

2) even when its functioning, the graphs are not right. The most recent values are all zero or near zero even when we are getting metrics. It appears to be an issue with time synchronization causing this.

Comment 14 Xia Zhao 2016-03-22 03:17:55 UTC

@mwringe, by saying the issue reproduced, I meant I encountered the thing that metrics stopped working fine after new pod is deployed to the same project. And you are right the router in default namespace turned to be pending because the node label region=primary somehow missed. I'm working on fix this issue and see how will metrics go.
BTW, I saw good metrics charts with exact stats displayed on web console before deploying the camel-spring pod into metrics project, and metrics service URL https://hawkular-metrics.0318-gtf.qe.rhcloud.com/hawkular/metrics is also accesible at that time.

Comment 15 Xia Zhao 2016-03-22 05:41:56 UTC

@mwringe, After fixed the router issue, metrics components continued working fine in my OSE 3.1. The CPU and memory stats are visible on web console UI. So the original issue about "metrics stopped working after new pod is deployed" is still not reproduced.

Comment 18 Matt Wringe 2016-03-22 19:42:35 UTC

@jdyson: any ideas about what might be causing this? I can't tell if it would be something wrong with heapster or with something else going wrong with the system.

Comment 24 Xia Zhao 2016-04-11 01:55:27 UTC

Verified this as not reproduced according to my comment in #15.

Comment 26 errata-xmlrpc 2016-05-12 16:33:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Comment 27 Red Hat Bugzilla 2023-09-14 03:19:42 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.