Bug 1696249

Summary:	pod metrics are available on CLI but not available in the UI
Product:	OpenShift Container Platform	Reporter:	Shivkumar Ople <sople>
Component:	Hawkular	Assignee:	Jan Martiska <jmartisk>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.11.0	CC:	aos-bugs, jmartisk, vlaad
Target Milestone:	---
Target Release:	3.11.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-26 09:08:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shivkumar Ople 2019-04-04 12:14:46 UTC

Description of problem:

All the three pods from the openshift-infra projects are running but the Metrics for pods are not visible in the UI, but from the CLI metrics are available when checked with (oc adm top command).

Version-Release number of selected component (if applicable):

How reproducible:
always

Steps to Reproduce:
1.Install metrics in OCP
2. Going further can check if the liveness and readiness probes are failing or not & the metrics are visible in UI

Actual results:
Probes are failing and metrics not available in UI

Expected results:

Probes should not fail and metrics should be available on UI

Additional info:

The project events are showing that readiness and liveness probes are failing but the scripts for liveness and readiness probes inside the pods seem to be executed successfully.
(ex: /opt/hawkular/scripts/hawkular-metrics-liveness.py)

Following are the events captured from the openshift-infra project,

11:18:00 AM hawkular-metrics-xr4g8 Pod Warning Unhealthy Readiness probe failed: The MetricService is not yet in the STARTED state [STARTING]. We need to wait until its in the STARTED state.
11:17:45 AM hawkular-metrics-xr4g8 Pod Warning Unhealthy Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again.
2 times in the last 4 minutes
11:17:38 AM hawkular-cassandra-1-4ml4c Pod Warning Unhealthy Readiness probe failed: Could not get the Cassandra status. This may mean that the Cassandra instance is not up yet. Will try again nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
11:17:38 AM hawkular-metrics-xr4g8 Pod Warning Unhealthy Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. Traceback (most recent call last): File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module> if int(uptime) < int(timeout): ValueError: invalid literal for int() with base 10: ''

Comment 1 Jan Martiska 2019-04-05 06:04:52 UTC

Are we sure the hawkular-metrics and hawkular-cassandra pods did not transition to successfully running later? Because these probe failures are normal during startup of the pods. Once the pods start up these warnings should stop occurring. I see these all are within 22 seconds, it would only be a concern if this continued to appear for more than a few minutes. Does "oc get pod -n openshift-infra" reveal that the pods are in Ready state?

Comment 2 Shivkumar Ople 2019-04-05 12:25:22 UTC

Hello,


  Are we sure the hawkular-metrics and hawkular-cassandra pods did not transition to successfully running later?
  
  -- Those were already running (probes are failing after successful pod start.)

 Does "oc get pod -n openshift-infra" reveal that the pods are in Ready state?

-- Yes, pods from the openshift-infra project are in running state.

Thank you!

Comment 3 Jan Martiska 2019-04-08 12:13:03 UTC

Then I don't understand - if the probes are failing after successful pod start, how can the pods be in running state? Do you mean that the readiness probe is failing, so they are Running, but not Ready? Could they share the output of "oc get pods -n openshift-infra" and perhaps "oc get events -n openshift-infra" so I can see what the exact status of the pods is? I need to see if the pods are getting restarted periodically, which would happen because a probe reaching is failure threshold.

Comment 4 Shivkumar Ople 2019-04-08 12:50:45 UTC

Hello Jan,

Below are the results. oc get events is empty at the moment.

Here are the output:


# oc get pods -n openshift-infra
NAME                            READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-klx5p      1/1       Running     0          13d
hawkular-metrics-pkfz2          1/1       Running     0          13d
hawkular-metrics-schema-n7n8l   0/1       Completed   0          24d
heapster-b6pds                  1/1       Running     0          17d

# oc get events -n openshift-infra
No resources found.

Thank you

Comment 5 Jan Martiska 2019-04-08 13:18:55 UTC

From the "oc get events" it is apparent that the pods are running fine. If probes were failing, the pods would not be Running and Ready. There aren't any restarts either.

So if they are having with seeing the metrics in the console, could it be because the console is misconfigured? Is the console showing an error message such as "An error occurred getting metrics."? This happens when there is a value for metricsPublicURL but the metrics can't be found there. Do they have a metrics URL configured in the web console config?

oc get openshiftwebconsoleconfig instance -o json -n openshift-web-console

Here in the resulting JSON document, there should be a value at the path /spec/config/clusterInfo/metricsPublicURL, the path should point at the hawkular-metrics route. The path of the route can be found by running "oc get route  hawkular-metrics -n openshift-infra" (and https:// must be prepended before it).

Another possible explanation for the error might be that Heapster is not collecting metrics about pods for some reason. Can they check Heapster logs if there are any errors? Heapster is a pod running in the openshift-infra namespace.

Comment 19 Jan Martiska 2019-04-23 06:35:35 UTC

PR: https://github.com/openshift/openshift-ansible/pull/11520

Comment 21 Junqi Zhao 2019-06-14 08:16:47 UTC

# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.11.119-1.git.0.c9a8ebf.el7.noarch
openshift-ansible-playbooks-3.11.119-1.git.0.c9a8ebf.el7.noarch
openshift-ansible-3.11.119-1.git.0.c9a8ebf.el7.noarch
openshift-ansible-roles-3.11.119-1.git.0.c9a8ebf.el7.noarch

set openshift_metrics_heapster_standalone is defined as false with small f, they installation is successful

openshift_metrics_heapster_standalone=false
openshift_metrics_install_metrics=true
openshift_metrics_cassandra_storage_type=dynamic

Comment 23 errata-xmlrpc 2019-06-26 09:08:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605