Bug 1695903

Summary: Could not monitor Elasticsearch with Prometheus with OCP 3.11
Product: OpenShift Container Platform Reporter: hgomes
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: anpicker, aos-bugs, erooth, jcantril, mloibl, pkrupa, rmeggins, surbania
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The oauth-proxy was not passing a user's token Consequence: Elasticsearch did not have a token to evaluate if a user could retrieve metrics Fix: add the proper switch to the oauth-proxy Result: User's with the proper role can retrieve metrics
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-26 09:07:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description hgomes 2019-04-03 21:22:04 UTC
Description of problem:

This might not even be a problem, but would like to understand if would this proceed as RFE maybe?

Elasticsearch and prometheus are installed with the standardinstallation of openshift. I expect
- prometheus could access the metrics of elasticsearch 
- prometheus is configured to scap them
- alerts are configured (e.g. disc is getting full in one week)
- dashboards are present in grafana 

For instance this project https://github.com/justwatchcom/elasticsearch_exporter is providing what we expect. We also expect this for fluentd and kibana.

From customer perspective:
~~~
I consider this is a bug. What is your roadmap to fix this bug? We/and our clients need the solution now. Can you provide us guidance and advices to work on this topic? How could we provide our solution upstream?
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Frederic Branczyk 2019-04-04 07:55:06 UTC
Each component is responsible for shipping their monitoring, so reassigning this to the logging component. As far as I am aware though the team has shipped scraping and alerting for 4.1, but I'd prefer if they would confirm that.

Comment 2 Jeff Cantrill 2019-04-12 19:29:46 UTC
Metrics are available in 3.11, though during investigation of another issue I discovered we are unable to pull them through our proxy because of a missing switch.  I will use this bz to fix that.  I believe there may be a second issue, however, which we corrected in 4.x.  I believe even if you provide the correct service account that you will not have properly signed certs unless you ignore who signed them.  Logging creates its own certs and builds it's own truststore.

To setup:

1. Define the service account in your inventory file(openshift_prometheus_namespace, openshift_logging_elasticsearch_prometheus_sa) which will be bound to this role: prometheus-metrics-viewer
2. Deploy logging using the 3.11 fix that will be associated with this bz
3. Retrieve metrics like: 'curl -k https://<logging-es-prometheus service>/_prometheus/metrics -H "Authorization : Bearer $sa_token"

I defer to the monitoring team how to configure prometheus as I'm unfamiliar with that end. Following is the documentation we presently have regarding metrics [1].

[1]https://github.com/openshift/origin-aggregated-logging/blob/master/docs/metrics.md#elasticsearch

Comment 6 Anping Li 2019-06-13 07:01:20 UTC
The metrics can be fetched using the token of serviceaccount system:serviceaccount:openshift-monitoring:prometheus-k8s.  @Frederic, you are correct, to display the elasticsearch metrics and apply the rules, you need to provide rules files to prometheus.  that appened automatically in 4.x.

Comment 8 Anping Li 2019-06-14 09:46:51 UTC
I'd like close this bug as the elasticsearch can expose the metrics via token. For the further requirement, such display metrics in prometheus. please workaround yourself or file a RFE bug.

Comment 10 errata-xmlrpc 2019-06-26 09:07:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605