Bug 1832125 - Couldn't find ES metrics in prometheus server: x509: certificate isn't valid for elasticsearch-metrics.openshift-logging.svc
Summary: Couldn't find ES metrics in prometheus server: x509: certificate isn't valid ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.5.0
Assignee: Periklis Tsirakidis
QA Contact: Qiaoling Tang
URL:
Whiteboard:
Depends On:
Blocks: 1833438
TreeView+ depends on / blocked
 
Reported: 2020-05-06 07:02 UTC by Qiaoling Tang
Modified: 2020-07-13 17:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Container port for metrics not exposed in proxy container. Consequence: Prometheus can not reach ES through proxy to scape the metrics. Fix: Expose the metrics port in proxy container. Result: Metrics can be scapped again.
Clone Of:
Environment:
Last Closed: 2020-07-13 17:35:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 354 0 None closed Bug 1832125: Expose es-metrics svc target port in proxy container 2020-10-16 01:41:59 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:35:32 UTC

Description Qiaoling Tang 2020-05-06 07:02:16 UTC
Description of problem:
Deploy logging 4.5, then log into prometheus console, check the targets status in the console, all the ES targets are down, error message:
Get https://10.128.2.51:60000/_prometheus/metrics: x509: certificate is valid for localhost, elasticsearch, elasticsearch.cluster.local, elasticsearch.openshift-logging.svc, elasticsearch.openshift-logging.svc.cluster.local, not elasticsearch-metrics.openshift-logging.svc

$ oc get servicemonitor  monitor-elasticsearch-cluster -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2020-05-06T06:34:53Z"
  generation: 1
  labels:
    cluster-name: elasticsearch
    scrape-metrics: enabled
 <--snip--->
  name: monitor-elasticsearch-cluster
  namespace: openshift-logging
  ownerReferences:
  - apiVersion: logging.openshift.io/v1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 91b9c83b-4e86-484b-a60d-7c0731923259
  resourceVersion: "166813"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/monitor-elasticsearch-cluster
  uid: c9c83a26-248a-4817-8f38-7c94cf31b6d7
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    path: /_prometheus/metrics
    port: elasticsearch
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      serverName: elasticsearch-metrics.openshift-logging.svc
  jobLabel: monitor-elasticsearch
  namespaceSelector:
    matchNames:
    - openshift-logging
  selector:
    matchLabels:
      cluster-name: elasticsearch
      scrape-metrics: enabled

$ oc get svc elasticsearch-metrics -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: elasticsearch-metrics
    service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1588723584
    service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1588723584
  creationTimestamp: "2020-05-06T06:34:39Z"
  labels:
    cluster-name: elasticsearch
    scrape-metrics: enabled
  managedFields:
 <--snip--->  
 
  name: elasticsearch-metrics
  namespace: openshift-logging
  ownerReferences:
  - apiVersion: logging.openshift.io/v1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 91b9c83b-4e86-484b-a60d-7c0731923259
  resourceVersion: "166512"
  selfLink: /api/v1/namespaces/openshift-logging/services/elasticsearch-metrics
  uid: 81985480-d8db-43cf-9093-dec67b01f7a7
spec:
  clusterIP: 172.30.7.71
  ports:
  - name: elasticsearch
    port: 60001
    protocol: TCP
    targetPort: restapi
  selector:
    cluster-name: elasticsearch
    es-node-client: "true"
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}


I can get ES metrics by executing command `oc exec elasticsearch-cdm-3zthl4gs-1-7f49bbb8-zbxh2 -- es_util --query=_prometheus/metrics`


Version-Release number of selected component (if applicable):
logging images are from 4.5.0-0.ci-2020-05-05-220426
manifests are copied from the master branch
cluster version: 4.5.0-0.nightly-2020-05-04-113741 

How reproducible:
Always

Steps to Reproduce:
1. deploy logging 4.5
2. login to prometheus console, go to Status-->Targets page, check the `openshift-logging/monitor-elasticsearch-cluster`
3.

Actual results:
The prometheus server can't collect the ES cluster's metrics

Expected results:
The ES metrics could be found in the prometheus server.

Additional info:
All the fluentd targets are up and the metrics of fluentd could be found in prometheus server.

Comment 5 Qiaoling Tang 2020-05-15 06:47:28 UTC
Verified with images from 4.5.0-0.ci-2020-05-14-224329.

Comment 6 errata-xmlrpc 2020-07-13 17:35:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.