Bug 1832125

Summary:	Couldn't find ES metrics in prometheus server: x509: certificate isn't valid for elasticsearch-metrics.openshift-logging.svc
Product:	OpenShift Container Platform	Reporter:	Qiaoling Tang <qitang>
Component:	Logging	Assignee:	Periklis Tsirakidis <periklis>
Status:	CLOSED ERRATA	QA Contact:	Qiaoling Tang <qitang>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	4.5	CC:	anli, aos-bugs, ewolinet, periklis
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Container port for metrics not exposed in proxy container. Consequence: Prometheus can not reach ES through proxy to scape the metrics. Fix: Expose the metrics port in proxy container. Result: Metrics can be scapped again.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-13 17:35:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1833438

Description Qiaoling Tang 2020-05-06 07:02:16 UTC

Description of problem:
Deploy logging 4.5, then log into prometheus console, check the targets status in the console, all the ES targets are down, error message:
Get https://10.128.2.51:60000/_prometheus/metrics: x509: certificate is valid for localhost, elasticsearch, elasticsearch.cluster.local, elasticsearch.openshift-logging.svc, elasticsearch.openshift-logging.svc.cluster.local, not elasticsearch-metrics.openshift-logging.svc

$ oc get servicemonitor  monitor-elasticsearch-cluster -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2020-05-06T06:34:53Z"
  generation: 1
  labels:
    cluster-name: elasticsearch
    scrape-metrics: enabled
 <--snip--->
  name: monitor-elasticsearch-cluster
  namespace: openshift-logging
  ownerReferences:
  - apiVersion: logging.openshift.io/v1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 91b9c83b-4e86-484b-a60d-7c0731923259
  resourceVersion: "166813"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-logging/servicemonitors/monitor-elasticsearch-cluster
  uid: c9c83a26-248a-4817-8f38-7c94cf31b6d7
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    path: /_prometheus/metrics
    port: elasticsearch
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      serverName: elasticsearch-metrics.openshift-logging.svc
  jobLabel: monitor-elasticsearch
  namespaceSelector:
    matchNames:
    - openshift-logging
  selector:
    matchLabels:
      cluster-name: elasticsearch
      scrape-metrics: enabled

$ oc get svc elasticsearch-metrics -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: elasticsearch-metrics
    service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1588723584
    service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1588723584
  creationTimestamp: "2020-05-06T06:34:39Z"
  labels:
    cluster-name: elasticsearch
    scrape-metrics: enabled
  managedFields:
 <--snip--->  
 
  name: elasticsearch-metrics
  namespace: openshift-logging
  ownerReferences:
  - apiVersion: logging.openshift.io/v1
    controller: true
    kind: Elasticsearch
    name: elasticsearch
    uid: 91b9c83b-4e86-484b-a60d-7c0731923259
  resourceVersion: "166512"
  selfLink: /api/v1/namespaces/openshift-logging/services/elasticsearch-metrics
  uid: 81985480-d8db-43cf-9093-dec67b01f7a7
spec:
  clusterIP: 172.30.7.71
  ports:
  - name: elasticsearch
    port: 60001
    protocol: TCP
    targetPort: restapi
  selector:
    cluster-name: elasticsearch
    es-node-client: "true"
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}


I can get ES metrics by executing command `oc exec elasticsearch-cdm-3zthl4gs-1-7f49bbb8-zbxh2 -- es_util --query=_prometheus/metrics`


Version-Release number of selected component (if applicable):
logging images are from 4.5.0-0.ci-2020-05-05-220426
manifests are copied from the master branch
cluster version: 4.5.0-0.nightly-2020-05-04-113741 

How reproducible:
Always

Steps to Reproduce:
1. deploy logging 4.5
2. login to prometheus console, go to Status-->Targets page, check the `openshift-logging/monitor-elasticsearch-cluster`
3.

Actual results:
The prometheus server can't collect the ES cluster's metrics

Expected results:
The ES metrics could be found in the prometheus server.

Additional info:
All the fluentd targets are up and the metrics of fluentd could be found in prometheus server.

Comment 5 Qiaoling Tang 2020-05-15 06:47:28 UTC

Verified with images from 4.5.0-0.ci-2020-05-14-224329.

Comment 6 errata-xmlrpc 2020-07-13 17:35:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409