Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1537857

Summary:

[3.9]ES prometheus metrics interface returns nothing, can not show the ES prometheus metrics

Product:

OpenShift Container Platform

Reporter:

Junqi Zhao <juzhao>

Component:

Logging

Assignee:

Jeff Cantrill <jcantril>

Status:

CLOSED ERRATA

QA Contact:

Junqi Zhao <juzhao>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.9.0

CC:

aos-bugs, jcantril, juzhao, pweil, rmeggins, ssorce, wsun, xtian

Target Milestone:

---

Keywords:

Regression, TestBlocker

Target Release:

3.9.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: A security fix to the openshift-elasticsearch-plugin caused metrics requests to be rejected because the oauth-proxy does not pass the bearer token. Consequence: Fix: Modify the plugin to accept username/password from the oauth-proxy and deploy with a randomly generated password Result: Data and metrics are correctly secured and able to be retrieved based on authorization

Story Points:

---

Clone Of:

1510320

Environment:

Last Closed:

2018-03-28 14:21:18 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1510320

Bug Blocks:

Attachments:

Description	Flags
es dc info	none
issue is not fixed	none
could show ES metrics output by API	none

Comment 1 Junqi Zhao 2018-01-24 04:04:14 UTC

Expose the 3.9 logging ES endpoints by Prometheus metrics testing is blocked

Comment 2 openshift-github-bot 2018-02-02 02:41:36 UTC

Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/93814bd617f633e613118b710b7fa33ff975c994
bug 1537857. Fix retrieving prometheus metrics

https://github.com/openshift/openshift-ansible/commit/fbdfa66f06abd8c026c1c14e292505c002d46dfe
Merge pull request #6903 from jcantrill/1537857_fix_logging_prometheus

bug 1537857. Fix retrieving prometheus metrics

Comment 3 openshift-github-bot 2018-02-06 00:03:58 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/058f366072505f39a5c5aad3faa98cf7e47eaddd
bug 1537857. Fix retrieving prometheus metrics

https://github.com/openshift/origin-aggregated-logging/commit/6c5c6a5ab12414b30d390e874b681616ff6f9aa6
Merge pull request #920 from jcantrill/1537857_fix_prometheus

Automatic merge from submit-queue.

bug 1537857. Fix retrieving prometheus metrics

This PR:

* bumps openshift-elasticsearch-plugin to fix retrieving metrics
* bumps the prometheus exporter to fix issue related to SG and plugin

Depends on: 
* https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/119
* https://github.com/fvvanholl/elasticsearch-prometheus-exporter/pull/82
* https://github.com/openshift/openshift-ansible/pull/6903

Comment 4 Junqi Zhao 2018-02-06 05:59:20 UTC

need new ES image to test, current latest image is v3.9.0-0.38.0.0

Comment 5 Junqi Zhao 2018-02-07 08:05:18 UTC

Tested with logging-elasticsearch:v3.9.0-0.39.0.0,
sh-4.2$ env | grep -i ver
OSE_ES_VER=2.4.4.21
RECOVER_EXPECTED_NODES=1
ES_VER=2.4.4
RECOVER_AFTER_NODES=1
PROMETHEUS_EXPORTER_VER=2.4.4.1
ES_CLOUD_K8S_VER=2.4.4.01
RECOVER_AFTER_TIME=5m
JAVA_VER=1.8.0

# oc get po -n logging -o wide | grep logging-es
logging-es-data-master-m69jkv20-1-v7csk       2/2       Running   0          2h        10.129.0.56   172.16.120.91
logging-es-ops-data-master-aymsaqly-1-lfz9c   2/2       Running   0          1h        10.129.0.57   172.16.120.91

# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://10.129.0.57:4443/_prometheus/metrics -I
HTTP/1.1 403 Forbidden
Set-Cookie: _oauth_proxy=; Path=/; Domain=10.129.0.57; Expires=Wed, 07 Feb 2018 07:04:49 GMT; HttpOnly; Secure
Date: Wed, 07 Feb 2018 08:04:49 GMT
Content-Type: text/html; charset=utf-8

Change back to MODIFIED

Comment 6 Jeff Cantrill 2018-02-07 14:44:41 UTC

is 'prometheus' the same user identified as the metrics user for the proxy and the ES container:

Sample of my local:

name: PROMETHEUS_USER
value: system:serviceaccount:logging:aggregated-logging-elasticsearch


proxy:
- -client-id=system:serviceaccount:logging:aggregated-logging-elasticsearch
- -basic-auth-password=41m0z4jgrIssHG7U

Please note the addition of the password to the proxy

Looking at the build log for the image it is missing the required plugin:

openshift-elasticsearch-plugin-2.4.4.17__redhat_1-3.el7
http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/5812/15225812/x86_64.log

and it needs:

openshift-elasticsearch-plugin-2.4.4.21

Comment 7 Junqi Zhao 2018-02-08 07:17:56 UTC

see the attached ES DC file
 
        - name: PROMETHEUS_USER
          value: system:serviceaccount:prometheus:prometheus

        - -client-id=system:serviceaccount:prometheus:prometheus

Since prometheus is deployed to openshift-metrics by default,the above settings should be 

system:serviceaccount:openshift-metrics:prometheus

It seems our code always set prometheus project name to prometheus.

Even changed the above settings, still get 403 Forbidden like Comment 5.


openshift-elasticsearch-plugin-2.4.4.17__redhat_1-3.el7 is still in v3.9.0-0.41.0.0, will test this defect when ES image packaged openshift-elasticsearch-plugin-2.4.4.21

Comment 8 Junqi Zhao 2018-02-08 07:18:22 UTC

Created attachment 1393033 [details]
es dc info

Comment 12 Junqi Zhao 2018-02-22 05:33:11 UTC

use
# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://${pod_ip}:4443/_prometheus/metrics, it will navigate to 
"Sign in with an OpenShift account" process, and can't show the ES metrics, it seems we still have authentication problems, see the attached file

$ rpm -qa | grep elasticsearch-plugin
openshift-elasticsearch-plugin-2.4.4.21__redhat_1-1.el7.noarch

logging-elasticsearch/images/v3.9.0-0.47.0.0

Comment 13 Junqi Zhao 2018-02-22 05:33:58 UTC

Created attachment 1399146 [details]
issue is not fixed

Comment 14 Jeff Cantrill 2018-02-22 15:56:18 UTC

Did you redeploy the image using openshift ansible.  The changes require the referenced version of the plugin and the DC to have: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/templates/2.x/es.j2#L143

which is password auth between the oauth proxy and elasticsearch.

Comment 16 Jeff Cantrill 2018-02-23 20:42:29 UTC

I have been working on your environment and one thing I did note is that you are using the defaults but the expeced user is in the openshift-metrics namespace.  The default is defined here [1].  If we intend to deploy metrics in the 'openshift-metrics' namespace, we should at minimum change the default.

The ripple of changes here requires:

* Elasticsearch DC edit to modify PROMETHEUS_USER
* Elasticsearch DC edit to modify -client-id in the proxy
* Edit the logging-elasticsearch secret passwd.yml entry to have the correct username
* Grant the role the correct permissions which is verifiable by [2]. 'oc policy who-can' is currently broken

I have been unable to modify your environment to fix the issue and recommend redeploying with the desired SA.  The result is the following:

Metrics can only be retrieved with username/password when hitting Elasticsearch directly. You would need to use the username/password in the passwd.yml which is not the desired way to access.  The desired mannor is to use Bearer token and go through the proxy via the service or directly to the pod. 


[1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/defaults/main.yml#L45
[2] # curl -k -XPOST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" -H "Accept: application/json, */*" https://${SERVER}/apis/authorization.openshift.io/v1/namespaces/logging/localresourceaccessreviews -d '{"kind":"LocalResourceAccessReview","apiVersion":"authorization.openshift.io/v1","namespace":"","verb":"view","resourceAPIGroup":"metrics.openshift.io","resourceAPIVersion":"","resource":"prometheus","resourceName":"","path":"","isNonResourceURL":false,"content":null}'

Comment 17 Junqi Zhao 2018-02-24 02:09:49 UTC

(In reply to Jeff Cantrill from comment #16)
> I have been working on your environment and one thing I did note is that you
> are using the defaults but the expeced user is in the openshift-metrics
> namespace.  The default is defined here [1].  If we intend to deploy metrics
> in the 'openshift-metrics' namespace, we should at minimum change the
> default.

Yes, as I mentioned in Comment 7, prometheus is deployed to openshift-metrics by default, we need to change our code, since user can deploy prometheus to any preferred namespaces. It seems openshift_prometheus_namespace did not get the correct namespace.

Move it to Assigned

Comment 18 Junqi Zhao 2018-02-24 04:26:12 UTC

even deployed prometheus to prometheus namespace, get "system:serviceaccount:openshift-metrics:prometheus" by command [1]:

I guess openshift_prometheus_namespace did not get the correct namespace.


[1]$ curl -k -XPOST -H "Authorization: Bearer `oc whoami -t`" -H "Content-Type: application/json" -H "Accept: application/json, */*" https://host-8-244-77.host.centralci.eng.rdu2.redhat.com:8443/aps/authorization.openshift.io/v1/namespaces/logging/localresourceaccessreviews -d '{"kind":"LocalResourceAccessReview","apiVersion":"authorization.openshift.io/v1","namespace":"","verb":"view","resourceAPIGroup":"metrics.openshift.io","resourceAPIVersion":"","resource":"prometheus","resourceName":"","path":"","isNonResourceURL":false,"content":null}'
{
  "kind": "ResourceAccessReviewResponse",
  "apiVersion": "authorization.openshift.io/v1",
  "namespace": "logging",
  "users": [
    "admin",
    "juzhao",
    "system:admin",
    "system:serviceaccount:kube-system:clusterrole-aggregation-controller",
    "system:serviceaccount:openshift-metrics:prometheus"
  ],
  "groups": [
    "system:cluster-admins",
    "system:masters"
  ],
  "evalutionError": ""
}


[2] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/defaults/main.yml#L45

Comment 19 Junqi Zhao 2018-02-24 04:31:10 UTC

passwd.yml and es dc have the same username and passwd
# oc rsh logging-es-data-master-86bziich-1-wtpjr
Defaulting container name to elasticsearch.
Use 'oc describe pod/logging-es-data-master-86bziich-1-wtpjr -n logging' to see all of the containers in this pod.
sh-4.2$ cat /etc/elasticsearch/secret/passwd.yml
"system:serviceaccount:prometheus:prometheus":
  passwd: "aWppUENJV2tyTmpta3VXUA=="

# oc get dc logging-es-data-master-86bziich -o yaml | grep client-id -A 5
        - -client-id=system:serviceaccount:prometheus:prometheus
        - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
        - -cookie-secret=VllhWnlZa25qdnAzQ3NYaA==
        - -basic-auth-password=aWppUENJV2tyTmpta3VXUA==
        - -upstream=https://localhost:9200
        - '-openshift-sar={"namespace": "logging", "verb": "view", "resource": "prometheus",

Comment 20 Jeff Cantrill 2018-02-26 21:41:00 UTC

Additional PR to fix ansible changes where we try to keep the password so we dont redeploy the ES image

https://github.com/openshift/openshift-ansible/pull/7294

Comment 21 openshift-github-bot 2018-02-27 07:05:56 UTC

Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/af9c8cd48ad7d74206385e2530221796f55e3096
bug 1537857. Additional logging proxy metrics fixes

https://github.com/openshift/openshift-ansible/commit/f123167eb3442e789d44b23fc9b5c92c73e444e3
Merge pull request #7294 from jcantrill/1537857_part2

Automatic merge from submit-queue.

bug 1537857. Additional logging proxy metrics fixes

This PR provides additional fixes to:

* set the password correctly by properly decoding
* modifying the default prometheus namespace if one isnt provided

ref: https://bugzilla.redhat.com/show_bug.cgi?id=1537857

Comment 22 Xiaoli Tian 2018-02-28 07:42:09 UTC

The change has been merged latest build.

Comment 23 Junqi Zhao 2018-02-28 07:51:17 UTC

Tested against openshift-ansible-3.9.1-1, could show ES prometheus metrics by REST API, see the attached file.

env:
# openshift version
openshift v3.9.1
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16


# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.9.1-1.git.0.9862628.el7.noarch
openshift-ansible-roles-3.9.1-1.git.0.9862628.el7.noarch
openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch
openshift-ansible-playbooks-3.9.1-1.git.0.9862628.el7.noarch

Comment 24 Junqi Zhao 2018-02-28 07:52:28 UTC

Created attachment 1401667 [details]
could show ES metrics output by API

Comment 27 errata-xmlrpc 2018-03-28 14:21:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489