Bug 1537857 - [3.9]ES prometheus metrics interface returns nothing, can not show the ES prometheus metrics
Summary: [3.9]ES prometheus metrics interface returns nothing, can not show the ES pro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.0
Assignee: Jeff Cantrill
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1510320
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-24 02:07 UTC by Junqi Zhao
Modified: 2018-03-28 14:22 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A security fix to the openshift-elasticsearch-plugin caused metrics requests to be rejected because the oauth-proxy does not pass the bearer token. Consequence: Fix: Modify the plugin to accept username/password from the oauth-proxy and deploy with a randomly generated password Result: Data and metrics are correctly secured and able to be retrieved based on authorization
Clone Of: 1510320
Environment:
Last Closed: 2018-03-28 14:21:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
es dc info (6.12 KB, text/plain)
2018-02-08 07:18 UTC, Junqi Zhao
no flags Details
issue is not fixed (3.22 KB, text/plain)
2018-02-22 05:33 UTC, Junqi Zhao
no flags Details
could show ES metrics output by API (110.58 KB, text/plain)
2018-02-28 07:52 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github fabric8io openshift-elasticsearch-plugin pull 119 0 None closed bug 1537857. Remove user header for requests without tokens and 2021-01-23 11:12:30 UTC
Github openshift openshift-ansible pull 7294 0 None closed bug 1537857. Additional logging proxy metrics fixes 2021-01-23 11:11:49 UTC
Github openshift origin-aggregated-logging pull 920 0 None closed bug 1537857. Fix retrieving prometheus metrics 2021-01-23 11:12:30 UTC
Github penshift openshift-ansible pull 6903 0 None None None 2020-09-11 08:12:53 UTC
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:22:06 UTC

Comment 1 Junqi Zhao 2018-01-24 04:04:14 UTC
Expose the 3.9 logging ES endpoints by Prometheus metrics testing is blocked

Comment 2 openshift-github-bot 2018-02-02 02:41:36 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/93814bd617f633e613118b710b7fa33ff975c994
bug 1537857. Fix retrieving prometheus metrics

https://github.com/openshift/openshift-ansible/commit/fbdfa66f06abd8c026c1c14e292505c002d46dfe
Merge pull request #6903 from jcantrill/1537857_fix_logging_prometheus

bug 1537857. Fix retrieving prometheus metrics

Comment 3 openshift-github-bot 2018-02-06 00:03:58 UTC
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/058f366072505f39a5c5aad3faa98cf7e47eaddd
bug 1537857. Fix retrieving prometheus metrics

https://github.com/openshift/origin-aggregated-logging/commit/6c5c6a5ab12414b30d390e874b681616ff6f9aa6
Merge pull request #920 from jcantrill/1537857_fix_prometheus

Automatic merge from submit-queue.

bug 1537857. Fix retrieving prometheus metrics

This PR:

* bumps openshift-elasticsearch-plugin to fix retrieving metrics
* bumps the prometheus exporter to fix issue related to SG and plugin

Depends on: 
* https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/119
* https://github.com/fvvanholl/elasticsearch-prometheus-exporter/pull/82
* https://github.com/openshift/openshift-ansible/pull/6903

Comment 4 Junqi Zhao 2018-02-06 05:59:20 UTC
need new ES image to test, current latest image is v3.9.0-0.38.0.0

Comment 5 Junqi Zhao 2018-02-07 08:05:18 UTC
Tested with logging-elasticsearch:v3.9.0-0.39.0.0,
sh-4.2$ env | grep -i ver
OSE_ES_VER=2.4.4.21
RECOVER_EXPECTED_NODES=1
ES_VER=2.4.4
RECOVER_AFTER_NODES=1
PROMETHEUS_EXPORTER_VER=2.4.4.1
ES_CLOUD_K8S_VER=2.4.4.01
RECOVER_AFTER_TIME=5m
JAVA_VER=1.8.0

# oc get po -n logging -o wide | grep logging-es
logging-es-data-master-m69jkv20-1-v7csk       2/2       Running   0          2h        10.129.0.56   172.16.120.91
logging-es-ops-data-master-aymsaqly-1-lfz9c   2/2       Running   0          1h        10.129.0.57   172.16.120.91

# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://10.129.0.57:4443/_prometheus/metrics -I
HTTP/1.1 403 Forbidden
Set-Cookie: _oauth_proxy=; Path=/; Domain=10.129.0.57; Expires=Wed, 07 Feb 2018 07:04:49 GMT; HttpOnly; Secure
Date: Wed, 07 Feb 2018 08:04:49 GMT
Content-Type: text/html; charset=utf-8

Change back to MODIFIED

Comment 6 Jeff Cantrill 2018-02-07 14:44:41 UTC
is 'prometheus' the same user identified as the metrics user for the proxy and the ES container:

Sample of my local:

name: PROMETHEUS_USER
value: system:serviceaccount:logging:aggregated-logging-elasticsearch


proxy:
- -client-id=system:serviceaccount:logging:aggregated-logging-elasticsearch
- -basic-auth-password=41m0z4jgrIssHG7U

Please note the addition of the password to the proxy

Looking at the build log for the image it is missing the required plugin:

openshift-elasticsearch-plugin-2.4.4.17__redhat_1-3.el7
http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/5812/15225812/x86_64.log

and it needs:

openshift-elasticsearch-plugin-2.4.4.21

Comment 7 Junqi Zhao 2018-02-08 07:17:56 UTC
see the attached ES DC file
 
        - name: PROMETHEUS_USER
          value: system:serviceaccount:prometheus:prometheus

        - -client-id=system:serviceaccount:prometheus:prometheus

Since prometheus is deployed to openshift-metrics by default,the above settings should be 

system:serviceaccount:openshift-metrics:prometheus

It seems our code always set prometheus project name to prometheus.

Even changed the above settings, still get 403 Forbidden like Comment 5.


openshift-elasticsearch-plugin-2.4.4.17__redhat_1-3.el7 is still in v3.9.0-0.41.0.0, will test this defect when ES image packaged openshift-elasticsearch-plugin-2.4.4.21

Comment 8 Junqi Zhao 2018-02-08 07:18:22 UTC
Created attachment 1393033 [details]
es dc info

Comment 12 Junqi Zhao 2018-02-22 05:33:11 UTC
use
# curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://${pod_ip}:4443/_prometheus/metrics, it will navigate to 
"Sign in with an OpenShift account" process, and can't show the ES metrics, it seems we still have authentication problems, see the attached file

$ rpm -qa | grep elasticsearch-plugin
openshift-elasticsearch-plugin-2.4.4.21__redhat_1-1.el7.noarch

logging-elasticsearch/images/v3.9.0-0.47.0.0

Comment 13 Junqi Zhao 2018-02-22 05:33:58 UTC
Created attachment 1399146 [details]
issue is not fixed

Comment 14 Jeff Cantrill 2018-02-22 15:56:18 UTC
Did you redeploy the image using openshift ansible.  The changes require the referenced version of the plugin and the DC to have: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/templates/2.x/es.j2#L143

which is password auth between the oauth proxy and elasticsearch.

Comment 16 Jeff Cantrill 2018-02-23 20:42:29 UTC
I have been working on your environment and one thing I did note is that you are using the defaults but the expeced user is in the openshift-metrics namespace.  The default is defined here [1].  If we intend to deploy metrics in the 'openshift-metrics' namespace, we should at minimum change the default.

The ripple of changes here requires:

* Elasticsearch DC edit to modify PROMETHEUS_USER
* Elasticsearch DC edit to modify -client-id in the proxy
* Edit the logging-elasticsearch secret passwd.yml entry to have the correct username
* Grant the role the correct permissions which is verifiable by [2]. 'oc policy who-can' is currently broken

I have been unable to modify your environment to fix the issue and recommend redeploying with the desired SA.  The result is the following:

Metrics can only be retrieved with username/password when hitting Elasticsearch directly. You would need to use the username/password in the passwd.yml which is not the desired way to access.  The desired mannor is to use Bearer token and go through the proxy via the service or directly to the pod. 


[1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/defaults/main.yml#L45
[2] # curl -k -XPOST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" -H "Accept: application/json, */*" https://${SERVER}/apis/authorization.openshift.io/v1/namespaces/logging/localresourceaccessreviews -d '{"kind":"LocalResourceAccessReview","apiVersion":"authorization.openshift.io/v1","namespace":"","verb":"view","resourceAPIGroup":"metrics.openshift.io","resourceAPIVersion":"","resource":"prometheus","resourceName":"","path":"","isNonResourceURL":false,"content":null}'

Comment 17 Junqi Zhao 2018-02-24 02:09:49 UTC
(In reply to Jeff Cantrill from comment #16)
> I have been working on your environment and one thing I did note is that you
> are using the defaults but the expeced user is in the openshift-metrics
> namespace.  The default is defined here [1].  If we intend to deploy metrics
> in the 'openshift-metrics' namespace, we should at minimum change the
> default.

Yes, as I mentioned in Comment 7, prometheus is deployed to openshift-metrics by default, we need to change our code, since user can deploy prometheus to any preferred namespaces. It seems openshift_prometheus_namespace did not get the correct namespace.

Move it to Assigned

Comment 18 Junqi Zhao 2018-02-24 04:26:12 UTC
even deployed prometheus to prometheus namespace, get "system:serviceaccount:openshift-metrics:prometheus" by command [1]:

I guess openshift_prometheus_namespace did not get the correct namespace.


[1]$ curl -k -XPOST -H "Authorization: Bearer `oc whoami -t`" -H "Content-Type: application/json" -H "Accept: application/json, */*" https://host-8-244-77.host.centralci.eng.rdu2.redhat.com:8443/aps/authorization.openshift.io/v1/namespaces/logging/localresourceaccessreviews -d '{"kind":"LocalResourceAccessReview","apiVersion":"authorization.openshift.io/v1","namespace":"","verb":"view","resourceAPIGroup":"metrics.openshift.io","resourceAPIVersion":"","resource":"prometheus","resourceName":"","path":"","isNonResourceURL":false,"content":null}'
{
  "kind": "ResourceAccessReviewResponse",
  "apiVersion": "authorization.openshift.io/v1",
  "namespace": "logging",
  "users": [
    "admin",
    "juzhao",
    "system:admin",
    "system:serviceaccount:kube-system:clusterrole-aggregation-controller",
    "system:serviceaccount:openshift-metrics:prometheus"
  ],
  "groups": [
    "system:cluster-admins",
    "system:masters"
  ],
  "evalutionError": ""
}


[2] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/defaults/main.yml#L45

Comment 19 Junqi Zhao 2018-02-24 04:31:10 UTC
passwd.yml and es dc have the same username and passwd
# oc rsh logging-es-data-master-86bziich-1-wtpjr
Defaulting container name to elasticsearch.
Use 'oc describe pod/logging-es-data-master-86bziich-1-wtpjr -n logging' to see all of the containers in this pod.
sh-4.2$ cat /etc/elasticsearch/secret/passwd.yml
"system:serviceaccount:prometheus:prometheus":
  passwd: "aWppUENJV2tyTmpta3VXUA=="

# oc get dc logging-es-data-master-86bziich -o yaml | grep client-id -A 5
        - -client-id=system:serviceaccount:prometheus:prometheus
        - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
        - -cookie-secret=VllhWnlZa25qdnAzQ3NYaA==
        - -basic-auth-password=aWppUENJV2tyTmpta3VXUA==
        - -upstream=https://localhost:9200
        - '-openshift-sar={"namespace": "logging", "verb": "view", "resource": "prometheus",

Comment 20 Jeff Cantrill 2018-02-26 21:41:00 UTC
Additional PR to fix ansible changes where we try to keep the password so we dont redeploy the ES image

https://github.com/openshift/openshift-ansible/pull/7294

Comment 21 openshift-github-bot 2018-02-27 07:05:56 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/af9c8cd48ad7d74206385e2530221796f55e3096
bug 1537857. Additional logging proxy metrics fixes

https://github.com/openshift/openshift-ansible/commit/f123167eb3442e789d44b23fc9b5c92c73e444e3
Merge pull request #7294 from jcantrill/1537857_part2

Automatic merge from submit-queue.

bug 1537857. Additional logging proxy metrics fixes

This PR provides additional fixes to:

* set the password correctly by properly decoding
* modifying the default prometheus namespace if one isnt provided

ref: https://bugzilla.redhat.com/show_bug.cgi?id=1537857

Comment 22 Xiaoli Tian 2018-02-28 07:42:09 UTC
The change has been merged latest build.

Comment 23 Junqi Zhao 2018-02-28 07:51:17 UTC
Tested against openshift-ansible-3.9.1-1, could show ES prometheus metrics by REST API, see the attached file.

env:
# openshift version
openshift v3.9.1
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16


# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.9.1-1.git.0.9862628.el7.noarch
openshift-ansible-roles-3.9.1-1.git.0.9862628.el7.noarch
openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch
openshift-ansible-playbooks-3.9.1-1.git.0.9862628.el7.noarch

Comment 24 Junqi Zhao 2018-02-28 07:52:28 UTC
Created attachment 1401667 [details]
could show ES metrics output by API

Comment 27 errata-xmlrpc 2018-03-28 14:21:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.