Expose the 3.9 logging ES endpoints by Prometheus metrics testing is blocked
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/93814bd617f633e613118b710b7fa33ff975c994 bug 1537857. Fix retrieving prometheus metrics https://github.com/openshift/openshift-ansible/commit/fbdfa66f06abd8c026c1c14e292505c002d46dfe Merge pull request #6903 from jcantrill/1537857_fix_logging_prometheus bug 1537857. Fix retrieving prometheus metrics
Commits pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/058f366072505f39a5c5aad3faa98cf7e47eaddd bug 1537857. Fix retrieving prometheus metrics https://github.com/openshift/origin-aggregated-logging/commit/6c5c6a5ab12414b30d390e874b681616ff6f9aa6 Merge pull request #920 from jcantrill/1537857_fix_prometheus Automatic merge from submit-queue. bug 1537857. Fix retrieving prometheus metrics This PR: * bumps openshift-elasticsearch-plugin to fix retrieving metrics * bumps the prometheus exporter to fix issue related to SG and plugin Depends on: * https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/119 * https://github.com/fvvanholl/elasticsearch-prometheus-exporter/pull/82 * https://github.com/openshift/openshift-ansible/pull/6903
need new ES image to test, current latest image is v3.9.0-0.38.0.0
Tested with logging-elasticsearch:v3.9.0-0.39.0.0, sh-4.2$ env | grep -i ver OSE_ES_VER=2.4.4.21 RECOVER_EXPECTED_NODES=1 ES_VER=2.4.4 RECOVER_AFTER_NODES=1 PROMETHEUS_EXPORTER_VER=2.4.4.1 ES_CLOUD_K8S_VER=2.4.4.01 RECOVER_AFTER_TIME=5m JAVA_VER=1.8.0 # oc get po -n logging -o wide | grep logging-es logging-es-data-master-m69jkv20-1-v7csk 2/2 Running 0 2h 10.129.0.56 172.16.120.91 logging-es-ops-data-master-aymsaqly-1-lfz9c 2/2 Running 0 1h 10.129.0.57 172.16.120.91 # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://10.129.0.57:4443/_prometheus/metrics -I HTTP/1.1 403 Forbidden Set-Cookie: _oauth_proxy=; Path=/; Domain=10.129.0.57; Expires=Wed, 07 Feb 2018 07:04:49 GMT; HttpOnly; Secure Date: Wed, 07 Feb 2018 08:04:49 GMT Content-Type: text/html; charset=utf-8 Change back to MODIFIED
is 'prometheus' the same user identified as the metrics user for the proxy and the ES container: Sample of my local: name: PROMETHEUS_USER value: system:serviceaccount:logging:aggregated-logging-elasticsearch proxy: - -client-id=system:serviceaccount:logging:aggregated-logging-elasticsearch - -basic-auth-password=41m0z4jgrIssHG7U Please note the addition of the password to the proxy Looking at the build log for the image it is missing the required plugin: openshift-elasticsearch-plugin-2.4.4.17__redhat_1-3.el7 http://download-node-02.eng.bos.redhat.com/brewroot/work/tasks/5812/15225812/x86_64.log and it needs: openshift-elasticsearch-plugin-2.4.4.21
see the attached ES DC file - name: PROMETHEUS_USER value: system:serviceaccount:prometheus:prometheus - -client-id=system:serviceaccount:prometheus:prometheus Since prometheus is deployed to openshift-metrics by default,the above settings should be system:serviceaccount:openshift-metrics:prometheus It seems our code always set prometheus project name to prometheus. Even changed the above settings, still get 403 Forbidden like Comment 5. openshift-elasticsearch-plugin-2.4.4.17__redhat_1-3.el7 is still in v3.9.0-0.41.0.0, will test this defect when ES image packaged openshift-elasticsearch-plugin-2.4.4.21
Created attachment 1393033 [details] es dc info
use # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://${pod_ip}:4443/_prometheus/metrics, it will navigate to "Sign in with an OpenShift account" process, and can't show the ES metrics, it seems we still have authentication problems, see the attached file $ rpm -qa | grep elasticsearch-plugin openshift-elasticsearch-plugin-2.4.4.21__redhat_1-1.el7.noarch logging-elasticsearch/images/v3.9.0-0.47.0.0
Created attachment 1399146 [details] issue is not fixed
Did you redeploy the image using openshift ansible. The changes require the referenced version of the plugin and the DC to have: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/templates/2.x/es.j2#L143 which is password auth between the oauth proxy and elasticsearch.
I have been working on your environment and one thing I did note is that you are using the defaults but the expeced user is in the openshift-metrics namespace. The default is defined here [1]. If we intend to deploy metrics in the 'openshift-metrics' namespace, we should at minimum change the default. The ripple of changes here requires: * Elasticsearch DC edit to modify PROMETHEUS_USER * Elasticsearch DC edit to modify -client-id in the proxy * Edit the logging-elasticsearch secret passwd.yml entry to have the correct username * Grant the role the correct permissions which is verifiable by [2]. 'oc policy who-can' is currently broken I have been unable to modify your environment to fix the issue and recommend redeploying with the desired SA. The result is the following: Metrics can only be retrieved with username/password when hitting Elasticsearch directly. You would need to use the username/password in the passwd.yml which is not the desired way to access. The desired mannor is to use Bearer token and go through the proxy via the service or directly to the pod. [1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/defaults/main.yml#L45 [2] # curl -k -XPOST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" -H "Accept: application/json, */*" https://${SERVER}/apis/authorization.openshift.io/v1/namespaces/logging/localresourceaccessreviews -d '{"kind":"LocalResourceAccessReview","apiVersion":"authorization.openshift.io/v1","namespace":"","verb":"view","resourceAPIGroup":"metrics.openshift.io","resourceAPIVersion":"","resource":"prometheus","resourceName":"","path":"","isNonResourceURL":false,"content":null}'
(In reply to Jeff Cantrill from comment #16) > I have been working on your environment and one thing I did note is that you > are using the defaults but the expeced user is in the openshift-metrics > namespace. The default is defined here [1]. If we intend to deploy metrics > in the 'openshift-metrics' namespace, we should at minimum change the > default. Yes, as I mentioned in Comment 7, prometheus is deployed to openshift-metrics by default, we need to change our code, since user can deploy prometheus to any preferred namespaces. It seems openshift_prometheus_namespace did not get the correct namespace. Move it to Assigned
even deployed prometheus to prometheus namespace, get "system:serviceaccount:openshift-metrics:prometheus" by command [1]: I guess openshift_prometheus_namespace did not get the correct namespace. [1]$ curl -k -XPOST -H "Authorization: Bearer `oc whoami -t`" -H "Content-Type: application/json" -H "Accept: application/json, */*" https://host-8-244-77.host.centralci.eng.rdu2.redhat.com:8443/aps/authorization.openshift.io/v1/namespaces/logging/localresourceaccessreviews -d '{"kind":"LocalResourceAccessReview","apiVersion":"authorization.openshift.io/v1","namespace":"","verb":"view","resourceAPIGroup":"metrics.openshift.io","resourceAPIVersion":"","resource":"prometheus","resourceName":"","path":"","isNonResourceURL":false,"content":null}' { "kind": "ResourceAccessReviewResponse", "apiVersion": "authorization.openshift.io/v1", "namespace": "logging", "users": [ "admin", "juzhao", "system:admin", "system:serviceaccount:kube-system:clusterrole-aggregation-controller", "system:serviceaccount:openshift-metrics:prometheus" ], "groups": [ "system:cluster-admins", "system:masters" ], "evalutionError": "" } [2] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_elasticsearch/defaults/main.yml#L45
passwd.yml and es dc have the same username and passwd # oc rsh logging-es-data-master-86bziich-1-wtpjr Defaulting container name to elasticsearch. Use 'oc describe pod/logging-es-data-master-86bziich-1-wtpjr -n logging' to see all of the containers in this pod. sh-4.2$ cat /etc/elasticsearch/secret/passwd.yml "system:serviceaccount:prometheus:prometheus": passwd: "aWppUENJV2tyTmpta3VXUA==" # oc get dc logging-es-data-master-86bziich -o yaml | grep client-id -A 5 - -client-id=system:serviceaccount:prometheus:prometheus - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token - -cookie-secret=VllhWnlZa25qdnAzQ3NYaA== - -basic-auth-password=aWppUENJV2tyTmpta3VXUA== - -upstream=https://localhost:9200 - '-openshift-sar={"namespace": "logging", "verb": "view", "resource": "prometheus",
Additional PR to fix ansible changes where we try to keep the password so we dont redeploy the ES image https://github.com/openshift/openshift-ansible/pull/7294
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/af9c8cd48ad7d74206385e2530221796f55e3096 bug 1537857. Additional logging proxy metrics fixes https://github.com/openshift/openshift-ansible/commit/f123167eb3442e789d44b23fc9b5c92c73e444e3 Merge pull request #7294 from jcantrill/1537857_part2 Automatic merge from submit-queue. bug 1537857. Additional logging proxy metrics fixes This PR provides additional fixes to: * set the password correctly by properly decoding * modifying the default prometheus namespace if one isnt provided ref: https://bugzilla.redhat.com/show_bug.cgi?id=1537857
The change has been merged latest build.
Tested against openshift-ansible-3.9.1-1, could show ES prometheus metrics by REST API, see the attached file. env: # openshift version openshift v3.9.1 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.16 # rpm -qa | grep openshift-ansible openshift-ansible-docs-3.9.1-1.git.0.9862628.el7.noarch openshift-ansible-roles-3.9.1-1.git.0.9862628.el7.noarch openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch openshift-ansible-playbooks-3.9.1-1.git.0.9862628.el7.noarch
Created attachment 1401667 [details] could show ES metrics output by API
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489