Created attachment 1348852 [details] logging environment dump Description of problem: ES prometheus metrics interface return nothing, can not show the ES prometheus metrics by command, # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://{ES_POD_IP}:4443/_prometheus/metrics it had metrics output several days ago, see https://bugzilla.redhat.com/show_bug.cgi?id=1508756 ******************************************************************************************** # oc get sa prometheus -n prometheus NAME SECRETS AGE prometheus 3 1h # oc get po -o wide | grep logging-es logging-es-data-master-y9dyzqoc-1-xrsxk 2/2 Running 0 34m 10.128.0.35 172.16.120.101 logging-es-ops-data-master-2jvix8yq-1-bbqhp 2/2 Running 0 33m 10.128.0.37 172.16.120.101 # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://10.128.0.35:4443/_prometheus/metrics Note: nothing returned # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://10.128.0.37:4443/_prometheus/metrics Note: nothing returned http head info is 401 Unauthorized # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://10.128.0.35:4443/_prometheus/metrics -I HTTP/1.1 401 Unauthorized Content-Length: 0 Content-Type: text/plain; charset=UTF-8 Gap-Auth: system:serviceaccount:prometheus:prometheus Gap-Upstream-Address: localhost:9200 Date: Tue, 07 Nov 2017 08:09:07 GMT Version-Release number of selected component (if applicable): all logging component version is v3.7.0-0.196.0.0 # openshift version openshift v3.7.0-0.196.0 kubernetes v1.7.6+a08f5eeb62 etcd 3.2.8 How reproducible: Always Steps to Reproduce: 1. Inventory file see [Additional info] part, deploy prometheus firstly 2. Deploy Logging 3.7 3. access ES prometheus metrics interface by: # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://{ES_POD_IP}:4443/_prometheus/metrics Actual results: can not show the ES prometheus metrics by the interface Expected results: Could be find the ES prometheus metrics Additional info: [OSEv3:children] masters etcd [masters] ${MASTER} openshift_public_hostname=${MASTER} [etcd] ${ETCD} openshift_public_hostname=${ETCD} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" deployment_type=openshift-enterprise # Logging openshift_logging_install_logging=true openshift_logging_kibana_hostname=kibana.apps.${SUB_DOMAIN} openshift_logging_kibana_ops_hostname=kibana-ops.apps.${SUB_DOMAIN} public_master_url=https://${MASTER}:${PORT} openshift_logging_image_prefix=${IMAGE_PREFIX} openshift_logging_image_version=v3.7 openshift_logging_namespace=logging openshift_logging_elasticsearch_memory_limit=1Gi openshift_logging_es_ops_memory_limit=512Mi openshift_logging_use_ops=true # prometheus openshift_prometheus_state=present openshift_prometheus_namespace=prometheus openshift_prometheus_replicas=1 openshift_prometheus_node_selector={'role': 'node'} openshift_prometheus_image_prefix=${IMAGE_PREFIX} openshift_prometheus_image_version=v3.7
Based on the logs I do not see the prometheus-exporter plugin being installed in ES nodes: $ cat logging-20171107_030309/es/logs/logging-es-data-master-y9dyzqoc-1-xrsxk-elasticsearch.log | grep plugins 2017-11-07 07:11:20 INFO plugins:180 - [Baron Brimstone] modules [], plugins [search-guard-ssl, search-guard2], sites [] $ cat logging-20171107_030309/es/logs/logging-es-ops-data-master-2jvix8yq-1-bbqhp-elasticsearch.log | grep plugins 2017-11-07 07:12:11 INFO plugins:180 - [Reeva Payge] modules [], plugins [search-guard-ssl, search-guard2], sites [] Why is this plugin not installed?
Can you: * Attach the DC for Elasticsearch. It should include the oauth-proxy as a sidecar container * Inspect the image to get the env vars. According to #c0 I would expect to see the plugin as noted in #c1. The build logs for that version [1] show the plugin added to the image. There appears to be some mismatch here. [1] http://download-node-02.eng.bos.redhat.com/brewroot/packages/logging-elasticsearch-docker/v3.7.0/0.196.0.0/data/logs/x86_64-build.log [2] https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14478884
I've verified the image has the prometheus plugin
(In reply to Jeff Cantrill from comment #2) > Can you: > > * Attach the DC for Elasticsearch. It should include the oauth-proxy as a > sidecar container Elasticsearch DC info is already in the attached file "logging environment dump"
Can you provide the additional commands: * oc exec $PODNAME -- es_util --query=_nodes/plugins?pretty * oc exec $PODNAME -- es_acl get --doc=roles * oc exec $PODNAME -- es_acl get --doc=rolesmapping I also so this in the logs which I'm not certain if this means you are being denied by the oauthproxy: 2017/11/07 07:19:29 oauthproxy.go:657: 10.129.0.1:54156 Cookie "_oauth_proxy" not present
Moving to 3.7.x as this is not a blocker
Created attachment 1349680 [details] Request info
(In reply to Jeff Cantrill from comment #5) > Can you provide the additional commands: > > * oc exec $PODNAME -- es_util --query=_nodes/plugins?pretty > * oc exec $PODNAME -- es_acl get --doc=roles > * oc exec $PODNAME -- es_acl get --doc=rolesmapping See the attached file "Request info" > I also so this in the logs which I'm not certain if this means you are being > denied by the oauthproxy: > > 2017/11/07 07:19:29 oauthproxy.go:657: 10.129.0.1:54156 Cookie > "_oauth_proxy" not present I also saw this log when the ES prometheus metrics could be shown a few days ago
I think it's related to ES image, I used the following images which can show ES prometheus metrics, OpenShift ElasticSearch Plugin version is 2.4.4.16--redhat-1, see the attached file use_es_image_v3.7.0-0.188.0.0 when this issue is reported, OpenShift ElasticSearch Plugin version is 2.4.4.17--redhat-1 logging-curator/images/v3.7.0-0.188.0.0 logging-auth-proxy/images/v3.7.0-0.188.0.0 logging-fluentd/images/v3.7.0-0.188.0.0 logging-elasticsearch/images/v3.7.0-0.188.0.0 logging-kibana/images/v3.7.0-0.188.0.0
Created attachment 1349681 [details] use_es_image_v3.7.0-0.188.0.0, it can show ES prometheus metrics
I think this issue may be related to a recent security change we made. Can you please confirm the following to see if resolves it. I think we will require an image change to fix: 1. oc rsh $ES_POD 2. cd to /opt/app/src/sgconfig 3. edit the sgconfig.yml to set remoteIpHeader to 'x-forwarded-user' 4. run 'es_seed_acl' Try to curl again. This will require us to modify the default proxy header provided by the kibana auth proxy uses so it matches the value the oauth-proxy. [1] https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/sgconfig/sg_config.yml#L6
@pweil - I have a fix ready to go, just need to rebuild the 3.7 puddle with the new package, then I can rebuild the image. That is, if there is still time to get this into 3.7.0
Created attachment 1350269 [details] still the same issue by doing workaround of Comment 13
fixed in logging-elasticsearch:v3.7.6-1
Created attachment 1351381 [details] curl returns "Sign in with an OpenShift account"
I notice there is this error message in the proxy log: 2017/11/07 08:00:38 oauthproxy.go:657: 10.129.0.1:33566 Cookie "_oauth_proxy" not present I wonder if something has changed, and now the client is required to submit the _oauth_proxy cookie with the request? How did you deploy with prometheus support? I have a plain v3.7 deployment - it does not have the prometheus sa nor namespace.
Use ansible to deploy prometheus, default namespace for prometheus is openshift-metrics # Inventory file [OSEv3:children] masters etcd [masters] ${MASTER} openshift_public_hostname=${MASTER} [etcd] ${ETCD} openshift_public_hostname=${ETCD} [OSEv3:vars] ansible_ssh_user=root # upload your libra.pem to ansible host ansible_ssh_private_key_file="${libra.pem_path}" deployment_type=openshift-enterprise # prometheus openshift_prometheus_state=present # default namespace is openshift-metrics, you can deploy in your preferred # namespace, or don't set this parameter if you want to use default namespace openshift_prometheus_namespace=${NAME_SPACE} # label your node where you want to deploy prometheus, for example: # openshift_prometheus_node_selector={'role': 'node'} openshift_prometheus_node_selector={${KEY}: ${VALUE}} openshift_prometheus_image_prefix=${IMAGE_PREFIX} openshift_prometheus_image_version=v3.7
(In reply to Rich Megginson from comment #22) > I notice there is this error message in the proxy log: > > 2017/11/07 08:00:38 oauthproxy.go:657: 10.129.0.1:33566 Cookie > "_oauth_proxy" not present > > I wonder if something has changed, and now the client is required to submit > the _oauth_proxy cookie with the request? I also saw this log when the ES prometheus metrics could be shown a few days ago
I deployed openshift + logging + prometheus with this added to my inventory: [OSEv3:vars] ... openshift_hosted_prometheus_deploy=True openshift_prometheus_namespace=openshift-metrics openshift_prometheus_state=present openshift_prometheus_node_selector={'region': 'infra'} openshift_prometheus_image_prefix=private.registry/openshift3/ openshift_prometheus_image_version=v3.7 I edited the es dc to enable logging for requests: oc edit dc logging-es-data-master-1tlcuva8 - -pass-user-headers - -request-logging image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/oauth-proxy:v3.7 After restart and requesting, I see this: 2017/11/14 04:42:04 oauthproxy.go:657: 10.128.0.1:58364 Cookie "_oauth_proxy" not present 10.128.0.1 - system:serviceaccount:openshift-metrics:prometheus [14/Nov/2017:04:42:04 +0000] 10.128.0.19:4443 HEAD localhost:9200 "/_prometheus/metrics" HTTP/1.1 "curl/7.29.0" 401 0 0.001 That is, the full username being logged by the proxy is system:serviceaccount:openshift-metrics:prometheus - I wonder if - that is passed through to ES - if we need to add that username to the searchguard acls - if that "@cluster.local" will be different depending on where the request came from - how to make searchguard flexible enough to accommodate ^^^
Searchguard will use the name passed by the x-forwarded-for header to evaluate permissions. In this case, if we are receiving 'system:serviceaccount:openshift-metrics:prometheus' but are not configured for it then there will be a mismatch and a failure. @Simo, I don't remember seeing this during my testing, do you have any understanding why we might see '@cluster.local' as part of the SA name?
X-Forwarded-For headers contain IP addresses of the proxy client. Are you thinking of X-Forwarded-User ? X-Forwarded-User / X-forwarded-Email ? As far as I can see in the code we get @anything only in the session.Email which is then exposed via X-forwarded-Email.
(In reply to Simo Sorce from comment #27) > X-Forwarded-For headers contain IP addresses of the proxy client. Are you > thinking of X-Forwarded-User ? > X-Forwarded-User / X-forwarded-Email ? > > As far as I can see in the code we get @anything only in the session.Email > which is then exposed via X-forwarded-Email. # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://10.128.0.35:4443/_prometheus/metrics -I HTTP/1.1 401 Unauthorized Content-Length: 0 Content-Type: text/plain; charset=UTF-8 Gap-Auth: system:serviceaccount:prometheus:prometheus Gap-Upstream-Address: localhost:9200 Date: Tue, 07 Nov 2017 08:09:07 GMT @simo We need to find out who is adding the "@cluster.local" to the username system:serviceaccount:prometheus:prometheus. Is it coming directly from OpenShift in the response to the auth request? Is oauth-proxy adding it? This is critical for logging access control because we need to grant access to the exact username in this case. If it will always be "system:serviceaccount:prometheus:prometheus" from now on, that's fine, but the "@" implies that perhaps the domain could be different somehow? @junqi Can you confirm that with a previous version of logging/oauth-proxy, when you used the curl command: # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n prometheus)" https://10.128.0.35:4443/_prometheus/metrics -I That you had a value like this? Gap-Auth: system:serviceaccount:prometheus:prometheus Without the "@cluster.local" suffix?
@Rich, It does not matter the Gap-Auth header has that value as SG uses 'x-forwarded-user' to evaluate the role [1]. If this test is against the version of the openshift-elasticsearch-plugin that has the username fix, it is possible we have some conflict that we did not account for [1] https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/sgconfig/sg_config.yml#L35
(In reply to Jeff Cantrill from comment #29) > @Rich, > > It does not matter the Gap-Auth header has that value as SG uses > 'x-forwarded-user' to evaluate the role [1]. If this test is against the > version of the openshift-elasticsearch-plugin that has the username fix, it > is possible we have some conflict that we did not account for > > [1] > https://github.com/openshift/origin-aggregated-logging/blob/master/ > elasticsearch/sgconfig/sg_config.yml#L35 We cannot trust the x-forwarded-user header. If we did, then I could curl to Elasticsearch, not via the proxy, and specify "admin" as the username, and gain full admin access. If we must use the x-forwarded-user header, if that is the only way we can get the username to use in Elasticsearch, then we need some way to trust the client e.g. the proxy must add some sort of secret to the request so that Elasticsearch can absolutely verify that x-forwarded-header was added by the proxy and no other client.
Rich, I see in oauth-proxy code that GAP-Auth is set to session.Email if that valus is not null, otherwise it is set to session.User So it seem GAP-Auth is default to email if available. If you need just the user you should set the -pass-user-headers option and instead look for the X-Forwarded-User parameter
Rich also, if you cannot trust X-forwarded-User you cannot trust GAP-Auth either ... so somethign I am missing here ...
(In reply to Rich Megginson from comment #28) > @junqi Can you confirm that with a previous version of logging/oauth-proxy, > when you used the curl command: > > # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n > prometheus)" https://10.128.0.35:4443/_prometheus/metrics -I > > That you had a value like this? > > Gap-Auth: system:serviceaccount:prometheus:prometheus > > Without the "@cluster.local" suffix? Use the previous version of logging/oauth-proxy, get the page source of "Sign in with an OpenShift account" by using the following commands #curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://{ES_POD}:4443/_prometheus/metrics and # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://10.128.0.26:4443/_prometheus/metrics -I HTTP/1.1 403 Forbidden Set-Cookie: _oauth_proxy=; Path=/; Domain=10.128.0.26; Expires=Thu, 16 Nov 2017 01:32:30 GMT; HttpOnly; Secure Date: Thu, 16 Nov 2017 02:32:30 GMT Content-Type: text/html; charset=utf-8 I don't know what's the actual result when ES prometheus metrics interface can be accessed. Maybe it is related to other prometheus images
(In reply to Junqi Zhao from comment #34) > (In reply to Rich Megginson from comment #28) > > > @junqi Can you confirm that with a previous version of logging/oauth-proxy, > > when you used the curl command: > > > > # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n > > prometheus)" https://10.128.0.35:4443/_prometheus/metrics -I > > > > That you had a value like this? > > > > Gap-Auth: system:serviceaccount:prometheus:prometheus > > > > Without the "@cluster.local" suffix? > > Use the previous version of logging/oauth-proxy, > get the page source of "Sign in with an OpenShift account" by using the > following commands > #curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n > openshift-metrics)" https://{ES_POD}:4443/_prometheus/metrics It can show ES prometheus metrics before, see the attached file use_es_image_v3.7.0-0.188.0.0, I am confused why it can not show ES prometheus metrics now
(In reply to Junqi Zhao from comment #35) > (In reply to Junqi Zhao from comment #34) > > (In reply to Rich Megginson from comment #28) > > > > > @junqi Can you confirm that with a previous version of logging/oauth-proxy, > > > when you used the curl command: > > > > > > # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n > > > prometheus)" https://10.128.0.35:4443/_prometheus/metrics -I > > > > > > That you had a value like this? > > > > > > Gap-Auth: system:serviceaccount:prometheus:prometheus > > > > > > Without the "@cluster.local" suffix? > > > > Use the previous version of logging/oauth-proxy, > > get the page source of "Sign in with an OpenShift account" by using the > > following commands > > #curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n > > openshift-metrics)" https://{ES_POD}:4443/_prometheus/metrics > > It can show ES prometheus metrics before, see the attached file > use_es_image_v3.7.0-0.188.0.0, I am confused why it can not show ES > prometheus metrics now Because we changed the elasticsearch not to trust a username without a token, and it doesn't appear that the oauth-proxy is passing through the token. There is a https://github.com/openshift/oauth-proxy#command-line-options -pass-access-token: pass OAuth access_token to upstream via X-Forwarded-Access-Token header but this does not work - I do not see this header, not sure why. Perhaps it is only passed if the actual authentication is done in the proxy, not if the user is just passing through a token already obtained elsewhere. I think we need to figure out if it is possible for the oauth-proxy to pass through, unchanged, the Authorization: Bearer $token send via curl. If that is not possible, then we need to change the openshift-elasticsearch-plugin to trust the oauth-proxy, so that we can use a plain username that we get from the oauth-proxy, but require a token/cert if the request is not coming from the oauth-proxy.
Rich, looking very carefully at the code, it indeed turns out that if you use a bearer token for auth, then oauth-proxy, which uses standard k8s clients, will remove and not propagate the bearer token (the removal happens in the vendored code that handles auth). This seems intentional, but indeed it also seem to fly in the face of the -pass-access-token option. This option is only used when an actual *oauth* authentication is performed and a bearer token is obtain as a result of the oauth process. I think one way to authenticate requests coming from the proxy is to set the -pass-basic-auth flag as well as -basic-auth-password with a password shared only between ES and Oauth-Proxy, then, ignoring the user name (it is set to the user oauth proxy authenticated),trust the headers only if this password matches. Clients can send any headers directly, but they will not be able to send you the secret (basic auth password) shared only between oauth-proxy and ES. HTH.
@Rich Will we merge the code to 3.9 when this defect is fixed?
(In reply to Junqi Zhao from comment #38) > @Rich > Will we merge the code to 3.9 when this defect is fixed? Yes, same as every other bug. Is there some reason that you think this bug is different?
(In reply to Rich Megginson from comment #39) > Yes, same as every other bug. Is there some reason that you think this bug > is different? No, since this defect will be fixed in 3.7.z, and 3.9.0 has the same problem, I am thinking shall we clone this defect for 3.9.0
https://github.com/openshift/origin-aggregated-logging/pull/928
Issue is not fixed, changed to ON_QA by errata, change back to MODIFIED now.
Tested with logging-elasticsearch-v3.7.37-1, issue is not fixed, changed to ON_QA by errata, change back to MODIFIED now. # oc get po -o wide | grep logging-es logging-es-data-master-krur34v5-1-jbgfq 2/2 Running 0 3m 10.129.0.52 ip-172-18-2-123.ec2.internal The html body still shows "Sign in with an OpenShift account", issue is not fixed. # curl -k -H "Authorization: Bearer $(oc sa get-token prometheus -n openshift-metrics)" https://10.129.0.52:4443/_prometheus/metrics ********************snipped*********************************** <body> <div class="signin center"> <form method="GET" action="/oauth/start"> <input type="hidden" name="rd" value="/_prometheus/metrics"> <button type="submit" class="btn">Sign in with an OpenShift account</button><br/> </form> </div> <script> if (window.location.hash) { (function() { var inputs = document.getElementsByName('rd'); for (var i = 0; i < inputs.length; i++) { inputs[i].value += window.location.hash; } })(); } </script> <footer> Secured with <a href="https://github.com/openshift/oauth-proxy#oauth2_proxy">OpenShift oauth-proxy</a> version 2.2.1-alpha </footer> </body> ********************snipped***********************************
This is fixed with the merge of https://github.com/openshift/openshift-ansible/pull/7307#issuecomment-372416884 Requires a new ansible build.
Issue is fixed, can show the ES prometheus metrics now, see the attached file openshift-ansible version openshift-ansible-3.7.42-1.git.0.427f18c.el7.noarch # openshift version openshift v3.7.42 kubernetes v1.7.6+a08f5eeb62 etcd 3.2.8 Images logging-curator/images/v3.7.42-2 logging-elasticsearch/images/v3.7.42-2 logging-kibana/images/v3.7.42-2 logging-fluentd/images/v3.7.42-2 logging-auth-proxy/images/v3.7.42-2
Created attachment 1415002 [details] can show the ES metrics now
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636