Description of problem: this testcase shows how a user can see metrics of namespaces for which it has no rights: ================ oc get users NAME UID FULL NAME IDENTITIES user1 0d96654a-3437-4d04-b677-d6b96ea46bf6 my_htpasswd_provider:user1 user2 47f6e3d6-8a82-4de3-9b22-a3caf1c0c4d2 my_htpasswd_provider:user2 oc login -u user2 -p secret12 Login successful. You don't have any projects. You can try to create a new project, by running oc new-project <projectname> oc login -u user1 -p secret12 Login successful. You don't have any projects. You can try to create a new project, by running oc new-project <projectname> I have two namespaces with the example prometheus app deployed: oc get pods -n ns1 NAME READY STATUS RESTARTS AGE prometheus-example-app-79697bd67f-f4z4c 1/1 Running 0 24h oc get pods -n ns2 NAME READY STATUS RESTARTS AGE prometheus-example-app-79697bd67f-8tcnt 1/1 Running 0 24h [gparente@gparente ocp4]$ I will give the view role to user1 for ns1 and to user2 for ns2 oc policy add-role-to-user view user1 -n ns1 oc policy add-role-to-user view user2 -n ns2 Now I login as user1: oc login -u user1 -p secret12 Login successful. You have one project on this server: "ns1" Using project "ns1". TOKEN=`oc whoami -t` curl -X GET -k "https://prometheus-user-workload-openshift-user-workload-monitoring.apps.gparente46latest.emeashift.support/api/v1/query?query=up" -H "Authorization: Bearer $TOKEN" {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"up","endpoint":"web","instance":"10.131.0.7:8080","job":"prometheus-example-app","namespace":"ns1","pod":"prometheus-example-app-79697bd67f-f4z4c","service":"prometheus-example-app"},"value":[1609417416.487,"1"]}]}} I see only the ns1 pod information. I login as user2: oc login -u user2 -p secret12 Login successful. You have one project on this server: "ns2" Using project "ns2". TOKEN=`oc whoami -t` {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"up","endpoint":"web","instance":"10.131.0.7:8080","job":"prometheus-example-app","namespace":"ns1","pod":"prometheus-example-app-79697bd67f-f4z4c","service":"prometheus-example-app"},"value":[1609417495.911,"1"]}]}} I can see the metrics of ns1 pod as well. ========================= users should query thanos receiver instead of prometheus users workloads. Even if the prometheus user route is not exposed, we can query metrics of any namespace using prometheus endpoint like this: ============== oc adm policy add-role-to-user admin user2 -n ns2 oc adm policy add-role-to-user admin user1 -n ns1 oc login -u user2 -p secret12 oc run curl --image=curlimages/curl --command -- sleep 3600 oc rsh curl curl -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://prometheus-use r-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"up","endpoint":"web","instance":"10.131.0.7:8080","job":"prometheus-example-app","namespace":"ns1","pod":"prometheus-example-app-79697bd67f-f4z4c","service":"prometheus-example-app"},"value":[1609948256.49,"1"]}]}} ============= user2 can see ns1 metrics and user2 has no rights on ns1 namespace. Version-Release number of selected component (if applicable): 4.6 How reproducible: always
reproduced the issue on payload 4.7.0-0.nightly-2021-01-14-014511
Hello Simon Pasquier, Yesterday I had a TAM regular conference call with the NEC OCP team and I've confirmed their concern as below, would you please take a look and triage this sooner ? Feedback from NEC: ~~~ For now, unprivileged users can call the Prometheus API for metrics, but they can also view metrics for other users. This should be seen as a kind of security issue, and NEC believes it needs to be fixed sooner. Does RH have an ETA? One of NEC's end customers, a major Japanese bank, wants to use this feature to monitor their workloads, but due to this type of security issue, it cannot be put into production. This is because compliance requires strict security requirements. ~~~ I am grateful for your help and support. Thank you, BR, Masaki
verified with payload 4.7.0-0.nightly-2021-02-03-165316 login with kubeadmin create testuser-0 create one project ns1 and deploy prometheus-example-app get token of user testuser-0 #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091/metrics' Forbidden (user=testuser-0, verb=get, resource=, subresource=) #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up' 404 page not found #oc adm policy add-cluster-role-to-user cluster-monitoring-operator testuser-0 #oc adm policy add-role-to-user view testuser-0 -n ns1 #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091/metrics' # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 2.8409e-05 go_gc_duration_seconds{quantile="0.25"} 5.938e-05 -------------------- #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up' 404 page not found
Hi, It seems that Comment 9's verification is wrong. The problem was reported for the prometheus running on openshift-user-workload-monitoring, but this verification tested for the prometheus running on openshift-monitoring. Why did Red Hat try to verify with the different way from Comment 1?
@Masaki Thanks for the sharp eyes. Assigning back to QA for additional verification.
@Masaki and @Simon I don't think there is anything wrong with the verification, I did verified prometheus running on openshift-user-workload-monitoring. I checked the svc https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091 under different situations with different token. It many be confusing for the part 'oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H', the pod here doesn't matter at all.
Hi Hongyan-san, Thank you for commenting. > It many be confusing for the part 'oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H', the pod here doesn't matter at all. Yes, the above is right. But I have another question. What the problem this bugzilla reported is that user1 can see metrics of application deployed by user2. But in your verification, only there is only one user(testuser-0). So how did you check if the original problem was fixed?
And, really is 'go_gc_duration_seconds' the metric of your application? It is the metric of prometheus-k8s-0 itself, I think. So your verification is for checking whether user1 can see metrics of prometheus-k8s-0 itself. It's not for checking whether user1 can see metrics of application deployed by user2.
And, I'm still wonder "Why did Red Hat try to verify with the different way from Comment 1?".
Test with payload:4.7.0-0.nightly-2021-02-17-130606 login with cluster-admin, create two projects and deploy prometheus-example-app respectively #oc new-project ns1 #oc get pod -n ns1 NAME READY STATUS RESTARTS AGE prometheus-example-app-7c887b8bb-kc6xh 1/1 Running 0 19s #oc new-project ns2 #oc get pod -n ns2 NAME READY STATUS RESTARTS AGE prometheus-example-app-7c887b8bb-kvbfw 1/1 Running 0 8m24s #oc policy add-role-to-user admin testuser-1 -n ns1 #oc policy add-role-to-user admin testuser-2 -n ns2 ------------------------------------------------------------------- #oc login -u testuser-1 -p secret Login successful. You have one project on this server: "ns1" ---------- #oc login -u testuser-2 -p secret Login successful. You have one project on this server: "ns2" -----get token of testuser-2 #toke=`oc whoami -t` #oc run curl --image=curlimages/curl --command -- sleep 3600 #oc rsh curl #curl -k -H "Authorization: Bearer $token https://prometheus-user-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up The above curl command get nothing ------------------------------------------------------------------- login with cluster-admin #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up' #oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up' Both the above commands get message '404 page not found'
Update comments 17 #oc login -u testuser-2 -p secret Login successful. You have one project on this server: "ns2" -----get token of testuser-2 #toke=`oc whoami -t` #oc run curl --image=curlimages/curl --command -- sleep 3600 #oc rsh curl #curl -k -H "Authorization: Bearer $token" https://prometheus-user-workload.openshift-user-workload-monitoring.svc.cluster.local:9091/api/v1/query?query=up Get message "404 page not found" This is expected according to the fix solution in the doc text feild Authenticated requests without cluster-wide permission on /metrics trying to query the /api/v1/query and /api/v1/query_range endpoints get a 404 status code.
Thank you for testing, but unfortunately the verification was still not enough... Yes, you checked if user1 cannot see metrics of application deployed by user2? However, how about whether user1 can see metrics of application deployed by *user1*? As far as we tested with OCP4.7-rc.0, user1 cannot see even metrics of application deployed by user1... $ ./oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091/metrics' Forbidden (user=user1, verb=get, resource=, subresource=) $ ./oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091/api/v1/query?query=up' 404 page not found It's a regression. You verified that the original bug was fixed, but you should verify whether other related functions still can work correctly or not.
Dear Simon, As I mentioned at Comment 19, it seems there is a regression. Could you check it?
testuser-2 can't see metrics of project ns2, even give an admin/view role, the forbidden message is expected. For role admin and view have no rule - nonResourceURLs: - /metrics verbs: - get # oc get clusterroles view -oyaml|grep -i nonresource -A6 # oc get clusterroles admin -oyaml|grep -i nonresource -A6 # oc get clusterroles cluster-monitoring-operator -oyaml|grep -i nonresource -A6 - nonResourceURLs: - /metrics verbs: - get - apiGroups: - "" resources: #oc policy add-role-to-user admin testuser-2 -n ns2 $./oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091/metrics' Forbidden (user=user1, verb=get, resource=, subresource=) #oc adm policy add-cluster-role-to-user cluster-monitoring-operator testuser-2 oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-user-workload.openshift-user-workload-monitoring.svc:9091/metrics' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 3.0286e-05 go_gc_duration_seconds{quantile="0.25"} 0.000156758 go_gc_duration_seconds{quantile="0.5"} 0.00018472 go_gc_duration_seconds{quantile="0.75"} 0.000216007 go_gc_duration_seconds{quantile="1"} 0.000617816 go_gc_duration_seconds_sum 0.013435326
We did verified all the related functions and comments 9 including the test steps. There is no regression here.
??? The regression I pointed out is that user became unable to see even metrics of application deployed by user itself. Where did you test this? > testuser-2 can't see metrics of project ns2, If this is right, this is a regression, isn't this? And I pointed out that 'go_gc_duration_seconds' is not a metric of your application. It is a metric of prometheus-k8s-0 itself. Why do you stick to check it? Non-admin user don't need to check a metric of prometheus-k8s-0 itself. They want to check a metric of their application. What kind of test you want to do? If user cannot see even metrics of application deployed by user itself, user-workload monitoring is no longer useful, isn't it?
(In reply to Masaki Hatada from comment #14) > Hi Hongyan-san, > > Thank you for commenting. > > > It many be confusing for the part 'oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H', the pod here doesn't matter at all. > > Yes, the above is right. > > But I have another question. > > What the problem this bugzilla reported is that user1 can see metrics of > application deployed by user2. > But in your verification, only there is only one user(testuser-0). So how > did you check if the original problem was fixed? I create one user because the user can't query data about prometheus-example-app in its project, that is, the project the user had admin/view role. It's unnecessary to create 2 users for /api/v1/query? is forbidden under any situation.
(In reply to Masaki Hatada from comment #23) > ??? > > The regression I pointed out is that user became unable to see even metrics > of application deployed by user itself. > Where did you test this? > > > testuser-2 can't see metrics of project ns2, > > If this is right, this is a regression, isn't this? > > And I pointed out that 'go_gc_duration_seconds' is not a metric of your > application. It is a metric of prometheus-k8s-0 itself. > Why do you stick to check it? > Non-admin user don't need to check a metric of prometheus-k8s-0 itself. They > want to check a metric of their application. > > What kind of test you want to do? > If user cannot see even metrics of application deployed by user itself, > user-workload monitoring is no longer useful, isn't it? user became unable to see even metrics of application deployed by user itself, this is not a regression testuser-2 can't see metrics of project ns2 from prometheus-user-workload Where did you test this, you can only see metrics from thano-qerier
With the following command you can get metrics and user-app, and this part we have tested a user can only query metrics about his app and can't query metrics of other users' app. This part has nothing with the bug. #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?query=query=up'
(In reply to hongyan li from comment #26) > With the following command you can get metrics and user-app, and this part > we have tested a user can only query metrics about his app and can't query > metrics of other users' app. This part has nothing with the bug. > #oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k > -H "Authorization: Bearer $token" > 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/ > query?query=query=up' Hummm, It's a new information... Ok, we will do test. But, could you include this in your verification result? It's the most important information for this verification.
And, if your opinion(we need to access thanos-querier to get metrics of user applications) is right, https://access.redhat.com/solutions/5151831 should be updated. Who will update this?
Test results for thanos-querier login with cluster-admin, create two projects and deploy prometheus-example-app respectively #oc new-project ns1 #oc get pod -n ns1 NAME READY STATUS RESTARTS AGE prometheus-example-app-7c887b8bb-kc6xh 1/1 Running 0 19s #oc new-project ns2 #oc get pod -n ns2 NAME READY STATUS RESTARTS AGE prometheus-example-app-7c887b8bb-kvbfw 1/1 Running 0 8m24s #oc policy add-role-to-user admin testuser-1 -n ns1 #oc policy add-role-to-user admin testuser-2 -n ns2 ------------------------------------------------------------------- #oc login -u testuser-1 -p secret Login successful. You have one project on this server: "ns1" ---------- #oc login -u testuser-2 -p secret Login successful. You have one project on this server: "ns2" -----get token of testuser-2 #toke=`oc whoami -t` ------------------------------------------------------------------- login with cluster-admin $ oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?query=up&namespace=ns2' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 363 100 363 0 0 12517 0 --:--:-- --:--:-- --:--:-- 12517 {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"up","endpoint":"web","instance":"10.129.2.42:8080","job":"prometheus-example-app","namespace":"ns2","pod":"prometheus-example-app-7c887b8bb-kvbfw","prometheus":"openshift-user-workload-monitoring/user-workload","service":"prometheus-example-app"},"value":[1613628023.118,"1"]}]}} $ oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?query=up&namespace=ns1' % Total % Received % Xferd Average Speed Time Time Time Current Dload Forbidden (user=testuser-2, verb=get, resource=pods, subresource=) Upload Total Spent Left Speed 100 67 100 67 0 0 2310 0 --:--:-- --:--:-- --:--:-- 2310 $ oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?query=up' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 56 100 56 0 0 1142 0 --:--:-- --:--:-- --:--:-- 1142 Bad Request. The request or configuration is malformed.
(In reply to Masaki Hatada from comment #28) > And, if your opinion(we need to access thanos-querier to get metrics of user > applications) is right, https://access.redhat.com/solutions/5151831 should > be updated. > Who will update this? I don't know who should updata this, I am asking in slack channel, I will post here if I get the answer.
(In reply to Masaki Hatada from comment #28) > And, if your opinion(we need to access thanos-querier to get metrics of user > applications) is right, https://access.redhat.com/solutions/5151831 should > be updated. > Who will update this? Yes I confirm that to access user metrics, you need to go through the Thanos querier API. The prometheus-user-workload.openshift-user-workload-monitoring.svc service only exists to expose Prometheus internal metrics. I'll follow up the person that wrote the article to remove the confusion. Thanks for your patience.
Thank you for updating and testing multiple times. We got the same result as Comment 29. Our concerns are gone. $ ./oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?query=up&namespace=ns1' {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"up","endpoint":"web","instance":"10.130.0.5:8080","job":"prometheus-example-app","namespace":"ns1","pod":"prometheus-example-app-7f8f8b8d4-cqcxt","prometheus":"openshift-user-workload-monitoring/user-workload","service":"prometheus-example-app"},"value":[1613630574.467,"1"]}]}} $ ./oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9092/api/v1/query?query=up&namespace=ns2' Forbidden (user=user1, verb=get, resource=pods, subresource=) > I'll follow up the person that wrote the article to remove the confusion. We are looking forward to the update of this.
hi, I will update the article to show that the query has to be done use thanos-querier service or the exposed route of it. regards, German.
@German and @Masaki I find I have access and have drafted a solution, I don't know the update process, you can delete it if it doesn't make sense. https://access.redhat.com/solutions/5151831/moderation
@Hongyan, thanks a lot. I cannot see your former link. I will contact you in irc or slack. regards, German.
Thanks a lot to Hongyan. The article has been modified with the right information.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633