Description of problem: The metrics endpoint for the Scheduler is not protected by RBAC Version-Release number of selected component (if applicable): OCP 4.5 How reproducible: Consistently Steps to Reproduce: oc project openshift-kube-scheduler POD=$(oc get pods -l app=openshift-kube-scheduler -o jsonpath='{.items[0].metadata.name}') PORT=$(oc get pod $POD -o jsonpath='{.spec.containers[0].livenessProbe.httpGet.port}') # Should return 403 Forbidden oc rsh ${POD} curl https://localhost:${PORT}/metrics -k # Create a service account to test RBAC oc create sa permission-test-sa # Should return 403 Forbidden SA_TOKEN=$(oc sa get-token permission-test-sa) oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k # As cluster admin, should succeed CLUSTER_ADMIN_TOKEN=$(oc whoami -t) oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k # Cleanup oc delete sa permission-test-sa Actual results: Metrics are returned. See private attachment. Expected results: 403 Forbidden Additional info: This test maps to CIS Kube item 1.4.1. OCP 4.5 fails this test Credit to Khaled Janania for finding this
By contrast, attempting the same thing for the Controller Manager resturns 403 Forbidden. oc project openshift-kube-contoller-manager POD=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].metadata.name}') PORT=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].spec.containers[0].ports[0].hostPort}') # Should return 403 Forbidden oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -k # Create a service account to test RBAC oc create -n openshift-kube-controller-manager sa permission-test-sa # Should return 403 Forbidden SA_TOKEN=$(oc sa -n openshift-kube-controller-manager get-token permission-test-sa) oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k # As cluster admin, should succeed CLUSTER_ADMIN_TOKEN=$(oc whoami -t) oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k # Cleanup oc delete -n openshift-kube-controller-manager sa permission-test-sa
Created attachment 1722676 [details] controller-sucess-output
Created attachment 1722678 [details] controller-unauthorized-output
Need more time to properly evaluate the solution for the issue
I see from the following that the scheme is set to HTTP for livenessProbe and readinessProbe for the scheduler. oc -n openshift-kube-scheduler get cm kube-scheduler-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers' "livenessProbe": { "httpGet": { "path": "healthz", "port": 10251, "scheme": "HTTP" }, "initialDelaySeconds": 45 }, "readinessProbe": { "httpGet": { "path": "healthz", "port": 10251, "scheme": "HTTP" The scheme for these for Controller manager is set to HTTPS. oc -n openshift-kube-controller-manager get cm kube-controller-manager-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers' "livenessProbe": { "httpGet": { "path": "healthz", "port": 10357, "scheme": "HTTPS" }, "initialDelaySeconds": 45, "timeoutSeconds": 10 }, "readinessProbe": { "httpGet": { "path": "healthz", "port": 10357, "scheme": "HTTPS"
The /metrics endpoint is registered for both http and https. Checking https endpoint: ``` POD=$(oc get pods -n openshift-kube-scheduler -l app=openshift-kube-scheduler -o jsonpath='{.items[0].metadata.name}') PORT=$(oc get pods -n openshift-kube-scheduler -l app=openshift-kube-scheduler -o jsonpath='{.items[0].spec.containers[0].ports[0].hostPort}') CLUSTER_ADMIN_TOKEN=$(oc whoami -t) oc rsh -n openshift-kube-scheduler ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k ``` I am able to get the metrics. Prometheus is collecting the metrics through https so it's safe to disable http unless Prometheus needs an anonymous access. On the other hand, kube-controller-manager does not provide metrics through http. So the insecure bits needs to be disabled. However, their initialization is hardcoded: https://github.com/openshift/kubernetes/blob/36083e429212a2e46c7243942748a258eb714b61/cmd/kube-scheduler/app/options/options.go#L92-L101. I did not find a way how to disable registering the insecure bits through options/component config configuration. Code changes are required.
More time is needed to discuss the right way to fix this upstream first.
https://github.com/kubernetes/kubernetes/pull/96345 will not make it for 1.20. Though, we can still pick the PR as a partial upstream PR and make sure the changes are properly tested in our environment. The PR is passing all upstream tests up to clusterload2 test which still collects the metrics through the insecure port.
Upstream PR is still getting reviews, it will take a bit more time to have it merged.
Upstream PR is still under review.
We disabled insecure serving in the operator, the remaining work is to ensure kube-scheduler is not serving insecure at all, which is happening in k8s 1.22.
The upstream kubernetes/kubernetes PR merged.
https://github.com/openshift/cluster-kube-scheduler-operator/pull/316 was merged in January. Moving to MODIFIED to allow QE to test it.
Hi, Just floating back to see if there is any new information. Do we have any update on this? Thank you.
Still waiting for the rebase
https://github.com/openshift/kubernetes/pull/1087 merged
Verified with the build below and i see that metrics endpoint for kube-scheduler are protected by RBAC. Below are the steps i have followed to verify on a 4.10 cluster. # oc project openshift-kube-scheduler # POD=$(oc get pods -l app=openshift-kube-scheduler -o jsonpath='{.items[0].metadata.name}') # PORT=$(oc get pod $POD -o jsonpath='{.spec.containers[0].livenessProbe.httpGet.port}') # oc rsh ${POD} curl https://localhost:${PORT}/metrics -k Returns forbidden error # oc create sa permission-test-sa # SA_TOKEN=$(oc sa get-token permission-test-sa) # oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k Returns forbidden error # CLUSTER_ADMIN_TOKEN=$(oc whoami -t) # oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k Returns metrics #cleanup oc delete sa permission-test-sa I see from the following that the scheme is set to HTTPS for livenessProbe and readinessProbe for the scheduler oc -n openshift-kube-scheduler get cm kube-scheduler-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers' "livenessProbe": { "httpGet": { "path": "healthz", "port": 10259, "scheme": "HTTPS" }, "initialDelaySeconds": 45 }, "readinessProbe": { "httpGet": { "path": "healthz", "port": 10259, "scheme": "HTTPS" }, "initialDelaySeconds": 45 Tried similar one for KCM as well and i see that it works fine : =================================================================== # oc project openshift-kube-controller-manager # POD=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].metadata.name}') # PORT=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].spec.containers[0].ports[0].hostPort}') # oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -k Returns forbidden error # oc create -n openshift-kube-controller-manager sa permission-test-sa # SA_TOKEN=$(oc sa -n openshift-kube-controller-manager get-token permission-test-sa) # oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k Returns 403 forbidden error # CLUSTER_ADMIN_TOKEN=$(oc whoami -t) # oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k Returns metrics scheme for kube-scheduler is set to https: =========================================== oc -n openshift-kube-controller-manager get cm kube-controller-manager-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers "livenessProbe": { "httpGet": { "path": "healthz", "port": 10357, "scheme": "HTTPS" }, "initialDelaySeconds": 45, "timeoutSeconds": 10 }, "readinessProbe": { "httpGet": { "path": "healthz", "port": 10357, "scheme": "HTTPS" }, "initialDelaySeconds": 10, "timeoutSeconds": 10 Tried to Reproduce the same with 4.5 cluster and i see that kube-scheduler was always returning metrics and kube-controller-manger is working as expected. Also scheme for kube-scheduler is set to HTTP and kube-controller-manager is set to HTTPS. Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056