Bug 1889488 - The metrics endpoint for the Scheduler is not protected by RBAC
Summary: The metrics endpoint for the Scheduler is not protected by RBAC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
Michael Burke
URL:
Whiteboard:
Depends On:
Blocks: 2008733
TreeView+ depends on / blocked
 
Reported: 2020-10-19 18:45 UTC by Kirsten Newcomer
Modified: 2022-03-10 16:02 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Removed functionality
Doc Text:
In {product-title} 4.10, Code for serving insecure metrics is removed from the `kube-scheduler` code base. Now, metrics are served only through a secure server. Bug fixes and support are provided through the end of a future life cycle. After which, no new feature enhancements are made.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:02:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
controller-unauthorized-output (175 bytes, text/plain)
2020-10-19 19:08 UTC, Kirsten Newcomer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 96345 0 None Merged refactor: disable insecure serving in kube-scheduler 2022-01-27 14:06:21 UTC
Github openshift cluster-kube-scheduler-operator pull 316 0 None Merged bug 1889488: Have probes listen to secure ports 2022-01-27 14:06:17 UTC
Github openshift kubernetes pull 1087 0 None Merged Bug 2033751: Kube 1.23.0 rebase 2022-01-27 14:06:14 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:02:50 UTC

Internal Links: 1897630

Description Kirsten Newcomer 2020-10-19 18:45:38 UTC
Description of problem:
The metrics endpoint for the Scheduler is not protected by RBAC

Version-Release number of selected component (if applicable):
OCP 4.5

How reproducible:
Consistently

Steps to Reproduce:
oc project openshift-kube-scheduler
POD=$(oc get pods -l app=openshift-kube-scheduler -o jsonpath='{.items[0].metadata.name}')
PORT=$(oc get pod $POD -o jsonpath='{.spec.containers[0].livenessProbe.httpGet.port}')
# Should return 403 Forbidden
oc rsh ${POD} curl https://localhost:${PORT}/metrics -k
 
# Create a service account to test RBAC
oc create sa permission-test-sa
 
# Should return 403 Forbidden
SA_TOKEN=$(oc sa get-token permission-test-sa)
oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k
 
# As cluster admin, should succeed
CLUSTER_ADMIN_TOKEN=$(oc whoami -t)
oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k
 
# Cleanup
oc delete sa permission-test-sa


Actual results:
Metrics are returned. See private attachment.


Expected results:
403 Forbidden


Additional info:
This test maps to CIS Kube item 1.4.1. OCP 4.5 fails this test
Credit to Khaled Janania for finding this

Comment 2 Kirsten Newcomer 2020-10-19 18:55:21 UTC
By contrast, attempting the same thing for the Controller Manager resturns 403 Forbidden. 

oc project openshift-kube-contoller-manager
POD=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].metadata.name}')
PORT=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].spec.containers[0].ports[0].hostPort}')

# Should return 403 Forbidden
oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -k
 
# Create a service account to test RBAC
oc create -n openshift-kube-controller-manager sa permission-test-sa
 
# Should return 403 Forbidden
SA_TOKEN=$(oc sa -n openshift-kube-controller-manager get-token permission-test-sa)
oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k
 
# As cluster admin, should succeed
CLUSTER_ADMIN_TOKEN=$(oc whoami -t)
oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k
 
# Cleanup
oc delete -n openshift-kube-controller-manager sa permission-test-sa

Comment 3 Kirsten Newcomer 2020-10-19 19:05:08 UTC
Created attachment 1722676 [details]
controller-sucess-output

Comment 4 Kirsten Newcomer 2020-10-19 19:08:04 UTC
Created attachment 1722678 [details]
controller-unauthorized-output

Comment 10 Jan Chaloupka 2020-10-23 11:07:26 UTC
Need more time to properly evaluate the solution for the issue

Comment 12 Kirsten Newcomer 2020-10-23 20:44:41 UTC
I see from the following that the scheme is set to HTTP for livenessProbe and readinessProbe for the scheduler. 

oc -n openshift-kube-scheduler get cm kube-scheduler-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers'


 "livenessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10251,
        "scheme": "HTTP"
      },
      "initialDelaySeconds": 45
    },
    "readinessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10251,
        "scheme": "HTTP"

The scheme for these for Controller manager is set to HTTPS.

oc -n openshift-kube-controller-manager get cm kube-controller-manager-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers'

    "livenessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10357,
        "scheme": "HTTPS"
      },
      "initialDelaySeconds": 45,
      "timeoutSeconds": 10
    },
    "readinessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10357,
        "scheme": "HTTPS"

Comment 16 Jan Chaloupka 2020-11-05 13:31:43 UTC
The /metrics endpoint is registered for both http and https.

Checking https endpoint:

```
POD=$(oc get pods -n openshift-kube-scheduler -l app=openshift-kube-scheduler -o jsonpath='{.items[0].metadata.name}')
PORT=$(oc get pods -n openshift-kube-scheduler -l app=openshift-kube-scheduler -o jsonpath='{.items[0].spec.containers[0].ports[0].hostPort}')
CLUSTER_ADMIN_TOKEN=$(oc whoami -t)
oc rsh -n openshift-kube-scheduler ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k
```

I am able to get the metrics.

Prometheus is collecting the metrics through https so it's safe to disable http unless Prometheus needs an anonymous access. On the other hand, kube-controller-manager does not provide metrics through http.

So the insecure bits needs to be disabled. However, their initialization is hardcoded: https://github.com/openshift/kubernetes/blob/36083e429212a2e46c7243942748a258eb714b61/cmd/kube-scheduler/app/options/options.go#L92-L101. I did not find a way how to disable registering the insecure bits through options/component config configuration. Code changes are required.

Comment 17 Jan Chaloupka 2020-11-13 12:07:22 UTC
More time is needed to discuss the right way to fix this upstream first.

Comment 18 Jan Chaloupka 2020-11-18 14:19:59 UTC
https://github.com/kubernetes/kubernetes/pull/96345 will not make it for 1.20. Though, we can still pick the PR as a partial upstream PR and make sure the changes are properly tested in our environment. The PR is passing all upstream tests up to clusterload2 test which still collects the metrics through the insecure port.

Comment 19 Jan Chaloupka 2021-01-15 11:10:32 UTC
Upstream PR is still getting reviews, it will take a bit more time to have it merged.

Comment 23 Jan Chaloupka 2021-05-20 14:08:05 UTC
Upstream PR is still under review.

Comment 24 Maciej Szulik 2021-06-08 12:03:47 UTC
We disabled insecure serving in the operator, the remaining work is to ensure kube-scheduler is not serving insecure at all, which is happening in k8s 1.22.

Comment 28 Jan Chaloupka 2021-09-15 10:34:44 UTC
The upstream kubernetes/kubernetes PR merged.

Comment 29 Jan Chaloupka 2021-09-15 10:37:02 UTC
https://github.com/openshift/cluster-kube-scheduler-operator/pull/316 was merged in January. Moving to MODIFIED to allow QE to test it.

Comment 32 Nirali Dabhi 2021-11-12 15:19:01 UTC
Hi,

Just floating back to see if there is any new information.  Do we have any update on this?

Thank you.

Comment 33 Jan Chaloupka 2021-11-25 15:27:12 UTC
Still waiting for the rebase

Comment 39 Jan Chaloupka 2022-01-20 22:12:29 UTC
https://github.com/openshift/kubernetes/pull/1087 merged

Comment 41 RamaKasturi 2022-01-24 09:57:17 UTC
Verified with the build below and i see that metrics endpoint for kube-scheduler are protected by RBAC. Below are the steps i have followed to verify on a 4.10 cluster.

# oc project openshift-kube-scheduler

# POD=$(oc get pods -l app=openshift-kube-scheduler -o jsonpath='{.items[0].metadata.name}')
# PORT=$(oc get pod $POD -o jsonpath='{.spec.containers[0].livenessProbe.httpGet.port}')
# oc rsh ${POD} curl https://localhost:${PORT}/metrics -k
Returns forbidden error
# oc create sa permission-test-sa
# SA_TOKEN=$(oc sa get-token permission-test-sa)
# oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k
Returns forbidden error
# CLUSTER_ADMIN_TOKEN=$(oc whoami -t)
# oc rsh ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k
Returns metrics

#cleanup
oc delete sa permission-test-sa


I see from the following that the scheme is set to HTTPS for livenessProbe and readinessProbe for the scheduler

oc -n openshift-kube-scheduler get cm kube-scheduler-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers'

"livenessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10259,
        "scheme": "HTTPS"
      },
      "initialDelaySeconds": 45
    },
    "readinessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10259,
        "scheme": "HTTPS"
      },
      "initialDelaySeconds": 45


Tried similar one for KCM as well and i see that it works fine :
===================================================================
# oc project openshift-kube-controller-manager
# POD=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].metadata.name}')
# PORT=$(oc get pods -n openshift-kube-controller-manager -l app=kube-controller-manager -o jsonpath='{.items[0].spec.containers[0].ports[0].hostPort}')
# oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -k
Returns forbidden error
# oc create -n openshift-kube-controller-manager sa permission-test-sa
# SA_TOKEN=$(oc sa -n openshift-kube-controller-manager get-token permission-test-sa)
# oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $SA_TOKEN" -k
Returns 403 forbidden error
# CLUSTER_ADMIN_TOKEN=$(oc whoami -t)
# oc rsh -n openshift-kube-controller-manager ${POD} curl https://localhost:${PORT}/metrics -H "Authorization: Bearer $CLUSTER_ADMIN_TOKEN" -k
Returns metrics

scheme for kube-scheduler is set to https:
===========================================
oc -n openshift-kube-controller-manager get cm kube-controller-manager-pod -o json | jq -r '.data."pod.yaml"' | jq '.spec.containers

"livenessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10357,
        "scheme": "HTTPS"
      },
      "initialDelaySeconds": 45,
      "timeoutSeconds": 10
    },
    "readinessProbe": {
      "httpGet": {
        "path": "healthz",
        "port": 10357,
        "scheme": "HTTPS"
      },
      "initialDelaySeconds": 10,
      "timeoutSeconds": 10


Tried to Reproduce the same with 4.5 cluster and i see that kube-scheduler was always returning metrics and kube-controller-manger is working as expected. Also scheme for kube-scheduler is set to HTTP and kube-controller-manager is set to HTTPS.

Based on the above moving bug to verified state.

Comment 44 errata-xmlrpc 2022-03-10 16:02:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.