Description of problem: We found that for some reason the golang debug net/http/pprof endpoint is exposed within an OCP cluster on the master nodes. For example, get the IP of a node in OCP and then you can query the debug path: curl <node ip>:10251/debug/pprof/goroutine?debug=1 There is a potential to be able to call this from arbitrary points in the cluster. Although this obviously depends on the environment and if worker pods can reach masters. Listing the listening ports and their processes on a node, we believed we've narrowed it down to the `kube-scheduler` process: https://github.com/kubernetes/kubernetes/blob/ea0764452222146c47ec826977f49d7001b0ea8c/staging/src/k8s.io/apiserver/pkg/server/routes/profiling.go#L30 And it's set on by default here: https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/apis/config/v1beta1/defaults.go#L168 We believe it only affects OCP as testing with minikube it does not seem to behave the same: $ curl -k https://localhost:10259/debug/pprof/goroutine { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "forbidden: User \"system:anonymous\" cannot get path \"/debug/pprof/goroutine\"", "reason": "Forbidden", "details": { }, "code": 403 I'm assuming that endpoint is protected via kube-rbac or something. But it looks the like a similar bug affecting the kubelet, https://github.com/kubernetes/kubernetes/issues/81023 and with the article, https://mmcloughlin.com/posts/your-pprof-is-showing Expected results: Whilst you still need local access to the cluster first and it has limited use, we feel it should be closed down if possible or protected by something like rbac to limit the information gathering. Would expect similar to k8s.
https://github.com/kubernetes/kubernetes/pull/96345/files#diff-d2ca723d710873ea0fa67bb8b79cbe4f6921355fb07368c842819499051c7c53L36 removes insecure bits from the kube-scheduler. We just need to wait until the change gets into 4.10 through a rebase. Once done, we can backport it to 4.9 as well. See https://bugzilla.redhat.com/show_bug.cgi?id=1889488 for more details.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
As the rule here is to first make the changes to the master branch and only then backport, I am still waiting for the next 4.10 rebase. Once done, I will backport the necessary changes to 4.9.
The LifecycleStale keyword was removed because the needinfo? flag was reset. The bug assignee was notified.
I am waiting for the v1.23 kubernetes rebase in https://github.com/openshift/kubernetes/pull/1087.
The insecure port 10251 got removed.
The LifecycleStale keyword was removed because the bug moved to QE. The bug assignee was notified.
Tested build with below nightly and i see that kube-scheduler does not serve on any port with number 10251 and it serves on a secure port now which is 10259. [knarra@knarra ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.0 True False 106m Cluster version is 4.10.0-rc.0 name: kube-scheduler ports: - containerPort: 10259 hostPort: 10259 protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: healthz port: 10259 scheme: HTTPS But i do see 10251 port when issued command `oc get pod <pod_name> -n openshift-kube-scheduler` but when checked with dev below is what they have suggested. - It is only a simple bash check. The 10251 is no longer used anywhere. I will remove it from the spec in 4.11. No need for a bug report. - Already filed a PR to remove the same in 4.11 , Removing in https://github.com/openshift/cluster-kube-scheduler-operator/pull/41 Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056