Description of problem: Metrics endpoint for monitor-sdn is not using TLS to encrypt traffic. Version-Release number of selected component (if applicable): 4.4 (possibly also earlier versions) How reproducible: Always Steps to Reproduce: 1. Start a cluster 2. Go to prometheus UI 3. Check connection schema for this component Actual results: Metrics are exposed over HTTP connection Expected results: Metrics are exposed over HTTPS connection Additional info: API server operator ServiceMonitor definition can be used as a template on how to fix this issue: https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/manifests/0000_90_openshift-apiserver-operator_03_servicemonitor.yaml
Can you check this please? Based on what we saw on the same bug filed against the router, this may be a spurious report.
To be precise, this bug is about openshift-sdn/monitor-sdn component.
Created attachment 1667571 [details] Screenshot for openshift-sdn/monitor-sdn component
Reproduce the issue in 4.4.0-0.nightly-2020-03-02-201804, openshift-sdn/monitor-sdn component is using http not https to expose metrics such as http://10.0.129.14:9101/metrics
Hi zhanqi, Do we need write a test case for this kind of UI testing? Thanks, Weibin
Jake can you take this please. Ricky has a lot on his plate.
Ben, I'm moving this to 4.5. This is *not* a regression, and IMHO not anywhere close to being the worst of our problems, after all if you are having a MiTM here, having false kube-proxy metrics is the smallest of your problems. There's a fair amount of work here, CNO is creating all the ServiceMonitors over plain HTTP, but the main issue here is that the servers themselves are hardcoded for http Regarding openshift-sdn, which probably our biggest concern at the moment, the metrics server comes from vendored kubernetes code which is hardcoded hardcoded to use plain HTTP. The only way to get this merged on 4.4.0 on time is to make a temporary a patch in our local branch while we get this merged upstream: https://github.com/kubernetes/kubernetes/blob/release-1.18/cmd/kube-proxy/app/server.go#L61 Regarding multus, it also has the HTTP hardcoded: https://github.com/openshift/multus-admission-controller/blob/1dfbb0a/cmd/webhook/main.go#L147 Regarding OVN, also has HTTP hardcoded: https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/util/metrics.go#L29 Don't get me wrong, I think this is important, but I wonder if this shouldn't be an RFE and priotized as such instead of a bug. I would like to hear your thoughts regarding the priorization of this.
You can always expose metrics on localhost and put a sidecar proxy container with kube-rbac-proxy which would do TLS termination. This is a common pattern for services exposing metrics on plain http endpoints. Example: https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/node-exporter/daemonset.yaml#L55-L81
After fixing please remove your component from an exclusion list in e2e tests at https://github.com/openshift/origin/blob/master/test/extended/prometheus/prometheus.go#L253-L268
Can be merged already but we're deliberately delaying this to 4.6
Verified this bug on 4.6.0-0.nightly-2020-07-25-091217
Created attachment 1702610 [details] 4.6 version
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196