Bug 1809205 - Metrics exposed over insecure channel
Summary: Metrics exposed over insecure channel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: zhaozhanqi
URL:
Whiteboard: SDN-CI-IMPACT
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-02 15:09 UTC by Pawel Krupa
Modified: 2020-10-27 15:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Expose SDN metrics over TLS Reason: We don't any traffic over plain HTTP at all. Result: OVN metrics exposed on an SDN endpoint.
Clone Of:
Environment:
Last Closed: 2020-10-27 15:56:17 UTC
Target Upstream Version:
Embargoed:
jdesousa: needinfo-


Attachments (Terms of Use)
Screenshot for openshift-sdn/monitor-sdn component (142.96 KB, image/png)
2020-03-04 19:28 UTC, Weibin Liang
no flags Details
4.6 version (44.74 KB, image/png)
2020-07-28 06:59 UTC, zhaozhanqi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 574 0 None closed Bug 1809205: Configure TLS for OpenShift SDN metrics endpoint 2020-09-21 08:40:31 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:56:20 UTC

Description Pawel Krupa 2020-03-02 15:09:44 UTC
Description of problem:
Metrics endpoint for monitor-sdn is not using TLS to encrypt traffic.

Version-Release number of selected component (if applicable):
4.4 (possibly also earlier versions)

How reproducible:
Always

Steps to Reproduce:
1. Start a cluster
2. Go to prometheus UI
3. Check connection schema for this component

Actual results:
Metrics are exposed over HTTP connection

Expected results:
Metrics are exposed over HTTPS connection

Additional info:
API server operator ServiceMonitor definition can be used as a template on how to fix this issue: https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/manifests/0000_90_openshift-apiserver-operator_03_servicemonitor.yaml

Comment 1 Ben Bennett 2020-03-04 13:59:45 UTC
Can you check this please?  Based on what we saw on the same bug filed against the router, this may be a spurious report.

Comment 2 Pawel Krupa 2020-03-04 16:55:32 UTC
To be precise, this bug is about openshift-sdn/monitor-sdn component.

Comment 3 Weibin Liang 2020-03-04 19:28:34 UTC
Created attachment 1667571 [details]
Screenshot for openshift-sdn/monitor-sdn component

Comment 4 Weibin Liang 2020-03-04 19:33:53 UTC
Reproduce the issue in 4.4.0-0.nightly-2020-03-02-201804, openshift-sdn/monitor-sdn component is using http not https to expose metrics such as http://10.0.129.14:9101/metrics

Comment 5 Weibin Liang 2020-03-05 13:40:18 UTC
Hi zhanqi, 

Do we need write a test case for this kind of UI testing?

Thanks,
Weibin

Comment 6 Ben Bennett 2020-03-05 15:38:55 UTC
Jake can you take this please.  Ricky has a lot on his plate.

Comment 7 Juan Luis de Sousa-Valadas 2020-03-05 19:17:07 UTC
Ben, I'm moving this to 4.5.

This is *not* a regression, and IMHO not anywhere close to being the worst of our problems, after all if you are having a MiTM here, having false kube-proxy metrics is the smallest of your problems.
There's a fair amount of work here, CNO is creating all the ServiceMonitors over plain HTTP, but the main issue here is that the servers themselves are hardcoded for http

Regarding openshift-sdn, which probably our biggest concern at the moment, the metrics server comes from vendored kubernetes code which is hardcoded hardcoded to use plain HTTP. The only way to get this merged on 4.4.0 on time is to make a temporary a patch in our local branch while we get this merged upstream:
https://github.com/kubernetes/kubernetes/blob/release-1.18/cmd/kube-proxy/app/server.go#L61

Regarding multus, it also has the HTTP hardcoded:
https://github.com/openshift/multus-admission-controller/blob/1dfbb0a/cmd/webhook/main.go#L147

Regarding OVN, also has HTTP hardcoded:
https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/util/metrics.go#L29

Don't get me wrong, I think this is important, but I wonder if this shouldn't be an RFE and priotized as such instead of a bug. I would like to hear your thoughts regarding the priorization of this.

Comment 8 Pawel Krupa 2020-03-31 12:09:04 UTC
You can always expose metrics on localhost and put a sidecar proxy container with kube-rbac-proxy which would do TLS termination. This is a common pattern for services exposing metrics on plain http endpoints. Example: https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/node-exporter/daemonset.yaml#L55-L81

Comment 9 Pawel Krupa 2020-04-06 11:42:04 UTC
After fixing please remove your component from an exclusion list in e2e tests at https://github.com/openshift/origin/blob/master/test/extended/prometheus/prometheus.go#L253-L268

Comment 11 Juan Luis de Sousa-Valadas 2020-06-18 14:27:40 UTC
Can be merged already but we're deliberately delaying this to 4.6

Comment 14 zhaozhanqi 2020-07-28 06:58:02 UTC
Verified this bug on 4.6.0-0.nightly-2020-07-25-091217

Comment 15 zhaozhanqi 2020-07-28 06:59:01 UTC
Created attachment 1702610 [details]
4.6 version

Comment 17 errata-xmlrpc 2020-10-27 15:56:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.