1809205 – Metrics exposed over insecure channel

Bug 1809205 - Metrics exposed over insecure channel

Summary: Metrics exposed over insecure channel

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Juan Luis de Sousa-Valadas
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:	SDN-CI-IMPACT
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-02 15:09 UTC by Pawel Krupa
Modified:	2020-10-27 15:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	Feature: Expose SDN metrics over TLS Reason: We don't any traffic over plain HTTP at all. Result: OVN metrics exposed on an SDN endpoint.
Clone Of:
Environment:
Last Closed:	2020-10-27 15:56:17 UTC
Target Upstream Version:
Embargoed:
Flags:	jdesousa: needinfo-

Attachments	(Terms of Use)
Screenshot for openshift-sdn/monitor-sdn component (142.96 KB, image/png) 2020-03-04 19:28 UTC, Weibin Liang	no flags	Details
4.6 version (44.74 KB, image/png) 2020-07-28 06:59 UTC, zhaozhanqi	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 574	0	None	closed	Bug 1809205: Configure TLS for OpenShift SDN metrics endpoint	2020-09-21 08:40:31 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 15:56:20 UTC

Description Pawel Krupa 2020-03-02 15:09:44 UTC

Description of problem:
Metrics endpoint for monitor-sdn is not using TLS to encrypt traffic.

Version-Release number of selected component (if applicable):
4.4 (possibly also earlier versions)

How reproducible:
Always

Steps to Reproduce:
1. Start a cluster
2. Go to prometheus UI
3. Check connection schema for this component

Actual results:
Metrics are exposed over HTTP connection

Expected results:
Metrics are exposed over HTTPS connection

Additional info:
API server operator ServiceMonitor definition can be used as a template on how to fix this issue: https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/manifests/0000_90_openshift-apiserver-operator_03_servicemonitor.yaml

Comment 1 Ben Bennett 2020-03-04 13:59:45 UTC

Can you check this please?  Based on what we saw on the same bug filed against the router, this may be a spurious report.

Comment 2 Pawel Krupa 2020-03-04 16:55:32 UTC

To be precise, this bug is about openshift-sdn/monitor-sdn component.

Comment 3 Weibin Liang 2020-03-04 19:28:34 UTC

Created attachment 1667571 [details]
Screenshot for openshift-sdn/monitor-sdn component

Comment 4 Weibin Liang 2020-03-04 19:33:53 UTC

Reproduce the issue in 4.4.0-0.nightly-2020-03-02-201804, openshift-sdn/monitor-sdn component is using http not https to expose metrics such as http://10.0.129.14:9101/metrics

Comment 5 Weibin Liang 2020-03-05 13:40:18 UTC

Hi zhanqi, 

Do we need write a test case for this kind of UI testing?

Thanks,
Weibin

Comment 6 Ben Bennett 2020-03-05 15:38:55 UTC

Jake can you take this please.  Ricky has a lot on his plate.

Comment 7 Juan Luis de Sousa-Valadas 2020-03-05 19:17:07 UTC

Ben, I'm moving this to 4.5.

This is *not* a regression, and IMHO not anywhere close to being the worst of our problems, after all if you are having a MiTM here, having false kube-proxy metrics is the smallest of your problems.
There's a fair amount of work here, CNO is creating all the ServiceMonitors over plain HTTP, but the main issue here is that the servers themselves are hardcoded for http

Regarding openshift-sdn, which probably our biggest concern at the moment, the metrics server comes from vendored kubernetes code which is hardcoded hardcoded to use plain HTTP. The only way to get this merged on 4.4.0 on time is to make a temporary a patch in our local branch while we get this merged upstream:
https://github.com/kubernetes/kubernetes/blob/release-1.18/cmd/kube-proxy/app/server.go#L61

Regarding multus, it also has the HTTP hardcoded:
https://github.com/openshift/multus-admission-controller/blob/1dfbb0a/cmd/webhook/main.go#L147

Regarding OVN, also has HTTP hardcoded:
https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/util/metrics.go#L29

Don't get me wrong, I think this is important, but I wonder if this shouldn't be an RFE and priotized as such instead of a bug. I would like to hear your thoughts regarding the priorization of this.

Comment 8 Pawel Krupa 2020-03-31 12:09:04 UTC

You can always expose metrics on localhost and put a sidecar proxy container with kube-rbac-proxy which would do TLS termination. This is a common pattern for services exposing metrics on plain http endpoints. Example: https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/node-exporter/daemonset.yaml#L55-L81

Comment 9 Pawel Krupa 2020-04-06 11:42:04 UTC

After fixing please remove your component from an exclusion list in e2e tests at https://github.com/openshift/origin/blob/master/test/extended/prometheus/prometheus.go#L253-L268

Comment 11 Juan Luis de Sousa-Valadas 2020-06-18 14:27:40 UTC

Can be merged already but we're deliberately delaying this to 4.6

Comment 14 zhaozhanqi 2020-07-28 06:58:02 UTC

Verified this bug on 4.6.0-0.nightly-2020-07-25-091217

Comment 15 zhaozhanqi 2020-07-28 06:59:01 UTC

Created attachment 1702610 [details]
4.6 version

Comment 17 errata-xmlrpc 2020-10-27 15:56:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.