Bug 1809197 - CoreDNS Metrics exposed over insecure channel
Summary: CoreDNS Metrics exposed over insecure channel
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: DNS
Version: 4.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.5.0
Assignee: Stephen Greene
QA Contact: Hongan Li
Depends On:
TreeView+ depends on / blocked
Reported: 2020-03-02 15:00 UTC by Pawel Krupa
Modified: 2020-07-13 17:17 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: CoreDNS prometheus metrics integration was not set up properly. Consequence: CoreDNS metrics were being exposed over an insecure channel within a cluster. Fix: Add proper TLS components and a kube-rbac-proxy sidecar to secure the CoreDNS metrics endpoint. Result: CoreDNS metrics are exposed over a secure channel.
Clone Of:
Last Closed: 2020-07-13 17:17:25 UTC
Target Upstream Version:

Attachments (Terms of Use)
prometheus scrape targets - dns section (35.33 KB, image/png)
2020-03-03 16:48 UTC, Pawel Krupa
no flags Details

System ID Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 163 None closed Bug 1809197: Secure CoreDNS metrics 2020-07-13 14:38:43 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:17:44 UTC

Description Pawel Krupa 2020-03-02 15:00:53 UTC
Description of problem:
Metrics endpoint is not using TLS to encrypt traffic.

Version-Release number of selected component (if applicable):
4.4 (possibly also earlier versions)

How reproducible:

Steps to Reproduce:
1. Start a cluster
2. Go to prometheus UI
3. Check connection schema for this component

Actual results:
Metrics are exposed over HTTP connection

Expected results:
Metrics are exposed over HTTPS connection

Additional info:
API server operator ServiceMonitor definition can be used as a template on how to fix this issue: https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/manifests/0000_90_openshift-apiserver-operator_03_servicemonitor.yaml

Comment 1 Dan Mace 2020-03-03 13:57:09 UTC

Was this bug generated by some boilerplate process? It refers to "the component" and the reproducer steps seem totally non-specific to the DNS operator.

Comment 2 Pawel Krupa 2020-03-03 15:58:23 UTC
Sorry for not clarifying. This is about openshift-dns/dns-default component.

Comment 3 Dan Mace 2020-03-03 16:03:20 UTC
Yes, so am I — what exactly leads you to believe that metrics are served insecurely? The CoreDNS pods are exposing a TCP port 9153 serving a TLS endpoint secured by a serving signer service certificate which Prometheus is configured to use.

Comment 4 Pawel Krupa 2020-03-03 16:48:28 UTC
Created attachment 1667246 [details]
prometheus scrape targets - dns section

Based on prometheus scrape targets page all DNS endpoints are scraped over HTTP and not HTTPS, which is an insecure channel. Screenshot from 4.3 cluster is attached in this BZ.

Comment 5 Dan Mace 2020-03-03 18:46:23 UTC
Thanks, I see my confusion now — I was looking at the dns operator and not coreDNS itself, which indeed looks misconfigured somehow despite TLS config present throughout the relevant resources. Going to move to 4.5 for now unless someone can justify the blocker status (given this has probably been an issue since 4.1).

Comment 6 Pawel Krupa 2020-04-06 11:41:54 UTC
After fixing please remove your component from an exclusion list in e2e tests at https://github.com/openshift/origin/blob/master/test/extended/prometheus/prometheus.go#L253-L268

Comment 9 Hongan Li 2020-04-26 04:02:10 UTC
Verified with 4.5.0-0.nightly-2020-04-25-170442 and issue has been fixed.

$ oc -n openshift-dns get pod -owide
NAME                READY   STATUS    RESTARTS   AGE    IP           NODE                                     NOMINATED NODE   READINESS GATES
dns-default-wp7p6   3/3     Running   0          111m   hongli-pl442-mld8x-master-1              <none>           <none>

Go to Prometheus UI and check the targets as below:

Comment 10 Miciah Dashiel Butler Masters 2020-04-29 16:00:57 UTC
> After fixing please remove your component from an exclusion list in e2e tests

For the record, that was done with this PR: https://github.com/openshift/origin/pull/24904

Comment 12 errata-xmlrpc 2020-07-13 17:17:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.