Description of problem: My test environment is 4.6.13, but 4.x also has this issue. It found that the metrics of 'coredns_dns_*' such as 'coredns_dns_request_count_total' don't included the custom zone metrics data, due to CoreDNS prometheus plugin is not defined in custom zone block. The documentation about 'prometheus enables Prometheus metrics'. https://coredns.io/plugins/metrics/ It mentioned that 'This plugin can only be used once per Server Block'. Currently, the coredns configuration is defined the prometheus plugin in '.' zone, but others zone is not defined. OpenShift release version: 4.x Cluster Platform: all How reproducible: Please see the details. Steps to Reproduce (in detail): 1. Configure the DNS forwarding for the custom zones. https://docs.openshift.com/container-platform/4.9/networking/dns-operator.html#nw-dns-forward_dns-operator # oc edit dns.operator/default apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: foo-server zones: - foo.com forwardPlugin: upstreams: - 1.1.1.1 - 2.2.2.2:5353 - name: bar-server zones: - bar.com - example.com forwardPlugin: upstreams: - 3.3.3.3 - 4.4.4.4:5454 2. Go to the OAuth pods and do the dns query for the custom forwwarding zone. # oc -n openshift-authentication rsh oauth-openshift-xxxxxxxxxx $ curl example.com $ curl foo.com 3. Open the Prometheus WebConsole and do the PromQL sum by (zone) (coredns_dns_request_count_total) Element Value {zone="."} 804685 Actual results: It is only '.' zone metrics. PromQL-> sum by (zone) (coredns_dns_request_count_total) Element Value {zone="."} 804685 # oc -n openshift-dns edit cm dns-default Corefile: | # foo-server foo.com:5353 { forward . 1.1.1.1 2.2.2.2:5353 } # bar-server bar.com:5353 example.com:5353 { forward . 3.3.3.3 4.4.4.4:5454 } .:5353 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { policy sequential } cache 30 reload } Expected results: All zone metrics should be included. PromQL-> sum by (zone) (coredns_dns_request_count_total) Element Value {zone="."} 804685 {zone="example.com."} 4644 {zone="foo.com."} 10 # oc -n openshift-dns edit cm dns-default Corefile: | # foo-server foo.com:5353 { forward . 1.1.1.1 2.2.2.2:5353 prometheus :9153 <- prometheus plugin should be defined for this zone } # bar-server bar.com:5353 example.com:5353 { forward . 3.3.3.3 4.4.4.4:5454 prometheus :9153 <- prometheus plugin should be defined for this zone } .:5353 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { policy sequential } cache 30 reload } Impact of the problem: The customer cannot get the custom zone metrics for the data analysis and the troubleshooting. Additional info: This issue can be verified via coredns operator unmanaged and modifying the coredns configuration. To set up to unmanaged status for the dns operator, the spec of the clusterversion should be changed as below. # oc edit clusterversion Append the overrides args for the coredns operator. spec: overrides: - group: apps/v1 kind: Deployment name: dns-operator namespace: openshift-dns-operator unmanaged: true Change the replicas to 0 for the deployment of the dns-operator. # oc -n openshift-dns-operator scale --replicas=0 deployment/dns-operator Check the deployment of the dns-operator is 0. # oc -n openshift-dns-operator get deployment NAME READY UP-TO-DATE AVAILABLE AGE dns-operator 0/0 0 0 15d Modify the default CoreDNS configuration. # oc -n openshift-dns edit cm dns-default Corefile: | # foo-server foo.com:5353 { forward . 1.1.1.1 2.2.2.2:5353 prometheus :9153 <- Add this line, prometheus plugin will collect the metrics for this zone } # bar-server bar.com:5353 example.com:5353 { forward . 3.3.3.3 4.4.4.4:5454 prometheus :9153 <- Add this line, prometheus plugin will collect the metrics for this zone } .:5353 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf { policy sequential } cache 30 reload } ** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.
Setting blocker- because this does not represent a regression or security issue. However, we will work on this promptly to ensure we report full metrics for CoreDNS.
Verified it with 4.10.0-0.nightly-2021-11-14-184249 and passed 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-11-14-184249 True False 81m Cluster version is 4.10.0-0.nightly-2021-11-14-184249 % 2. Configure the DNS forwarding for the two custom zones %oc edit dns.operator/default apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: foo-server zones: - foo1.com forwardPlugin: upstreams: - 1.1.1.1 - 2.2.2.2:5353 - name: bar-server zones: - bar2.com - example.com forwardPlugin: upstreams: - 3.3.3.3 - 4.4.4.4:5454 3. check cm dns-default, prometheus plugin is added by default % oc -n openshift-dns get cm dns-default -o yaml apiVersion: v1 data: Corefile: | # foo-server foo1.com:5353 { prometheus 127.0.0.1:9153 <--- forward . 1.1.1.1 2.2.2.2:5353 { policy random } errors bufsize 512 cache 900 { denial 9984 30 } } # bar-server bar2.com:5353 example.com:5353 { prometheus 127.0.0.1:9153 <--- forward . 3.3.3.3 4.4.4.4:5454 { policy random } errors bufsize 512 cache 900 { denial 9984 30 } } .:5353 { bufsize 512 errors health { lameduck 20s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } prometheus 127.0.0.1:9153 ... 4. Send traffic to foo1.com and bar2.com in an oauth-openshift-xxx pod as the bug described 5. Check the coredns_dns metrics in the Prometheus WebConsole and doing the PromQL: sum by (zone) (coredns_dns_requests_total) zone value bar2.com. 4 foo1.com. 4 . 38273
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056