Bug 2066605

Summary: coredns template block matches cluster API to loose
Product: OpenShift Container Platform Reporter: Bram Verschueren <bverschu>
Component: Machine Config OperatorAssignee: Bram Verschueren <bverschu>
Machine Config Operator sub component: platform-baremetal QA Contact: Rio Liu <rioliu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, fsoppels, kgarriso, mkrejci, mmorgill, rioliu, tsedovic
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:55:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 2076493    

Description Bram Verschueren 2022-03-22 07:25:37 UTC
Description of problem:
In on-prem installations the regex used to match the cluster API in a node's Corefile is too wide.
Any FQDN matching ".*api.<basedomain>" is resolved by coredns' template plugin [1].

[1] https://coredns.io/plugins/template/

Version-Release number of MCO (Machine Config Operator) (if applicable):

$ oc get co machine-config 
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.11.0-0.nightly-2022-03-18-065017   True        False         False      34m    

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
(Y/N/Not sure): Y

How reproducible:

Did you catch this issue by running a Jenkins job? If yes, please list:
1. Jenkins job: N/A

2. Profile: N/A

Steps to Reproduce:
1. get cluster's API address
$ oc whoami --show-server

2. resolve any host matching '.*api.<basedomain>':

(using non-existing 'myapi.mycluster.tld')

$ oc run -ti --image=registry.redhat.io/openshift4/network-tools-rhel8 test -- /bin/bash
If you don't see a command prompt, try pressing enter.
[root@test /]# nslookup myapi.mycluster.tld

Name:   myapi.mycluster.tld

[root@test /]# nslookup api.mycluster.tld

Name:   api.mycluster.tld

[root@test /]# nslookup my.sub.api.mycluster.tld

Name:   my.sub.api.mycluster.tld

Actual results:
Any '.*api.<basedomain>' FQDN is resolved by coredns' template plugin.

Expected results:
Only exact '^api.<basedomain>' FQDN's should be resolved from a template block.

Additional info:

The regex used in the Corefile template block's match field is too wide:

$ oc debug node/mycluster-wxt6k-worker-0-g5965 -- grep api -B1 -A2 /host/etc/coredns/Corefile 
Starting pod/mycluster-wxt6k-worker-0-g5965-debug ...
To use host binaries, run `chroot /host`
    template IN A mycluster.tld {
        match api.mycluster.tld
        answer "{{ .Name }} 60 in {{ .Type }}"
    template IN AAAA mycluster.tld {
        match api.mycluster.tld
    template IN A mycluster.tld {
        match api-int.mycluster.tld
        answer "{{ .Name }} 60 in {{ .Type }}"
    template IN AAAA mycluster.tld {
        match api-int.mycluster.tld

Comment 8 errata-xmlrpc 2022-08-10 10:55:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.