1919151 – [Azure] dnsrecords with invalid domain should not be published to Azure dnsZone

Bug 1919151 - [Azure] dnsrecords with invalid domain should not be published to Azure dnsZone

Summary: [Azure] dnsrecords with invalid domain should not be published to Azure dnsZone

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Stephen Greene
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-22 09:26 UTC by Hongan Li
Modified:	2022-08-04 22:32 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 22:36:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 537	0	None	open	Bug 1919151: Azure: Don't publish records using an invalid domain	2021-02-04 16:03:56 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:36:56 UTC

Description Hongan Li 2021-01-22 09:26:37 UTC

Description of problem:
create a custom ingresscontroller with invalid domain, the dnsrecords for the invalid domain still can be published to Azure DNS.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-21-172657

How reproducible:
100%

Steps to Reproduce:
1. create a custom ingresscontroller with invalid domain, e.g
kind: IngressController
apiVersion: operator.openshift.io/v1
metadata:
  name: lb-ext
  namespace: openshift-ingress-operator
spec:
  defaultCertificate:
    name: router-certs-default
  domain: lb-ext.example.com
  replicas: 1

2. 
3.

Actual results:
$ oc -n openshift-ingress-operator get dnsrecords/lb-ext-wildcard -oyaml
spec:
  dnsName: '*.lb-ext.example.com.'
  recordTTL: 30
  recordType: A
  targets:
  - 52.162.218.201
status:
  observedGeneration: 1
  zones:
  - conditions:
    - lastTransitionTime: "2021-01-22T08:08:52Z"
      message: The DNS provider succeeded in ensuring the record
      reason: ProviderSuccess
      status: "False"
      type: Failed
    dnsZone:
      id: /subscriptions/xxxx/resourceGroups/hongli-pl657-4284r-rg/providers/Microsoft.Network/privateDnsZones/hongli-pl657.qe.azure.devcluster.openshift.com
  - conditions:
    - lastTransitionTime: "2021-01-22T08:08:53Z"
      message: The DNS provider succeeded in ensuring the record
      reason: ProviderSuccess
      status: "False"
      type: Failed
    dnsZone:
      id: /subscriptions/xxxx/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/qe.azure.devcluster.openshift.com


Expected results:
the dnsName '*.lb-ext.example.com.' should not be provioned to the dnsZone.

Additional info:
no issue on AWS and GCP

Comment 2 Stephen Greene 2021-01-22 20:52:33 UTC

I was able to reproduce this on an azure cluster launched from cluster-bot (ci). I believe I have identified the code defect that allows for this. Will push a change and open a PR soon.

Comment 6 Hongan Li 2021-04-19 10:07:01 UTC

Hello Stephen,

I got below logs and it is as expected per https://github.com/openshift/cluster-ingress-operator/pull/537

2021-04-19T07:51:40.173Z	INFO	operator.dns	azure/dns.go:78	domain is not a subdomain of zone. The DNS provider may still succeed in updating the record, which might be unexpected	{"domain": "*.lb-ext.example.com.", "zone": "hongli-iaz.qe.azure.devcluster.openshift.com"}
2021-04-19T07:51:41.041Z	INFO	operator.dns	azure/dns.go:78	domain is not a subdomain of zone. The DNS provider may still succeed in updating the record, which might be unexpected	{"domain": "*.lb-ext.example.com.", "zone": "qe.azure.devcluster.openshift.com"}

but actually my expectation is to make it failed to provision, just like the behaviour on AWS, e.g.

$ oc -n openshift-ingress-operator get dnsrecords lb-ext-wildcard -oyaml
<---snip--->
 status:
  observedGeneration: 1
  zones:
  - conditions:
    - lastTransitionTime: "2021-04-19T09:46:40Z"
      message: "The DNS provider failed to ensure the record: failed to update alias
        in zone Z0143xxxxx: couldn't update DNS record in zone Z0143xxxxx:
        InvalidChangeBatch: [RRSet with DNS name \\052.lb-ext.example.com. is not
        permitted in zone hongli-aw.qe.devcluster.openshift.com.]\n\tstatus code:
        400, request id: af32c104-53a3-46f6-9d6f-3fa7dbb0735a"
      reason: ProviderError
      status: "True"
      type: Failed
    dnsZone:
      tags:
        Name: hongli-aw-pgzgb-int
        kubernetes.io/cluster/hongli-aw-pgzgb: owned
  - conditions:
    - lastTransitionTime: "2021-04-19T09:46:40Z"
      message: "The DNS provider failed to ensure the record: failed to update alias
        in zone Z3B3Kxxxxx: couldn't update DNS record in zone Z3B3Kxxxxx:
        InvalidChangeBatch: [RRSet with DNS name \\052.lb-ext.example.com. is not
        permitted in zone qe.devcluster.openshift.com.]\n\tstatus code: 400, request
        id: 252ecdcc-7891-47d9-bf0e-00f37bb8be42"
      reason: ProviderError
      status: "True"
      type: Failed
    dnsZone:
      id: Z3B3Kxxxxx

$ oc -n openshift-ingress-operator get ingresscontroller/lb-ext -oyaml
<---snip--->
  - lastTransitionTime: "2021-04-19T09:45:09Z"
    message: 'The record failed to provision in some zones: [{ map[Name:hongli-aw-pgzgb-int
      kubernetes.io/cluster/hongli-aw-pgzgb:owned]} {Z3B3Kxxxxx map[]}]'
    reason: FailedZones
    status: "False"
    type: DNSReady

I'm not sure if this can be implemented on Azure, could you please help confirm?

Comment 7 Stephen Greene 2021-04-19 13:35:27 UTC

(In reply to Hongan Li from comment #6)
> I'm not sure if this can be implemented on Azure, could you please help
> confirm?

We decided that we do not want to break any customers who might be relying on the current Azure DNS publishing behavior. I spoke with my team and we decided that logging in this case is sufficient. Unfortunately it may be too late to align the behavior on Azure with the behavior on AWS, etc, since don't want to break any customers on upgrade by restricting Azure DNS capabilities for an existing ingress controller.

Does that sound reasonable to you @Hongan?

Comment 8 Hongan Li 2021-04-19 14:19:04 UTC

Thank you for your detailed explanation, Stephen. It makes sense.

Comment 9 Hongan Li 2021-04-19 14:21:49 UTC

moving to verified per comment #6 and #7

Comment 12 errata-xmlrpc 2021-07-27 22:36:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.