Bug 1919151 - [Azure] dnsrecords with invalid domain should not be published to Azure dnsZone
Summary: [Azure] dnsrecords with invalid domain should not be published to Azure dnsZone
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.8.0
Assignee: Stephen Greene
QA Contact: Hongan Li
Depends On:
TreeView+ depends on / blocked
Reported: 2021-01-22 09:26 UTC by Hongan Li
Modified: 2022-08-04 22:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2021-07-27 22:36:39 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 537 0 None open Bug 1919151: Azure: Don't publish records using an invalid domain 2021-02-04 16:03:56 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:36:56 UTC

Description Hongan Li 2021-01-22 09:26:37 UTC
Description of problem:
create a custom ingresscontroller with invalid domain, the dnsrecords for the invalid domain still can be published to Azure DNS.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. create a custom ingresscontroller with invalid domain, e.g
kind: IngressController
apiVersion: operator.openshift.io/v1
  name: lb-ext
  namespace: openshift-ingress-operator
    name: router-certs-default
  domain: lb-ext.example.com
  replicas: 1


Actual results:
$ oc -n openshift-ingress-operator get dnsrecords/lb-ext-wildcard -oyaml
  dnsName: '*.lb-ext.example.com.'
  recordTTL: 30
  recordType: A
  observedGeneration: 1
  - conditions:
    - lastTransitionTime: "2021-01-22T08:08:52Z"
      message: The DNS provider succeeded in ensuring the record
      reason: ProviderSuccess
      status: "False"
      type: Failed
      id: /subscriptions/xxxx/resourceGroups/hongli-pl657-4284r-rg/providers/Microsoft.Network/privateDnsZones/hongli-pl657.qe.azure.devcluster.openshift.com
  - conditions:
    - lastTransitionTime: "2021-01-22T08:08:53Z"
      message: The DNS provider succeeded in ensuring the record
      reason: ProviderSuccess
      status: "False"
      type: Failed
      id: /subscriptions/xxxx/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/qe.azure.devcluster.openshift.com

Expected results:
the dnsName '*.lb-ext.example.com.' should not be provioned to the dnsZone.

Additional info:
no issue on AWS and GCP

Comment 2 Stephen Greene 2021-01-22 20:52:33 UTC
I was able to reproduce this on an azure cluster launched from cluster-bot (ci). I believe I have identified the code defect that allows for this. Will push a change and open a PR soon.

Comment 6 Hongan Li 2021-04-19 10:07:01 UTC
Hello Stephen,

I got below logs and it is as expected per https://github.com/openshift/cluster-ingress-operator/pull/537

2021-04-19T07:51:40.173Z	INFO	operator.dns	azure/dns.go:78	domain is not a subdomain of zone. The DNS provider may still succeed in updating the record, which might be unexpected	{"domain": "*.lb-ext.example.com.", "zone": "hongli-iaz.qe.azure.devcluster.openshift.com"}
2021-04-19T07:51:41.041Z	INFO	operator.dns	azure/dns.go:78	domain is not a subdomain of zone. The DNS provider may still succeed in updating the record, which might be unexpected	{"domain": "*.lb-ext.example.com.", "zone": "qe.azure.devcluster.openshift.com"}

but actually my expectation is to make it failed to provision, just like the behaviour on AWS, e.g.

$ oc -n openshift-ingress-operator get dnsrecords lb-ext-wildcard -oyaml
  observedGeneration: 1
  - conditions:
    - lastTransitionTime: "2021-04-19T09:46:40Z"
      message: "The DNS provider failed to ensure the record: failed to update alias
        in zone Z0143xxxxx: couldn't update DNS record in zone Z0143xxxxx:
        InvalidChangeBatch: [RRSet with DNS name \\052.lb-ext.example.com. is not
        permitted in zone hongli-aw.qe.devcluster.openshift.com.]\n\tstatus code:
        400, request id: af32c104-53a3-46f6-9d6f-3fa7dbb0735a"
      reason: ProviderError
      status: "True"
      type: Failed
        Name: hongli-aw-pgzgb-int
        kubernetes.io/cluster/hongli-aw-pgzgb: owned
  - conditions:
    - lastTransitionTime: "2021-04-19T09:46:40Z"
      message: "The DNS provider failed to ensure the record: failed to update alias
        in zone Z3B3Kxxxxx: couldn't update DNS record in zone Z3B3Kxxxxx:
        InvalidChangeBatch: [RRSet with DNS name \\052.lb-ext.example.com. is not
        permitted in zone qe.devcluster.openshift.com.]\n\tstatus code: 400, request
        id: 252ecdcc-7891-47d9-bf0e-00f37bb8be42"
      reason: ProviderError
      status: "True"
      type: Failed
      id: Z3B3Kxxxxx

$ oc -n openshift-ingress-operator get ingresscontroller/lb-ext -oyaml
  - lastTransitionTime: "2021-04-19T09:45:09Z"
    message: 'The record failed to provision in some zones: [{ map[Name:hongli-aw-pgzgb-int
      kubernetes.io/cluster/hongli-aw-pgzgb:owned]} {Z3B3Kxxxxx map[]}]'
    reason: FailedZones
    status: "False"
    type: DNSReady

I'm not sure if this can be implemented on Azure, could you please help confirm?

Comment 7 Stephen Greene 2021-04-19 13:35:27 UTC
(In reply to Hongan Li from comment #6)
> I'm not sure if this can be implemented on Azure, could you please help
> confirm?

We decided that we do not want to break any customers who might be relying on the current Azure DNS publishing behavior. I spoke with my team and we decided that logging in this case is sufficient. Unfortunately it may be too late to align the behavior on Azure with the behavior on AWS, etc, since don't want to break any customers on upgrade by restricting Azure DNS capabilities for an existing ingress controller.

Does that sound reasonable to you @Hongan?

Comment 8 Hongan Li 2021-04-19 14:19:04 UTC
Thank you for your detailed explanation, Stephen. It makes sense.

Comment 9 Hongan Li 2021-04-19 14:21:49 UTC
moving to verified per comment #6 and #7

Comment 12 errata-xmlrpc 2021-07-27 22:36:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.