Description of problem: create a custom ingresscontroller with invalid domain, the dnsrecords for the invalid domain still can be published to Azure DNS. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-01-21-172657 How reproducible: 100% Steps to Reproduce: 1. create a custom ingresscontroller with invalid domain, e.g kind: IngressController apiVersion: operator.openshift.io/v1 metadata: name: lb-ext namespace: openshift-ingress-operator spec: defaultCertificate: name: router-certs-default domain: lb-ext.example.com replicas: 1 2. 3. Actual results: $ oc -n openshift-ingress-operator get dnsrecords/lb-ext-wildcard -oyaml spec: dnsName: '*.lb-ext.example.com.' recordTTL: 30 recordType: A targets: - 52.162.218.201 status: observedGeneration: 1 zones: - conditions: - lastTransitionTime: "2021-01-22T08:08:52Z" message: The DNS provider succeeded in ensuring the record reason: ProviderSuccess status: "False" type: Failed dnsZone: id: /subscriptions/xxxx/resourceGroups/hongli-pl657-4284r-rg/providers/Microsoft.Network/privateDnsZones/hongli-pl657.qe.azure.devcluster.openshift.com - conditions: - lastTransitionTime: "2021-01-22T08:08:53Z" message: The DNS provider succeeded in ensuring the record reason: ProviderSuccess status: "False" type: Failed dnsZone: id: /subscriptions/xxxx/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/qe.azure.devcluster.openshift.com Expected results: the dnsName '*.lb-ext.example.com.' should not be provioned to the dnsZone. Additional info: no issue on AWS and GCP
I was able to reproduce this on an azure cluster launched from cluster-bot (ci). I believe I have identified the code defect that allows for this. Will push a change and open a PR soon.
Hello Stephen, I got below logs and it is as expected per https://github.com/openshift/cluster-ingress-operator/pull/537 2021-04-19T07:51:40.173Z INFO operator.dns azure/dns.go:78 domain is not a subdomain of zone. The DNS provider may still succeed in updating the record, which might be unexpected {"domain": "*.lb-ext.example.com.", "zone": "hongli-iaz.qe.azure.devcluster.openshift.com"} 2021-04-19T07:51:41.041Z INFO operator.dns azure/dns.go:78 domain is not a subdomain of zone. The DNS provider may still succeed in updating the record, which might be unexpected {"domain": "*.lb-ext.example.com.", "zone": "qe.azure.devcluster.openshift.com"} but actually my expectation is to make it failed to provision, just like the behaviour on AWS, e.g. $ oc -n openshift-ingress-operator get dnsrecords lb-ext-wildcard -oyaml <---snip---> status: observedGeneration: 1 zones: - conditions: - lastTransitionTime: "2021-04-19T09:46:40Z" message: "The DNS provider failed to ensure the record: failed to update alias in zone Z0143xxxxx: couldn't update DNS record in zone Z0143xxxxx: InvalidChangeBatch: [RRSet with DNS name \\052.lb-ext.example.com. is not permitted in zone hongli-aw.qe.devcluster.openshift.com.]\n\tstatus code: 400, request id: af32c104-53a3-46f6-9d6f-3fa7dbb0735a" reason: ProviderError status: "True" type: Failed dnsZone: tags: Name: hongli-aw-pgzgb-int kubernetes.io/cluster/hongli-aw-pgzgb: owned - conditions: - lastTransitionTime: "2021-04-19T09:46:40Z" message: "The DNS provider failed to ensure the record: failed to update alias in zone Z3B3Kxxxxx: couldn't update DNS record in zone Z3B3Kxxxxx: InvalidChangeBatch: [RRSet with DNS name \\052.lb-ext.example.com. is not permitted in zone qe.devcluster.openshift.com.]\n\tstatus code: 400, request id: 252ecdcc-7891-47d9-bf0e-00f37bb8be42" reason: ProviderError status: "True" type: Failed dnsZone: id: Z3B3Kxxxxx $ oc -n openshift-ingress-operator get ingresscontroller/lb-ext -oyaml <---snip---> - lastTransitionTime: "2021-04-19T09:45:09Z" message: 'The record failed to provision in some zones: [{ map[Name:hongli-aw-pgzgb-int kubernetes.io/cluster/hongli-aw-pgzgb:owned]} {Z3B3Kxxxxx map[]}]' reason: FailedZones status: "False" type: DNSReady I'm not sure if this can be implemented on Azure, could you please help confirm?
(In reply to Hongan Li from comment #6) > I'm not sure if this can be implemented on Azure, could you please help > confirm? We decided that we do not want to break any customers who might be relying on the current Azure DNS publishing behavior. I spoke with my team and we decided that logging in this case is sufficient. Unfortunately it may be too late to align the behavior on Azure with the behavior on AWS, etc, since don't want to break any customers on upgrade by restricting Azure DNS capabilities for an existing ingress controller. Does that sound reasonable to you @Hongan?
Thank you for your detailed explanation, Stephen. It makes sense.
moving to verified per comment #6 and #7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438