Description of problem: After recreating the LoadBalancer service, it will get a new External-IP and the targets in dnsrecords is updated as well, but the Google Cloud DNS still keep the old one. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-11-12-200927 How reproducible: always Steps to Reproduce: 1. # oc -n openshift-ingress get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.42.89 23.251.151.137 80:30006/TCP,443:30384/TCP 48m 2. # oc -n openshift-ingress delete svc/router-default service "router-default" deleted 3. # oc -n openshift-ingress get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.20.193 35.192.182.31 80:31335/TCP,443:31801/TCP 65s Actual results: Checked Google Cloud DNS console and it still shows the old IP and dig/nslookup also can confirm: # dig +short a.apps.hongli-gcp47.qe.gcp.devcluster.openshift.com 23.251.151.137 # oc -n openshift-ingress-operator get dnsrecords -oyaml <---sni---> spec: dnsName: '*.apps.hongli-gcp47.qe.gcp.devcluster.openshift.com.' recordTTL: 30 recordType: A targets: - 35.192.182.31 status: observedGeneration: 1 zones: - conditions: - lastTransitionTime: "2020-11-17T01:20:25Z" message: The DNS provider succeeded in ensuring the record reason: ProviderSuccess status: "False" type: Failed dnsZone: id: hongli-gcp47-ct5pr-private-zone - conditions: - lastTransitionTime: "2020-11-17T01:20:25Z" message: The DNS provider succeeded in ensuring the record reason: ProviderSuccess status: "False" type: Failed dnsZone: id: qe kind: List metadata: resourceVersion: "" selfLink: "" Expected results: The targets should be updated in Google Cloud DNS. Additional info: Tested on AWS/Azure and it works well.
This is a known shortcoming in the GCP DNS provider: https://github.com/openshift/cluster-ingress-operator/blob/c00aa8159a782e94f2169bec29f0d1495bd965b5/pkg/dns/gcp/provider.go#L53 As a workaround, if you delete the DNSRecord object, the operator should delete the old record, create a new DNSRecord object, and publish the new record. In general, this issue should not arise unless the administrator explicitly deletes the LoadBalancer service (as documented in the steps to reproduce). Given that the steps to reproduce include deliberate sabotage and that there is a workaround, this is a low-severity issue.
@misalunk maybe this is a duplicate of BUG #1899435. If it is then let's keep the high priority bug open.
WIP https://github.com/openshift/cluster-ingress-operator/pull/500
*** Bug 1899435 has been marked as a duplicate of this bug. ***
https://github.com/openshift/cluster-ingress-operator/pull/500 seems to be working fine. [miheer@miheer cluster-ingress-operator]$ export KUBECONFIG=/home/miheer/Downloads/cluster-bot-2020-12-15-101217.kubeconfig [miheer@miheer cluster-ingress-operator]$ oc whoami system:admin [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ oc get dnsrecord -n openshift-ingress-operator -o yaml apiVersion: v1 items: - apiVersion: ingress.operator.openshift.io/v1 kind: DNSRecord metadata: creationTimestamp: "2020-12-15T10:34:12Z" finalizers: - operator.openshift.io/ingress-dns generation: 1 labels: ingresscontroller.operator.openshift.io/owning-ingresscontroller: default managedFields: - apiVersion: ingress.operator.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: {} v:"operator.openshift.io/ingress-dns": {} f:labels: .: {} f:ingresscontroller.operator.openshift.io/owning-ingresscontroller: {} f:ownerReferences: .: {} k:{"uid":"108ecb74-d348-474f-a76a-8f3fc0869cc5"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: .: {} f:dnsName: {} f:recordTTL: {} f:recordType: {} f:targets: {} f:status: .: {} f:observedGeneration: {} f:zones: {} manager: ingress-operator operation: Update time: "2020-12-15T10:34:13Z" name: default-wildcard namespace: openshift-ingress-operator ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: IngressController name: default uid: 108ecb74-d348-474f-a76a-8f3fc0869cc5 resourceVersion: "17570" uid: 6e8e9139-452b-42bb-a07b-63039d9e6f46 spec: dnsName: '*.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com.' recordTTL: 30 recordType: A targets: - 34.75.69.247 status: observedGeneration: 1 zones: - conditions: - lastTransitionTime: "2020-12-15T10:34:12Z" message: The DNS provider succeeded in ensuring the record reason: ProviderSuccess status: "False" type: Failed dnsZone: id: ci-ln-y8spvzb-f76d1-2nq4q-private-zone - conditions: - lastTransitionTime: "2020-12-15T10:34:12Z" message: The DNS provider succeeded in ensuring the record reason: ProviderSuccess status: "False" type: Failed dnsZone: id: origin-ci-int-gce-new kind: List metadata: resourceVersion: "" selfLink: "" [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ dig +short .apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com. dig: '.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com.' is not a legal name (empty label) [miheer@miheer cluster-ingress-operator]$ dig +short .apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com dig: '.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com' is not a legal name (empty label) [miheer@miheer cluster-ingress-operator]$ dig +short *.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com 34.75.69.247 [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.109.148 34.75.69.247 80:32350/TCP,443:30397/TCP 17m router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 17m [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ [miheer@miheer cluster-ingress-operator]$ oc delete svc router-default -n openshift-ingress service "router-default" deleted [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 5s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 18m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 8s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 18m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 11s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 18m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 16s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 18s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 20s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 27s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 29s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 <pending> 80:30082/TCP,443:31301/TCP 31s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 34.74.201.134 80:30082/TCP,443:31301/TCP 45s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.65.104 34.74.201.134 80:30082/TCP,443:31301/TCP 52s router-internal-default ClusterIP 172.30.197.30 <none> 80/TCP,443/TCP,1936/TCP 19m [miheer@miheer cluster-ingress-operator]$
yes, tested with https://github.com/openshift/cluster-ingress-operator/pull/500 and passed # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.ci.test-2020-12-16-080558-ci-ln-nd3i04k True False 40m Cluster version is 4.7.0-0.ci.test-2020-12-16-080558-ci-ln-nd3i04k # oc -n openshift-ingress get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.226.221 35.227.105.206 80:31583/TCP,443:32306/TCP 19m router-internal-default ClusterIP 172.30.14.242 <none> 80/TCP,443/TCP,1936/TCP 48m # nslookup downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com Server: 10.11.5.19 Address: 10.11.5.19#53 Non-authoritative answer: Name: downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com Address: 35.227.105.206 ### change endpointPublishingStrategy.loadBalancer.scope to "Internal" # oc -n openshift-ingress get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.226.221 10.0.32.6 80:31583/TCP,443:32306/TCP 25m router-internal-default ClusterIP 172.30.14.242 <none> 80/TCP,443/TCP,1936/TCP 54m # nslookup downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com Server: 10.11.5.19 Address: 10.11.5.19#53 Non-authoritative answer: Name: downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com Address: 10.0.32.6 ### change back endpointPublishingStrategy.loadBalancer.scope to "External" # oc -n openshift-ingress get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.226.221 35.196.250.222 80:31583/TCP,443:32306/TCP 28m router-internal-default ClusterIP 172.30.14.242 <none> 80/TCP,443/TCP,1936/TCP 57m # nslookup downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com Server: 10.11.5.19 Address: 10.11.5.19#53 Non-authoritative answer: Name: downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com Address: 35.196.250.222 deleting the LB service also works well.
Bug https://bugzilla.redhat.com/show_bug.cgi?id=1914127 can be fixed in 4.8 as it is not that important to get fixed in 4.7 Hongan Li can you please test this PR again ? You will need to delete finalizers in the service before deleting the service.
please note: the test steps in Comment 7 are not valid now since https://bugzilla.redhat.com/show_bug.cgi?id=1906560 revert the mutable ingress load-balancer scope feature, so cannot verify this by changing endpointPublishingStrategy.loadBalancer.scope. Deleting service is still valid but please ensure removing the finalizers from the service when deleting the service.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633