Bug 1898417 - GCP: the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service
Summary: GCP: the dns targets in Google Cloud DNS is not updated after recreating load...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Miheer Salunke
QA Contact: Hongan Li
URL:
Whiteboard:
: 1899435 (view as bug list)
Depends On: 1914127
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-17 02:39 UTC by Hongan Li
Modified: 2022-08-04 22:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:33:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 500 0 None open Bug 1898417: GCP the dns targets in Google Cloud DNS is not updated after recreating loadbalancer service 2021-01-12 01:27:56 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:34:06 UTC

Description Hongan Li 2020-11-17 02:39:52 UTC
Description of problem:
After recreating the LoadBalancer service, it will get a new External-IP and the targets in dnsrecords is updated as well, but the Google Cloud DNS still keep the old one.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-11-12-200927

How reproducible:
always

Steps to Reproduce:
1. # oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
router-default            LoadBalancer   172.30.42.89     23.251.151.137   80:30006/TCP,443:30384/TCP   48m

2. # oc -n openshift-ingress delete svc/router-default
service "router-default" deleted

3. # oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.20.193    35.192.182.31   80:31335/TCP,443:31801/TCP   65s


Actual results:
Checked Google Cloud DNS console and it still shows the old IP and dig/nslookup also can confirm:

# dig +short a.apps.hongli-gcp47.qe.gcp.devcluster.openshift.com
23.251.151.137


# oc -n openshift-ingress-operator get dnsrecords -oyaml
<---sni--->
  spec:
    dnsName: '*.apps.hongli-gcp47.qe.gcp.devcluster.openshift.com.'
    recordTTL: 30
    recordType: A
    targets:
    - 35.192.182.31
  status:
    observedGeneration: 1
    zones:
    - conditions:
      - lastTransitionTime: "2020-11-17T01:20:25Z"
        message: The DNS provider succeeded in ensuring the record
        reason: ProviderSuccess
        status: "False"
        type: Failed
      dnsZone:
        id: hongli-gcp47-ct5pr-private-zone
    - conditions:
      - lastTransitionTime: "2020-11-17T01:20:25Z"
        message: The DNS provider succeeded in ensuring the record
        reason: ProviderSuccess
        status: "False"
        type: Failed
      dnsZone:
        id: qe
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Expected results:
The targets should be updated in Google Cloud DNS.

Additional info:
Tested on AWS/Azure and it works well.

Comment 2 Miciah Dashiel Butler Masters 2020-11-17 17:20:54 UTC
This is a known shortcoming in the GCP DNS provider:  https://github.com/openshift/cluster-ingress-operator/blob/c00aa8159a782e94f2169bec29f0d1495bd965b5/pkg/dns/gcp/provider.go#L53

As a workaround, if you delete the DNSRecord object, the operator should delete the old record, create a new DNSRecord object, and publish the new record.

In general, this issue should not arise unless the administrator explicitly deletes the LoadBalancer service (as documented in the steps to reproduce).  Given that the steps to reproduce include deliberate sabotage and that there is a workaround, this is a low-severity issue.

Comment 3 Andrew McDermott 2020-11-24 17:40:46 UTC
@misalunk maybe this is a duplicate of BUG #1899435. If it is then let's keep the high priority bug open.

Comment 5 Miheer Salunke 2020-12-08 10:34:12 UTC
*** Bug 1899435 has been marked as a duplicate of this bug. ***

Comment 6 Miheer Salunke 2020-12-15 10:55:03 UTC
https://github.com/openshift/cluster-ingress-operator/pull/500  seems to be working fine.


[miheer@miheer cluster-ingress-operator]$ export KUBECONFIG=/home/miheer/Downloads/cluster-bot-2020-12-15-101217.kubeconfig
[miheer@miheer cluster-ingress-operator]$ oc whoami
system:admin
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ oc get dnsrecord -n openshift-ingress-operator -o yaml
apiVersion: v1
items:
- apiVersion: ingress.operator.openshift.io/v1
  kind: DNSRecord
  metadata:
    creationTimestamp: "2020-12-15T10:34:12Z"
    finalizers:
    - operator.openshift.io/ingress-dns
    generation: 1
    labels:
      ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
    managedFields:
    - apiVersion: ingress.operator.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"operator.openshift.io/ingress-dns": {}
          f:labels:
            .: {}
            f:ingresscontroller.operator.openshift.io/owning-ingresscontroller: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"108ecb74-d348-474f-a76a-8f3fc0869cc5"}:
              .: {}
              f:apiVersion: {}
              f:blockOwnerDeletion: {}
              f:controller: {}
              f:kind: {}
              f:name: {}
              f:uid: {}
        f:spec:
          .: {}
          f:dnsName: {}
          f:recordTTL: {}
          f:recordType: {}
          f:targets: {}
        f:status:
          .: {}
          f:observedGeneration: {}
          f:zones: {}
      manager: ingress-operator
      operation: Update
      time: "2020-12-15T10:34:13Z"
    name: default-wildcard
    namespace: openshift-ingress-operator
    ownerReferences:
    - apiVersion: operator.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: IngressController
      name: default
      uid: 108ecb74-d348-474f-a76a-8f3fc0869cc5
    resourceVersion: "17570"
    uid: 6e8e9139-452b-42bb-a07b-63039d9e6f46
  spec:
    dnsName: '*.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com.'
    recordTTL: 30
    recordType: A
    targets:
    - 34.75.69.247
  status:
    observedGeneration: 1
    zones:
    - conditions:
      - lastTransitionTime: "2020-12-15T10:34:12Z"
        message: The DNS provider succeeded in ensuring the record
        reason: ProviderSuccess
        status: "False"
        type: Failed
      dnsZone:
        id: ci-ln-y8spvzb-f76d1-2nq4q-private-zone
    - conditions:
      - lastTransitionTime: "2020-12-15T10:34:12Z"
        message: The DNS provider succeeded in ensuring the record
        reason: ProviderSuccess
        status: "False"
        type: Failed
      dnsZone:
        id: origin-ci-int-gce-new
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ dig +short .apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com.
dig: '.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com.' is not a legal name (empty label)
[miheer@miheer cluster-ingress-operator]$ dig +short .apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com
dig: '.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com' is not a legal name (empty label)
[miheer@miheer cluster-ingress-operator]$ dig +short *.apps.ci-ln-y8spvzb-f76d1.origin-ci-int-gce.dev.openshift.com
34.75.69.247
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                      AGE
router-default            LoadBalancer   172.30.109.148   34.75.69.247   80:32350/TCP,443:30397/TCP   17m
router-internal-default   ClusterIP      172.30.197.30    <none>         80/TCP,443/TCP,1936/TCP      17m
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ 
[miheer@miheer cluster-ingress-operator]$ oc delete svc router-default -n openshift-ingress
service "router-default" deleted
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   5s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      18m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   8s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      18m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   11s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      18m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   16s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   18s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   20s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   27s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   29s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   <pending>     80:30082/TCP,443:31301/TCP   31s
router-internal-default   ClusterIP      172.30.197.30   <none>        80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   34.74.201.134   80:30082/TCP,443:31301/TCP   45s
router-internal-default   ClusterIP      172.30.197.30   <none>          80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$ oc get svc -n openshift-ingress
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
router-default            LoadBalancer   172.30.65.104   34.74.201.134   80:30082/TCP,443:31301/TCP   52s
router-internal-default   ClusterIP      172.30.197.30   <none>          80/TCP,443/TCP,1936/TCP      19m
[miheer@miheer cluster-ingress-operator]$

Comment 7 Hongan Li 2020-12-16 09:28:02 UTC
yes, tested with https://github.com/openshift/cluster-ingress-operator/pull/500 and passed

# oc get clusterversion
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.ci.test-2020-12-16-080558-ci-ln-nd3i04k   True        False         40m     Cluster version is 4.7.0-0.ci.test-2020-12-16-080558-ci-ln-nd3i04k

# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
router-default            LoadBalancer   172.30.226.221   35.227.105.206   80:31583/TCP,443:32306/TCP   19m
router-internal-default   ClusterIP      172.30.14.242    <none>           80/TCP,443/TCP,1936/TCP      48m

# nslookup downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com
Server:		10.11.5.19
Address:	10.11.5.19#53

Non-authoritative answer:
Name:	downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com
Address: 35.227.105.206

### change endpointPublishingStrategy.loadBalancer.scope to "Internal"
# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
router-default            LoadBalancer   172.30.226.221   10.0.32.6     80:31583/TCP,443:32306/TCP   25m
router-internal-default   ClusterIP      172.30.14.242    <none>        80/TCP,443/TCP,1936/TCP      54m
# nslookup downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com
Server:		10.11.5.19
Address:	10.11.5.19#53

Non-authoritative answer:
Name:	downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com
Address: 10.0.32.6


### change back endpointPublishingStrategy.loadBalancer.scope to "External"
# oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
router-default            LoadBalancer   172.30.226.221   35.196.250.222   80:31583/TCP,443:32306/TCP   28m
router-internal-default   ClusterIP      172.30.14.242    <none>           80/TCP,443/TCP,1936/TCP      57m

# nslookup downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com
Server:		10.11.5.19
Address:	10.11.5.19#53

Non-authoritative answer:
Name:	downloads-openshift-console.apps.ci-ln-nd3i04k-f76d1.origin-ci-int-gce.dev.openshift.com
Address: 35.196.250.222

deleting the LB service also works well.

Comment 8 Miheer Salunke 2021-01-12 15:09:25 UTC
Bug https://bugzilla.redhat.com/show_bug.cgi?id=1914127  can be fixed in 4.8 as it is not that important to get fixed in 4.7

Hongan Li can you please test this PR again ?

You will need to delete finalizers in the service before deleting the service.

Comment 12 Hongan Li 2021-01-14 08:45:03 UTC
please note: 
the test steps in Comment 7 are not valid now since https://bugzilla.redhat.com/show_bug.cgi?id=1906560 revert the mutable ingress load-balancer scope feature, so cannot verify this by changing endpointPublishingStrategy.loadBalancer.scope.

Deleting service is still valid but please ensure removing the finalizers from the service when deleting the service.

Comment 15 errata-xmlrpc 2021-02-24 15:33:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.