Hide Forgot
Description of problem: Deleting a ClusterIngress before a DNS Alias record is created causes the operation to hang. This causes dependent resources to become orphaned. The source of the issue appears to be a reconciliation error when trying to delete the router service loadbalancer dns record. Since the record does not exist, the reconciliation errors and stops trying to reconcile the resource. Version-Release number of selected component (if applicable): $ git log --oneline 1b4fa5a5 Merge pull request #132 from pravisankar/fix-retry-controller How reproducible: always Steps to Reproduce: 1. Create a clusteringress. 2. Before the clusteringress dns record and associated to the router's serivce (type: LoadBalancer), delete the clusteringress. Actual results: The deletion event hangs. The DeleteionTimestamp is applied to the clusteringress, but the resource and dependent resources are not deleted. Expected results: The deletion event to complete successfully. Additional info: Relevant Operator Log Messages): 2019-02-26T21:12:39.934-0800 INFO operator.controller controller/controller.go:82 reconciling {"request": "openshift-ingress-operator/test1"} 2019-02-26T21:12:41.721-0800 ERROR operator.init.kubebuilder.controller controller/controller.go:217 Reconciler error {"controller": "operator-controller", "request": "openshift-ingress-operator/test1", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: [failed to delete DNS record &{{ map[Name:danehans-9nggd-int kubernetes.io/cluster/danehans-9nggd:owned]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone ZYPD4B0DM135S: couldn't update DNS record in zone ZYPD4B0DM135S: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4e6f07e1-3a4e-11e9-95f1-3f28621c9566, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4ebcd95b-3a4e-11e9-9626-4f65dfa62605]", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: [failed to delete DNS record &{{ map[Name:danehans-9nggd-int kubernetes.io/cluster/danehans-9nggd:owned]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone ZYPD4B0DM135S: couldn't update DNS record in zone ZYPD4B0DM135S: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4e6f07e1-3a4e-11e9-95f1-3f28621c9566, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4ebcd95b-3a4e-11e9-9626-4f65dfa62605]"}]} github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr/zapr.go:128 github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217 github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 2019-02-26T21:12:42.723-0800 INFO operator.controller controller/controller.go:82 reconciling {"request": "openshift-ingress-operator/test0"} 2019-02-26T21:12:43.356-0800 INFO operator.dns aws/dns.go:271 skipping DNS record update {"record": {"Zone":{"tags":{"Name":"danehans-9nggd-int","kubernetes.io/cluster/danehans-9nggd":"owned"}},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}} 2019-02-26T21:12:43.356-0800 INFO operator.controller controller/controller_dns.go:26 ensured DNS record for clusteringress {"namespace": "openshift-ingress-operator", "name": "test0", "record": {"Zone":{"tags":{"Name":"danehans-9nggd-int","kubernetes.io/cluster/danehans-9nggd":"owned"}},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}} 2019-02-26T21:12:43.356-0800 INFO operator.dns aws/dns.go:271 skipping DNS record update {"record": {"Zone":{"id":"Z3URY6TWQ91KVV"},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}} 2019-02-26T21:12:43.356-0800 INFO operator.controller controller/controller_dns.go:26 ensured DNS record for clusteringress {"namespace": "openshift-ingress-operator", "name": "test0", "record": {"Zone":{"id":"Z3URY6TWQ91KVV"},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}} 2019-02-26T21:12:44.164-0800 DEBUG operator.init.kubebuilder.controller
The issue also exists if you create a clusteringress and the service LoadBalancer EXTERNAL-IP gets stuck in pending: $ oc get svc -n openshift-ingress | grep test1 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-internal-test1 ClusterIP 172.30.161.222 <none> 80/TCP,443/TCP,1936/TCP 6m31s router-test1 LoadBalancer 172.30.87.116 <pending> 80:32051/TCP,443:30474/TCP 6m33s You are unable to delete the associated clustyeringress. 2019-02-27T08:58:08.204-0800 INFO operator.controller controller/controller.go:82 reconciling {"request": "openshift-ingress-operator/test1"} 2019-02-27T08:58:09.137-0800 ERROR operator.init.kubebuilder.controller controller/controller.go:217 Reconciler error {"controller": "operator-controller", "request": "openshift-ingress-operator/test1", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: no load balancer is assigned to service openshift-ingress/router-test1", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: no load balancer is assigned to service openshift-ingress/router-test1"}]} github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr/zapr.go:128 github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217 github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until /Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
If ingresscontroller 'delete' occurs during a dns update: 2019-03-12T16:31:11.638-0700 ERROR operator.init.kubebuilder.controller controller/controller.go:217 Reconciler error {"controller": "operator-controller", "request": "openshift-ingress-operator/test0", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: [failed to delete DNS record &{{ map[Name:danehans-wpwp4-int kubernetes.io/cluster/danehans-wpwp4:owned]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z1O11RGK05PNBT: couldn't update DNS record in zone Z1O11RGK05PNBT: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb4454ee-451e-11e9-b04e-9b8331d6fb61, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb7bdfdf-451e-11e9-83a1-a146a63c1bc2]", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: [failed to delete DNS record &{{ map[Name:danehans-wpwp4-int kubernetes.io/cluster/danehans-wpwp4:owned]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z1O11RGK05PNBT: couldn't update DNS record in zone Z1O11RGK05PNBT: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb4454ee-451e-11e9-b04e-9b8331d6fb61, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb7bdfdf-451e-11e9-83a1-a146a63c1bc2]"}]} If ingresscontroller 'delete' occurs during a finalize: 2019-03-12T17:24:46.296-0700 ERROR operator.init.kubebuilder.controller controller/controller.go:217 Reconciler error {"controller": "operator-controller", "request": "openshift-ingress-operator/test0", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: no load balancer is assigned to service openshift-ingress/router-test0", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: no load balancer is assigned to service openshift-ingress/router-test0"}]}
PR to fix bug: https://github.com/openshift/cluster-ingress-operator/pull/164
verified with 4.0.0-0.nightly-2019-03-19-004004 and issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758