Bug 1683515 - Deleting a ClusterIngress before a DNS Alias record is created causes the operation to hang.
Summary: Deleting a ClusterIngress before a DNS Alias record is created causes the ope...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.1.0
Assignee: Daneyon Hansen
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-27 05:23 UTC by Daneyon Hansen
Modified: 2019-06-04 10:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:39 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:44:45 UTC
Github openshift cluster-ingress-operator pull 164 None None None 2019-03-13 23:26:47 UTC

Description Daneyon Hansen 2019-02-27 05:23:11 UTC
Description of problem:
Deleting a ClusterIngress before a DNS Alias record is created causes the operation to hang. This causes dependent resources to become orphaned. The source of the issue appears to be a reconciliation error when trying to delete the router service loadbalancer dns record. Since the record does not exist, the reconciliation errors and stops trying to reconcile the resource. 

Version-Release number of selected component (if applicable):
$ git log --oneline
1b4fa5a5 Merge pull request #132 from pravisankar/fix-retry-controller

How reproducible:
always

Steps to Reproduce:
1. Create a clusteringress.

2. Before the clusteringress dns record and associated to the router's serivce (type: LoadBalancer), delete the clusteringress.

Actual results:
The deletion event hangs. The DeleteionTimestamp is applied to the clusteringress, but the resource and dependent resources are not deleted.

Expected results:
The deletion event to complete successfully.

Additional info:

Relevant Operator Log Messages):
2019-02-26T21:12:39.934-0800	INFO	operator.controller	controller/controller.go:82	reconciling	{"request": "openshift-ingress-operator/test1"}
2019-02-26T21:12:41.721-0800	ERROR	operator.init.kubebuilder.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/test1", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: [failed to delete DNS record &{{ map[Name:danehans-9nggd-int kubernetes.io/cluster/danehans-9nggd:owned]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone ZYPD4B0DM135S: couldn't update DNS record in zone ZYPD4B0DM135S: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4e6f07e1-3a4e-11e9-95f1-3f28621c9566, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4ebcd95b-3a4e-11e9-9626-4f65dfa62605]", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: [failed to delete DNS record &{{ map[Name:danehans-9nggd-int kubernetes.io/cluster/danehans-9nggd:owned]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone ZYPD4B0DM135S: couldn't update DNS record in zone ZYPD4B0DM135S: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4e6f07e1-3a4e-11e9-95f1-3f28621c9566, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests1.danehans.devcluster.openshift.com -> a53a055213a4c11e9adeb0a6b3bd6b3e-1017019273.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test1: failed to update alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests1.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: 4ebcd95b-3a4e-11e9-9626-4f65dfa62605]"}]}
github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr/zapr.go:128
github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217
github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
2019-02-26T21:12:42.723-0800	INFO	operator.controller	controller/controller.go:82	reconciling	{"request": "openshift-ingress-operator/test0"}
2019-02-26T21:12:43.356-0800	INFO	operator.dns	aws/dns.go:271	skipping DNS record update	{"record": {"Zone":{"tags":{"Name":"danehans-9nggd-int","kubernetes.io/cluster/danehans-9nggd":"owned"}},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}}
2019-02-26T21:12:43.356-0800	INFO	operator.controller	controller/controller_dns.go:26	ensured DNS record for clusteringress	{"namespace": "openshift-ingress-operator", "name": "test0", "record": {"Zone":{"tags":{"Name":"danehans-9nggd-int","kubernetes.io/cluster/danehans-9nggd":"owned"}},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}}
2019-02-26T21:12:43.356-0800	INFO	operator.dns	aws/dns.go:271	skipping DNS record update	{"record": {"Zone":{"id":"Z3URY6TWQ91KVV"},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}}
2019-02-26T21:12:43.356-0800	INFO	operator.controller	controller/controller_dns.go:26	ensured DNS record for clusteringress	{"namespace": "openshift-ingress-operator", "name": "test0", "record": {"Zone":{"id":"Z3URY6TWQ91KVV"},"Type":"ALIAS","Alias":{"Domain":"*.tests.danehans.devcluster.openshift.com","Target":"a56efc2813a4c11e9adeb0a6b3bd6b3e-1598625165.us-west-2.elb.amazonaws.com"}}}
2019-02-26T21:12:44.164-0800	DEBUG	operator.init.kubebuilder.controller

Comment 1 Daneyon Hansen 2019-02-27 17:02:00 UTC
The issue also exists if you create a clusteringress and the service LoadBalancer EXTERNAL-IP gets stuck in pending:

$ oc get svc -n openshift-ingress | grep test1
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                      AGE
router-internal-test1     ClusterIP      172.30.161.222   <none>                                                                   80/TCP,443/TCP,1936/TCP      6m31s
router-test1              LoadBalancer   172.30.87.116    <pending>                                                                80:32051/TCP,443:30474/TCP   6m33s

You are unable to delete the associated clustyeringress.

2019-02-27T08:58:08.204-0800	INFO	operator.controller	controller/controller.go:82	reconciling	{"request": "openshift-ingress-operator/test1"}
2019-02-27T08:58:09.137-0800	ERROR	operator.init.kubebuilder.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/test1", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: no load balancer is assigned to service openshift-ingress/router-test1", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test1: no load balancer is assigned to service openshift-ingress/router-test1"}]}
github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/zapr/zapr.go:128
github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217
github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until
	/Users/daneyonhansen/code/go/src/github.com/openshift/cluster-ingress-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88

Comment 3 Daneyon Hansen 2019-03-13 00:31:40 UTC
If ingresscontroller 'delete' occurs during a dns update:
2019-03-12T16:31:11.638-0700	ERROR	operator.init.kubebuilder.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/test0", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: [failed to delete DNS record &{{ map[Name:danehans-wpwp4-int kubernetes.io/cluster/danehans-wpwp4:owned]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z1O11RGK05PNBT: couldn't update DNS record in zone Z1O11RGK05PNBT: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb4454ee-451e-11e9-b04e-9b8331d6fb61, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb7bdfdf-451e-11e9-83a1-a146a63c1bc2]", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: [failed to delete DNS record &{{ map[Name:danehans-wpwp4-int kubernetes.io/cluster/danehans-wpwp4:owned]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z1O11RGK05PNBT: couldn't update DNS record in zone Z1O11RGK05PNBT: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb4454ee-451e-11e9-b04e-9b8331d6fb61, failed to delete DNS record &{{Z3URY6TWQ91KVV map[]} ALIAS *.tests0.danehans.devcluster.openshift.com -> adfb4c320451e11e99bb706fb156f538-250013580.us-west-2.elb.amazonaws.com} for ingress openshift-ingress-operator/test0: failed to delete alias in zone Z3URY6TWQ91KVV: couldn't update DNS record in zone Z3URY6TWQ91KVV: InvalidChangeBatch: [Tried to delete resource record set [name='\\052.tests0.danehans.devcluster.openshift.com.', type='A'] but it was not found]\n\tstatus code: 400, request id: eb7bdfdf-451e-11e9-83a1-a146a63c1bc2]"}]}

If ingresscontroller 'delete' occurs during a finalize:
2019-03-12T17:24:46.296-0700	ERROR	operator.init.kubebuilder.controller	controller/controller.go:217	Reconciler error	{"controller": "operator-controller", "request": "openshift-ingress-operator/test0", "error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: no load balancer is assigned to service openshift-ingress/router-test0", "errorCauses": [{"error": "failed to ensure ingress deletion: failed to finalize load balancer service for test0: no load balancer is assigned to service openshift-ingress/router-test0"}]}

Comment 4 Daneyon Hansen 2019-03-13 15:28:08 UTC
PR to fix bug: https://github.com/openshift/cluster-ingress-operator/pull/164

Comment 6 Hongan Li 2019-03-20 06:07:58 UTC
verified with 4.0.0-0.nightly-2019-03-19-004004 and issue has been fixed.

Comment 8 errata-xmlrpc 2019-06-04 10:44:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.