Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1912413

Summary:

[AWS UPI] it takes about 10 minutes to delete LoadBalancer service and security groups resources are left behind

Product:

OpenShift Container Platform

Reporter:

Hongan Li <hongli>

Component:

Networking

Assignee:

Candace Holman <cholman>

Networking sub component:

router

QA Contact:

Hongan Li <hongli>

Status:

CLOSED WONTFIX

Docs Contact:

Severity:

high

Priority:

medium

CC:

amcdermo, mfisher, mrbraga

Version:

4.7

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-11-04 15:05:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
kube-controller-manager pod logs for IPI	none
kube-controller-manager pod logs for UPI	none

Description Hongan Li 2021-01-04 12:01:12 UTC

Description of problem:
it takes about 10 minutes to deleting a custom ingresscontroller on UPI AWS platform, but same operation only takes about 20-30 seconds on IPI AWS platform.
 

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-21-131655

How reproducible:
100%

Steps to Reproduce:
1. launch UPI cluster on AWS.
2. create a custom ingresscontroller, e.g.

kind: IngressController
apiVersion: operator.openshift.io/v1
metadata:
  name: test
  namespace: openshift-ingress-operator
spec:
  defaultCertificate:
    name: router-certs-default
  domain: test.example.com
  replicas: 1
  endpointPublishingStrategy:
    loadBalancer:
      scope: Internal
    type: LoadBalancerService

3. delete the custom ingresscontroller

Actual results:
# time oc -n openshift-ingress-operator delete ingresscontroller/test
ingresscontroller.operator.openshift.io "test" deleted

real	10m9.935s
user	0m0.224s
sys	0m0.057s

during the time, we can see below error logs in ingress operator pod:
2021-01-04T11:16:28.511282308Z 2021-01-04T11:16:28.511Z	ERROR	operator.init.controller	controller/controller.go:218	Reconciler error	{"controller": "ingress_controller", "name": "test", "namespace": "openshift-ingress-operator", "error": "failed to ensure ingress deletion: load balancer service exists for ingress openshift-ingress-operator/test"}


Expected results:
The deleting time on UPI AWS should be same to the one on IPI AWS. 

Additional info:

Comment 2 Candace Holman 2021-01-05 18:41:58 UTC

Hi Hongan,

I would like to rule out any issues with UPI -- can you please share the details of your UPI cluster installation on AWS?

Thanks,
Candace

Comment 6 Candace Holman 2021-01-12 00:00:07 UTC

Hi Hongan,

So far I have not discovered why this might be an issue in UPI but not IPI.  

Can you check in the AWS console for errors during this 10 minute wait? Specifically, what is blocking the security group from being deleted during the 10-minute wait?  If not, can you please provide an environment for me to test?

Thanks,
Candace

Comment 8 Candace Holman 2021-01-13 23:12:30 UTC

After examining the logs it is clear that the regularly installed AWS deletes two security groups prior to deleting the ingress controller, 
and this does not show up in the logs for the deletion of the ingress controller of an UPI installation.

I have found that there are some differences in the regular installer security groups at 
https://github.com/openshift/installer/blob/master/data/data/aws/vpc/sg-master.tf and 
https://github.com/openshift/installer/blob/master/data/data/aws/vpc/sg-worker.tf that are not reflected
in the UPI yaml version here: https://github.com/openshift/installer/blob/master/upi/aws/cloudformation/03_cluster_security.yaml.

Checking with the installer team about it.

Comment 10 Candace Holman 2021-01-15 15:03:01 UTC

Created attachment 1747845 [details]
kube-controller-manager pod logs for IPI

Comment 11 Candace Holman 2021-01-15 15:40:04 UTC

Created attachment 1747869 [details]
kube-controller-manager pod logs for UPI

Comment 12 Candace Holman 2021-01-19 18:59:49 UTC

Installer team PR: https://github.com/openshift/installer/pull/4552/

Comment 14 mfisher 2021-01-20 20:04:56 UTC

Updating target-release as it appears this will not make the 4.7 release schedule.

Comment 15 Candace Holman 2021-01-26 20:38:17 UTC

The installer#4552 does not fix the issue.  I ran a test via UPI AWS installation 4.7.0-0.ci-2021-01-23-055147 today and it has the same issue.

Comment 23 Andrew McDermott 2021-04-19 15:30:35 UTC

Since we enabled the http2, grpc-interop and h2spec tests in origin this
bug causes us to fail CI tests for AWS/UPI.

https://bugzilla.redhat.com/show_bug.cgi?id=1949978#c6

Comment 47 mfisher 2022-11-04 15:05:55 UTC

This issue is stale and closed because it has no activity for a significant amount of time and is reported on a version no longer in maintenance.  If this issue should not be closed please verify the condition still exists on a supported release and submit an updated bug.