Bug 1912413
| Summary: | [AWS UPI] it takes about 10 minutes to delete LoadBalancer service and security groups resources are left behind | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> | ||||||
| Component: | Networking | Assignee: | Candace Holman <cholman> | ||||||
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> | ||||||
| Status: | CLOSED WONTFIX | Docs Contact: | |||||||
| Severity: | high | ||||||||
| Priority: | medium | CC: | amcdermo, mfisher, mrbraga | ||||||
| Version: | 4.7 | ||||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-11-04 15:05:55 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Hi Hongan, I would like to rule out any issues with UPI -- can you please share the details of your UPI cluster installation on AWS? Thanks, Candace Hi Hongan, So far I have not discovered why this might be an issue in UPI but not IPI. Can you check in the AWS console for errors during this 10 minute wait? Specifically, what is blocking the security group from being deleted during the 10-minute wait? If not, can you please provide an environment for me to test? Thanks, Candace After examining the logs it is clear that the regularly installed AWS deletes two security groups prior to deleting the ingress controller, and this does not show up in the logs for the deletion of the ingress controller of an UPI installation. I have found that there are some differences in the regular installer security groups at https://github.com/openshift/installer/blob/master/data/data/aws/vpc/sg-master.tf and https://github.com/openshift/installer/blob/master/data/data/aws/vpc/sg-worker.tf that are not reflected in the UPI yaml version here: https://github.com/openshift/installer/blob/master/upi/aws/cloudformation/03_cluster_security.yaml. Checking with the installer team about it. Created attachment 1747845 [details]
kube-controller-manager pod logs for IPI
Created attachment 1747869 [details]
kube-controller-manager pod logs for UPI
Installer team PR: https://github.com/openshift/installer/pull/4552/ Updating target-release as it appears this will not make the 4.7 release schedule. The installer#4552 does not fix the issue. I ran a test via UPI AWS installation 4.7.0-0.ci-2021-01-23-055147 today and it has the same issue. Since we enabled the http2, grpc-interop and h2spec tests in origin this bug causes us to fail CI tests for AWS/UPI. https://bugzilla.redhat.com/show_bug.cgi?id=1949978#c6 This issue is stale and closed because it has no activity for a significant amount of time and is reported on a version no longer in maintenance. If this issue should not be closed please verify the condition still exists on a supported release and submit an updated bug. |
Description of problem: it takes about 10 minutes to deleting a custom ingresscontroller on UPI AWS platform, but same operation only takes about 20-30 seconds on IPI AWS platform. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-12-21-131655 How reproducible: 100% Steps to Reproduce: 1. launch UPI cluster on AWS. 2. create a custom ingresscontroller, e.g. kind: IngressController apiVersion: operator.openshift.io/v1 metadata: name: test namespace: openshift-ingress-operator spec: defaultCertificate: name: router-certs-default domain: test.example.com replicas: 1 endpointPublishingStrategy: loadBalancer: scope: Internal type: LoadBalancerService 3. delete the custom ingresscontroller Actual results: # time oc -n openshift-ingress-operator delete ingresscontroller/test ingresscontroller.operator.openshift.io "test" deleted real 10m9.935s user 0m0.224s sys 0m0.057s during the time, we can see below error logs in ingress operator pod: 2021-01-04T11:16:28.511282308Z 2021-01-04T11:16:28.511Z ERROR operator.init.controller controller/controller.go:218 Reconciler error {"controller": "ingress_controller", "name": "test", "namespace": "openshift-ingress-operator", "error": "failed to ensure ingress deletion: load balancer service exists for ingress openshift-ingress-operator/test"} Expected results: The deleting time on UPI AWS should be same to the one on IPI AWS. Additional info: