Bug 1772879
| Summary: | Finalizer of Loadbalancer referenecd by CR is not removed upon deletion (sessionAffinity: ClientIP) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Petr Kremensky <pkremens> |
| Component: | Networking | Assignee: | Dan Mace <dmace> |
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | aos-bugs, eparis, jokerman, mchoma, mfojtik |
| Version: | 4.3.0 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-06 14:23:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Kremensky
2019-11-15 12:35:24 UTC
How was this cluster created and on what platform? If it wasn't using the OpenShift installer with a supported IPI/UPI configuration, it's highly unlikely we're going to take any action here. Hi, by the time I reported the issues, cluster was created by FlexyWrapper installer on AWS, version 4.3.0-0.nightly-2019-11-11-115927 I retested now on cluster created by Openshift Installer on OpenStack, version 4.3.0-0.nightly-2019-12-10-034925, I'm no longer able to reproduce the issue on this setup, thus we can close this. I am reopening this. This would be AWS specific bug as Service of type LoadBalancer use external load balancer of cloud provider. We use Flexy wrapper tool [1], which is wrapper around Flexy tool, which is used by OpenShift QE team as well. [1] https://docs.engineering.redhat.com/pages/viewpage.action?pageId=63298965 [2] https://mojo.redhat.com/docs/DOC-1074220 This is minimalistic reproducer to problem
1.
cat << EOF > service.yaml
apiVersion: v1
kind: Service
metadata:
name: example7
spec:
clusterIP: 172.30.211.53
ports:
- name: http
nodePort: 31744
port: 8080
protocol: TCP
targetPort: 8080
sessionAffinity: ClientIP
type: LoadBalancer
EOF
oc apply -f service.yaml
2.
oc delete services example7
service "example7" deleted
<Prompt>
3.
But service is not actually deleted.
(In reply to mchoma from comment #4) > This is minimalistic reproducer to problem > > 1. > cat << EOF > service.yaml > apiVersion: v1 > kind: Service > metadata: > name: example7 > spec: > clusterIP: 172.30.211.53 > ports: > - name: http > nodePort: 31744 > port: 8080 > protocol: TCP > targetPort: 8080 > sessionAffinity: ClientIP > type: LoadBalancer > EOF > > oc apply -f service.yaml > > 2. > oc delete services example7 > service "example7" deleted > <Prompt> > > 3. > But service is not actually deleted. What platform are you on? How did you create the cluster and with what version of OCP? ClientIP session affinity for LoadBalancer service isn't supported on AWS[1]. Sadly, it's not actually documented upstream on what platforms it is supported. When trying this on AWS, you can see the service controller reporting failure to provision the LB by looking at events in the service's namespace: 33s Warning SyncLoadBalancerFailed service/loadbalancer Error syncing load balancer: failed to ensure load balancer: unsupported load balancer affinity: ClientIP In this case, the LB will perpetually fail provisioning, and the finalizer won't be removed from the Service. The only way I see to delete the service is to patch it to remove the finalizer manually. I'm not sure whether this would be considered a bug upstream in the service controller, but I think we could make a case that it is. However, given you shouldn't have created the service in the first place (as it was destined to fail), and given the workaround (patch away the finalizer), it seems low priority. If you all want to keep the bug open for the deletion bug, I won't object, but the likelihood of us spending attention upstream for the problem is very low. If the actual concern is getting ClientIP session affinity working on a platform where it's not currently supported upstream (e.g. AWS), that would be an issue to pursue upstream and isn't a bug in OpenShift. Is there a reproducer where: 1. The LoadBalancer service is actually successfully provisioned 2. After successful provisioning, the service can't be deleted That would be a higher impact problem, IMO. [1] https://github.com/kubernetes/kubernetes/issues/13892 our cluster is of version 4.3.0-0.nightly-2020-01-06-101556 and is running on AWS I can confirm I see the same error event on OCP 4.2. On OCP4.2 finalizer was not present so Service was deleted and we haven't noticed the problem. To be honest I saw that event before. But it did not ring the bell for me that is the source of problem. We do not have strong use case for this combination of parameters to work on AWS to create RFE for kubernetes/openshift, we just tried it because it was possible combination So now problem is. How should I as a user now that this combination of parameters is not supported on AWS? I think that should be documented properly somewhere. Couldn't openshift be smart enough ? - and be able to validate Service yaml input and prevent user from creating buggy service. It means when running on AWS combination of LoadBalancer and ClientIP is prohibited. - or does not retry of creating AWS load balancer as it wont succeed never because of "unsupported load balancer affinity: ClientIP" [1] https://github.com/kubernetes/kubernetes/issues/13892 What is interesting when I create "sessionAffinity: None" service and update to "sessionAffinity: ClientIP". Service object can be deleted. (In reply to mchoma from comment #6) > our cluster is of version 4.3.0-0.nightly-2020-01-06-101556 and is running > on AWS > > I can confirm I see the same error event on OCP 4.2. On OCP4.2 finalizer was > not present so Service was deleted and we haven't noticed the problem. > > To be honest I saw that event before. But it did not ring the bell for me > that is the source of problem. > > We do not have strong use case for this combination of parameters to work on > AWS to create RFE for kubernetes/openshift, we just tried it because it was > possible combination > > So now problem is. How should I as a user now that this combination of > parameters is not supported on AWS? I think that should be documented > properly somewhere. > > Couldn't openshift be smart enough ? > - and be able to validate Service yaml input and prevent user from creating > buggy service. It means when running on AWS combination of LoadBalancer and > ClientIP is prohibited. > - or does not retry of creating AWS load balancer as it wont succeed never > because of "unsupported load balancer affinity: ClientIP" > > [1] https://github.com/kubernetes/kubernetes/issues/13892 I agree the experience needs improved. Keep in mind the problem is with Kubernetes itself in the service controller and cloud provider implementations. You could reproduce this on a vanilla Kube installation outside of OpenShift. Any improvement here will need to start upstream. We can keep this open for now, but I doubt we're willing to block a release on it. Moving to 4.5. Given the existing events Kube reports (https://bugzilla.redhat.com/show_bug.cgi?id=1772879#c5) and the low overall impact, I think it's unlikely we're going to dedicate any resources to improving the upstream status reporting for this issue in the foreseeable future. I'm going to close the bug to avoid setting false expectations of action on our part. If there's some strong business justification, we can discuss and re-open later. |