Bug 1914127

Summary: Deletion of oc get svc router-default -n openshift-ingress hangs
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: NetworkingAssignee: Miheer Salunke <misalunk>
Networking sub component: router QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amcdermo, aos-bugs, bmcelvee, mmasters, ngirard, shudili, yhe
Version: 4.6   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:36:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1898417    

Description Miheer Salunke 2021-01-08 07:59:48 UTC
Description of problem:

Deletion of oc get svc router-default -n openshift-ingress hangs

Version-Release number of selected component (if applicable):
OCP 4.6

How reproducible:
Always

Steps to Reproduce:
1. [miheer@miheer gcp-ocp]oc delete svc router-default -n openshift-ingress
service "router-default" deleted
..........hangs
2.
3.

Actual results:
deletion of svc hangs

Expected results:
deletion of svc shall not hang


Additional info:

Workaround is to delete finalizers from the svc which then helps the oc delete command to complete.

Comment 1 Miheer Salunke 2021-01-08 08:00:14 UTC
2021-01-08T07:53:41.731Z	INFO	operator.ingress_controller	handler/enqueue_mapped.go:104	queueing ingress	{"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2021-01-08T07:53:41.731Z	INFO	operator.ingress_controller	handler/enqueue_mapped.go:104	queueing ingress	{"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2021-01-08T07:53:41.731Z	INFO	operator.ingress_controller	controller/controller.go:235	reconciling	{"request": "openshift-ingress-operator/default"}
2021-01-08T07:53:41.843Z	DEBUG	operator.init.controller	controller/controller.go:209	Successfully Reconciled	{"controller": "ingress_controller", "name": "default", "namespace": "openshift-ingress-operator"}
2021-01-08T07:54:09.806Z	INFO	operator.ingress_controller	handler/enqueue_mapped.go:104	queueing ingress	{"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2021-01-08T07:54:09.806Z	INFO	operator.ingress_controller	handler/enqueue_mapped.go:104	queueing ingress	{"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2021-01-08T07:54:09.807Z	INFO	operator.ingress_controller	controller/controller.go:235	reconciling	{"request": "openshift-ingress-operator/default"}
2021-01-08T07:54:09.822Z	INFO	operator.ingress_controller	handler/enqueue_mapped.go:104	queueing ingress	{"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2021-01-08T07:54:09.822Z	INFO	operator.ingress_controller	handler/enqueue_mapped.go:104	queueing ingress	{"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2021-01-08T07:54:09.917Z	ERROR	operator.ingress_controller	controller/controller.go:235	got retryable error; requeueing	{"after": "1m29.999991841s", "error": "IngressController may become degraded soon: LoadBalancerReady=False"}
2021-01-08T07:54:09.917Z	INFO	operator.ingress_controller	controller/controller.go:235	reconciling	{"request": "openshift-ingress-operator/default"}
2021-01-08T07:54:09.918Z	INFO	operator.status_controller	controller/controller.go:235	Reconciling	{"request": "openshift-ingress-operator/default"}
2021-01-08T07:54:09.928Z	DEBUG	operator.init.controller	controller/controller.go:209	Successfully Reconciled	{"controller": "status_controller", "name": "default", "namespace": "openshift-ingress-operator"}
2021-01-08T07:54:10.016Z	ERROR	operator.ingress_controller	controller/controller.go:235	got retryable error; requeueing	{"after": "1m28.984842132s", "error": "IngressController may become degraded soon: LoadBalancerReady=False"}
2021-01-08T07:54:10.016Z	INFO	operator.ingress_controller	controller/controller.go:235	reconciling	{"request": "openshift-ingress-operator/default"}
2021-01-08T07:54:10.131Z	ERROR	operator.ingress_controller	controller/controller.go:235	got retryable error; requeueing	{"after": "1m28.87058498s", "error": "IngressController may become degraded soon: LoadBalancerReady=False"}
2021-01-08T07:54:10.277Z	DEBUG	operator.init.controller	controller/controller.go:209	Successfully Reconciled	{"controller": "certificate_controller", "name": "default", "namespace": "openshift-ingress-operator"}

Comment 2 Miciah Dashiel Butler Masters 2021-01-08 08:06:30 UTC
Can you check the service controller logs?  The service controller runs in the kube-controller-manager pod; use `oc -n openshift-kube-controller-manager get pods -l app=kube-controller-manager` to list the pods, and then use something like `oc logs -n openshift-kube-controller-manager -c kube-controller-manager kube-controller-manager-foo` to check each pod's.  The relevant error messages will probably have "gce" or "load balancer" in them.

Comment 3 Miheer Salunke 2021-01-08 11:44:42 UTC
related kube controller logs after deletion

I0108 10:47:48.663169       1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-monitoring/grafana" err="Operation cannot be fulfilled on deployments.apps \"grafana\": the object has been modified; please apply your changes to the latest version and try again"
I0108 10:49:58.197632       1 controller.go:353] Deleting existing load balancer for service openshift-ingress/router-default
I0108 10:49:58.198494       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="DeletingLoadBalancer" message="Deleting load balancer"
I0108 10:49:58.643140       1 gce_loadbalancer_external.go:337] ensureExternalLoadBalancerDeleted(a2056239fcfed49cba62027a81fa49ff(openshift-ingress/router-default)): Deleting forwarding rule.
I0108 10:49:58.643148       1 gce_loadbalancer_external.go:319] ensureExternalLoadBalancerDeleted(a2056239fcfed49cba62027a81fa49ff(openshift-ingress/router-default)): Deleting firewall rule.
I0108 10:49:58.643187       1 gce_loadbalancer_external.go:333] ensureExternalLoadBalancerDeleted(a2056239fcfed49cba62027a81fa49ff(openshift-ingress/router-default)): Deleting IP address.
I0108 10:50:16.191945       1 gce_loadbalancer_external.go:343] ensureExternalLoadBalancerDeleted(a2056239fcfed49cba62027a81fa49ff(openshift-ingress/router-default)): Deleting target pool.
I0108 10:50:19.794303       1 gce_loadbalancer_external.go:379] DeleteExternalTargetPoolAndChecks(a2056239fcfed49cba62027a81fa49ff(openshift-ingress/router-default)): Deleting health check a2056239fcfed49cba62027a81fa49ff.
I0108 10:50:22.054471       1 gce_loadbalancer_external.go:401] DeleteExternalTargetPoolAndChecks(a2056239fcfed49cba62027a81fa49ff(openshift-ingress/router-default)): Deleting health check firewall k8s-a2056239fcfed49cba62027a81fa49ff-http-hc.
I0108 10:50:27.099495       1 controller.go:868] Removing finalizer from service openshift-ingress/router-default
I0108 10:50:27.117322       1 controller.go:894] Patching status for service openshift-ingress/router-default
I0108 10:50:27.117629       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="DeletedLoadBalancer" message="Deleted load balancer"
I0108 10:50:27.132544       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-default
I0108 10:50:27.132663       1 controller.go:853] Adding finalizer to service openshift-ingress/router-default
I0108 10:50:27.133766       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0108 10:50:27.143730       1 controller.go:275] error processing service openshift-ingress/router-default (will retry): failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}
I0108 10:50:27.143835       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to add load balancer cleanup finalizer: Service \"router-default\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"service.kubernetes.io/load-balancer-cleanup\"}"
I0108 10:50:32.144102       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-default
I0108 10:50:32.144173       1 controller.go:853] Adding finalizer to service openshift-ingress/router-default
I0108 10:50:32.144318       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0108 10:50:32.152049       1 controller.go:275] error processing service openshift-ingress/router-default (will retry): failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}
I0108 10:50:32.152143       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to add load balancer cleanup finalizer: Service \"router-default\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"service.kubernetes.io/load-balancer-cleanup\"}"
I0108 10:50:42.152320       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-default
I0108 10:50:42.152436       1 controller.go:853] Adding finalizer to service openshift-ingress/router-default
I0108 10:50:42.152573       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0108 10:50:42.166954       1 controller.go:275] error processing service openshift-ingress/router-default (will retry): failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}
I0108 10:50:42.167042       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to add load balancer cleanup finalizer: Service \"router-default\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"service.kubernetes.io/load-balancer-cleanup\"}"
I0108 10:51:02.167300       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-default
I0108 10:51:02.167411       1 controller.go:853] Adding finalizer to service openshift-ingress/router-default
I0108 10:51:02.167561       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0108 10:51:02.181051       1 controller.go:275] error processing service openshift-ingress/router-default (will retry): failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}
I0108 10:51:02.181134       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to add load balancer cleanup finalizer: Service \"router-default\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"service.kubernetes.io/load-balancer-cleanup\"}"
I0108 10:51:42.181329       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-default
I0108 10:51:42.181409       1 controller.go:853] Adding finalizer to service openshift-ingress/router-default
I0108 10:51:42.181622       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E0108 10:51:42.191909       1 controller.go:275] error processing service openshift-ingress/router-default (will retry): failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}
I0108 10:51:42.192040       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to add load balancer cleanup finalizer: Service \"router-default\" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{\"service.kubernetes.io/load-balancer-cleanup\"}"


[miheer@miheer gcp-ocp]$ oc get  svc router-default -n openshift-ingress
NAME             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-default   LoadBalancer   172.30.97.221   <pending>     80:31633/TCP,443:31368/TCP   11m


apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2021-01-08T10:43:14Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2021-01-08T10:49:57Z"
  finalizers:
  - ingress.openshift.io/operator
  labels:
    app: router
    ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
    router: router-default
  name: router-default
  namespace: openshift-ingress
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: Deployment
    name: router-default
    uid: afb5630c-32e8-436a-b843-2265d60ab02e
  resourceVersion: "111945"
  selfLink: /api/v1/namespaces/openshift-ingress/services/router-default
  uid: 2056239f-cfed-49cb-a620-27a81fa49ff7
spec:
  clusterIP: 172.30.97.221
  externalTrafficPolicy: Local
  healthCheckNodePort: 30623
  ports:
  - name: http
    nodePort: 31633
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: 31368
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}


After deleting the finalizer
[miheer@miheer gcp-ocp]$ oc get svc router-default -n openshift-ingress -w
NAME             TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
router-default   LoadBalancer   172.30.63.192   34.87.251.58   80:31171/TCP,443:31356/TCP   82s

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2021-01-08T10:56:06Z"
  finalizers:
  - ingress.openshift.io/operator
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app: router
    ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
    router: router-default
  name: router-default
  namespace: openshift-ingress
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: Deployment
    name: router-default
    uid: afb5630c-32e8-436a-b843-2265d60ab02e
  resourceVersion: "113798"
  selfLink: /api/v1/namespaces/openshift-ingress/services/router-default
  uid: f1911857-3775-4efd-8cfb-b8797ca1f8c6
spec:
  clusterIP: 172.30.63.192
  externalTrafficPolicy: Local
  healthCheckNodePort: 31157
  ports:
  - name: http
    nodePort: 31171
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: 31356
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 34.87.251.58



One interesting happened I removed - ingress.openshift.io/operator  from finalizers and ran oc delete svc router-default and it worked.

Related logs looking good ->

I0108 10:58:39.950289       1 controller.go:353] Deleting existing load balancer for service openshift-ingress/router-default
I0108 10:58:39.950568       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="DeletingLoadBalancer" message="Deleting load balancer"
I0108 10:58:40.404331       1 gce_loadbalancer_external.go:337] ensureExternalLoadBalancerDeleted(af191185737754efd8cfbb8797ca1f8c(openshift-ingress/router-default)): Deleting forwarding rule.
I0108 10:58:40.404344       1 gce_loadbalancer_external.go:319] ensureExternalLoadBalancerDeleted(af191185737754efd8cfbb8797ca1f8c(openshift-ingress/router-default)): Deleting firewall rule.
I0108 10:58:40.404382       1 gce_loadbalancer_external.go:333] ensureExternalLoadBalancerDeleted(af191185737754efd8cfbb8797ca1f8c(openshift-ingress/router-default)): Deleting IP address.
I0108 10:58:58.095853       1 gce_loadbalancer_external.go:343] ensureExternalLoadBalancerDeleted(af191185737754efd8cfbb8797ca1f8c(openshift-ingress/router-default)): Deleting target pool.
I0108 10:59:01.594496       1 gce_loadbalancer_external.go:379] DeleteExternalTargetPoolAndChecks(af191185737754efd8cfbb8797ca1f8c(openshift-ingress/router-default)): Deleting health check af191185737754efd8cfbb8797ca1f8c.
I0108 10:59:03.984714       1 gce_loadbalancer_external.go:401] DeleteExternalTargetPoolAndChecks(af191185737754efd8cfbb8797ca1f8c(openshift-ingress/router-default)): Deleting health check firewall k8s-af191185737754efd8cfbb8797ca1f8c-http-hc.
I0108 10:59:09.539832       1 controller.go:868] Removing finalizer from service openshift-ingress/router-default
I0108 10:59:09.557441       1 controller.go:894] Patching status for service openshift-ingress/router-default
I0108 10:59:09.558677       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="DeletedLoadBalancer" message="Deleted load balancer"
I0108 10:59:09.558803       1 garbagecollector.go:404] "Processing object" object="openshift-ingress/router-default-wwqfr" objectUID=bc641926-5410-4c63-8c4a-dfa2d6442281 kind="EndpointSlice"
I0108 10:59:09.606598       1 garbagecollector.go:519] "Deleting object" object="openshift-ingress/router-default-wwqfr" objectUID=bc641926-5410-4c63-8c4a-dfa2d6442281 kind="EndpointSlice" propagationPolicy=Background
I0108 10:59:09.665930       1 controller.go:368] Ensuring load balancer for service openshift-ingress/router-default
I0108 10:59:09.666110       1 controller.go:853] Adding finalizer to service openshift-ingress/router-default
I0108 10:59:09.668159       1 event.go:291] "Event occurred" object="openshift-ingress/router-default" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0108 10:59:11.355734       1 gce_loadbalancer_external.go:74] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default), australia-southeast1, , [TCP/80 TCP/443], [misalunk-w2658-master-1.c.openshift-gce-devel.internal misalunk-w2658-master-2.c.openshift-gce-devel.internal misalunk-w2658-worker-b-8rp79.c.openshift-gce-devel.internal misalunk-w2658-worker-c-k4nh5.c.openshift-gce-devel.internal misalunk-w2658-worker-a-g9tcj.c.openshift-gce-devel.internal misalunk-w2658-master-0.c.openshift-gce-devel.internal], map[])
I0108 10:59:12.565721       1 gce_loadbalancer_external.go:92] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Forwarding rule a737b6a846e8340beb69980b33c5f2c6 doesn't exist.
I0108 10:59:15.230402       1 gce_loadbalancer_external.go:155] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Ensured IP address 34.116.73.192 (tier: Premium).
I0108 10:59:15.656542       1 gce_loadbalancer_external.go:189] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Creating firewall.
I0108 10:59:19.339921       1 gce_loadbalancer_external.go:193] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Created firewall.
I0108 10:59:19.727391       1 gce_loadbalancer_external.go:202] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Target pool for service doesn't exist.
I0108 10:59:20.165109       1 gce_loadbalancer_external.go:218] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Updating from nodes health checks to local traffic health checks.
I0108 10:59:20.597179       1 gce_loadbalancer_external.go:901] Creating firewall k8s-a737b6a846e8340beb69980b33c5f2c6-http-hc for health checks.
I0108 10:59:24.203646       1 gce_loadbalancer_external.go:905] Created firewall k8s-a737b6a846e8340beb69980b33c5f2c6-http-hc for health checks.
I0108 10:59:24.630610       1 gce_loadbalancer_external.go:694] Did not find health check a737b6a846e8340beb69980b33c5f2c6, creating port 30902 path /healthz
I0108 10:59:27.357522       1 gce_loadbalancer_external.go:703] Created HTTP health check a737b6a846e8340beb69980b33c5f2c6 healthCheckNodePort: 30902
I0108 10:59:27.357563       1 gce_loadbalancer_external.go:553] Creating targetpool a737b6a846e8340beb69980b33c5f2c6 with 1 healthchecks
I0108 10:59:31.181479       1 gce_loadbalancer_external.go:495] ensureTargetPoolAndHealthCheck(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Created health checks a737b6a846e8340beb69980b33c5f2c6.
I0108 10:59:31.181514       1 gce_loadbalancer_external.go:498] ensureTargetPoolAndHealthCheck(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Created target pool.
I0108 10:59:31.181523       1 gce_loadbalancer_external.go:262] ensureExternalLoadBalancer(a737b6a846e8340beb69980b33c5f2c6(openshift-ingress/router-default)): Creating forwarding rule, IP 34.116.73.192 (tier: Premium).
^C
[miheer@miheer gcp-ocp]$ 



My observations ->

Initially both finalizers are there one by ingress operator and other by kube controller manager

[miheer@miheer gcp-ocp]$ oc get svc router-default -n openshift-ingress -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2021-01-08T11:12:57Z"
  finalizers:
  - ingress.openshift.io/operator
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app: router
    ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
    router: router-default
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"ingress.openshift.io/operator": {}
        f:labels:
          .: {}
          f:app: {}
          f:ingresscontroller.operator.openshift.io/owning-ingresscontroller: {}
          f:router: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"afb5630c-32e8-436a-b843-2265d60ab02e"}:
            .: {}
            f:apiVersion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:externalTrafficPolicy: {}
        f:ports:
          .: {}
          k:{"port":80,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
          k:{"port":443,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector:
          .: {}
          f:ingresscontroller.operator.openshift.io/deployment-ingresscontroller: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: ingress-operator
    operation: Update
    time: "2021-01-08T11:12:57Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          v:"service.kubernetes.io/load-balancer-cleanup": {}
      f:status:
        f:loadBalancer:
          f:ingress: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-01-08T11:13:44Z"
  name: router-default
  namespace: openshift-ingress
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: Deployment
    name: router-default
    uid: afb5630c-32e8-436a-b843-2265d60ab02e
  resourceVersion: "118756"
  selfLink: /api/v1/namespaces/openshift-ingress/services/router-default
  uid: b44a9a1c-82a0-43ab-953b-536ef54354d9
spec:
  clusterIP: 172.30.252.204
  externalTrafficPolicy: Local
  healthCheckNodePort: 30375
  ports:
  - name: http
    nodePort: 31229
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: 30183
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 34.87.251.58
[miheer@miheer gcp-ocp]$ 



When say oc delete svc router-default -n openshift-ingress the service.kubernetes.io/load-balancer-cleanup is removed as expected but   - ingress.openshift.io/operator remains which is not correct from my understanding causing the oc delete command to get hung. Once ingress.openshift.io/operator is removed manually the oc delete commands completes and a new svc is created with new LB external IP
  

So it looks like from the ingress operator code we need to delete the finalizer ingress.openshift.io/operator from the router default service after we invoke a delete command for router default svc in openshift-ingress. 

It looks like we delete that finalizer once the ingress controller is deleted 

 https://github.com/openshift/cluster-ingress-operator/blob/87e9d6cf3fa320f85ad4e1ffd4552f579a65e857/pkg/operator/controller/ingress/controller.go#L187

 https://github.com/openshift/cluster-ingress-operator/blob/87e9d6cf3fa320f85ad4e1ffd4552f579a65e857/pkg/operator/controller/ingress/controller.go#L544

https://github.com/openshift/cluster-ingress-operator/blob/87e9d6cf3fa320f85ad4e1ffd4552f579a65e857/pkg/operator/controller/ingress/load_balancer_service.go#L259


But the question how to handle this from kubernetes level because when delete a service we will have make changes in the kubernetes service controller

https://github.com/kubernetes/kubernetes/blob/43ce28b9954c0d0b8b43b02724f12dce795befec/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L324

Comment 4 Miheer Salunke 2021-01-12 13:40:23 UTC
I think we don't have any control at kubernetes service level code so before deleting the service we need to delete the finalizers and then perform the delete action.

Shall we close this BZ ?

Comment 5 Miciah Dashiel Butler Masters 2021-01-12 14:34:01 UTC
The issue is that the service controller tries to re-add its finalizer after the service has been marked for deletion: 

E0108 10:50:27.143730       1 controller.go:275] error processing service openshift-ingress/router-default (will retry): failed to add load balancer cleanup finalizer: Service "router-default" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"service.kubernetes.io/load-balancer-cleanup"}

That seems like a logic error in the service controller, right?  

As for the ingress.openshift.io/operator finalizer, I think I understand what happened.  Earlier, we merged <https://github.com/openshift/cluster-ingress-operator/pull/472>, which deleted logic to add the ingress.openshift.io/operator finalizer and added logic to delete the same, so that finalizer did not block deletion of the service.  Subsequently, bug 1898417 was reported, which included deleting the service as part of the steps to reproduce the issue, and these steps worked because #472 had removed the ingress.openshift.io/operator finalizer.  Then we merged <https://github.com/openshift/cluster-ingress-operator/pull/514>, which reverted #472, meaning the ingress.openshift.io/operator finalizer was again added, so bug 1898417's steps to reproduce the issue no longer work.  

So as far as the deletion hanging, I think we are really just back to the pre-#472 behavior, which is undesirable, and we should go ahead and get rid of the ingress.openshift.io/operator finalizer (i.e., restore that part of #472 without the other parts of #472), but this BZ seems less urgent than it initially seemed to be, so it can be deferred until post-4.7.  

As far as the logic error in the service controller, that appears to be a harmless error (as the API's validation is prohibiting the service controller's erroneous re-adding of the finalizer to succeed), so that too can be deferred until post-4.7.

Comment 9 jechen 2021-02-25 01:52:05 UTC
"oc delete svc router-default -n openshift-ingress" finished within reasonable time, did not hang

[jechen@jechen ~]$ oc version
Client Version: 4.8.0-0.nightly-2021-02-23-200827
Server Version: 4.8.0-0.nightly-2021-02-24-063313
Kubernetes Version: v1.20.0+6f8878d
[jechen@jechen ~]$ oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
router-default            LoadBalancer   172.30.139.55   35.227.59.35   80:30563/TCP,443:30122/TCP   24m
router-internal-default   ClusterIP      172.30.47.162   <none>         80/TCP,443/TCP,1936/TCP      24m


[jechen@jechen ~]$ oc -n openshift-ingress delete svc router-default
service "router-default" deleted
[jechen@jechen ~]$ oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
router-default            LoadBalancer   172.30.66.86    104.196.114.95   80:31815/TCP,443:30425/TCP   42s
router-internal-default   ClusterIP      172.30.196.65   <none>           80/TCP,443/TCP,1936/TCP      90s

Comment 11 Brandi Munilla 2021-06-24 16:48:37 UTC
Hi, does this bug require doc text? If so, please update the doc text field.

Comment 13 errata-xmlrpc 2021-07-27 22:36:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438