Description of problem: If we delete our VirtualMachineInstance CRs with the "Foreground" PropagationPolicy once everything works fine. If we send the delete request which looks like this: ``` {"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"} ``` multiple times, it happens that the DELETE request adds the Foreground finalizer again to the CR after the Garbage Collector already removed it. The Garbabe Collector does not remove it afterwards again and the CR is stuck. Here the VMI directly after it got deleted: ``` apiVersion: kubevirt.io/v1alpha2 kind: VirtualMachineInstance metadata: creationTimestamp: 2018-10-24T14:34:28Z deletionGracePeriodSeconds: 0 deletionTimestamp: 2018-10-24T14:48:18Z finalizers: - foregroundDeleteVirtualMachine uid: e99fc418-d799-11e8-a4cd-fa163e0953ea ``` Here the GC logs: ``` I1024 14:48:18.907722 1 graph_builder.go:553] add [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: e99fc418-d799-11e8-a4cd-fa163e0953ea] to the attemptToDelete, because it's waiting for its dependents to be deleted I1024 14:48:18.908062 1 garbagecollector.go:408] processing item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: e99fc418-d799-11e8-a4cd-fa163e0953ea] I1024 14:48:18.913571 1 garbagecollector.go:530] remove DeleteDependents finalizer for item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: e99fc418-d799-11e8-a4cd-fa163e0953ea] ``` As you see the "Foreground" finalizer got immediately deleted and only our "foregroundDeleteVirtualMachine" finalizer is still left there. After our controller is done and removes our finalizer the object disappears as expected. Now with multiple times sending the delete request: ``` apiVersion: kubevirt.io/v1alpha2 kind: VirtualMachineInstance metadata: creationTimestamp: 2018-10-24T14:54:41Z deletionGracePeriodSeconds: 0 deletionTimestamp: 2018-10-24T14:54:58Z finalizers: - foregroundDeleteVirtualMachine - foregroundDeletion uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea ``` here the logs again: ``` I1024 14:54:57.368392 1 graph_builder.go:553] add [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea] to the attemptToDelete, because it's waiting for its dependents to be deleted I1024 14:54:57.368910 1 garbagecollector.go:408] processing item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea] I1024 14:54:57.388385 1 garbagecollector.go:530] remove DeleteDependents finalizer for item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea] ``` The gc reports that it removed the finalizer but we added it with a follow-up DELETE request again and it stays tere. After our controller is done and removes its own finalizer, we still see the Foreground finalizer on the object: ``` apiVersion: kubevirt.io/v1alpha2 kind: VirtualMachineInstance metadata: creationTimestamp: 2018-10-24T14:54:41Z deletionGracePeriodSeconds: 0 deletionTimestamp: 2018-10-24T14:54:58Z finalizers: - foregroundDeletion uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea ``` The conclusion here is, that rightfully the gc immediately removes the finalizer, because we don't set an owner reference on the pod. If we do the delete request again, the finalizer gets re-added, but the gc does not process the item again. Version-Release number of selected component (if applicable): oc version oc v3.10.0+dd10d17 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://cnv-executor-gouyang-master1.example.com:8443 openshift v3.11.16 kubernetes v1.11.0+d4cacc0 How reproducible: Steps to Reproduce: 1. Create an arbitrary CRD 2. Create an arbitrary CR of that CRD 3. Set a custom finalizer 4. Do a foregreound delete 5. Wait until the foregreoundDelete finalizer disappears 6. Do another foreground delete 7. The finalizer will be back 8. Remove the custom finalizer Actual results: The gc will not remove the finalizer again and the CR is stuck Expected results: The garbage collector should remove the foreground finalizer as often as necessary. Additional info:
Ping?
I reproduced the issue on 4.2 as well. I created a PR (hasn't merged) with a potential fix https://github.com/kubernetes/kubernetes/pull/80895.
since the issue has been fixed upstream - https://github.com/kubernetes/kubernetes/pull/81081 we can start backporting it. @sttts how far back do we want to backport the fix?
Verified in 4.2.0-0.nightly-2019-09-01-224700 oc create -f - << EOF apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: applications.shipper.booking.com spec: conversion: strategy: None group: shipper.booking.com names: kind: Application listKind: ApplicationList plural: applications shortNames: - app singular: application scope: Namespaced validation: openAPIV3Schema: properties: spec: properties: template: properties: values: type: object required: - values type: object required: - template type: object version: v1alpha1 versions: - name: v1alpha1 served: true storage: true - name: v1 served: true storage: false EOF oc create -n default -f - << EOF apiVersion: shipper.booking.com/v1 kind: Application metadata: name: super-server spec: template: values: replicaCount: 3 EOF oc edit application.shipper.booking.com/super-server -n default metadata: finalizers: - my-any-finalizer ... oc proxy --port=8080 & curl -X DELETE -H "Content-Type: application/json" localhost:8080/apis/shipper.booking.com/v1/namespaces/default/applications/super-server -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' Get: {"apiVersion":"shipper.booking.com/v1","kind":"Application","metadata":{"creationTimestamp":"2019-09-03T13:07:58Z","deletionGracePeriodSeconds":0,"deletionTimestamp":"2019-09-03T13:36:03Z","finalizers":["my-any-finalizer","foregroundDeletion"],"generation":3,"name":"super-server","namespace":"default","resourceVersion":"711481","selfLink":"/apis/shipper.booking.com/v1/namespaces/default/applications/super-server","uid":"da0d4e4a-ce4b-11e9-89c8-02b4bde53ce6"},"spec":{"template":{"values":{"replicaCount":3}}}} curl -X DELETE -H "Content-Type: application/json" localhost:8080/apis/shipper.booking.com/v1/namespaces/default/applications/super-server -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' # same oc edit application.shipper.booking.com/super-server # remove above my-any-finalizer oc get application.shipper.booking.com/super-server Get: Error from server (NotFound): applications.shipper.booking.com "super-server" not found
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922