Bug 1642530

Summary: Custom Resource is stuck if deleted repeatedly with PropagationPolicy Foreground
Product: OpenShift Container Platform Reporter: Roman Mohr <rmohr>
Component: kube-apiserverAssignee: Lukasz Szaszkiewicz <lszaszki>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, fdeutsch, jokerman, mfojtik, mmccomas, nagrawal, sttts, xtian
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:27:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1640872    

Description Roman Mohr 2018-10-24 15:31:48 UTC
Description of problem:

If we delete our VirtualMachineInstance CRs with the "Foreground" PropagationPolicy once everything works fine. If we send the delete request which looks like this:


```
{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}
```

multiple times, it happens that the DELETE request adds the Foreground finalizer again to the CR after the Garbage Collector already removed it. The Garbabe Collector does not remove it afterwards again and the CR is stuck.


Here the VMI directly after it got deleted:

```
apiVersion: kubevirt.io/v1alpha2
kind: VirtualMachineInstance
metadata:
  creationTimestamp: 2018-10-24T14:34:28Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-10-24T14:48:18Z
  finalizers:
  - foregroundDeleteVirtualMachine
  uid: e99fc418-d799-11e8-a4cd-fa163e0953ea
```

Here the GC logs:

```
I1024 14:48:18.907722       1 graph_builder.go:553] add [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: e99fc418-d799-11e8-a4cd-fa163e0953ea] to the attemptToDelete, because it's waiting for its dependents to be deleted
I1024 14:48:18.908062       1 garbagecollector.go:408] processing item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: e99fc418-d799-11e8-a4cd-fa163e0953ea]
I1024 14:48:18.913571       1 garbagecollector.go:530] remove DeleteDependents finalizer for item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: e99fc418-d799-11e8-a4cd-fa163e0953ea]
```

As you see the "Foreground" finalizer got immediately deleted and only our "foregroundDeleteVirtualMachine" finalizer is still left there. After our controller is done and removes our finalizer the object disappears as expected.

Now with multiple times sending the delete request:

```
apiVersion: kubevirt.io/v1alpha2
kind: VirtualMachineInstance
metadata:
  creationTimestamp: 2018-10-24T14:54:41Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-10-24T14:54:58Z
  finalizers:
  - foregroundDeleteVirtualMachine
  - foregroundDeletion
  uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea
```

here the logs again:

```
I1024 14:54:57.368392       1 graph_builder.go:553] add [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea] to the attemptToDelete, because it's waiting for its dependents to be deleted
I1024 14:54:57.368910       1 garbagecollector.go:408] processing item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea]
I1024 14:54:57.388385       1 garbagecollector.go:530] remove DeleteDependents finalizer for item [kubevirt.io/v1alpha2/VirtualMachineInstance, namespace: default, name: fedora, uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea]
```

The gc reports that it removed the finalizer but we added it with a follow-up DELETE request again and it stays tere. After our controller is done and removes its own finalizer, we still see the Foreground finalizer on the object:

```
apiVersion: kubevirt.io/v1alpha2
kind: VirtualMachineInstance
metadata:
  creationTimestamp: 2018-10-24T14:54:41Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-10-24T14:54:58Z
  finalizers:
  - foregroundDeletion
  uid: bca97fd7-d79c-11e8-a4cd-fa163e0953ea

```

The conclusion here is, that rightfully the gc immediately removes the finalizer, because we don't set an owner reference on the pod. If we do the delete request again, the finalizer gets re-added, but the gc does not process the item again.

Version-Release number of selected component (if applicable):

oc version
oc v3.10.0+dd10d17
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://cnv-executor-gouyang-master1.example.com:8443
openshift v3.11.16
kubernetes v1.11.0+d4cacc0


How reproducible:


Steps to Reproduce:
1. Create an arbitrary CRD
2. Create an arbitrary CR of that CRD
3. Set a custom finalizer
4. Do a foregreound delete
5. Wait until the foregreoundDelete finalizer disappears
6. Do another foreground delete
7. The finalizer will be back
8. Remove the custom finalizer


Actual results:

The gc will not remove the finalizer again and the CR is stuck

Expected results:

The garbage collector should remove the foreground finalizer as often as necessary.

Additional info:

Comment 1 Fabian Deutsch 2018-11-05 10:58:28 UTC
Ping?

Comment 2 Lukasz Szaszkiewicz 2019-08-05 11:12:43 UTC
I reproduced the issue on 4.2 as well. I created a PR (hasn't merged) with a potential fix https://github.com/kubernetes/kubernetes/pull/80895.

Comment 3 Lukasz Szaszkiewicz 2019-08-14 11:31:02 UTC
since the issue has been fixed upstream - https://github.com/kubernetes/kubernetes/pull/81081 we can start backporting it.
@sttts how far back do we want to backport the fix?

Comment 5 Xingxing Xia 2019-09-03 14:01:52 UTC
Verified in 4.2.0-0.nightly-2019-09-01-224700
oc create -f - << EOF
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: applications.shipper.booking.com
spec:
  conversion:
    strategy: None
  group: shipper.booking.com
  names:
    kind: Application
    listKind: ApplicationList
    plural: applications
    shortNames:
    - app
    singular: application
  scope: Namespaced
  validation:
    openAPIV3Schema:
      properties:
        spec:
          properties:
            template:
              properties:
                values:
                  type: object
              required:
              - values
              type: object
          required:
          - template
          type: object
  version: v1alpha1
  versions:
  - name: v1alpha1
    served: true
    storage: true
  - name: v1
    served: true
    storage: false
EOF

oc create -n default -f - << EOF
apiVersion: shipper.booking.com/v1
kind: Application
metadata:
  name: super-server
spec:
  template:
    values:
      replicaCount: 3
EOF

oc edit application.shipper.booking.com/super-server -n default
metadata:
  finalizers:
  - my-any-finalizer
...

oc proxy --port=8080 &

curl -X DELETE -H "Content-Type: application/json" localhost:8080/apis/shipper.booking.com/v1/namespaces/default/applications/super-server -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}'
Get:
{"apiVersion":"shipper.booking.com/v1","kind":"Application","metadata":{"creationTimestamp":"2019-09-03T13:07:58Z","deletionGracePeriodSeconds":0,"deletionTimestamp":"2019-09-03T13:36:03Z","finalizers":["my-any-finalizer","foregroundDeletion"],"generation":3,"name":"super-server","namespace":"default","resourceVersion":"711481","selfLink":"/apis/shipper.booking.com/v1/namespaces/default/applications/super-server","uid":"da0d4e4a-ce4b-11e9-89c8-02b4bde53ce6"},"spec":{"template":{"values":{"replicaCount":3}}}}

curl -X DELETE -H "Content-Type: application/json" localhost:8080/apis/shipper.booking.com/v1/namespaces/default/applications/super-server -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' # same

oc edit application.shipper.booking.com/super-server # remove above my-any-finalizer

oc get application.shipper.booking.com/super-server
Get: Error from server (NotFound): applications.shipper.booking.com "super-server" not found

Comment 7 errata-xmlrpc 2019-10-16 06:27:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922