Bug 1794050

Summary: hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)
Product: Container Native Virtualization (CNV) Reporter: Natalie Gavrielov <ngavrilo>
Component: StorageAssignee: Adam Litke <alitke>
Status: CLOSED ERRATA QA Contact: Alex Kalenyuk <akalenyu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.2.0CC: akalenyu, alitke, cnv-qe-bugs, ngavrilo, ycui
Target Milestone: ---   
Target Release: 2.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virt-cdi-operator-container-v2.3.0-32 hco-bundle-registry-container-v2.2.0-353 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1793132 Environment:
Last Closed: 2020-05-04 19:10:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1793132, 1796342    
Bug Blocks:    

Description Natalie Gavrielov 2020-01-22 14:56:58 UTC
+++ This bug was initially created as a clone of Bug #1793132 +++

Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Version-Release number of selected component (if applicable):
CNV 2.2

How reproducible:
100%

Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)

2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w

3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>

4. Check that associated PV is removed:
oc get pv

Actual results:
Associated PV isn't removed (PVC is).

Expected results:
DataVolume is deleted successfully with its PVC and PV.

Additional info:
Storage type used: hostpath-provisioner

1) Also tested this with qcow2 images, getting inconsistent behavior:
Run #1:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
Run #2:
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Run #3:
Same behavior as qcow2.xz/gz

2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:

[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME               STATUS     ROLES    AGE    VERSION
host-172-16-0-23   Ready      master   6d1h   v1.16.2
host-172-16-0-25   Ready      master   6d1h   v1.16.2
host-172-16-0-26   Ready      master   6d1h   v1.16.2
host-172-16-0-27   Ready      worker   6d1h   v1.16.2
host-172-16-0-40   NotReady   worker   6d1h   v1.16.2
host-172-16-0-57   Ready      worker   6d1h   v1.16.2

yamls:

dv.yaml:
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata: 
  annotations: 
    kubevirt.io/provisionOnNode: host-172-16-0-40
  name: dv-test-20g-qcow2-xz-1
spec: 
  pvc: 
    accessModes: 
      - ReadWriteOnce
    resources: 
      requests: 
        storage: 45Gi
    storageClassName: hostpath-provisioner
  source: 
    http: 
      url: "http://<YOUR_SERVER>/<YOUR_IMAGE>.qcow2.xz"

--- Additional comment from Natalie Gavrielov on 2020-01-22 13:23:56 UTC ---

We should have a release note once we have a workaround for this.

Comment 1 Adam Litke 2020-02-06 20:29:57 UTC
This should be fixed when 1796342 is resolved.

Comment 2 Adam Litke 2020-02-06 20:31:49 UTC
I suspect what is happening is that I/O is still being written back to the underlying storage and therefore the file cannot be removed.  See https://bugzilla.redhat.com/show_bug.cgi?id=1796342 for details.

Comment 3 Alex Kalenyuk 2020-03-03 16:24:15 UTC
Verified on CNV 2.3, OCP 4.4:
Following the instructions in docs - https://github.com/openshift/openshift-docs/pull/19846/files
Node doesn't go down, pod no longer gets stuck in "Pending", instead it is "Running" and log shows 
an error about not having enough space.

Comment 6 errata-xmlrpc 2020-05-04 19:10:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011