Bug 1794050 - hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)
Summary: hostpath-provisioner - PV doesn't get removed after deleting DV (when attempt...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2.3.0
Assignee: Adam Litke
QA Contact: Alex Kalenyuk
URL:
Whiteboard:
Depends On: 1793132 1796342
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-22 14:56 UTC by Natalie Gavrielov
Modified: 2020-05-04 19:10 UTC (History)
5 users (show)

Fixed In Version: virt-cdi-operator-container-v2.3.0-32 hco-bundle-registry-container-v2.2.0-353
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1793132
Environment:
Last Closed: 2020-05-04 19:10:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:10:48 UTC

Description Natalie Gavrielov 2020-01-22 14:56:58 UTC
+++ This bug was initially created as a clone of Bug #1793132 +++

Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Version-Release number of selected component (if applicable):
CNV 2.2

How reproducible:
100%

Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)

2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w

3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>

4. Check that associated PV is removed:
oc get pv

Actual results:
Associated PV isn't removed (PVC is).

Expected results:
DataVolume is deleted successfully with its PVC and PV.

Additional info:
Storage type used: hostpath-provisioner

1) Also tested this with qcow2 images, getting inconsistent behavior:
Run #1:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
Run #2:
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Run #3:
Same behavior as qcow2.xz/gz

2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:

[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME               STATUS     ROLES    AGE    VERSION
host-172-16-0-23   Ready      master   6d1h   v1.16.2
host-172-16-0-25   Ready      master   6d1h   v1.16.2
host-172-16-0-26   Ready      master   6d1h   v1.16.2
host-172-16-0-27   Ready      worker   6d1h   v1.16.2
host-172-16-0-40   NotReady   worker   6d1h   v1.16.2
host-172-16-0-57   Ready      worker   6d1h   v1.16.2

yamls:

dv.yaml:
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata: 
  annotations: 
    kubevirt.io/provisionOnNode: host-172-16-0-40
  name: dv-test-20g-qcow2-xz-1
spec: 
  pvc: 
    accessModes: 
      - ReadWriteOnce
    resources: 
      requests: 
        storage: 45Gi
    storageClassName: hostpath-provisioner
  source: 
    http: 
      url: "http://<YOUR_SERVER>/<YOUR_IMAGE>.qcow2.xz"

--- Additional comment from Natalie Gavrielov on 2020-01-22 13:23:56 UTC ---

We should have a release note once we have a workaround for this.

Comment 1 Adam Litke 2020-02-06 20:29:57 UTC
This should be fixed when 1796342 is resolved.

Comment 2 Adam Litke 2020-02-06 20:31:49 UTC
I suspect what is happening is that I/O is still being written back to the underlying storage and therefore the file cannot be removed.  See https://bugzilla.redhat.com/show_bug.cgi?id=1796342 for details.

Comment 3 Alex Kalenyuk 2020-03-03 16:24:15 UTC
Verified on CNV 2.3, OCP 4.4:
Following the instructions in docs - https://github.com/openshift/openshift-docs/pull/19846/files
Node doesn't go down, pod no longer gets stuck in "Pending", instead it is "Running" and log shows 
an error about not having enough space.

Comment 6 errata-xmlrpc 2020-05-04 19:10:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011


Note You need to log in before you can comment on or make changes to this bug.