1794050 – hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Bug 1794050 - hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Summary: hostpath-provisioner - PV doesn't get removed after deleting DV (when attempt...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	2.3.0
Assignee:	Adam Litke
QA Contact:	Alex Kalenyuk
Docs Contact:
URL:
Whiteboard:
Depends On:	1793132 1796342
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-22 14:56 UTC by Natalie Gavrielov
Modified:	2020-05-04 19:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:	virt-cdi-operator-container-v2.3.0-32 hco-bundle-registry-container-v2.2.0-353
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1793132
Environment:
Last Closed:	2020-05-04 19:10:37 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:2011	0	None	None	None	2020-05-04 19:10:48 UTC

Description Natalie Gavrielov 2020-01-22 14:56:58 UTC

+++ This bug was initially created as a clone of Bug #1793132 +++

Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Version-Release number of selected component (if applicable):
CNV 2.2

How reproducible:
100%

Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)

2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w

3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>

4. Check that associated PV is removed:
oc get pv

Actual results:
Associated PV isn't removed (PVC is).

Expected results:
DataVolume is deleted successfully with its PVC and PV.

Additional info:
Storage type used: hostpath-provisioner

1) Also tested this with qcow2 images, getting inconsistent behavior:
Run #1:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
Run #2:
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Run #3:
Same behavior as qcow2.xz/gz

2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:

[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME               STATUS     ROLES    AGE    VERSION
host-172-16-0-23   Ready      master   6d1h   v1.16.2
host-172-16-0-25   Ready      master   6d1h   v1.16.2
host-172-16-0-26   Ready      master   6d1h   v1.16.2
host-172-16-0-27   Ready      worker   6d1h   v1.16.2
host-172-16-0-40   NotReady   worker   6d1h   v1.16.2
host-172-16-0-57   Ready      worker   6d1h   v1.16.2

yamls:

dv.yaml:
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata: 
  annotations: 
    kubevirt.io/provisionOnNode: host-172-16-0-40
  name: dv-test-20g-qcow2-xz-1
spec: 
  pvc: 
    accessModes: 
      - ReadWriteOnce
    resources: 
      requests: 
        storage: 45Gi
    storageClassName: hostpath-provisioner
  source: 
    http: 
      url: "http://<YOUR_SERVER>/<YOUR_IMAGE>.qcow2.xz"

--- Additional comment from Natalie Gavrielov on 2020-01-22 13:23:56 UTC ---

We should have a release note once we have a workaround for this.

Comment 1 Adam Litke 2020-02-06 20:29:57 UTC

This should be fixed when 1796342 is resolved.

Comment 2 Adam Litke 2020-02-06 20:31:49 UTC

I suspect what is happening is that I/O is still being written back to the underlying storage and therefore the file cannot be removed.  See https://bugzilla.redhat.com/show_bug.cgi?id=1796342 for details.

Comment 3 Alex Kalenyuk 2020-03-03 16:24:15 UTC

Verified on CNV 2.3, OCP 4.4:
Following the instructions in docs - https://github.com/openshift/openshift-docs/pull/19846/files
Node doesn't go down, pod no longer gets stuck in "Pending", instead it is "Running" and log shows 
an error about not having enough space.

Comment 6 errata-xmlrpc 2020-05-04 19:10:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011

Note You need to log in before you can comment on or make changes to this bug.