+++ This bug was initially created as a clone of Bug #1793132 +++
Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)
2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w
3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>
4. Check that associated PV is removed:
oc get pv
Associated PV isn't removed (PVC is).
DataVolume is deleted successfully with its PVC and PV.
Storage type used: hostpath-provisioner
1) Also tested this with qcow2 images, getting inconsistent behavior:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Same behavior as qcow2.xz/gz
2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:
[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME STATUS ROLES AGE VERSION
host-172-16-0-23 Ready master 6d1h v1.16.2
host-172-16-0-25 Ready master 6d1h v1.16.2
host-172-16-0-26 Ready master 6d1h v1.16.2
host-172-16-0-27 Ready worker 6d1h v1.16.2
host-172-16-0-40 NotReady worker 6d1h v1.16.2
host-172-16-0-57 Ready worker 6d1h v1.16.2
--- Additional comment from Natalie Gavrielov on 2020-01-22 13:23:56 UTC ---
We should have a release note once we have a workaround for this.
This should be fixed when 1796342 is resolved.
I suspect what is happening is that I/O is still being written back to the underlying storage and therefore the file cannot be removed. See https://bugzilla.redhat.com/show_bug.cgi?id=1796342 for details.
Verified on CNV 2.3, OCP 4.4:
Following the instructions in docs - https://github.com/openshift/openshift-docs/pull/19846/files
Node doesn't go down, pod no longer gets stuck in "Pending", instead it is "Running" and log shows
an error about not having enough space.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.