1793132 – hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Bug 1793132 - hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Summary: hostpath-provisioner - PV doesn't get removed after deleting DV (when attempt...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Documentation
Sub Component:
Version:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2.2.0
Assignee:	Andrew Burden
QA Contact:	Irina Gulina
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1794050
TreeView+	depends on / blocked

Reported:	2020-01-20 18:13 UTC by Alex Kalenyuk
Modified:	2020-03-04 10:53 UTC (History)
CC List:	8 users (show)
Fixed In Version:	2.2.0
Doc Type:	Enhancement
Doc Text:	Feature: HPP Reason: see comment 4 Result: see comment 5
Clone Of:
Clones:	1794050 (view as bug list)
Environment:
Last Closed:	2020-03-04 10:53:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Alex Kalenyuk 2020-01-20 18:13:23 UTC

Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Version-Release number of selected component (if applicable):
CNV 2.2

How reproducible:
100%

Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)

2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w

3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>

4. Check that associated PV is removed:
oc get pv

Actual results:
Associated PV isn't removed (PVC is).

Expected results:
DataVolume is deleted successfully with its PVC and PV.

Additional info:
Storage type used: hostpath-provisioner

1) Also tested this with qcow2 images, getting inconsistent behavior:
Run #1:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
Run #2:
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Run #3:
Same behavior as qcow2.xz/gz

2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:

[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME               STATUS     ROLES    AGE    VERSION
host-172-16-0-23   Ready      master   6d1h   v1.16.2
host-172-16-0-25   Ready      master   6d1h   v1.16.2
host-172-16-0-26   Ready      master   6d1h   v1.16.2
host-172-16-0-27   Ready      worker   6d1h   v1.16.2
host-172-16-0-40   NotReady   worker   6d1h   v1.16.2
host-172-16-0-57   Ready      worker   6d1h   v1.16.2

yamls:

dv.yaml:
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata: 
  annotations: 
    kubevirt.io/provisionOnNode: host-172-16-0-40
  name: dv-test-20g-qcow2-xz-1
spec: 
  pvc: 
    accessModes: 
      - ReadWriteOnce
    resources: 
      requests: 
        storage: 45Gi
    storageClassName: hostpath-provisioner
  source: 
    http: 
      url: "http://<YOUR_SERVER>/<YOUR_IMAGE>.qcow2.xz"

Comment 1 Natalie Gavrielov 2020-01-22 13:23:56 UTC

We should have a release note once we have a workaround for this.

Comment 2 Ying Cui 2020-01-22 15:05:28 UTC

Adam, could you provide the proper workaround when this bug happens?

Comment 3 Adam Litke 2020-01-22 21:26:35 UTC

Alexander, can you attempt to reproduce this and determine the least invasive workaround for when this issue occurs on a CNV 2.2 installation?

Comment 4 Alexander Wels 2020-01-23 20:18:27 UTC

So the problem is that when we use /var/hpvolumes as the path, we are using the same partition as where the OS is running. So basically we are leaving the OS no temporary space that it uses for various things, and thus the entire node goes down. Once that happens the kubelet is dead, and nothing you do from a cluster perspective will cure it. You will have manually go into the node and free up some space so the OS and thus the kubelet can recover, after which everything should start working again.

Solution: Make a separate partition for your storage needs that does not share storage with the OS.

Comment 5 Alexander Wels 2020-01-23 20:30:37 UTC

Opened a PR against the hpp, to have a warning in the README about not sharing the storage with the OS.


https://github.com/kubevirt/hostpath-provisioner/pull/39

Comment 6 Adam Litke 2020-01-28 18:43:33 UTC

Alexander confirmed the workaround is to access the node and free up storage.  I believe the next step is to engage the Documentation team to update the product documentation with a recommendation similar to the one given in https://github.com/kubevirt/hostpath-provisioner/pull/39.

Comment 7 Ying Cui 2020-02-03 09:05:09 UTC

according to comment #6,  moving this bug to doc team. And the cloned bug 1794050 in Storage component to fix the real issue in future release.

Comment 8 Andrew Burden 2020-02-20 12:22:40 UTC

Warning admonition added to two modules, after the prerequisite steps for 'Create a backing partition...`

PR: 
https://github.com/openshift/openshift-docs/pull/19846/files

Sorry, there's no preview build, but all affected content is in the following (not-updated) section:
https://docs.openshift.com/container-platform/4.3/cnv/cnv_virtual_machines/cnv_virtual_disks/cnv-configuring-local-storage-for-vms.html

Note, this change is only targeted to CNV 2.2, as bz#1794050 is fixing this for 2.3

Comment 9 Irina Gulina 2020-02-25 14:02:26 UTC

Andrew, docs PR looks good to me, merge please. Thanks.

Comment 10 Andrew Burden 2020-03-04 10:53:59 UTC

This update was included in the most recent 4.3 build. It can be viewed here:
https://docs.openshift.com/container-platform/4.3/cnv/cnv_virtual_machines/cnv_virtual_disks/cnv-configuring-local-storage-for-vms.html

Note You need to log in before you can comment on or make changes to this bug.