Bug 1793132

Summary:	hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)
Product:	Container Native Virtualization (CNV)	Reporter:	Alex Kalenyuk <akalenyu>
Component:	Documentation	Assignee:	Andrew Burden <aburden>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Irina Gulina <igulina>
Severity:	high	Docs Contact:
Priority:	high
Version:	2.2.0	CC:	aburden, alitke, cnv-qe-bugs, ncredi, ngavrilo, rgarcia, sgordon, ycui
Target Milestone:	---
Target Release:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	2.2.0	Doc Type:	Enhancement
Doc Text:	Feature: HPP Reason: see comment 4 Result: see comment 5	Story Points:	---
Clone Of:
Clones:	1794050 (view as bug list)		Environment:
Last Closed:	2020-03-04 10:53:59 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1794050

Description Alex Kalenyuk 2020-01-20 18:13:23 UTC

Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)

Version-Release number of selected component (if applicable):
CNV 2.2

How reproducible:
100%

Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)

2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w

3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>

4. Check that associated PV is removed:
oc get pv

Actual results:
Associated PV isn't removed (PVC is).

Expected results:
DataVolume is deleted successfully with its PVC and PV.

Additional info:
Storage type used: hostpath-provisioner

1) Also tested this with qcow2 images, getting inconsistent behavior:
Run #1:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
Run #2:
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Run #3:
Same behavior as qcow2.xz/gz

2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:

[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME               STATUS     ROLES    AGE    VERSION
host-172-16-0-23   Ready      master   6d1h   v1.16.2
host-172-16-0-25   Ready      master   6d1h   v1.16.2
host-172-16-0-26   Ready      master   6d1h   v1.16.2
host-172-16-0-27   Ready      worker   6d1h   v1.16.2
host-172-16-0-40   NotReady   worker   6d1h   v1.16.2
host-172-16-0-57   Ready      worker   6d1h   v1.16.2

yamls:

dv.yaml:
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata: 
  annotations: 
    kubevirt.io/provisionOnNode: host-172-16-0-40
  name: dv-test-20g-qcow2-xz-1
spec: 
  pvc: 
    accessModes: 
      - ReadWriteOnce
    resources: 
      requests: 
        storage: 45Gi
    storageClassName: hostpath-provisioner
  source: 
    http: 
      url: "http://<YOUR_SERVER>/<YOUR_IMAGE>.qcow2.xz"

Comment 1 Natalie Gavrielov 2020-01-22 13:23:56 UTC

We should have a release note once we have a workaround for this.

Comment 2 Ying Cui 2020-01-22 15:05:28 UTC

Adam, could you provide the proper workaround when this bug happens?

Comment 3 Adam Litke 2020-01-22 21:26:35 UTC

Alexander, can you attempt to reproduce this and determine the least invasive workaround for when this issue occurs on a CNV 2.2 installation?

Comment 4 Alexander Wels 2020-01-23 20:18:27 UTC

So the problem is that when we use /var/hpvolumes as the path, we are using the same partition as where the OS is running. So basically we are leaving the OS no temporary space that it uses for various things, and thus the entire node goes down. Once that happens the kubelet is dead, and nothing you do from a cluster perspective will cure it. You will have manually go into the node and free up some space so the OS and thus the kubelet can recover, after which everything should start working again.

Solution: Make a separate partition for your storage needs that does not share storage with the OS.

Comment 5 Alexander Wels 2020-01-23 20:30:37 UTC

Opened a PR against the hpp, to have a warning in the README about not sharing the storage with the OS.


https://github.com/kubevirt/hostpath-provisioner/pull/39

Comment 6 Adam Litke 2020-01-28 18:43:33 UTC

Alexander confirmed the workaround is to access the node and free up storage.  I believe the next step is to engage the Documentation team to update the product documentation with a recommendation similar to the one given in https://github.com/kubevirt/hostpath-provisioner/pull/39.

Comment 7 Ying Cui 2020-02-03 09:05:09 UTC

according to comment #6,  moving this bug to doc team. And the cloned bug 1794050 in Storage component to fix the real issue in future release.

Comment 8 Andrew Burden 2020-02-20 12:22:40 UTC

Warning admonition added to two modules, after the prerequisite steps for 'Create a backing partition...`

PR: 
https://github.com/openshift/openshift-docs/pull/19846/files

Sorry, there's no preview build, but all affected content is in the following (not-updated) section:
https://docs.openshift.com/container-platform/4.3/cnv/cnv_virtual_machines/cnv_virtual_disks/cnv-configuring-local-storage-for-vms.html

Note, this change is only targeted to CNV 2.2, as bz#1794050 is fixing this for 2.3

Comment 9 Irina Gulina 2020-02-25 14:02:26 UTC

Andrew, docs PR looks good to me, merge please. Thanks.

Comment 10 Andrew Burden 2020-03-04 10:53:59 UTC

This update was included in the most recent 4.3 build. It can be viewed here:
https://docs.openshift.com/container-platform/4.3/cnv/cnv_virtual_machines/cnv_virtual_disks/cnv-configuring-local-storage-for-vms.html