Description of problem:
hostpath-provisioner - PV doesn't get removed after deleting DV (when attempting to run out of space)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Attempt to run out of space by creating DataVolumes on a specific node.
(import an image from HTTP, image type: qcow2.xz/gz)
2. When space runs out, the importer pod gets stuck in "Pending" status.
oc get pods -w
3. Attempt to delete the DataVolume which should cause space to run out:
oc delete dv <DV_NAME>
4. Check that associated PV is removed:
oc get pv
Associated PV isn't removed (PVC is).
DataVolume is deleted successfully with its PVC and PV.
Storage type used: hostpath-provisioner
1) Also tested this with qcow2 images, getting inconsistent behavior:
The importer pod was stuck in "Pending", but no PV was created.
Deleted the DV successfully.
After that, tried to create a 4Gi DV (assuming there is enough space now) - importer pod gets stuck in "Pending" status.
The importer pod was stuck in "Pending", and PV was created.
Deleted the DV successfully - PVC, PV deleted too.
Proceeded to create a new small DV - importer pod gets stuck in “Pending”.
Same behavior as qcow2.xz/gz
2) After deleting the DV, I attempted to create a new small DV on the same node.
The importer pod for this process is stuck in "Pending".
Eventually (~25 minutes) the node status changes to "NotReady".
oc get nodes output:
[cnv-qe-jenkins@cnv-executor-alex22 cnv2329]$ oc get nodes
NAME STATUS ROLES AGE VERSION
host-172-16-0-23 Ready master 6d1h v1.16.2
host-172-16-0-25 Ready master 6d1h v1.16.2
host-172-16-0-26 Ready master 6d1h v1.16.2
host-172-16-0-27 Ready worker 6d1h v1.16.2
host-172-16-0-40 NotReady worker 6d1h v1.16.2
host-172-16-0-57 Ready worker 6d1h v1.16.2
We should have a release note once we have a workaround for this.
Adam, could you provide the proper workaround when this bug happens?
Alexander, can you attempt to reproduce this and determine the least invasive workaround for when this issue occurs on a CNV 2.2 installation?
So the problem is that when we use /var/hpvolumes as the path, we are using the same partition as where the OS is running. So basically we are leaving the OS no temporary space that it uses for various things, and thus the entire node goes down. Once that happens the kubelet is dead, and nothing you do from a cluster perspective will cure it. You will have manually go into the node and free up some space so the OS and thus the kubelet can recover, after which everything should start working again.
Solution: Make a separate partition for your storage needs that does not share storage with the OS.
Opened a PR against the hpp, to have a warning in the README about not sharing the storage with the OS.
Alexander confirmed the workaround is to access the node and free up storage. I believe the next step is to engage the Documentation team to update the product documentation with a recommendation similar to the one given in https://github.com/kubevirt/hostpath-provisioner/pull/39.
according to comment #6, moving this bug to doc team. And the cloned bug 1794050 in Storage component to fix the real issue in future release.
Warning admonition added to two modules, after the prerequisite steps for 'Create a backing partition...`
Sorry, there's no preview build, but all affected content is in the following (not-updated) section:
Note, this change is only targeted to CNV 2.2, as bz#1794050 is fixing this for 2.3
Andrew, docs PR looks good to me, merge please. Thanks.
This update was included in the most recent 4.3 build. It can be viewed here: