Description of problem: We installed HPP CR with pvcTemplate based on CEPH storage class. CEPH got CriticallyFull and some hpp-pool pods couldn't create. Version-Release number of selected component (if applicable): 4.12, 4.11, 4.10 How reproducible: Only when there's a problem with the underlying storage class Steps to Reproduce: 1. Create HPP CR with pvcTemplate based on CEPH 2. Use all CEPH storage 3. Delete HPP CR Actual results: $ oc get pods -A | grep hpp openshift-cnv hpp-pool-29ab9406-85bc665cdb-wqz7j 1/1 Running 0 46h openshift-cnv hpp-pool-4356e54b-7ccf5c44d-95tkr 1/1 Running 0 46h openshift-cnv hpp-pool-7dfd761c-6ffd959c85-tfqds 0/1 ContainerCreating 0 19m $ oc delete hostpathprovisioner hostpath-provisioner hostpathprovisioner.hostpathprovisioner.kubevirt.io "hostpath-provisioner" deleted (STUCK) $ oc get jobs -n openshift-cnv NAME COMPLETIONS DURATION AGE cleanup-pool-4dd1b8bf 0/1 13m 13m cleanup-pool-d1954b6a 0/1 13m 13m cleanup-pool-edb68ab8 1/1 6s 13m Expected results: HPP CR deleted Additional info: As a W/A: delete the cleanup pods manually, they will be recreated and they will complete successfully. HPP CR: apiVersion: hostpathprovisioner.kubevirt.io/v1beta1 kind: HostPathProvisioner metadata: name: hostpath-provisioner spec: imagePullPolicy: IfNotPresent storagePools: - name: hpp-csi-local-basic path: "/var/hpp-csi-local-basic" - name: hpp-csi-pvc-block pvcTemplate: volumeMode: Block storageClassName: ocs-storagecluster-ceph-rbd accessModes: - ReadWriteOnce resources: requests: storage: 100Gi path: "/var/hpp-csi-pvc-block" workload: nodeSelector: kubernetes.io/os: linux
Alexander, could you please take a look?
please ignore, made wrong change
we will only fix the bug on 4.11.z and 4.12.0
Alexander please triage.
This has been reproduced when the storage used in the pvcTemplate had an issue and we tried to clean it up. I would say this is pretty low priority and even if it does happen, you can clean up manually, it is just annoying that the automatic cleanup failed.
Retargeting to 4.13. This is a relatively low severity issue for an unlikely use case (running HPP on top of ceph). It would be nice to fix stuck cleanup jobs but it's not necessary to backport to such a fix to older versions.
Based on relatively low severity and possible manual W/A - closing the bug.