2118273 – HPP CR cleanup jobs can't complete when hpp-pool mount wasn't successful

Bug 2118273 - HPP CR cleanup jobs can't complete when hpp-pool mount wasn't successful

Summary: HPP CR cleanup jobs can't complete when hpp-pool mount wasn't successful

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.13.0
Assignee:	Alexander Wels
QA Contact:	Jenia Peimer
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2121091
TreeView+	depends on / blocked

Reported:	2022-08-15 10:21 UTC by Jenia Peimer
Modified:	2023-02-13 21:27 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2121091 (view as bug list)
Environment:
Last Closed:	2023-02-13 21:27:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-20456	0	None	None	None	2022-11-16 14:23:08 UTC

Description Jenia Peimer 2022-08-15 10:21:41 UTC

Description of problem:
We installed HPP CR with pvcTemplate based on CEPH storage class. CEPH got CriticallyFull and some hpp-pool pods couldn't create. 

Version-Release number of selected component (if applicable):
4.12, 4.11, 4.10

How reproducible:
Only when there's a problem with the underlying storage class

Steps to Reproduce:
1. Create HPP CR with pvcTemplate based on CEPH
2. Use all CEPH storage
3. Delete HPP CR

Actual results:
$ oc get pods -A | grep hpp
openshift-cnv    hpp-pool-29ab9406-85bc665cdb-wqz7j   1/1  Running  0  46h
openshift-cnv    hpp-pool-4356e54b-7ccf5c44d-95tkr    1/1  Running  0  46h
openshift-cnv    hpp-pool-7dfd761c-6ffd959c85-tfqds   0/1  ContainerCreating  0 19m

$ oc delete hostpathprovisioner hostpath-provisioner
hostpathprovisioner.hostpathprovisioner.kubevirt.io "hostpath-provisioner" deleted
(STUCK)

$ oc get jobs -n openshift-cnv 
NAME                    COMPLETIONS   DURATION   AGE
cleanup-pool-4dd1b8bf   0/1           13m        13m
cleanup-pool-d1954b6a   0/1           13m        13m
cleanup-pool-edb68ab8   1/1           6s         13m


Expected results:
HPP CR deleted


Additional info:
As a W/A: delete the cleanup pods manually, they will be recreated and they will complete successfully.


HPP CR:
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  name: hostpath-provisioner
spec:
  imagePullPolicy: IfNotPresent
  storagePools: 
    - name: hpp-csi-local-basic
      path: "/var/hpp-csi-local-basic"
    - name: hpp-csi-pvc-block
      pvcTemplate: 
        volumeMode: Block
        storageClassName: ocs-storagecluster-ceph-rbd
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
      path: "/var/hpp-csi-pvc-block"
  workload:
    nodeSelector:
      kubernetes.io/os: linux

Comment 1 Yan Du 2022-09-28 12:15:38 UTC

Alexander, could you please take a look?

Comment 2 Yan Du 2022-09-28 13:57:11 UTC

please ignore, made wrong change

Comment 3 Yan Du 2022-11-16 13:35:29 UTC

we will only fix the bug on 4.11.z and 4.12.0

Comment 4 Adam Litke 2022-11-23 18:33:52 UTC

Alexander please triage.

Comment 5 Alexander Wels 2022-11-23 19:57:36 UTC

This has been reproduced when the storage used in the pvcTemplate had an issue and we tried to clean it up. I would say this is pretty low priority and even if it does happen, you can clean up manually, it is just annoying that the automatic cleanup failed.

Comment 7 Adam Litke 2023-01-18 14:22:29 UTC

Retargeting to 4.13.  This is a relatively low severity issue for an unlikely use case (running HPP on top of ceph).  It would be nice to fix stuck cleanup jobs but it's not necessary to backport to such a fix to older versions.

Comment 8 Jenia Peimer 2023-02-13 21:27:55 UTC

Based on relatively low severity and possible manual W/A - closing the bug.

Note You need to log in before you can comment on or make changes to this bug.