Bug 2118273 - HPP CR cleanup jobs can't complete when hpp-pool mount wasn't successful
Summary: HPP CR cleanup jobs can't complete when hpp-pool mount wasn't successful
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.10.5
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.13.0
Assignee: Alexander Wels
QA Contact: Jenia Peimer
URL:
Whiteboard:
Depends On:
Blocks: 2121091
TreeView+ depends on / blocked
 
Reported: 2022-08-15 10:21 UTC by Jenia Peimer
Modified: 2023-02-13 21:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2121091 (view as bug list)
Environment:
Last Closed: 2023-02-13 21:27:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-20456 0 None None None 2022-11-16 14:23:08 UTC

Description Jenia Peimer 2022-08-15 10:21:41 UTC
Description of problem:
We installed HPP CR with pvcTemplate based on CEPH storage class. CEPH got CriticallyFull and some hpp-pool pods couldn't create. 

Version-Release number of selected component (if applicable):
4.12, 4.11, 4.10

How reproducible:
Only when there's a problem with the underlying storage class

Steps to Reproduce:
1. Create HPP CR with pvcTemplate based on CEPH
2. Use all CEPH storage
3. Delete HPP CR

Actual results:
$ oc get pods -A | grep hpp
openshift-cnv    hpp-pool-29ab9406-85bc665cdb-wqz7j   1/1  Running  0  46h
openshift-cnv    hpp-pool-4356e54b-7ccf5c44d-95tkr    1/1  Running  0  46h
openshift-cnv    hpp-pool-7dfd761c-6ffd959c85-tfqds   0/1  ContainerCreating  0 19m

$ oc delete hostpathprovisioner hostpath-provisioner
hostpathprovisioner.hostpathprovisioner.kubevirt.io "hostpath-provisioner" deleted
(STUCK)

$ oc get jobs -n openshift-cnv 
NAME                    COMPLETIONS   DURATION   AGE
cleanup-pool-4dd1b8bf   0/1           13m        13m
cleanup-pool-d1954b6a   0/1           13m        13m
cleanup-pool-edb68ab8   1/1           6s         13m


Expected results:
HPP CR deleted


Additional info:
As a W/A: delete the cleanup pods manually, they will be recreated and they will complete successfully.


HPP CR:
apiVersion: hostpathprovisioner.kubevirt.io/v1beta1
kind: HostPathProvisioner
metadata:
  name: hostpath-provisioner
spec:
  imagePullPolicy: IfNotPresent
  storagePools: 
    - name: hpp-csi-local-basic
      path: "/var/hpp-csi-local-basic"
    - name: hpp-csi-pvc-block
      pvcTemplate: 
        volumeMode: Block
        storageClassName: ocs-storagecluster-ceph-rbd
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
      path: "/var/hpp-csi-pvc-block"
  workload:
    nodeSelector:
      kubernetes.io/os: linux

Comment 1 Yan Du 2022-09-28 12:15:38 UTC
Alexander, could you please take a look?

Comment 2 Yan Du 2022-09-28 13:57:11 UTC
please ignore, made wrong change

Comment 3 Yan Du 2022-11-16 13:35:29 UTC
we will only fix the bug on 4.11.z and 4.12.0

Comment 4 Adam Litke 2022-11-23 18:33:52 UTC
Alexander please triage.

Comment 5 Alexander Wels 2022-11-23 19:57:36 UTC
This has been reproduced when the storage used in the pvcTemplate had an issue and we tried to clean it up. I would say this is pretty low priority and even if it does happen, you can clean up manually, it is just annoying that the automatic cleanup failed.

Comment 7 Adam Litke 2023-01-18 14:22:29 UTC
Retargeting to 4.13.  This is a relatively low severity issue for an unlikely use case (running HPP on top of ceph).  It would be nice to fix stuck cleanup jobs but it's not necessary to backport to such a fix to older versions.

Comment 8 Jenia Peimer 2023-02-13 21:27:55 UTC
Based on relatively low severity and possible manual W/A - closing the bug.


Note You need to log in before you can comment on or make changes to this bug.