2230928 – ocs-client operator sets hardcoded quota per consumer to 1TB

Bug 2230928 - ocs-client operator sets hardcoded quota per consumer to 1TB

Summary: ocs-client operator sets hardcoded quota per consumer to 1TB

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-client-operator
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Ritesh Chikatwar
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:	2211866 2212333
Blocks:	2212330 2230924
TreeView+	depends on / blocked

Reported:	2023-08-10 09:30 UTC by Ritesh Chikatwar
Modified:	2023-11-02 08:01 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2212333
Environment:
Last Closed:	2023-10-12 05:32:24 UTC
Embargoed:

Attachments	(Terms of Use)

Description Ritesh Chikatwar 2023-08-10 09:30:14 UTC

+++ This bug was initially created as a clone of Bug #2212333 +++

+++ This bug was initially created as a clone of Bug #2211866 +++

Description of problem:
=================================
I have a 4TiB cluster in Fusion aaS(agent based install)  to which I have attached a single consumer cluster. Now, when I am running tests creating a dataset of 1.2 TiB from that consunmer, my test fails.. Here are my observations:

>> 1. ceph health shows the blockpool quota is full

ceph health detail
HEALTH_WARN 1 pool(s) full
[WRN] POOL_FULL: 1 pool(s) full
    pool 'cephblockpool-storageconsumer-54294405-cfae-4867-810d-1ff7290acf83-b5b8eee9' is full (running out of quota)

>> 2.On checking the storageconsumer, I see we have granted capacity of 1TIB... Wasn't it supposed to be unlimited (approx 1PB)???

$oc get storageconsumer -n fusion-storage -o yaml
 spec:
>>    capacity: 1T
    enable: true
  status:
    cephResources:
    - kind: CephClient
      name: 74b7f702286c4ecf6c62197982adedfd
      status: Ready
>>  grantedCapacity: 1T
    lastHeartbeat: "2023-06-02T10:39:04Z"
    state: Ready


The quote per consumer was removed even from deployer based installs of current RH ODF MS, hence the ocs-client operator setting it to default of 1TB is incorrect since the rest of the capacity of provider is unutilized. Considering the same fo 20TB cluster, each consumer bein able to use only 1 TB is not as per expected behavior.


Version-Release number of selected component (if applicable):
===============================================================
Consumer
========

 oc get csv -n fusion-storage
NAME                                      DISPLAY                            VERSION           REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent               2.0.11                                                      Succeeded
observability-operator.v0.0.21            Observability Operator             0.0.21            observability-operator.v0.0.20            Succeeded
ocs-client-operator.v4.12.3-rhodf         OpenShift Data Foundation Client   4.12.3-rhodf                                                Succeeded
odf-csi-addons-operator.v4.12.3-rhodf     CSI Addons                         4.12.3-rhodf      odf-csi-addons-operator.v4.12.2-rhodf     Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator                4.10.0                                                      Succeeded

provider
============
oc get csv -n fusion-storage
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent          2.0.11                                                      Succeeded
observability-operator.v0.0.21            Observability Operator        0.0.21            observability-operator.v0.0.20            Succeeded
ocs-operator.v4.12.3-rhodf                OpenShift Container Storage   4.12.3-rhodf      ocs-operator.v4.12.2-rhodf                Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0                                                      Succeeded
route-monitor-operator.v0.1.500-6152b76   Route Monitor Operator        0.1.500-6152b76   route-monitor-operator.v0.1.498-e33e391   Succeeded



How reproducible:
=====================
Always


Steps to Reproduce:
======================
1.Create a provider consumer cluster in Fusion aaS following the document [1]

[1] https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit#

2. Check the requestedCapacity is incorrcectly set to 1TB fo the storageconsumer via the ocs-client-operator
3.

Actual results:
==================
Each consumer)ocs-storageclient) is able to use only 1TB of the provider usable space 

Expected results:
===================
No quota should be set per consumer/ocs-storage-client

Additional info:
=======================

apiVersion: misf.ibm.com/v1alpha1
kind: ManagedFusionOffering
metadata:
 name: managedfusionoffering-sample
 namespace: fusion-storage
spec:
 kind: DFC
 release: "4.12"
 config: |
   onboardingTicket: <ticket>
   providerEndpoint: XXXXX:31659



provider

 cluster:
    id:     3aad2a98-c8ff-433a-86e8-f78f1bdb98be
    health: HEALTH_WARN
            1 pool(s) full
 
  services:
    mon: 3 daemons, quorum a,b,c (age 30h)
    mgr: a(active, since 30h)
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 30h), 3 in (since 30h)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 577 pgs
    objects: 239.46k objects, 934 GiB
    usage:   2.7 TiB used, 9.3 TiB / 12 TiB avail
    pgs:     577 active+clean
 
  io:
    client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr


ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE    RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
 2    ssd  4.00000   1.00000   4 TiB  934 GiB  932 GiB   39 KiB  1.9 GiB  3.1 TiB  22.81  1.00  577      up
 0    ssd  4.00000   1.00000   4 TiB  934 GiB  932 GiB   39 KiB  1.9 GiB  3.1 TiB  22.81  1.00  577      up
 1    ssd  4.00000   1.00000   4 TiB  934 GiB  932 GiB   39 KiB  1.9 GiB  3.1 TiB  22.81  1.00  577      up
                       TOTAL  12 TiB  2.7 TiB  2.7 TiB  118 KiB  5.7 GiB  9.3 TiB  22.81                   
MIN/MAX VAR: 1.00/1.00  STDDEV: 0


$
ceph df
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    12 TiB  9.3 TiB  2.7 TiB   2.7 TiB      22.81
TOTAL  12 TiB  9.3 TiB  2.7 TiB   2.7 TiB      22.81
 
--- POOLS ---
POOL                                                                         ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics                                                         1    1   15 KiB        6   46 KiB      0    2.5 TiB
ocs-storagecluster-cephfilesystem-metadata                                    2   32   16 KiB       22  131 KiB      0    2.5 TiB
ocs-storagecluster-cephfilesystem-ssd                                         3  512      0 B        0      0 B      0    2.5 TiB
cephblockpool-storageconsumer-54294405-cfae-4867-810d-1ff7290acf83-b5b8eee9   4   32  932 GiB  239.43k  2.7 TiB  26.79    2.5 TiB

--- Additional comment from Shekhar Berry on 2023-06-02 11:50:27 UTC ---

Must-gather not added as it is a known bug to the dev engineering team.

https://github.com/red-hat-storage/ocs-client-operator/blob/main/controllers/storageclient_controller.go#L272-L279

 Let me know otherwise

--- Additional comment from RHEL Program Management on 2023-06-02 13:09:06 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-06-02 13:09:06 UTC ---

Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP.

--- Additional comment from RHEL Program Management on 2023-06-05 10:32:33 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-06-05 10:32:33 UTC ---

Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP.

--- Additional comment from Jilju Joy on 2023-06-07 07:09:03 UTC ---

Logs(from a differnt cluster):
Provider: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jn6-pr/jijoy-jn6-pr_20230607T030415/logs/failed_testcase_ocs_logs_1686119226/test_deployment_ocs_logs/jijoy-jn6-pr/
Consumer: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jn6-c1/jijoy-jn6-c1_20230607T030410/logs/testcases_1686119185/jijoy-jn6-c1/

--- Additional comment from RHEL Program Management on 2023-06-28 13:25:49 UTC ---

This BZ is being approved for an ODF 4.12.z z-stream update, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.12.z', and having been marked for an approved z-stream update

--- Additional comment from RHEL Program Management on 2023-06-28 13:25:49 UTC ---

Since this bug has been approved for ODF 4.12.5 release, through release flag 'odf-4.12.z+', and appropriate update number entry at the 'Internal Whiteboard', the Target Release is being set to 'ODF 4.12.5'

--- Additional comment from Sunil Kumar Acharya on 2023-07-05 03:51:55 UTC ---

Please backport the fix to ODF-4.12 and update the RDT flag/text appropriately.

--- Additional comment from Ritesh Chikatwar on 2023-07-06 09:29:43 UTC ---

Still, the Development of the bug is in progress hence moving the state of the bug to assigned.

--- Additional comment from Red Hat Bugzilla on 2023-08-03 08:29:58 UTC ---

Account disabled by LDAP Audit

Note You need to log in before you can comment on or make changes to this bug.