Bug 2212330 - ocs-client operator sets hardcoded quota per consumer to 1TB
Summary: ocs-client operator sets hardcoded quota per consumer to 1TB
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Ritesh Chikatwar
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On: 2211866 2212333 2230924 2230928
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-05 10:22 UTC by Jilju Joy
Modified: 2024-01-02 10:54 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2211866
Environment:
Last Closed: 2024-01-02 10:54:41 UTC
Embargoed:


Attachments (Terms of Use)

Description Jilju Joy 2023-06-05 10:22:24 UTC
+++ This bug was initially created as a clone of Bug #2211866 +++

Description of problem:
=================================
I have a 4TiB cluster in Fusion aaS(agent based install)  to which I have attached a single consumer cluster. Now, when I am running tests creating a dataset of 1.2 TiB from that consunmer, my test fails.. Here are my observations:

>> 1. ceph health shows the blockpool quota is full

ceph health detail
HEALTH_WARN 1 pool(s) full
[WRN] POOL_FULL: 1 pool(s) full
    pool 'cephblockpool-storageconsumer-54294405-cfae-4867-810d-1ff7290acf83-b5b8eee9' is full (running out of quota)

>> 2.On checking the storageconsumer, I see we have granted capacity of 1TIB... Wasn't it supposed to be unlimited (approx 1PB)???

$oc get storageconsumer -n fusion-storage -o yaml
 spec:
>>    capacity: 1T
    enable: true
  status:
    cephResources:
    - kind: CephClient
      name: 74b7f702286c4ecf6c62197982adedfd
      status: Ready
>>  grantedCapacity: 1T
    lastHeartbeat: "2023-06-02T10:39:04Z"
    state: Ready


The quote per consumer was removed even from deployer based installs of current RH ODF MS, hence the ocs-client operator setting it to default of 1TB is incorrect since the rest of the capacity of provider is unutilized. Considering the same fo 20TB cluster, each consumer bein able to use only 1 TB is not as per expected behavior.


Version-Release number of selected component (if applicable):
===============================================================
Consumer
========

 oc get csv -n fusion-storage
NAME                                      DISPLAY                            VERSION           REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent               2.0.11                                                      Succeeded
observability-operator.v0.0.21            Observability Operator             0.0.21            observability-operator.v0.0.20            Succeeded
ocs-client-operator.v4.12.3-rhodf         OpenShift Data Foundation Client   4.12.3-rhodf                                                Succeeded
odf-csi-addons-operator.v4.12.3-rhodf     CSI Addons                         4.12.3-rhodf      odf-csi-addons-operator.v4.12.2-rhodf     Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator                4.10.0                                                      Succeeded

provider
============
oc get csv -n fusion-storage
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent          2.0.11                                                      Succeeded
observability-operator.v0.0.21            Observability Operator        0.0.21            observability-operator.v0.0.20            Succeeded
ocs-operator.v4.12.3-rhodf                OpenShift Container Storage   4.12.3-rhodf      ocs-operator.v4.12.2-rhodf                Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0                                                      Succeeded
route-monitor-operator.v0.1.500-6152b76   Route Monitor Operator        0.1.500-6152b76   route-monitor-operator.v0.1.498-e33e391   Succeeded



How reproducible:
=====================
Always


Steps to Reproduce:
======================
1.Create a provider consumer cluster in Fusion aaS following the document [1]

[1] https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit#

2. Check the requestedCapacity is incorrcectly set to 1TB fo the storageconsumer via the ocs-client-operator
3.

Actual results:
==================
Each consumer)ocs-storageclient) is able to use only 1TB of the provider usable space 

Expected results:
===================
No quota should be set per consumer/ocs-storage-client

Additional info:
=======================

apiVersion: misf.ibm.com/v1alpha1
kind: ManagedFusionOffering
metadata:
 name: managedfusionoffering-sample
 namespace: fusion-storage
spec:
 kind: DFC
 release: "4.12"
 config: |
   onboardingTicket: <ticket>
   providerEndpoint: XXXXX:31659



provider

 cluster:
    id:     3aad2a98-c8ff-433a-86e8-f78f1bdb98be
    health: HEALTH_WARN
            1 pool(s) full
 
  services:
    mon: 3 daemons, quorum a,b,c (age 30h)
    mgr: a(active, since 30h)
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 30h), 3 in (since 30h)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 577 pgs
    objects: 239.46k objects, 934 GiB
    usage:   2.7 TiB used, 9.3 TiB / 12 TiB avail
    pgs:     577 active+clean
 
  io:
    client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr


ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE    RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
 2    ssd  4.00000   1.00000   4 TiB  934 GiB  932 GiB   39 KiB  1.9 GiB  3.1 TiB  22.81  1.00  577      up
 0    ssd  4.00000   1.00000   4 TiB  934 GiB  932 GiB   39 KiB  1.9 GiB  3.1 TiB  22.81  1.00  577      up
 1    ssd  4.00000   1.00000   4 TiB  934 GiB  932 GiB   39 KiB  1.9 GiB  3.1 TiB  22.81  1.00  577      up
                       TOTAL  12 TiB  2.7 TiB  2.7 TiB  118 KiB  5.7 GiB  9.3 TiB  22.81                   
MIN/MAX VAR: 1.00/1.00  STDDEV: 0


$
ceph df
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    12 TiB  9.3 TiB  2.7 TiB   2.7 TiB      22.81
TOTAL  12 TiB  9.3 TiB  2.7 TiB   2.7 TiB      22.81
 
--- POOLS ---
POOL                                                                         ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics                                                         1    1   15 KiB        6   46 KiB      0    2.5 TiB
ocs-storagecluster-cephfilesystem-metadata                                    2   32   16 KiB       22  131 KiB      0    2.5 TiB
ocs-storagecluster-cephfilesystem-ssd                                         3  512      0 B        0      0 B      0    2.5 TiB
cephblockpool-storageconsumer-54294405-cfae-4867-810d-1ff7290acf83-b5b8eee9   4   32  932 GiB  239.43k  2.7 TiB  26.79    2.5 TiB

--- Additional comment from Shekhar Berry on 2023-06-02 11:50:27 UTC ---

Must-gather not added as it is a known bug to the dev engineering team.

https://github.com/red-hat-storage/ocs-client-operator/blob/main/controllers/storageclient_controller.go#L272-L279

 Let me know otherwise

--- Additional comment from RHEL Program Management on 2023-06-02 13:09:06 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from RHEL Program Management on 2023-06-02 13:09:06 UTC ---

Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP.

Comment 2 Mudit Agarwal 2024-01-02 10:11:27 UTC
Ritesh, is this issue fixed in 4.15 also?

Comment 3 Ritesh Chikatwar 2024-01-02 10:54:41 UTC
This is been fixed in 4.14 and a backport is not required for this one. looks like I missed closing this.
Closure Comment: As we discontinue FaaS development and in the current situation, customers would not see provider/consumer support for 4.12/4.13 so this backport is not needed. Hence Closing the bug.


Note You need to log in before you can comment on or make changes to this bug.