DescriptionRitesh Chikatwar
2023-08-10 09:30:14 UTC
+++ This bug was initially created as a clone of Bug #2212333 +++
+++ This bug was initially created as a clone of Bug #2211866 +++
Description of problem:
=================================
I have a 4TiB cluster in Fusion aaS(agent based install) to which I have attached a single consumer cluster. Now, when I am running tests creating a dataset of 1.2 TiB from that consunmer, my test fails.. Here are my observations:
>> 1. ceph health shows the blockpool quota is full
ceph health detail
HEALTH_WARN 1 pool(s) full
[WRN] POOL_FULL: 1 pool(s) full
pool 'cephblockpool-storageconsumer-54294405-cfae-4867-810d-1ff7290acf83-b5b8eee9' is full (running out of quota)
>> 2.On checking the storageconsumer, I see we have granted capacity of 1TIB... Wasn't it supposed to be unlimited (approx 1PB)???
$oc get storageconsumer -n fusion-storage -o yaml
spec:
>> capacity: 1T
enable: true
status:
cephResources:
- kind: CephClient
name: 74b7f702286c4ecf6c62197982adedfd
status: Ready
>> grantedCapacity: 1T
lastHeartbeat: "2023-06-02T10:39:04Z"
state: Ready
The quote per consumer was removed even from deployer based installs of current RH ODF MS, hence the ocs-client operator setting it to default of 1TB is incorrect since the rest of the capacity of provider is unutilized. Considering the same fo 20TB cluster, each consumer bein able to use only 1 TB is not as per expected behavior.
Version-Release number of selected component (if applicable):
===============================================================
Consumer
========
oc get csv -n fusion-storage
NAME DISPLAY VERSION REPLACES PHASE
managed-fusion-agent.v2.0.11 Managed Fusion Agent 2.0.11 Succeeded
observability-operator.v0.0.21 Observability Operator 0.0.21 observability-operator.v0.0.20 Succeeded
ocs-client-operator.v4.12.3-rhodf OpenShift Data Foundation Client 4.12.3-rhodf Succeeded
odf-csi-addons-operator.v4.12.3-rhodf CSI Addons 4.12.3-rhodf odf-csi-addons-operator.v4.12.2-rhodf Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 Succeeded
provider
============
oc get csv -n fusion-storage
NAME DISPLAY VERSION REPLACES PHASE
managed-fusion-agent.v2.0.11 Managed Fusion Agent 2.0.11 Succeeded
observability-operator.v0.0.21 Observability Operator 0.0.21 observability-operator.v0.0.20 Succeeded
ocs-operator.v4.12.3-rhodf OpenShift Container Storage 4.12.3-rhodf ocs-operator.v4.12.2-rhodf Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 Succeeded
route-monitor-operator.v0.1.500-6152b76 Route Monitor Operator 0.1.500-6152b76 route-monitor-operator.v0.1.498-e33e391 Succeeded
How reproducible:
=====================
Always
Steps to Reproduce:
======================
1.Create a provider consumer cluster in Fusion aaS following the document [1]
[1] https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit#
2. Check the requestedCapacity is incorrcectly set to 1TB fo the storageconsumer via the ocs-client-operator
3.
Actual results:
==================
Each consumer)ocs-storageclient) is able to use only 1TB of the provider usable space
Expected results:
===================
No quota should be set per consumer/ocs-storage-client
Additional info:
=======================
apiVersion: misf.ibm.com/v1alpha1
kind: ManagedFusionOffering
metadata:
name: managedfusionoffering-sample
namespace: fusion-storage
spec:
kind: DFC
release: "4.12"
config: |
onboardingTicket: <ticket>
providerEndpoint: XXXXX:31659
provider
cluster:
id: 3aad2a98-c8ff-433a-86e8-f78f1bdb98be
health: HEALTH_WARN
1 pool(s) full
services:
mon: 3 daemons, quorum a,b,c (age 30h)
mgr: a(active, since 30h)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 30h), 3 in (since 30h)
data:
volumes: 1/1 healthy
pools: 4 pools, 577 pgs
objects: 239.46k objects, 934 GiB
usage: 2.7 TiB used, 9.3 TiB / 12 TiB avail
pgs: 577 active+clean
io:
client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 ssd 4.00000 1.00000 4 TiB 934 GiB 932 GiB 39 KiB 1.9 GiB 3.1 TiB 22.81 1.00 577 up
0 ssd 4.00000 1.00000 4 TiB 934 GiB 932 GiB 39 KiB 1.9 GiB 3.1 TiB 22.81 1.00 577 up
1 ssd 4.00000 1.00000 4 TiB 934 GiB 932 GiB 39 KiB 1.9 GiB 3.1 TiB 22.81 1.00 577 up
TOTAL 12 TiB 2.7 TiB 2.7 TiB 118 KiB 5.7 GiB 9.3 TiB 22.81
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
$
ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 12 TiB 9.3 TiB 2.7 TiB 2.7 TiB 22.81
TOTAL 12 TiB 9.3 TiB 2.7 TiB 2.7 TiB 22.81
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 15 KiB 6 46 KiB 0 2.5 TiB
ocs-storagecluster-cephfilesystem-metadata 2 32 16 KiB 22 131 KiB 0 2.5 TiB
ocs-storagecluster-cephfilesystem-ssd 3 512 0 B 0 0 B 0 2.5 TiB
cephblockpool-storageconsumer-54294405-cfae-4867-810d-1ff7290acf83-b5b8eee9 4 32 932 GiB 239.43k 2.7 TiB 26.79 2.5 TiB
--- Additional comment from Shekhar Berry on 2023-06-02 11:50:27 UTC ---
Must-gather not added as it is a known bug to the dev engineering team.
https://github.com/red-hat-storage/ocs-client-operator/blob/main/controllers/storageclient_controller.go#L272-L279
Let me know otherwise
--- Additional comment from RHEL Program Management on 2023-06-02 13:09:06 UTC ---
This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.
--- Additional comment from RHEL Program Management on 2023-06-02 13:09:06 UTC ---
Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP.
--- Additional comment from RHEL Program Management on 2023-06-05 10:32:33 UTC ---
This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.
--- Additional comment from RHEL Program Management on 2023-06-05 10:32:33 UTC ---
Since this bug has severity set to 'urgent', it is being proposed as a blocker for the currently set release flag. Please resolve ASAP.
--- Additional comment from Jilju Joy on 2023-06-07 07:09:03 UTC ---
Logs(from a differnt cluster):
Provider: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jn6-pr/jijoy-jn6-pr_20230607T030415/logs/failed_testcase_ocs_logs_1686119226/test_deployment_ocs_logs/jijoy-jn6-pr/
Consumer: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jn6-c1/jijoy-jn6-c1_20230607T030410/logs/testcases_1686119185/jijoy-jn6-c1/
--- Additional comment from RHEL Program Management on 2023-06-28 13:25:49 UTC ---
This BZ is being approved for an ODF 4.12.z z-stream update, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.12.z', and having been marked for an approved z-stream update
--- Additional comment from RHEL Program Management on 2023-06-28 13:25:49 UTC ---
Since this bug has been approved for ODF 4.12.5 release, through release flag 'odf-4.12.z+', and appropriate update number entry at the 'Internal Whiteboard', the Target Release is being set to 'ODF 4.12.5'
--- Additional comment from Sunil Kumar Acharya on 2023-07-05 03:51:55 UTC ---
Please backport the fix to ODF-4.12 and update the RDT flag/text appropriately.
--- Additional comment from Ritesh Chikatwar on 2023-07-06 09:29:43 UTC ---
Still, the Development of the bug is in progress hence moving the state of the bug to assigned.
--- Additional comment from Red Hat Bugzilla on 2023-08-03 08:29:58 UTC ---
Account disabled by LDAP Audit