Bug 2129456
| Summary: | Unable to Expand Storage Cannot Add New OSDs to Existing Hosts/Racks (Local Storage) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Craig Wayman <crwayman> |
| Component: | ceph | Assignee: | Neha Ojha <nojha> |
| ceph sub component: | RADOS | QA Contact: | Elad <ebenahar> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | bhubbard, bmcmurra, bniver, hnallurv, ksirivad, mmuench, muagarwa, nojha, ocs-bugs, odf-bz-bot, pdhiran, sapillai, tnielsen, vumrao |
| Version: | 4.10 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-04-05 19:07:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Looked in the must gather logs. No mention of OSD 27 and OSD 28 anywhere in the logs. Looks like some stale/leftover osds. If no data is associated with these OSDs, then I would suggest purging them. @tnielsen Any different suggestion? Some log entries of interest in the rook operator log. These must be related to OSD 27 and 28, showing that the OSD prepare jobs had been previously run and are now running again and completing. But what this doesn't show is why the OSD daemon pods aren't being created.
2022-09-22T21:07:34.464893032Z 2022-09-22 21:07:34.464803 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-0-data-9z89m4"
2022-09-22T21:07:34.487802419Z 2022-09-22 21:07:34.485598 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-ocs-deviceset-0-data-9z89m4 to start a new one
2022-09-22T21:07:34.499272039Z 2022-09-22 21:07:34.499071 I | op-k8sutil: batch job rook-ceph-osd-prepare-ocs-deviceset-0-data-9z89m4 still exists
2022-09-22T21:07:37.697943877Z 2022-09-22 21:07:37.695202 I | op-osd: started OSD provisioning job for PVC "ocs-deviceset-0-data-9z89m4"
...
2022-09-22T21:07:37.697943877Z 2022-09-22 21:07:37.695307 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-1-data-9zqpmn"
2022-09-22T21:07:37.903936821Z 2022-09-22 21:07:37.889714 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-ocs-deviceset-1-data-9zqpmn to start a new one
2022-09-22T21:07:37.916934008Z 2022-09-22 21:07:37.912401 I | op-k8sutil: batch job rook-ceph-osd-prepare-ocs-deviceset-1-data-9zqpmn still exists
2022-09-22T21:07:40.975107008Z 2022-09-22 21:07:40.946586 I | op-k8sutil: batch job rook-ceph-osd-prepare-ocs-deviceset-1-data-9zqpmn deleted
2022-09-22T21:07:41.041076830Z 2022-09-22 21:07:41.039806 I | op-osd: started OSD provisioning job for PVC "ocs-deviceset-1-data-9zqpmn"
...
2022-09-22T21:07:46.815205610Z 2022-09-22 21:07:46.815058 I | op-osd: OSD orchestration status for node ocs-deviceset-0-data-9z89m4 is "orchestrating"
2022-09-22T21:07:46.815619258Z 2022-09-22 21:07:46.815578 I | op-osd: OSD orchestration status for PVC ocs-deviceset-0-data-9z89m4 is "orchestrating"
2022-09-22T21:07:46.815963586Z 2022-09-22 21:07:46.815941 I | op-osd: OSD orchestration status for PVC ocs-deviceset-0-data-9z89m4 is "completed"
2022-09-22T21:07:46.824453895Z 2022-09-22 21:07:46.824370 I | op-osd: OSD orchestration status for node ocs-deviceset-1-data-9zqpmn is "orchestrating"
2022-09-22T21:07:46.824781145Z 2022-09-22 21:07:46.824760 I | op-osd: OSD orchestration status for PVC ocs-deviceset-1-data-9zqpmn is "orchestrating"
...
2022-09-22T21:07:53.087939312Z 2022-09-22 21:07:53.086125 I | op-osd: OSD orchestration status for PVC ocs-deviceset-1-data-9zqpmn is "completed"
The OSD prepare log for pod rook-ceph-osd-prepare-ocs-deviceset-1-data-9zqpmn-dlpj7 shows that it can't determine the status of the LUKS device:
2022-09-22T21:07:44.972667064Z 2022-09-22 21:07:44.972595 I | cephosd: creating and starting the osds
2022-09-22T21:07:44.972760954Z 2022-09-22 21:07:44.972750 D | cephosd: desiredDevices are [{Name:/mnt/ocs-deviceset-1-data-9zqpmn OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: InitialWeight: IsFilter:false IsDevicePathFilter:false}]
2022-09-22T21:07:44.972779505Z 2022-09-22 21:07:44.972772 D | cephosd: context.Devices are:
2022-09-22T21:07:44.972821487Z 2022-09-22 21:07:44.972811 D | cephosd: &{Name:/mnt/ocs-deviceset-1-data-9zqpmn Parent: HasChildren:false DevLinks:/dev/disk/by-id/scsi-36000c299c61ea642991160eeb1090604 /dev/disk/by-id/scsi-SVMware_Virtual_disk_6000c299c61ea642991160eeb1090604 /dev/disk/by-id/wwn-0x6000c299c61ea642991160eeb1090604 /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:11:0 Size:2199023255552 UUID:8a84ff25-21ae-4017-81b5-8773b4580ca3 Serial:36000c299c61ea642991160eeb1090604 Type:data Rotational:false Readonly:false Partitions:[] Filesystem: Vendor:VMware Model:Virtual_disk WWN:0x6000c299c61ea642 WWNVendorExtension:0x6000c299c61ea642991160eeb1090604 Empty:false CephVolumeData: RealPath:/dev/sdk KernelName:sdk Encrypted:false}
2022-09-22T21:07:44.972871730Z 2022-09-22 21:07:44.972840 D | exec: Running command: cryptsetup luksDump /mnt/ocs-deviceset-1-data-9zqpmn
2022-09-22T21:07:44.985602529Z 2022-09-22 21:07:44.985543 E | cephosd: failed to determine if the encrypted block "/mnt/ocs-deviceset-1-data-9zqpmn" is from our cluster. failed to dump LUKS header for disk "/mnt/ocs-deviceset-1-data-9zqpmn". Device /mnt/ocs-deviceset-1-data-9zqpmn is not a valid LUKS device.: exit status 1
2022-09-22T21:07:44.985866773Z 2022-09-22 21:07:44.985851 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-1-data-9zqpmn --format json
2022-09-22T21:07:45.372591165Z 2022-09-22 21:07:45.372408 D | cephosd: {
2022-09-22T21:07:45.372591165Z "7739597f-b777-4620-a86c-2ba1d95c708a": {
2022-09-22T21:07:45.372591165Z "ceph_fsid": "3bc51fe9-c37e-4fd3-8599-bf49a5012407",
2022-09-22T21:07:45.372591165Z "device": "/mnt/ocs-deviceset-1-data-9zqpmn",
2022-09-22T21:07:45.372591165Z "osd_id": 27,
2022-09-22T21:07:45.372591165Z "osd_uuid": "7739597f-b777-4620-a86c-2ba1d95c708a",
2022-09-22T21:07:45.372591165Z "type": "bluestore"
2022-09-22T21:07:45.372591165Z }
2022-09-22T21:07:45.372591165Z }
The osd prepare job for the OSD 28 also shows the similar error in pod log rook-ceph-osd-prepare-ocs-deviceset-0-data-9z89m4-qtnnl.
The logs of the previous OSD prepare jobs are not available from the original provisioning of OSDs 27 and 28 are not available, so it may be a different error in the original osd creation. We cannot find the original cause without those original OSD prepare logs.
To get these OSDs created, I would also suggest to purge these two OSDs (as mentioned by Santosh) and wipe the disks, then try again to create them. If this does happen again after the purge and re-creation, please share the OSD prepare logs from the failure, before restarting the operator again and losing them.
Good Evening,
As the customer proceeded to follow the latest BZ recommendation to remove OSD 27 and 28, zap/wipe, and re-add them back, they were unsuccessful and met with the following error:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[g4yy@edcloscd101 ~ ]$ echo $osd_id_to_remove
27
[g4yy@edcloscd101 ~ ]$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f -
error: unknown parameter name "FORCE_OSD_REMOVAL"
error: no objects passed to create
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
An interesting find was from the # oc get templates ocs-osd-removal -o yaml output the customer provided. I vimdiff'd against the output of my test cluster (same version) and the customer's template was missing the following information:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
spec:
containers:
- args:
- ceph
- osd
- remove
- --osd-ids=${FAILED_OSD_IDS}
- --force-osd-removal <------------MISSING
- ${FORCE_OSD_REMOVAL} <------------MISSING
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Additionally, there was quite a bit of differentiation in the description at the end of the # oc get templates ocs-osd-removal -o yaml output. I will attach the latest must-gather along with the outputs mentioned above from my test cluster and from the customer's output in a private comment following this one.
Looking for further direction from Engineering, however, because osd.27 and osd.28 were originally planned as OSDs deployed to scale/add storage capacity that was unsuccessful. Then following the 5.1. Replacing operational or failed storage devices on clusters backed by local storage devices product documentation is not working. Should we have the customer delete PVs/PVCs associated with osd.27 and osd.28 followed by going into the rook-ceph-tools pod and removing the OSDs manually from rack/host and crush map? Thank you for your time.
Regards,
Craig Wayman
TSE Red Hat OpenShift Data Foundations (ODF)
Customer Experience and Engagement, NA
(In reply to Craig Wayman from comment #8) > Good Evening, > > As the customer proceeded to follow the latest BZ recommendation to remove > OSD 27 and 28, zap/wipe, and re-add them back, they were unsuccessful and > met with the following error: > > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > -------------------------- > [g4yy@edcloscd101 ~ ]$ echo $osd_id_to_remove > 27 > [g4yy@edcloscd101 ~ ]$ oc process -n openshift-storage ocs-osd-removal -p > FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=false |oc create -n > openshift-storage -f - > error: unknown parameter name "FORCE_OSD_REMOVAL" > error: no objects passed to create > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > -------------------------- > > An interesting find was from the # oc get templates ocs-osd-removal -o > yaml output the customer provided. I vimdiff'd against the output of my test > cluster (same version) and the customer's template was missing the following > information: > > > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > -------------------------- > spec: > containers: > - args: > - ceph > - osd > - remove > - --osd-ids=${FAILED_OSD_IDS} > - --force-osd-removal <------------MISSING > - ${FORCE_OSD_REMOVAL} <------------MISSING > ----------------------------------------------------------------------------- > ----------------------------------------------------------------------------- > -------------------------- > > That's strange. OCS 4.10 has this `FORCE_OSD_REMOVAL` but somehow customer is not seeing that in the template. I'm assuming that the OCS operator is not updating this template when cluster is upgraded. I'll take a look into this. (In reply to Craig Wayman from comment #8) > Looking for further direction from Engineering, however, because osd.27 > and osd.28 were originally planned as OSDs deployed to scale/add storage > capacity that was unsuccessful. is the operation unsuccessful of the failure to start the `ocs-osd-removal` job due `error: unknown parameter name "FORCE_OSD_REMOVAL"` if that's the case, can you try without passing this argument? For example: oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f - > Then following the 5.1. Replacing > operational or failed storage devices on clusters backed by local storage > devices product documentation is not working. Should we have the customer > delete PVs/PVCs associated with osd.27 and osd.28 followed by going into the > rook-ceph-tools pod and removing the OSDs manually from rack/host and crush > map? Thank you for your time. Good Morning, I am unsure why the removal operation is failing. After seeing the customer's error messages regarding those parameters, I bumped it up with their output to the osd removal template with my test cluster osd removal template and made the observation that those parameters were missing. The OSDs were in an awkward state, to begin with. They were picked up by LSO, created PVs/PVCs, and the count in the storagecluster.yaml is spot on, yet rook did not assign them to a rack. Looking at it on the CEPH end they show up in # ceph osd df, as down and they appear in the crush map however, there is no data between curl braces. The other odd observation is that osd.29 was in the exact same predicament during the initial deployment of the three new disks however when the customer was given the process to scale operators, delete ocsinit, and scale ocs-operator up only osd.29 was deployed successfully and osd.27 and osd.28 were not picked up by rook. I will have the customer attempt to run the command again using your command to set/override the parameter and update the case with the result. Thank you for your help. Regards, Craig Wayman TSE Red Hat OpenShift Data Foundations (ODF) Customer Experience and Engagement, NA Good Afternoon,
I would like to post an update. The customer successfully removed the devices and successfully re-added the devices (osd.27 and osd.28) as well by leaving out the FORCE_OSD_REMOVAL parameter derived from the v4.10 product documentation.
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} FORCE_OSD_REMOVAL=false |oc create -n openshift-storage -f - <------------- Failed to remove devices
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f - <------------- Successfully removed devices
The customer is now in HEALTH_OK and has satisfied the count of "10" in their storagecluster.yaml yielding a total of 30 OSDs, 10 per ODF node/rack.
I also figured out why the first command (the command that failed to remove devices) didn't work. That command was from v4.10 product documentation. Now, the customer is on ODF v4.10 however, it's been a cluster that has been upgraded over time all the way from v4.4. Their storagecluster version is v4.6.0. It seems that the oc process command processes the osd-removal template to perform the osd removal action/job and the customer's template is a by-product of their v4.6.0 storagecluster version. I've noticed that during MOST upgrades, the majority of the OCS/ODF components are upgraded except the storagecluster version.
If you review the RHOCS v4.6 Replacing Devices Product documentation. Specifically chapter/section 4.3. Replacing operational or failed storage devices on clusters backed by local storage devices, step 4-iii, there is no FORCE_OSD_REMOVAL parameter. The command below is the same command the customer ran to finally/officially remove the devices and because my v4.10 cluster was a freshly deployed v4.10 test cluster, that parameter was in fact in my template and not in the customer.
Command Below is from v4.6 Product Documentation
oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f -
As of right now, we're still finishing up a few things with the customer, and once completed I will update the BZ for closure.
Regards,
Craig Wayman
TSE Red Hat OpenShift Data Foundations (ODF)
Customer Experience and Engagement, NA
Good to hear it's in a better state, any update if we're ready to close this? Good Morning, As of now, everything successfully removed the devices and added back. Their CEPH backend looks very good in relation to the storagecluster.yaml count and how the OSDs are distributed on the racks/hosts. However, the PGs on their two biggest pools and OSDs are very low even with the autoscaler set to "on." Currently, I am working with them to increase the PGs manually. As of right now, there is one OSD that has exceeded the 75% threshold and put them in HEALTH_WARN even though other OSDs are sitting at low 50% use. We could re-weight a couple of OSDs however, due to noting how low the PGs were, the way forward is to increase them. Increasing the PGs and then following a CEPH balancer run should spread the data out a little better. This BZ was opened because we were having issues removing/adding storage devices. As of now, that is not the issue, so you can close the BZ however, if the same issue arises, I will update the BZ. Regards, Craig Wayman TSE Red Hat OpenShift Data Foundations (ODF) Customer Experience and Engagement, NA Good to hear the issue is resolved for removing OSDs, thanks for the details. We will close this BZ for now. Good Afternoon, I decided to re-open the BZ since this is still related to the process of adding more OSDs. When the customer opened the case it was because one osd (osd.22) triggered CEPH to go into HEALTH_WARN because it crossed the ODF CEPH nearfull threshold of 75%. The customer added x3 (three) 2 TiB disks bringing CEPH up to HEALTH_OK, only for a short while. I know they could continue to add disks however, this is an issue I've seen in ODF before where rook-ceph doesn't autoscale PGs even with the autoscaler set to "on" and PGs on the OSDs are noticeably low. As of now most of the OSDs are hovering around 50-50% use, and just a few OSDs in the 70% use and osd.22 being one of them that has exceeded 75%. Looking at the PGs on the OSDs, they're pretty low as well (50-60 PGs per OSD). The customer's concern is why isn't ODF autoscaling the PGs with the autoscaler set to "on?" As a proactive step with the observation that the OSD PGs were low, I wanted to have the customer increase not only the pg_num but the pgp_num on the two biggest pools as a method to both increase the PGs on the OSDs and essentially perform a kind of re-weight with the pgp_num dividing the PGs and sending them to other OSDs. The odd part about this process is that, although the customer increased the pg_num and pgp_num on the cephblockpool to 512 the PGs on the OSDs, did not increase. Since then the autoscaler must've autoscaled back down to 256 PGs. This could be an issue with this cluster as it has come all the way (upgraded) from OCS v4.4 because I've performed this process on test clusters previously and was successfully able to scale PGs on both pools and OSDs. I know we can re-weight osd.22 or any of the OSDs that are above 70% however a couple of issues. First, with ODF v4.10 I assume once those three disks were added and CEPH re-mapped then ran the balancer, shouldn't it spread out/increased PGs? It's just odd that with OSDs around 50-60 PGs and the pool/OSD utilization is high, it should've autoscaled the PGs. I know this has a lot to do with possibly setting the TARGET_RATIO on the pools, but it's set to .49 on the two biggest pools. Should this be changed? I would like to address the customer's concern of: 1. The PGs on the OSDs look very low, how do we increase them? Shouldn't rook-ceph do it automatically? 2. Should we adjust the TARGET_RATIO to accomplish this? If so, what values should they be set to and on which pools? I have pulled some data from the must-gather and pasted it below. For more information and logs, the recent must-gather can be found here: https://drive.google.com/drive/folders/1uFAzCAMIt7fzJ5geqFnNfv24vOP8gtbg?usp=sharing Thank you for your time and help! --------------ceph osd df tree----------- ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 59.99799 - 60 TiB 37 TiB 37 TiB 7.4 GiB 112 GiB 23 TiB 61.38 1.00 - root default -8 20.00000 - 20 TiB 12 TiB 12 TiB 2.6 GiB 38 GiB 7.7 TiB 61.37 1.00 - rack rack0 -7 20.00000 - 20 TiB 12 TiB 12 TiB 2.6 GiB 38 GiB 7.7 TiB 61.37 1.00 - host edcxosid001g-bcbsfl-com 1 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 978 MiB 4.3 GiB 696 GiB 66.02 1.08 67 up osd.1 4 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 14 MiB 3.5 GiB 798 GiB 61.02 0.99 61 up osd.4 7 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 596 MiB 3.9 GiB 896 GiB 56.24 0.92 58 up osd.7 11 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 69 MiB 3.7 GiB 842 GiB 58.87 0.96 59 up osd.11 13 ssd 2.00000 1.00000 2 TiB 1.4 TiB 1.4 TiB 32 MiB 4.0 GiB 606 GiB 70.43 1.15 60 up osd.13 16 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 32 MiB 3.5 GiB 791 GiB 61.38 1.00 61 up osd.16 20 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 350 MiB 3.8 GiB 736 GiB 64.07 1.04 62 up osd.20 23 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 0 B 3.8 GiB 734 GiB 64.18 1.05 63 up osd.23 24 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 288 MiB 3.6 GiB 942 GiB 54.01 0.88 55 up osd.24 29 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 281 MiB 3.7 GiB 871 GiB 57.49 0.94 55 up osd.29 -12 19.99899 - 20 TiB 12 TiB 12 TiB 2.2 GiB 38 GiB 7.7 TiB 61.38 1.00 - rack rack1 -11 19.99899 - 20 TiB 12 TiB 12 TiB 2.2 GiB 38 GiB 7.7 TiB 61.38 1.00 - host edcxosid001f-bcbsfl-com 2 ssd 1.99899 1.00000 2.0 TiB 1.2 TiB 1.2 TiB 494 MiB 3.7 GiB 814 GiB 60.24 0.98 65 up osd.2 3 ssd 2.00000 1.00000 2.0 TiB 1.4 TiB 1.4 TiB 0 B 4.0 GiB 657 GiB 67.94 1.11 58 up osd.3 6 ssd 2.00000 1.00000 2.0 TiB 1.2 TiB 1.2 TiB 730 MiB 4.3 GiB 769 GiB 62.43 1.02 67 up osd.6 9 ssd 2.00000 1.00000 2 TiB 1.0 TiB 1.0 TiB 303 MiB 3.4 GiB 1013 GiB 50.54 0.82 54 up osd.9 12 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 0 B 2.9 GiB 933 GiB 54.43 0.89 57 up osd.12 15 ssd 2.00000 1.00000 2 TiB 1.0 TiB 1.0 TiB 240 MiB 3.3 GiB 1012 GiB 50.59 0.82 55 up osd.15 18 ssd 2.00000 1.00000 2 TiB 1.4 TiB 1.4 TiB 48 MiB 4.1 GiB 651 GiB 68.24 1.11 63 up osd.18 21 ssd 2.00000 1.00000 2 TiB 1.4 TiB 1.4 TiB 311 MiB 4.4 GiB 615 GiB 69.98 1.14 60 up osd.21 26 ssd 2.00000 1.00000 2 TiB 1.4 TiB 1.4 TiB 63 MiB 3.9 GiB 586 GiB 71.39 1.16 63 up osd.26 27 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 98 MiB 3.8 GiB 859 GiB 58.06 0.95 59 up osd.27 -4 19.99899 - 20 TiB 12 TiB 12 TiB 2.6 GiB 36 GiB 7.7 TiB 61.38 1.00 - rack rack2 -3 19.99899 - 20 TiB 12 TiB 12 TiB 2.6 GiB 36 GiB 7.7 TiB 61.38 1.00 - host edcxosid001e-bcbsfl-com 0 ssd 1.99899 1.00000 2.0 TiB 1.0 TiB 1.0 TiB 80 MiB 3.2 GiB 990 GiB 51.64 0.84 56 up osd.0 5 ssd 2.00000 1.00000 2.0 TiB 1.4 TiB 1.4 TiB 0 B 3.9 GiB 650 GiB 68.27 1.11 61 up osd.5 8 ssd 2.00000 1.00000 2.0 TiB 1.2 TiB 1.2 TiB 319 MiB 3.7 GiB 827 GiB 59.61 0.97 57 up osd.8 10 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.1 TiB 271 MiB 3.6 GiB 869 GiB 57.59 0.94 57 up osd.10 14 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.2 TiB 275 MiB 3.9 GiB 765 GiB 62.66 1.02 65 up osd.14 17 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 0 B 3.8 GiB 696 GiB 66.00 1.08 61 up osd.17 19 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 28 MiB 3.6 GiB 741 GiB 63.83 1.04 59 up osd.19 22 ssd 2.00000 1.00000 2 TiB 1.5 TiB 1.5 TiB 620 MiB 4.7 GiB 487 GiB 76.22 1.24 72 up osd.22 25 ssd 2.00000 1.00000 2 TiB 1.0 TiB 1.0 TiB 63 MiB 2.2 GiB 1008 GiB 50.80 0.83 53 up osd.25 28 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 961 MiB 3.7 GiB 878 GiB 57.15 0.93 60 up osd.28 TOTAL 60 TiB 37 TiB 37 TiB 7.4 GiB 112 GiB 23 TiB 61.38 MIN/MAX VAR: 0.82/1.24 STDDEV: 6.63 ------------------------------------------------------------- -------------------ceph df detail----------------------------- --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 60 TiB 23 TiB 37 TiB 37 TiB 61.38 TOTAL 60 TiB 23 TiB 37 TiB 37 TiB 61.38 --- POOLS --- POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR .rgw.root 1 8 4.7 KiB 4.7 KiB 0 B 16 224 KiB 224 KiB 0 B 0 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephblockpool 2 256 5.1 TiB 5.1 TiB 4.9 KiB 1.34M 15 TiB 15 TiB 4.9 KiB 74.35 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.control 3 8 0 B 0 B 0 B 8 0 B 0 B 0 B 0 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephfilesystem-metadata 4 8 2.0 GiB 509 MiB 1.5 GiB 1.09M 3.0 GiB 1.5 GiB 1.5 GiB 0.06 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephfilesystem-data0 5 256 4.9 TiB 4.9 TiB 0 B 3.57M 15 TiB 15 TiB 0 B 73.66 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.meta 6 8 4.4 KiB 3.9 KiB 441 B 17 208 KiB 208 KiB 441 B 0 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.log 7 8 22 MiB 9.3 KiB 22 MiB 214 23 MiB 720 KiB 22 MiB 0 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.buckets.index 8 8 305 MiB 0 B 305 MiB 12 305 MiB 0 B 305 MiB 0 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 9 8 0 B 0 B 0 B 0 0 B 0 B 0 B 0 1.8 TiB N/A N/A N/A 0 B 0 B ocs-storagecluster-cephobjectstore.rgw.buckets.data 10 32 2.2 TiB 2.2 TiB 0 B 1.07M 6.7 TiB 6.7 TiB 0 B 55.98 1.8 TiB N/A N/A N/A 0 B 0 B device_health_metrics 11 1 3.5 MiB 0 B 3.5 MiB 30 3.5 MiB 0 B 3.5 MiB 0 1.8 TiB N/A N/A N/A 0 B 0 B ------------------------------------------------------------ ---------------------ceph osd pool autoscale status------------------- POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK .rgw.root 4792 3.0 61437G 0.0000 1.0 8 on False ocs-storagecluster-cephblockpool 5211G 3.0 61437G 0.5000 0.4900 0.5000 1.0 256 on False ocs-storagecluster-cephobjectstore.rgw.control 0 3.0 61437G 0.0000 1.0 8 on False ocs-storagecluster-cephfilesystem-metadata 2088M 3.0 61437G 0.0001 4.0 8 on False ocs-storagecluster-cephfilesystem-data0 4996G 3.0 61437G 0.5000 0.4900 0.5000 1.0 256 on False ocs-storagecluster-cephobjectstore.rgw.meta 4472 3.0 61437G 0.0000 1.0 8 on False ocs-storagecluster-cephobjectstore.rgw.log 22426k 3.0 61437G 0.0000 1.0 8 on False ocs-storagecluster-cephobjectstore.rgw.buckets.index 304.8M 3.0 61437G 0.0000 1.0 8 on False ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 0 3.0 61437G 0.0000 1.0 8 on False ocs-storagecluster-cephobjectstore.rgw.buckets.data 2285G 3.0 61437G 0.1116 1.0 32 on False device_health_metrics 3572k 3.0 61437G 0.0000 1.0 1 on False ------------------------------------------------------------- Regards, Craig Wayman TSE Red Hat OpenShift Data Foundations (ODF) Customer Experience and Engagement, NA (In reply to Craig Wayman from comment #18) > Good Afternoon, > > I decided to re-open the BZ since this is still related to the process of > > > 1. The PGs on the OSDs look very low, how do we increase them? Shouldn't > rook-ceph do it automatically? The number of PGs, based on the number of OSDs available, looks good to me. But I'll check again and get back to you. > 2. Should we adjust the TARGET_RATIO to accomplish this? If so, what values > should they be set to and on which pools? Not sure about this right now. I'll check and get back to you. Checking with someone from Ceph about this. I'll update the BZ once I get the answers. Acknowledge, From what the consensus on our team is that for customers with a decent amount of data, their OSD PGs should be at least over 100 PGs. Their PGs look pretty low. In addition to that statement, is the fact that there are just a couple of OSDs that are flirting with the 75% %USE (ODF nearfull limit) meanwhile the vast majority of the OSDs are sitting around high 50s - low 60s %USE. They actually added three more new disks as well which decreased the %USE even more but there are still those couple of OSDs causing trouble (I will post ceph df and ceph osd df tree below).
Now, we could re-weight those two OSDs however because the PGs were so low I was going to use the increase PG process to accomplish both by increasing the pg_num and pgp_num... the pgp_num being the action that would break up the PGs and send them to other OSDS however, even that process didn't work as expected. The Pool PGS went up but the OSD PGs stayed the same. That does not happen in my test environment. I wonder if it has to do with this cluster being upgraded over time as it started from v4.4 so maybe components like rook-ceph aren't functioning the way they should be as it would a new install.
All that said, the big questions are, why isn't rook-ceph scaling these PGs when the autoscaler is set to "on?" Why isn't rook-ceph balancing these OSDs better when the ceph balancer runs? I've seen posts about the autoscaler not being particularly great at scaling PGs in ODF, that's why I was wondering what the next steps should be... Should we adjust the TARGET_RATIO on the pools? With CEPH there is a lot more hands-on/manual tweaking, but with ODF it is generally supposed to be more of a hand-off approach which is why the customer is curious as to why things aren't working properly. The reason why this case was opened is that they were in HEALTH_WARN because of just one OSD. Then after adding three new disks they were still in HEALTH_WARN because of that same OSD (osd.22).
I just wanted to explain a little better, looking forward to Engineering's input.
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 66 TiB 29 TiB 37 TiB 37 TiB 56.09
TOTAL 66 TiB 29 TiB 37 TiB 37 TiB 56.09
--- POOLS ---
POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA) (OMAP) %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
.rgw.root 1 8 4.7 KiB 4.7 KiB 0 B 16 232 KiB 232 KiB 0 B 0 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephblockpool 2 256 5.1 TiB 5.1 TiB 4.9 KiB 1.34M 15 TiB 15 TiB 4.9 KiB 64.14 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephobjectstore.rgw.control 3 8 0 B 0 B 0 B 8 0 B 0 B 0 B 0 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephfilesystem-metadata 4 8 2.1 GiB 512 MiB 1.6 GiB 1.09M 3.1 GiB 1.5 GiB 1.6 GiB 0.03 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephfilesystem-data0 5 256 4.9 TiB 4.9 TiB 0 B 3.57M 15 TiB 15 TiB 0 B 63.28 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephobjectstore.rgw.meta 6 8 4.4 KiB 3.9 KiB 441 B 17 208 KiB 208 KiB 441 B 0 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephobjectstore.rgw.log 7 8 22 MiB 3.6 KiB 22 MiB 214 23 MiB 528 KiB 22 MiB 0 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephobjectstore.rgw.buckets.index 8 8 310 MiB 0 B 310 MiB 12 310 MiB 0 B 310 MiB 0 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 9 8 0 B 0 B 0 B 0 0 B 0 B 0 B 0 2.8 TiB N/A N/A N/A 0 B 0 B
ocs-storagecluster-cephobjectstore.rgw.buckets.data 10 32 2.3 TiB 2.3 TiB 0 B 1.11M 6.9 TiB 6.9 TiB 0 B 44.72 2.8 TiB N/A N/A N/A 0 B 0 B
device_health_metrics 11 1 3.9 MiB 0 B 3.9 MiB 33 3.9 MiB 0 B 3.9 MiB 0 2.8 TiB N/A N/A N/A 0 B 0 B
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 65.99799 - 66 TiB 37 TiB 37 TiB 8.1 GiB 121 GiB 29 TiB 56.09 1.00 - root default
-8 22.00000 - 22 TiB 12 TiB 12 TiB 2.8 GiB 40 GiB 9.7 TiB 56.08 1.00 - rack rack0
-7 22.00000 - 22 TiB 12 TiB 12 TiB 2.8 GiB 40 GiB 9.7 TiB 56.08 1.00 - host edcxosid001g-bcbsfl-com
1 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 1013 MiB 4.3 GiB 788 GiB 61.53 1.10 62 up osd.1
4 ssd 2.00000 1.00000 2 TiB 1.0 TiB 1.0 TiB 14 MiB 3.7 GiB 986 GiB 51.84 0.92 54 up osd.4
7 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 576 MiB 3.7 GiB 934 GiB 54.41 0.97 56 up osd.7
11 ssd 2.00000 1.00000 2 TiB 963 GiB 960 GiB 71 MiB 3.4 GiB 1.1 TiB 47.04 0.84 52 up osd.11
13 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 33 MiB 4.3 GiB 724 GiB 64.65 1.15 56 up osd.13
16 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 33 MiB 3.2 GiB 888 GiB 56.65 1.01 56 up osd.16
20 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 324 MiB 3.7 GiB 849 GiB 58.57 1.04 55 up osd.20
23 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 0 B 3.7 GiB 922 GiB 55.00 0.98 56 up osd.23
24 ssd 2.00000 1.00000 2 TiB 990 GiB 987 GiB 274 MiB 3.5 GiB 1.0 TiB 48.36 0.86 49 up osd.24
29 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 213 MiB 3.5 GiB 925 GiB 54.84 0.98 51 up osd.29
30 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 281 MiB 3.3 GiB 737 GiB 64.02 1.14 54 up osd.30
-12 21.99899 - 22 TiB 12 TiB 12 TiB 2.6 GiB 41 GiB 9.7 TiB 56.10 1.00 - rack rack1
-11 21.99899 - 22 TiB 12 TiB 12 TiB 2.6 GiB 41 GiB 9.7 TiB 56.10 1.00 - host edcxosid001f-bcbsfl-com
2 ssd 1.99899 1.00000 2.0 TiB 1.1 TiB 1.1 TiB 506 MiB 3.7 GiB 892 GiB 56.44 1.01 60 up osd.2
3 ssd 2.00000 1.00000 2.0 TiB 1.2 TiB 1.2 TiB 0 B 4.2 GiB 852 GiB 58.39 1.04 53 up osd.3
6 ssd 2.00000 1.00000 2.0 TiB 1.0 TiB 1.0 TiB 728 MiB 4.2 GiB 1015 GiB 50.44 0.90 56 up osd.6
9 ssd 2.00000 1.00000 2 TiB 936 GiB 932 GiB 296 MiB 3.4 GiB 1.1 TiB 45.70 0.81 48 up osd.9
12 ssd 2.00000 1.00000 2 TiB 1.0 TiB 1.0 TiB 0 B 3.0 GiB 1011 GiB 50.64 0.90 53 up osd.12
15 ssd 2.00000 1.00000 2 TiB 1018 GiB 1014 GiB 242 MiB 3.7 GiB 1.0 TiB 49.72 0.89 54 up osd.15
18 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 33 MiB 3.6 GiB 771 GiB 62.35 1.11 59 up osd.18
21 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 316 MiB 4.4 GiB 736 GiB 64.07 1.14 56 up osd.21
26 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 66 MiB 4.2 GiB 757 GiB 63.03 1.12 54 up osd.26
27 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 74 MiB 3.4 GiB 956 GiB 53.30 0.95 52 up osd.27
32 ssd 2.00000 1.00000 2 TiB 1.3 TiB 1.3 TiB 365 MiB 3.5 GiB 758 GiB 62.99 1.12 56 up osd.32
-4 21.99899 - 22 TiB 12 TiB 12 TiB 2.8 GiB 39 GiB 9.7 TiB 56.09 1.00 - rack rack2
-3 21.99899 - 22 TiB 12 TiB 12 TiB 2.8 GiB 39 GiB 9.7 TiB 56.09 1.00 - host edcxosid001e-bcbsfl-com
0 ssd 1.99899 1.00000 2.0 TiB 844 GiB 841 GiB 83 MiB 3.1 GiB 1.2 TiB 41.24 0.74 48 up osd.0
5 ssd 2.00000 1.00000 2.0 TiB 1.4 TiB 1.4 TiB 0 B 4.1 GiB 658 GiB 67.86 1.21 60 up osd.5
8 ssd 2.00000 1.00000 2.0 TiB 1.2 TiB 1.2 TiB 341 MiB 3.7 GiB 862 GiB 57.92 1.03 55 up osd.8
10 ssd 2.00000 1.00000 2 TiB 1.1 TiB 1.1 TiB 289 MiB 3.7 GiB 924 GiB 54.89 0.98 54 up osd.10
14 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 0 B 3.9 GiB 861 GiB 57.97 1.03 59 up osd.14
17 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 0 B 3.9 GiB 768 GiB 62.50 1.11 57 up osd.17
19 ssd 2.00000 1.00000 2 TiB 1.2 TiB 1.2 TiB 31 MiB 3.7 GiB 792 GiB 61.33 1.09 55 up osd.19
22 ssd 2.00000 1.00000 2 TiB 1.4 TiB 1.4 TiB 642 MiB 4.1 GiB 572 GiB 72.06 1.28 67 up osd.22
25 ssd 2.00000 1.00000 2 TiB 980 GiB 978 GiB 67 MiB 2.3 GiB 1.0 TiB 47.87 0.85 50 up osd.25
28 ssd 2.00000 1.00000 2 TiB 1.0 TiB 1.0 TiB 746 MiB 3.7 GiB 994 GiB 51.45 0.92 51 up osd.28
31 ssd 2.00000 1.00000 2 TiB 858 GiB 854 GiB 640 MiB 3.0 GiB 1.2 TiB 41.88 0.75 45 up osd.31
TOTAL 66 TiB 37 TiB 37 TiB 8.1 GiB 121 GiB 29 TiB 56.09
MIN/MAX VAR: 0.74/1.28 STDDEV: 7.26
Regards,
Craig Wayman
TSE Red Hat OpenShift Data Foundations (ODF)
Customer Experience and Engagement, NA
Everything looks correct from Rook's perspective since the OSDs are all online as desired, including the new OSDs created to scale up the cluster. The PG autoscaler is handled by core ceph. Vikhyat could someone from your team take a look to see if the PGs are expected or if there is an issue with the auto scaler? (In reply to Travis Nielsen from comment #23) > Everything looks correct from Rook's perspective since the OSDs are all > online as desired, including the new OSDs created to scale up the cluster. > > The PG autoscaler is handled by core ceph. Vikhyat could someone from your > team take a look to see if the PGs are expected or if there is an issue with > the auto scaler? Sure, Travis. Let me ask Junior if he can help! Junior - see comment#21 looks like autoscaler is not scaling pgs, can you please take a look and help the support team? If you need more debug data feel free to request it. Thank you, Vikhyat (In reply to Kamoltat (Junior) Sirivadhna from comment #29) > Hi Craig, > > apologies for the delay, thank you for your patience. > > The output from `ceph osd pool autoscale-status` suggests we did not utilize > the `bulk` flag. Basically what the `bulk` flag does is that it tells the > autoscaler that the pool is expected to be large. > Therefore, the autoscaler will start out that pool with a large amount of > PGs for performance purposes. > > From the comments in this BZ, I can see that you want to increase > `ocs-storagecluster-cephblockpool` and maybe > `ocs-storagecluster-cephfilesystem-data0`, > `ocs-storagecluster-cephobjectstore.rgw.buckets.data`. > > So what you can do is use this command: > > `ceph osd pool set <pool-name> bulk true` > > Let me know if this helps in the short term. > > To be honest, I feel like a lot of people are still unaware of the `bulk` > flag feature, especially during pool creation, I'll also look into improving > the pool creation process, when the autoscaler is enabled `bulk` flag should > also be set for data pools. Junior, First, thank you for pointing out that bulk flag, and yes, you were correct members of my team including myself, weren't very familiar with that particular ceph bulk flag. The interesting thing that happened is that when the customer used the bulk flag it increased the PGs however, once they were increased since the PG autoscaler still being set to "on" for those pools the bulk flag was applied to, the PGs increased, and then decreased. The good news is that this process ended up balancing their %USE on the OSDs. Everything is much more balanced now after that process. With that said, I hopefully have one more question, that once answered should put this BZ to bed. If the customer uses the bulk flag on a pool. Should that pool have the autoscaler set to "off"? In my view, I believe this should be the case. Now, I know this is use case specific, but for this cluster that has a good amount of data, a lot of OSDs, high-use/workload... Their PGs on their OSDs are sitting on average at around 75 PGs per OSD. My goal was to get them up over at least 100 PGs per OSD, so if the bulk flag was applied, the PGs increased to around 200 PGs per OSD. That is a good desired state. In my opinion, I don't think there is a need to have the autoscaler set to "on" since we're already at the desired state. Again, thank you for your time and insight on the issue. This was helpful. Regards, Craig Wayman TSE Red Hat OpenShift Data Foundations (ODF) Customer Experience and Engagement, NA (In reply to Craig Wayman from comment #32) > (In reply to Kamoltat (Junior) Sirivadhna from comment #29) > > Hi Craig, > > > > apologies for the delay, thank you for your patience. > > > > The output from `ceph osd pool autoscale-status` suggests we did not utilize > > the `bulk` flag. Basically what the `bulk` flag does is that it tells the > > autoscaler that the pool is expected to be large. > > Therefore, the autoscaler will start out that pool with a large amount of > > PGs for performance purposes. > > > > From the comments in this BZ, I can see that you want to increase > > `ocs-storagecluster-cephblockpool` and maybe > > `ocs-storagecluster-cephfilesystem-data0`, > > `ocs-storagecluster-cephobjectstore.rgw.buckets.data`. > > > > So what you can do is use this command: > > > > `ceph osd pool set <pool-name> bulk true` > > > > Let me know if this helps in the short term. > > > > To be honest, I feel like a lot of people are still unaware of the `bulk` > > flag feature, especially during pool creation, I'll also look into improving > > the pool creation process, when the autoscaler is enabled `bulk` flag should > > also be set for data pools. > > Junior, > > First, thank you for pointing out that bulk flag, and yes, you were > correct members of my team including myself, weren't very familiar with that > particular ceph bulk flag. The interesting thing that happened is that when > the customer used the bulk flag it increased the PGs however, once they were > increased since the PG autoscaler still being set to "on" for those pools > the bulk flag was applied to, the PGs increased, and then decreased. The > good news is that this process ended up balancing their %USE on the OSDs. > Everything is much more balanced now after that process. With that said, I > hopefully have one more question, that once answered should put this BZ to > bed. If the customer uses the bulk flag on a pool. Should that pool have the > autoscaler set to "off"? > > In my view, I believe this should be the case. Now, I know this is use > case specific, but for this cluster that has a good amount of data, a lot of > OSDs, high-use/workload... Their PGs on their OSDs are sitting on average at > around 75 PGs per OSD. My goal was to get them up over at least 100 PGs per > OSD, so if the bulk flag was applied, the PGs increased to around 200 PGs > per OSD. That is a good desired state. In my opinion, I don't think there is > a need to have the autoscaler set to "on" since we're already at the desired > state. > > Again, thank you for your time and insight on the issue. This was helpful. > > > Regards, > > > Craig Wayman > TSE Red Hat OpenShift Data Foundations (ODF) > Customer Experience and Engagement, NA The bulk setting is only meaningful if the autoscaler is enabled, so if you're going to set it, you need the autoscaler enabled. If the autoscaler is disabled, then all PG management becomes manual, which is something ODF (and also Ceph) aims to avoid for users as much as possible. |
Description of problem (please be detailed as possible and provide log snippets): The customer opened the case due to latency associated with CEPH. This cluster is using local storage/LSO. Upon review of the must-gather CEPH reflected that osd.22 was nearfull along with the two biggest pools. The customer made the decision to expand storage and add three new NVMe disks to three pre-existing hosts. From what the customer provided us, it seems they've performed the process correctly. The three rook-ceph-osd-prepare-ocs-deviceset jobs/pods completed successfully, the PVs/PVCs were created however, rook-ceph-operator did not pick up the three OSDs (osd.27, osd.28, and osd.29). When the customer scaled ocs-operator and rook-ceph operator, delete ocsinit, and scale up just the ocs-operator osd.29 was added to rack0. Unfortunately, osd.27, and osd.28 were not picked up and added to existing hosts/racks. The storagecluster.yaml reflects the correct count of 10 OSDs per host. Previously, the count was 9. osd.29 has a deployment and a pod however, osd.27 and osd.28 do not have either a deployment or a pod. When looking at the Crush Map, osd.29 is populated with all the information it needs however, osd.27 and osd.28 DO APPEAR in the Crush Map, but with empty curly braces }, { "id": 27 }, { "id": 28 }, { "id": 29, "arch": "x86_64", "back_addr": "[v2:10.224.18.27:6802/406,v1:10.224.18.27:6803/406]", "back_iface": "", "bluefs": "1", "bluefs_dedicated_db": "0",.......and so on...........omitted......... } Version of all relevant components (if applicable): ODF (CSV): NAME DISPLAY VERSION REPLACES PHASE container-security-operator.v3.7.6 Red Hat Quay Container Security Operator 3.7.6 container-security-operator.v3.7.5 Succeeded elasticsearch-operator.5.5.1 OpenShift Elasticsearch Operator 5.5.1 elasticsearch-operator.5.5.0 Succeeded jaeger-operator.v1.36.0-2 Red Hat OpenShift distributed tracing platform 1.36.0-2 jaeger-operator.v1.34.1-5 Succeeded kiali-operator.v1.48.2 Kiali Operator 1.48.2 kiali-operator.v1.48.1 Succeeded kubernetes-imagepuller-operator.v1.0.1 Kubernetes Image Puller Operator 1.0.1 kubernetes-imagepuller-operator.v1.0.0 Installing mcg-operator.v4.10.5 NooBaa Operator 4.10.5 mcg-operator.v4.9.10 Succeeded ocs-operator.v4.10.5 OpenShift Container Storage 4.10.5 ocs-operator.v4.9.10 Succeeded odf-csi-addons-operator.v4.10.5 CSI Addons 4.10.5 odf-csi-addons-operator.v4.10.4 Succeeded odf-operator.v4.10.5 OpenShift Data Foundation 4.10.5 odf-operator.v4.9.10 Succeeded serverless-operator.v1.7.2 OpenShift Serverless Operator 1.7.2 serverless-operator.v1.7.1 Installing servicemeshoperator.v2.2.1 Red Hat OpenShift Service Mesh 2.2.1-0 servicemeshoperator.v2.1.3 Succeeded Cluster Version (OCP): NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.28 True False 8d Cluster version is 4.10.28 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, the Production Cluster is now Nearfull Is there any workaround available to the best of your knowledge? The workaround we were able to implement was to 1. Scale the ocs-operator and rook-ceph-operator to –replicas=0. 2. Delete ocsinitialization/ocsinit. And 3. Scale up just the ocs-operator. With that process, rook-ceph was able to add osd.29 however, even by repeating that process osd.27 and osd.28 were not able to be added. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? N/A Can this issue reproducible? Not in a testing environment. Can this issue reproduce from the UI? No Additional info: It may be worth it to mention that this is a very busy/demanding cluster. We’ve seen the odf-operator get OOMKilled even after increasing memory limits/requests to 700Mi. Since then they’ve increased to 900Mi. We’re constantly seeing two noobaa-endpoint pods deployed, signifying high workload/traffic.