Description of problem (please be detailed as possible and provide log snippets): In some situations, such as vSphere VSAN, the drive media type is detected wrongly, for instance as HDD instead of SSD. For that, it would be good if we can set Set bluestore_debug_enforce_settings = "ssd" in rook-config-override
I guess this should go to ocs-operator as we need to set it there. Malay, can you pick it up?
More context: https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c47
I think we should be able to set `Set bluestore_debug_enforce_settings = "ssd" ` it now as well, but I'll check and update here.
Hi @sapillai , Can you take a look once? Is this related to your feature epic you are working for 4.14 https://github.com/red-hat-storage/ocs-operator/pull/2053.
I was looking at this comment https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c41 from the linked BZ. ``` RECOMMENDATION Force BlueStore to use settings designed for SSDs. 1) set bluestore_debug_enforce_settings = "ssd" but it requires restart OR 2) set bluestore_prefer_deferred_size_hdd = 0 should work right away and no new deferred writes will be enqueued ``` I see the 2nd option will not require OSD restarts. So shouldn't we set that one instead of the 1st one? I see in the linked BZ Aman went ahead with testing with the 2nd option. In the case of customers who will upgrade from earlier versions of ODF will the 1st settings work as intended directly or it will require an osd restart?
Also does the setting need to go under the global section or any other specific section? ref-https://github.com/red-hat-storage/ocs-operator/blob/abe61d1773697c03af99cae0095d3136f252fa7c/controllers/storagecluster/cephconfig.go#L31
(In reply to Malay Kumar parida from comment #6) > I see the 2nd option will not require OSD restarts. So shouldn't we set that > one instead of the 1st one? I see in the linked BZ Aman went ahead with > testing with the 2nd option. If the second option is the one we validated and it doesn't require OSD restart then we should go with it.
(In reply to Elad from comment #8) > (In reply to Malay Kumar parida from comment #6) > > > I see the 2nd option will not require OSD restarts. So shouldn't we set that > > one instead of the 1st one? I see in the linked BZ Aman went ahead with > > testing with the 2nd option. > > If the second option is the one we validated and it doesn't require OSD > restart then we should go with it. This config. was tested only once. Elad, do you think a thorough testing is required here by perf. team or even us to validate the results? Ref- https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c49
Tested with on a RDR setup ODF 4.14.0-136.stable OCP 4.14.0-0.nightly-2023-09-02-132842 ACM 2.9.0-DOWNSTREAM-2023-08-24-09-30-12 subctl version: v0.16.0 ceph version 17.2.6-138.el9cp (b488c8dad42b2ecffcd96f3d76eeeecce48b8590) quincy (stable) bash-5.1$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 4.50000 root default -8 1.50000 rack rack0 -7 0.50000 host ocs-deviceset-thin-csi-odf-1-data-022gpw 0 ssd 0.50000 osd.0 up 1.00000 1.00000 -19 0.50000 host ocs-deviceset-thin-csi-odf-2-data-15rvp9 5 ssd 0.50000 osd.5 up 1.00000 1.00000 -21 0.50000 host ocs-deviceset-thin-csi-odf-2-data-2vlwvm 6 ssd 0.50000 osd.6 up 1.00000 1.00000 -12 1.50000 rack rack1 -11 0.50000 host ocs-deviceset-thin-csi-odf-0-data-0527jt 2 ssd 0.50000 osd.2 up 1.00000 1.00000 -17 0.50000 host ocs-deviceset-thin-csi-odf-1-data-16gz27 4 ssd 0.50000 osd.4 up 1.00000 1.00000 -25 0.50000 host ocs-deviceset-thin-csi-odf-1-data-2b8zjk 8 ssd 0.50000 osd.8 up 1.00000 1.00000 -4 1.50000 rack rack2 -15 0.50000 host ocs-deviceset-thin-csi-odf-0-data-1d9dll 3 ssd 0.50000 osd.3 up 1.00000 1.00000 -23 0.50000 host ocs-deviceset-thin-csi-odf-0-data-2tlj2p 7 ssd 0.50000 osd.7 up 1.00000 1.00000 -3 0.50000 host ocs-deviceset-thin-csi-odf-2-data-08bq7j 1 ssd 0.50000 osd.1 up 1.00000 1.00000 While osd type is shown as ssd, ceph config still reports it as hdd bash-5.1$ ceph config get osd bluestore_prefer_deferred_size_hdd 65536 The expected value here was 0. Shared the cluster with Malay and got the confirmation that values aren't properly being set. Hence failing_qa... For logs, refer C1 or C2 logs under http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/20sept23-1/ which are managed clusters where ODF is installed.
As per the discussion here https://chat.google.com/room/AAAAREGEba8/6fVthUX9WA4, according to Travis ``` The configuration in that configmap will not show up in the central config store, only on the individual daemons To verify, 1) connect to an osd daemon pod, 2) run unset CEPH_ARGS, and then 3) run ceph daemon osd.0 config show where the osd daemon ID needs to be replaced with the ID that was connected to it will output a lot of settings, so grep for the one you need ``` Moving to ON_QA.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832