Bug 2218116
Summary: | Avoid wrong detection of disk media type, such as HDD instead of SSD on vSAN | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Elad <ebenahar> |
Component: | ocs-operator | Assignee: | Malay Kumar parida <mparida> |
Status: | CLOSED ERRATA | QA Contact: | Aman Agrawal <amagrawa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.13 | CC: | amagrawa, mparida, muagarwa, odf-bz-bot, sapillai, srai, vavuthu |
Target Milestone: | --- | ||
Target Release: | ODF 4.14.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 4.14.0-123 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-08 18:52:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Elad
2023-06-28 08:34:10 UTC
I guess this should go to ocs-operator as we need to set it there. Malay, can you pick it up? More context: https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c47 I think we should be able to set `Set bluestore_debug_enforce_settings = "ssd" ` it now as well, but I'll check and update here. Hi @sapillai , Can you take a look once? Is this related to your feature epic you are working for 4.14 https://github.com/red-hat-storage/ocs-operator/pull/2053. I was looking at this comment https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c41 from the linked BZ. ``` RECOMMENDATION Force BlueStore to use settings designed for SSDs. 1) set bluestore_debug_enforce_settings = "ssd" but it requires restart OR 2) set bluestore_prefer_deferred_size_hdd = 0 should work right away and no new deferred writes will be enqueued ``` I see the 2nd option will not require OSD restarts. So shouldn't we set that one instead of the 1st one? I see in the linked BZ Aman went ahead with testing with the 2nd option. In the case of customers who will upgrade from earlier versions of ODF will the 1st settings work as intended directly or it will require an osd restart? Also does the setting need to go under the global section or any other specific section? ref-https://github.com/red-hat-storage/ocs-operator/blob/abe61d1773697c03af99cae0095d3136f252fa7c/controllers/storagecluster/cephconfig.go#L31 (In reply to Malay Kumar parida from comment #6) > I see the 2nd option will not require OSD restarts. So shouldn't we set that > one instead of the 1st one? I see in the linked BZ Aman went ahead with > testing with the 2nd option. If the second option is the one we validated and it doesn't require OSD restart then we should go with it. (In reply to Elad from comment #8) > (In reply to Malay Kumar parida from comment #6) > > > I see the 2nd option will not require OSD restarts. So shouldn't we set that > > one instead of the 1st one? I see in the linked BZ Aman went ahead with > > testing with the 2nd option. > > If the second option is the one we validated and it doesn't require OSD > restart then we should go with it. This config. was tested only once. Elad, do you think a thorough testing is required here by perf. team or even us to validate the results? Ref- https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c49 Tested with on a RDR setup ODF 4.14.0-136.stable OCP 4.14.0-0.nightly-2023-09-02-132842 ACM 2.9.0-DOWNSTREAM-2023-08-24-09-30-12 subctl version: v0.16.0 ceph version 17.2.6-138.el9cp (b488c8dad42b2ecffcd96f3d76eeeecce48b8590) quincy (stable) bash-5.1$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 4.50000 root default -8 1.50000 rack rack0 -7 0.50000 host ocs-deviceset-thin-csi-odf-1-data-022gpw 0 ssd 0.50000 osd.0 up 1.00000 1.00000 -19 0.50000 host ocs-deviceset-thin-csi-odf-2-data-15rvp9 5 ssd 0.50000 osd.5 up 1.00000 1.00000 -21 0.50000 host ocs-deviceset-thin-csi-odf-2-data-2vlwvm 6 ssd 0.50000 osd.6 up 1.00000 1.00000 -12 1.50000 rack rack1 -11 0.50000 host ocs-deviceset-thin-csi-odf-0-data-0527jt 2 ssd 0.50000 osd.2 up 1.00000 1.00000 -17 0.50000 host ocs-deviceset-thin-csi-odf-1-data-16gz27 4 ssd 0.50000 osd.4 up 1.00000 1.00000 -25 0.50000 host ocs-deviceset-thin-csi-odf-1-data-2b8zjk 8 ssd 0.50000 osd.8 up 1.00000 1.00000 -4 1.50000 rack rack2 -15 0.50000 host ocs-deviceset-thin-csi-odf-0-data-1d9dll 3 ssd 0.50000 osd.3 up 1.00000 1.00000 -23 0.50000 host ocs-deviceset-thin-csi-odf-0-data-2tlj2p 7 ssd 0.50000 osd.7 up 1.00000 1.00000 -3 0.50000 host ocs-deviceset-thin-csi-odf-2-data-08bq7j 1 ssd 0.50000 osd.1 up 1.00000 1.00000 While osd type is shown as ssd, ceph config still reports it as hdd bash-5.1$ ceph config get osd bluestore_prefer_deferred_size_hdd 65536 The expected value here was 0. Shared the cluster with Malay and got the confirmation that values aren't properly being set. Hence failing_qa... For logs, refer C1 or C2 logs under http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/20sept23-1/ which are managed clusters where ODF is installed. As per the discussion here https://chat.google.com/room/AAAAREGEba8/6fVthUX9WA4, according to Travis ``` The configuration in that configmap will not show up in the central config store, only on the individual daemons To verify, 1) connect to an osd daemon pod, 2) run unset CEPH_ARGS, and then 3) run ceph daemon osd.0 config show where the osd daemon ID needs to be replaced with the ID that was connected to it will output a lot of settings, so grep for the one you need ``` Moving to ON_QA. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832 |