Bug 2218116

Summary:	Avoid wrong detection of disk media type, such as HDD instead of SSD on vSAN
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Elad <ebenahar>
Component:	ocs-operator	Assignee:	Malay Kumar parida <mparida>
Status:	CLOSED ERRATA	QA Contact:	Aman Agrawal <amagrawa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.13	CC:	amagrawa, mparida, muagarwa, odf-bz-bot, sapillai, srai, vavuthu
Target Milestone:	---
Target Release:	ODF 4.14.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	4.14.0-123	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-11-08 18:52:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Elad 2023-06-28 08:34:10 UTC

Description of problem (please be detailed as possible and provide log
snippets):

In some situations, such as vSphere VSAN, the drive media type is detected wrongly, for instance as HDD instead of SSD. For that, it would be good if we can set Set bluestore_debug_enforce_settings = "ssd" in rook-config-override

Comment 1 Mudit Agarwal 2023-06-28 10:37:08 UTC

I guess this should go to ocs-operator as we need to set it there. 
Malay, can you pick it up?

Comment 2 Mudit Agarwal 2023-06-28 10:37:31 UTC

More context: https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c47

Comment 3 Subham Rai 2023-06-28 12:25:09 UTC

I think we should be able to set `Set bluestore_debug_enforce_settings = "ssd" ` it now as well, but I'll check and update here.

Comment 4 Malay Kumar parida 2023-06-30 06:47:48 UTC

Hi @sapillai , Can you take a look once? Is this related to your feature epic you are working for 4.14 https://github.com/red-hat-storage/ocs-operator/pull/2053.

Comment 6 Malay Kumar parida 2023-08-05 14:09:31 UTC

I was looking at this comment https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c41 from the linked BZ.

```
RECOMMENDATION
Force BlueStore to use settings designed for SSDs.

1) set
bluestore_debug_enforce_settings = "ssd"
but it requires restart

OR

2) set 
bluestore_prefer_deferred_size_hdd = 0
should work right away and no new deferred writes will be enqueued
```

I see the 2nd option will not require OSD restarts. So shouldn't we set that one instead of the 1st one? I see in the linked BZ Aman went ahead with testing with the 2nd option.
In the case of customers who will upgrade from earlier versions of ODF will the 1st settings work as intended directly or it will require an osd restart?

Comment 7 Malay Kumar parida 2023-08-05 14:10:44 UTC

Also does the setting need to go under the global section or any other specific section?
ref-https://github.com/red-hat-storage/ocs-operator/blob/abe61d1773697c03af99cae0095d3136f252fa7c/controllers/storagecluster/cephconfig.go#L31

Comment 8 Elad 2023-08-06 07:13:42 UTC

(In reply to Malay Kumar parida from comment #6)

> I see the 2nd option will not require OSD restarts. So shouldn't we set that
> one instead of the 1st one? I see in the linked BZ Aman went ahead with
> testing with the 2nd option.

If the second option is the one we validated and it doesn't require OSD restart then we should go with it.

Comment 9 Aman Agrawal 2023-08-06 17:36:22 UTC

(In reply to Elad from comment #8)
> (In reply to Malay Kumar parida from comment #6)
> 
> > I see the 2nd option will not require OSD restarts. So shouldn't we set that
> > one instead of the 1st one? I see in the linked BZ Aman went ahead with
> > testing with the 2nd option.
> 
> If the second option is the one we validated and it doesn't require OSD
> restart then we should go with it.

This config. was tested only once. Elad, do you think a thorough testing is required here by perf. team or even us to validate the results?
Ref- https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c49

Comment 14 Aman Agrawal 2023-09-21 11:09:36 UTC

Tested with on a RDR setup
ODF 4.14.0-136.stable
OCP 4.14.0-0.nightly-2023-09-02-132842
ACM 2.9.0-DOWNSTREAM-2023-08-24-09-30-12
subctl version: v0.16.0
ceph version 17.2.6-138.el9cp (b488c8dad42b2ecffcd96f3d76eeeecce48b8590) quincy (stable)

bash-5.1$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                              STATUS  REWEIGHT  PRI-AFF
 -1         4.50000  root default                                                                    
 -8         1.50000      rack rack0                                                                  
 -7         0.50000          host ocs-deviceset-thin-csi-odf-1-data-022gpw                           
  0    ssd  0.50000              osd.0                                          up   1.00000  1.00000
-19         0.50000          host ocs-deviceset-thin-csi-odf-2-data-15rvp9                           
  5    ssd  0.50000              osd.5                                          up   1.00000  1.00000
-21         0.50000          host ocs-deviceset-thin-csi-odf-2-data-2vlwvm                           
  6    ssd  0.50000              osd.6                                          up   1.00000  1.00000
-12         1.50000      rack rack1                                                                  
-11         0.50000          host ocs-deviceset-thin-csi-odf-0-data-0527jt                           
  2    ssd  0.50000              osd.2                                          up   1.00000  1.00000
-17         0.50000          host ocs-deviceset-thin-csi-odf-1-data-16gz27                           
  4    ssd  0.50000              osd.4                                          up   1.00000  1.00000
-25         0.50000          host ocs-deviceset-thin-csi-odf-1-data-2b8zjk                           
  8    ssd  0.50000              osd.8                                          up   1.00000  1.00000
 -4         1.50000      rack rack2                                                                  
-15         0.50000          host ocs-deviceset-thin-csi-odf-0-data-1d9dll                           
  3    ssd  0.50000              osd.3                                          up   1.00000  1.00000
-23         0.50000          host ocs-deviceset-thin-csi-odf-0-data-2tlj2p                           
  7    ssd  0.50000              osd.7                                          up   1.00000  1.00000
 -3         0.50000          host ocs-deviceset-thin-csi-odf-2-data-08bq7j                           
  1    ssd  0.50000              osd.1                                          up   1.00000  1.00000

While osd type is shown as ssd, ceph config still reports it as hdd

bash-5.1$ ceph config get osd bluestore_prefer_deferred_size_hdd
65536

The expected value here was 0.

Shared the cluster with Malay and got the confirmation that values aren't properly being set. Hence failing_qa...

For logs, refer C1 or C2 logs under http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/20sept23-1/ which are managed clusters where ODF is installed.

Comment 15 Malay Kumar parida 2023-09-26 11:20:35 UTC

As per the discussion here https://chat.google.com/room/AAAAREGEba8/6fVthUX9WA4,
according to Travis
```
The configuration in that configmap will not show up in the central config store, only on the individual daemons
To verify, 1) connect to an osd daemon pod, 2) run unset CEPH_ARGS, and then 3) run ceph daemon osd.0 config show
where the osd daemon ID needs to be replaced with the ID that was connected to
it will output a lot of settings, so grep for the one you need
```
Moving to ON_QA.

Comment 18 errata-xmlrpc 2023-11-08 18:52:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832