2218116 – Avoid wrong detection of disk media type, such as HDD instead of SSD on vSAN

Bug 2218116 - Avoid wrong detection of disk media type, such as HDD instead of SSD on vSAN

Summary: Avoid wrong detection of disk media type, such as HDD instead of SSD on vSAN

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.14.0
Assignee:	Malay Kumar parida
QA Contact:	Aman Agrawal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-06-28 08:34 UTC by Elad
Modified:	2023-11-08 18:53 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.14.0-123
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 18:52:10 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 2123	None	open	Avoid wrong detection of disk type, such as HDD instead of SSD on vSAN	2023-08-06 08:31:17 UTC
Github	red-hat-storage ocs-operator pull 2170	None	open	Bug 2218116:[release-4.14] Avoid wrong detection of disk type, such as HDD instead of SSD on vSAN	2023-08-29 12:23:02 UTC
Red Hat Product Errata	RHSA-2023:6832	None	None	None	2023-11-08 18:53:09 UTC

Description Elad 2023-06-28 08:34:10 UTC

Description of problem (please be detailed as possible and provide log
snippets):

In some situations, such as vSphere VSAN, the drive media type is detected wrongly, for instance as HDD instead of SSD. For that, it would be good if we can set Set bluestore_debug_enforce_settings = "ssd" in rook-config-override

Comment 1 Mudit Agarwal 2023-06-28 10:37:08 UTC

I guess this should go to ocs-operator as we need to set it there. 
Malay, can you pick it up?

Comment 2 Mudit Agarwal 2023-06-28 10:37:31 UTC

More context: https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c47

Comment 3 Subham Rai 2023-06-28 12:25:09 UTC

I think we should be able to set `Set bluestore_debug_enforce_settings = "ssd" ` it now as well, but I'll check and update here.

Comment 4 Malay Kumar parida 2023-06-30 06:47:48 UTC

Hi @sapillai , Can you take a look once? Is this related to your feature epic you are working for 4.14 https://github.com/red-hat-storage/ocs-operator/pull/2053.

Comment 6 Malay Kumar parida 2023-08-05 14:09:31 UTC

I was looking at this comment https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c41 from the linked BZ.

```
RECOMMENDATION
Force BlueStore to use settings designed for SSDs.

1) set
bluestore_debug_enforce_settings = "ssd"
but it requires restart

OR

2) set 
bluestore_prefer_deferred_size_hdd = 0
should work right away and no new deferred writes will be enqueued
```

I see the 2nd option will not require OSD restarts. So shouldn't we set that one instead of the 1st one? I see in the linked BZ Aman went ahead with testing with the 2nd option.
In the case of customers who will upgrade from earlier versions of ODF will the 1st settings work as intended directly or it will require an osd restart?

Comment 7 Malay Kumar parida 2023-08-05 14:10:44 UTC

Also does the setting need to go under the global section or any other specific section?
ref-https://github.com/red-hat-storage/ocs-operator/blob/abe61d1773697c03af99cae0095d3136f252fa7c/controllers/storagecluster/cephconfig.go#L31

Comment 8 Elad 2023-08-06 07:13:42 UTC

(In reply to Malay Kumar parida from comment #6)

> I see the 2nd option will not require OSD restarts. So shouldn't we set that
> one instead of the 1st one? I see in the linked BZ Aman went ahead with
> testing with the 2nd option.

If the second option is the one we validated and it doesn't require OSD restart then we should go with it.

Comment 9 Aman Agrawal 2023-08-06 17:36:22 UTC

(In reply to Elad from comment #8)
> (In reply to Malay Kumar parida from comment #6)
> 
> > I see the 2nd option will not require OSD restarts. So shouldn't we set that
> > one instead of the 1st one? I see in the linked BZ Aman went ahead with
> > testing with the 2nd option.
> 
> If the second option is the one we validated and it doesn't require OSD
> restart then we should go with it.

This config. was tested only once. Elad, do you think a thorough testing is required here by perf. team or even us to validate the results?
Ref- https://bugzilla.redhat.com/show_bug.cgi?id=2154351#c49

Comment 14 Aman Agrawal 2023-09-21 11:09:36 UTC

Tested with on a RDR setup
ODF 4.14.0-136.stable
OCP 4.14.0-0.nightly-2023-09-02-132842
ACM 2.9.0-DOWNSTREAM-2023-08-24-09-30-12
subctl version: v0.16.0
ceph version 17.2.6-138.el9cp (b488c8dad42b2ecffcd96f3d76eeeecce48b8590) quincy (stable)

bash-5.1$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                              STATUS  REWEIGHT  PRI-AFF
 -1         4.50000  root default                                                                    
 -8         1.50000      rack rack0                                                                  
 -7         0.50000          host ocs-deviceset-thin-csi-odf-1-data-022gpw                           
  0    ssd  0.50000              osd.0                                          up   1.00000  1.00000
-19         0.50000          host ocs-deviceset-thin-csi-odf-2-data-15rvp9                           
  5    ssd  0.50000              osd.5                                          up   1.00000  1.00000
-21         0.50000          host ocs-deviceset-thin-csi-odf-2-data-2vlwvm                           
  6    ssd  0.50000              osd.6                                          up   1.00000  1.00000
-12         1.50000      rack rack1                                                                  
-11         0.50000          host ocs-deviceset-thin-csi-odf-0-data-0527jt                           
  2    ssd  0.50000              osd.2                                          up   1.00000  1.00000
-17         0.50000          host ocs-deviceset-thin-csi-odf-1-data-16gz27                           
  4    ssd  0.50000              osd.4                                          up   1.00000  1.00000
-25         0.50000          host ocs-deviceset-thin-csi-odf-1-data-2b8zjk                           
  8    ssd  0.50000              osd.8                                          up   1.00000  1.00000
 -4         1.50000      rack rack2                                                                  
-15         0.50000          host ocs-deviceset-thin-csi-odf-0-data-1d9dll                           
  3    ssd  0.50000              osd.3                                          up   1.00000  1.00000
-23         0.50000          host ocs-deviceset-thin-csi-odf-0-data-2tlj2p                           
  7    ssd  0.50000              osd.7                                          up   1.00000  1.00000
 -3         0.50000          host ocs-deviceset-thin-csi-odf-2-data-08bq7j                           
  1    ssd  0.50000              osd.1                                          up   1.00000  1.00000

While osd type is shown as ssd, ceph config still reports it as hdd

bash-5.1$ ceph config get osd bluestore_prefer_deferred_size_hdd
65536

The expected value here was 0.

Shared the cluster with Malay and got the confirmation that values aren't properly being set. Hence failing_qa...

For logs, refer C1 or C2 logs under http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/20sept23-1/ which are managed clusters where ODF is installed.

Comment 15 Malay Kumar parida 2023-09-26 11:20:35 UTC

As per the discussion here https://chat.google.com/room/AAAAREGEba8/6fVthUX9WA4,
according to Travis
```
The configuration in that configmap will not show up in the central config store, only on the individual daemons
To verify, 1) connect to an osd daemon pod, 2) run unset CEPH_ARGS, and then 3) run ceph daemon osd.0 config show
where the osd daemon ID needs to be replaced with the ID that was connected to
it will output a lot of settings, so grep for the one you need
```
Moving to ON_QA.

Comment 18 errata-xmlrpc 2023-11-08 18:52:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832

Note You need to log in before you can comment on or make changes to this bug.