Bug 1903973 - [Azure][ROKS] Set SSD tuning (tuneFastDeviceClass) as default for OSD devices in Azure/ROKS platform
Summary: [Azure][ROKS] Set SSD tuning (tuneFastDeviceClass) as default for OSD devices...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: OCS 4.7.0
Assignee: Pulkit Kundra
QA Contact: Yuli Persky
URL:
Whiteboard:
Depends On:
Blocks: 1909793 1925004
TreeView+ depends on / blocked
 
Reported: 2020-12-03 09:23 UTC by Sahina Bose
Modified: 2021-06-01 08:45 UTC (History)
11 users (show)

Fixed In Version: 4.7.0-701.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1909793 (view as bug list)
Environment:
Last Closed: 2021-05-19 09:16:33 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 955 0 None closed cephcluster: tune disks according to plaftorm 2021-02-16 13:42:43 UTC
Github openshift ocs-operator pull 994 0 None closed Bug 1903973: [release-4.7] cephcluster: tune disks according to plaftorm 2021-02-16 13:42:44 UTC
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:17:26 UTC

Description Sahina Bose 2020-12-03 09:23:28 UTC
Description of problem (please be detailed as possible and provide log
snippests):

For OSDs created on virtual/cloud environments, the OSD is detected as rotational disk though the underlying device is mostly SSD.
We want to default to SSD tuning options for these devices in order to get better performance results. See bug 1848907#c13 for instance.

We could either default to using tuneFastDeviceClass for specific environments or for all environments. Considering the majority of workloads are run on SSD backed devices, I think it makes sense to set this as default

Version of all relevant components (if applicable):
4.6

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Reduced performance

Is there any workaround available to the best of your knowledge?
Manually setting SSD tune options

Comment 1 Sahina Bose 2020-12-21 16:28:48 UTC
Pulkit, is this addressed by https://github.com/openshift/ocs-operator/pull/955 ?

Comment 8 Yuli Persky 2021-02-03 06:40:49 UTC
On the 4.6.2 OCS azure cluster: 

(yulidir) [ypersky@qpas ocs-ci]$ oc rsh rook-ceph-tools-6fdd868f75-259zs
sh-4.4# ceph config dump
WHO                                         MASK LEVEL    OPTION                              VALUE                            RO 
global                                           basic    log_file                                                             *  
global                                           advanced mon_allow_pool_delete               true                                
global                                           advanced mon_cluster_log_file                                                    
global                                           advanced mon_pg_warn_min_per_osd             0                                   
global                                           advanced osd_pool_default_pg_autoscale_mode  on                                  
global                                           advanced rbd_default_features                3                                   
  mgr                                            advanced mgr/balancer/active                 true                                
  mgr                                            advanced mgr/balancer/mode                   upmap                               
    mgr.                                         advanced mgr/prometheus/rbd_stats_pools      ocs-storagecluster-cephblockpool *  
    mgr.a                                        advanced mgr/dashboard/a/server_addr         10.128.2.11                      *  
    mgr.a                                        advanced mgr/prometheus/a/server_addr        10.128.2.11                      *  
  osd                                            dev      bluestore_cache_size                3221225472                          
  osd                                            advanced bluestore_compression_max_blob_size 65536                               
  osd                                            advanced bluestore_compression_min_blob_size 8192                                
  osd                                            advanced bluestore_deferred_batch_ops        16                                  
  osd                                            dev      bluestore_max_blob_size             65536                               
  osd                                            advanced bluestore_min_alloc_size            4000                             *  
  osd                                            advanced bluestore_prefer_deferred_size      0                                   
  osd                                            advanced bluestore_throttle_cost_per_io      4000                                
  osd                                            advanced osd_delete_sleep                    0.000000                            
  osd                                            advanced osd_op_num_shards                   8                                *  
  osd                                            advanced osd_op_num_threads_per_shard        2                                *  
  osd                                            advanced osd_recovery_sleep                  0.000000                            
  osd                                            advanced osd_snap_trim_sleep                 0.000000                            
    mds.ocs-storagecluster-cephfilesystem-a      basic    mds_cache_memory_limit              4294967296                          
    mds.ocs-storagecluster-cephfilesystem-b      basic    mds_cache_memory_limit              4294967296                          
sh-4.4#

Comment 10 Yuli Persky 2021-04-04 21:20:50 UTC
It looks like the fix is not applied in 4.7 ( please correct me if I am wrong). 
I've did exactly the same as in comment 8 and did not get the same output. 
To be more detailed: 

1) I've used Azure cluster musoni-30 ( https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/1646/  ) 


2) The ocs version : 

(myenv) [ypersky@ypersky auth]$ oc -n openshift-storage get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.7.0-324.ci   OpenShift Container Storage   4.7.0-324.ci              Succeeded
(myenv) [ypersky@ypersky auth]$ 

3) (myenv) [ypersky@ypersky auth]$ oc rsh rook-ceph-tools-76bc89666b-xtdj5
sh-4.4# 
sh-4.4# ceph config dump
WHO                                         MASK LEVEL    OPTION                             VALUE                            RO 
global                                           basic    log_file                           /var/log/ceph/$cluster-$name.log *  
global                                           basic    log_to_file                        true                                
global                                           advanced mon_allow_pool_delete              true                                
global                                           advanced mon_cluster_log_file                                                   
global                                           advanced mon_pg_warn_min_per_osd            0                                   
global                                           advanced osd_pool_default_pg_autoscale_mode on                                  
global                                           advanced osd_scrub_auto_repair              true                                
global                                           advanced rbd_default_features               3                                   
  mgr                                            advanced mgr/balancer/active                true                                
  mgr                                            advanced mgr/balancer/mode                  upmap                               
    mgr.                                         advanced mgr/prometheus/rbd_stats_pools     ocs-storagecluster-cephblockpool *  
    mgr.a                                        advanced mgr/prometheus/a/server_addr       10.131.0.38                      *  
    mds.ocs-storagecluster-cephfilesystem-a      basic    mds_cache_memory_limit             4294967296                          
    mds.ocs-storagecluster-cephfilesystem-b      basic    mds_cache_memory_limit             4294967296                          
sh-4.4#


My conclusion: Since the following tunings 

osd                                            dev      bluestore_cache_size                3221225472                          
  osd                                            advanced bluestore_compression_max_blob_size 65536                               
  osd                                            advanced bluestore_compression_min_blob_size 8192                                
  osd                                            advanced bluestore_deferred_batch_ops        16                                  
  osd                                            dev      bluestore_max_blob_size             65536                               
  osd                                            advanced bluestore_min_alloc_size            4000                             *  
  osd                                            advanced bluestore_prefer_deferred_size      0                                   
  osd                                            advanced bluestore_throttle_cost_per_io      4000                                 

do not appear in the "ceph config dump" output,this version (4.7.0-324.ci) does not include the fix. 
=> reopening the bug and changing the status to Assigned.

Comment 12 Yuli Persky 2021-04-05 15:14:18 UTC
@Pulkit Kundra,

I've verified with 3).

1) run oc rsh rook-ceph-tools-76bc89666b-xtdj5

2) Run each one of the following commands : 

   ceph config show osd.0
   ceph config show osd.1
   ceph config show osd.2

and in each one of the outputs the following parameters appeared: 

sh-4.4# ceph config show osd.0
NAME                                 VALUE                                                                                                                                     SOURCE   OVERRIDES             IGNORES 
bluestore_cache_size                 3221225472                                                                                                                                cmdline                                
bluestore_compression_max_blob_size  65536                                                                                                                                     cmdline                                
bluestore_compression_min_blob_size  8912                                                                                                                                      cmdline                                
bluestore_deferred_batch_ops         16                                                                                                                                        cmdline                                
bluestore_max_blob_size              65536                                                                                                                                     cmdline                                
bluestore_min_alloc_size             4096                                                                                                                                      cmdline                                
bluestore_prefer_deferred_size       0                                                                                                                                         cmdline                                
bluestore_throttle_cost_per_io       4000                                                                                                                                      cmdline                                


sh-4.4# ceph config show osd.1
NAME                                 VALUE                                                                                                                                     SOURCE   OVERRIDES             IGNORES 
bluestore_cache_size                 3221225472                                                                                                                                cmdline                                
bluestore_compression_max_blob_size  65536                                                                                                                                     cmdline                                
bluestore_compression_min_blob_size  8912                                                                                                                                      cmdline                                
bluestore_deferred_batch_ops         16                                                                                                                                        cmdline                                
bluestore_max_blob_size              65536                                                                                                                                     cmdline                                
bluestore_min_alloc_size             4096                                                                                                                                      cmdline                                
bluestore_prefer_deferred_size       0                                                                                                                                         cmdline                                
bluestore_throttle_cost_per_io       4000                                                                                                                                      cmdline                                
crush_location                       root=default host=ocs-deviceset-1-data-05flj5 region=eastus zone=eastus-2                                                                 cmdline                                



sh-4.4# ceph config show osd.2
NAME                                 VALUE                                                                                                                                     SOURCE   OVERRIDES             IGNORES 
bluestore_cache_size                 3221225472                                                                                                                                cmdline                                
bluestore_compression_max_blob_size  65536                                                                                                                                     cmdline                                
bluestore_compression_min_blob_size  8912                                                                                                                                      cmdline                                
bluestore_deferred_batch_ops         16                                                                                                                                        cmdline                                
bluestore_max_blob_size              65536                                                                                                                                     cmdline                                
bluestore_min_alloc_size             4096                                                                                                                                      cmdline                                
bluestore_prefer_deferred_size       0                                                                                                                                         cmdline                                
bluestore_throttle_cost_per_io       4000                                                                                                                                      cmdline                            


=> Changing the bug status to "verified".

Comment 14 errata-xmlrpc 2021-05-19 09:16:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.