We need two rook commits for this issue which are already there in rook upstream 1.4.9 Have created this BZ to track the same. Because changes are already there in other releases, this issue is only required for 4.6.z
Agreed. On Azure had to set udev rules via machineconfig to expose disks as SSDs to OCS. Sets SSD as device type in crush map. Without udev rule device shows up as HDD.
In order to verify this fix I've deployed 4.6.3 on Azure platform. The build is: (yulidir) [ypersky@qpas ocs-ci]$ oc -n openshift-storage get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.3-261.ci OpenShift Container Storage 4.6.3-261.ci Succeeded (yulidir) [ypersky@qpas ocs-ci]$ When checking whether SSD tuning is automatically applied on all OSDs I've run the following commands and saw that the tunings are NOT applied. 1) yulidir) [ypersky@qpas ocs-ci]$ oc rsh -n openshift-storage rook-ceph-tools-57c7996cd8-j7mjc sh-4.4# ceph config dump WHO MASK LEVEL OPTION VALUE RO global basic log_file * global advanced mon_allow_pool_delete true global advanced mon_cluster_log_file global advanced mon_pg_warn_min_per_osd 0 global advanced osd_pool_default_pg_autoscale_mode on global advanced rbd_default_features 3 mgr advanced mgr/balancer/active true mgr advanced mgr/balancer/mode upmap mgr. advanced mgr/prometheus/rbd_stats_pools ocs-storagecluster-cephblockpool * mgr.a advanced mgr/dashboard/a/server_addr 10.128.2.13 * mgr.a advanced mgr/prometheus/a/server_addr 10.128.2.13 * mds.ocs-storagecluster-cephfilesystem-a basic mds_cache_memory_limit 4294967296 mds.ocs-storagecluster-cephfilesystem-b basic mds_cache_memory_limit 4294967296 sh-4.4# 2) oc get sc -oyaml and oc get cephcluster -oyaml commands outputs are available here: The outputs of both commands are saved in a file here: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-1909793/ Reopening this bug.
Hi, Just to add to comment 8 above, I also tested this on my Azure Setup with OCS 4.6.3 RC build and the issue still persists. oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.3-724.ci OpenShift Container Storage 4.6.3-724.ci Succeeded oc version Client Version: 4.6.16 Server Version: 4.6.16 Kubernetes Version: v1.19.0+e49167a ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 6.00000 root default -5 6.00000 region eastus -14 2.00000 zone eastus-1 -13 2.00000 host ocs-deviceset-managed-premium-0-data-0-s86x4 2 hdd 2.00000 osd.2 up 1.00000 1.00000 -10 2.00000 zone eastus-2 -9 2.00000 host ocs-deviceset-managed-premium-1-data-0-78g7j 1 hdd 2.00000 osd.1 up 1.00000 1.00000 -4 2.00000 zone eastus-3 -3 2.00000 host ocs-deviceset-managed-premium-2-data-0-ghlvw 0 hdd 2.00000 osd.0 up 1.00000 1.00000 --Shekhar
Shekhar, the way to validate this fix is by checking the osd CLI arguments. So exec into any OSD pod look for the run flags by running "ps fauxwwww|grep ceph-os[d]" You should see a few flag like "--osd_op_num_threads_per_shard=2". Moving back ON_QA.
I've performed the following steps: 1) rsh to one of the osd pods: 2) ps fauxwwww|grep osd I"ve got the following output : 22960 14.2 3.8 4861740 2519528 ? Ssl Feb25 1051:46 \_ ceph-osd --foreground --id 0 --fsid cc8613e6-2114-420b-9409-53640664e54f --setuser ceph --setgroup ceph --crush-location=root=default host=ocs-deviceset-0-data-0-hqgj9 rack=rack0 region=eastus zone=eastus-1 --osd-op-num-threads-per-shard=2 sh-4.4# ps fauxwwww| grep osd root 1010699 0.0 0.0 9188 996 pts/0 S+ 17:17 0:00 \_ grep osd root 22483 0.0 0.0 143476 2812 ? Ssl Feb25 0:00 /usr/libexec/crio/conmon -b /var/run/containers/storage/overlay-containers/2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e/userdata -c 2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-storage_rook-ceph-osd-0-6dd687c6cf-p7k9w_af8ec3f7-4d46-48d1-bfc7-6311021576c1/2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e.log --log-level info -n k8s_POD_rook-ceph-osd-0-6dd687c6cf-p7k9w_openshift-storage_af8ec3f7-4d46-48d1-bfc7-6311021576c1_0 -P /var/run/containers/storage/overlay-containers/2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e/userdata/conmon-pidfile -p /var/run/containers/storage/overlay-containers/2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e/userdata/pidfile --persist-dir /var/lib/containers/storage/overlay-containers/2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e/userdata -r /usr/bin/runc --runtime-arg --root=/run/runc --socket-dir-path /var/run/crio -u 2ac7cf6d1c6411c77d48548a06a4e11e587a10ff508b693e7bcca204d322209e -s root 22948 0.0 0.0 143476 2744 ? Ssl Feb25 0:01 /usr/libexec/crio/conmon -b /var/run/containers/storage/overlay-containers/5cdb1106c1797431bd9ed1c600e5bc893e145ef2be1b00f2c7b0b1ac7ec858b8/userdata -c 5cdb1106c1797431bd9ed1c600e5bc893e145ef2be1b00f2c7b0b1ac7ec858b8 --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-storage_rook-ceph-osd-0-6dd687c6cf-p7k9w_af8ec3f7-4d46-48d1-bfc7-6311021576c1/osd/0.log --log-level info -n k8s_osd_rook-ceph-osd-0-6dd687c6cf-p7k9w_openshift-storage_af8ec3f7-4d46-48d1-bfc7-6311021576c1_0 -P /var/run/containers/storage/overlay-containers/5cdb1106c1797431bd9ed1c600e5bc893e145ef2be1b00f2c7b0b1ac7ec858b8/userdata/conmon-pidfile -p /var/run/containers/storage/overlay-containers/5cdb1106c1797431bd9ed1c600e5bc893e145ef2be1b00f2c7b0b1ac7ec858b8/userdata/pidfile --persist-dir /var/lib/containers/storage/overlay-containers/5cdb1106c1797431bd9ed1c600e5bc893e145ef2be1b00f2c7b0b1ac7ec858b8/userdata -r /usr/bin/runc --runtime-arg --root=/run/runc --socket-dir-path /var/run/crio -u 5cdb1106c1797431bd9ed1c600e5bc893e145ef2be1b00f2c7b0b1ac7ec858b8 -s ceph 22960 14.2 3.8 4861740 2519528 ? Ssl Feb25 1051:46 \_ ceph-osd --foreground --id 0 --fsid cc8613e6-2114-420b-9409-53640664e54f --setuser ceph --setgroup ceph --crush-location=root=default host=ocs-deviceset-0-data-0-hqgj9 rack=rack0 region=eastus zone=eastus-1 --osd-op-num-threads-per-shard=2 --osd-op-num-shards=8 --osd-recovery-sleep=0 --osd-snap-trim-sleep=0 --osd-delete-sleep=0 --bluestore-min-alloc-size=4096 --bluestore-prefer-deferred-size=0 --bluestore-compression-min-blob-size=8912 --bluestore-compression-max-blob-size=65536 --bluestore-max-blob-size=65536 --bluestore-cache-size=3221225472 --bluestore-throttle-cost-per-io=4000 --bluestore-deferred-batch-ops=16 --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false sh-4.4# We see that in process 22960 the requested property "--osd-op-num-threads-per-shard=2" I've verifyied it for each one of the OSD0/1/2 pods. => closing the bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.6.3 container bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0718
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days