Bug 1817228
| Summary: | [baremetal][RFE] OCS does not distinguish between SSD and HDD | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Ben England <bengland> | ||||||
| Component: | unclassified | Assignee: | N Balachandran <nibalach> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Petr Balogh <pbalogh> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 4.3 | CC: | assingh, bniver, ebenahar, ekuric, etamir, gmeno, madam, muagarwa, ocs-bugs, odf-bz-bot, owasserm, rcyriac, sabose, shan, shberry, sostapov, tmuthami | ||||||
| Target Milestone: | --- | Keywords: | AutomationBackLog, FutureFeature, Performance | ||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2022-05-31 13:47:19 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Ben England
2020-03-25 21:07:11 UTC
Setting to 4.6 as a tentative, but this requires work in LSO. @Seb, if you set one release flag, you should remove the other. Bugzilla is rather dumb. ;-) Created attachment 1674072 [details]
screenshot graph showing NVM device approaching max throughput
This graph was the result of an fio test where we used an all NVM storage pool in an OCS cluster. It was easy to create using the shell like this.
oc create -f toolbox.yaml
sleep 5
alias cephpod="ocos rsh $(ocos get pod | awk '/tools/{print $1}') ceph "
cephpod osd crush rule create-replicated fast default host ssd
cephpod osd pool create fast 256 256 replicated fast
cat > fast-sc.yaml <<EOF
allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: example-storagecluster-ceph-rbd-fast
resourceVersion: "1452259"
selfLink: /apis/storage.k8s.io/v1/storageclasses/example-storagecluster-ceph-rbd-fast
uid: d8814e53-f6f8-49a5-8ce9-31c6f5626796
parameters:
clusterID: openshift-storage
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
imageFeatures: layering
imageFormat: "2"
pool: fast
provisioner: openshift-storage.rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF
oc create -f fast-sc.yaml
using 14 ripsaw fio pods spread across 7 hosts (2 per host), with this CR:
apiVersion: ripsaw.cloudbulldozer.io/v1alpha1
kind: Benchmark
metadata:
name: fio-benchmark
namespace: my-ripsaw
spec:
elasticsearch:
server: "marquez.perf.lab.eng.rdu2.redhat.com"
port: 9200
clustername: "bene-alias-cloud02-2020-03-24"
test_user: bene
workload:
name: "fio_distributed"
args:
samples: 1
servers: 14
pin_server: ''
jobs:
- "write"
- "randread"
bs:
- 4MiB
- 4KiB
numjobs:
- 1
iodepth: 4
read_runtime: 60
read_ramp_time: 5
filesize: 2GiB
log_sample_rate: 1000
storageclass: example-storagecluster-ceph-rbd-fast
accessmode: ReadWriteOnce
storagesize: 30Gi
I got up to 1.4 GB/s throughput from each NVM SSD during a write. This is a significant percentage of the NVM device capacity (have to take down the Ceph cluster and dd to the NVM device to find out what percentage. Am measuring random IOPS next. This 1 NVM is the equivalent of 7 HDDs for a sequential workload, but for a random workload it should be 10-50 times faster than HDD if Ceph pods can keep up. Will update with Random I/O results.
Hi Sahina, Seb, If this requires work in LSO, is there an OCP BZ to track this work? One more thing, the performance difference seems significant enough to strive for having this done in OCS 4.6. Can we consider retargeting? Not possible for 4.6 For 4.7, we have 2 epics that are related to this 1. Using SSD for metadata and HDD for data PVs 2. Creating multiple pools based on device type of OSDs. The LSO work involved is to ensure we can identify the device type from the LSO PV & Storageclass. The epic is not yet created in OCP storage We have support for segregating Metadata and data via https://issues.redhat.com/browse/KNIP-1546 and support for specifying deviceClass (and overriding the auto-detected one) and pools based on deviceClass via https://issues.redhat.com/browse/KNIP-1545 We don't have a way to correct the auto-detection as we rely on the rotational property of device reported by lsblk command Does this cover the asks of the bug? sorry I didn't see the needinfo, too much e-mail. Unfortunately, the previous comment mentions a workaround for the problem, but does not address this feature need for automatic distinguishing between SSDs and HDDs. A typical storage customer would expect the storage system to understand which devices are SSDs and which are HDDs. I understand the technical reasons why this is hard to do - Ceph is just defaulting to whatever /sys/block/sdX/queue/rotational says, but this is a lie in many cases (example: RAID controllers). I thought Sebastien Han had suggested some solutions to this bug for several storage classes based on querying storage-class-specific attributes. For example, in AWS, the storage type (i.e. gp2 tells you its SSD, st1/sc1 is HDD). https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html I'm sure Azure is the same. Is there any public cloud vendor that doesn't allow the user to query attributes of the device that indicate what type of performance to expect? Not sure about VMware. for baremetal, there may be a way to use libstoragemgmt or something like that to dig out the metadata that will distinguish SAS/SATA SSDs from HDDs. Secondly, until recently the only kind of device supported by OCS was SSD, so it should have been overriding from day 1. But now that we are expanding support to HDDs, this problem has to be solved by better automatic detection - otherwise it could get to be a support nightmare. Looks like there are more requirements which need to be addressed, moving it to 4.8 Hi Seb, Ben mentions in Comment 8 about possible solutions suggested by you. Do we have any reliable way of detecting disk type in cloud/virtualized environments? Hi Sahina, I guess I was thinking we could build some kind of matrix-based out on the information the cloud/virt providers are giving to us (through their respective documentation). Just like Ben mentioned, we don't have any reliable way to determine the underlying disk family, so that's what we were thinking about. Based on request from engineering, the 'installation' component has been deprecated Eran, please create a Jira epic for this. |