Bug 2045072
| Summary: | AWS - degradation on OCP4.10+ODF 4.10 vs OCP 4.9 + ODF 4.9 in files per second in CephFS 4KB create and append actions, and 16KB create actions | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Yuli Persky <ypersky> |
| Component: | ceph | Assignee: | Travis Nielsen <tnielsen> |
| ceph sub component: | RBD | QA Contact: | Elad <ebenahar> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | alayani, bniver, ekuric, jopinto, kramdoss, madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot, pnataraj, shberry, tnielsen, ypadia |
| Version: | 4.10 | Keywords: | Automation, Performance |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-10-19 06:04:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yuli Persky
2022-01-25 14:53:43 UTC
ypersky the must-gather attached here is for 4.10 or 4.9? Also in order to compare the performance, we will ned must-gather for both 4.9 and 4.10 to make sure if the time taken to create and append the file is on ceph-csi level or not. The must gather provided above was for 4.10 run. I've run the small files tests again on 4.9 and this is the link to 4.9 nyst gather : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/lr5-ypersky-a9/lr5-ypersky-a9_20220301T225837/logs/testcases_1646217501/ Not a 4.10 blocker, moving out @tnielsen have we confirmed this is a degradation in ceph fs performance and this is not a csi issue? The CSI driver is only involved in the provisioning and mounting. Not sure how that would affect the cephfs write performance since the csi driver is not in the data path. @vshankar Was there a change in performance in ceph fs in this time frame? @tnielsen Did the AWS provisioning of the PV change in this time frame? Was there a CI change? @vshankar Was there a change in performance in ceph fs in this time frame? @tnielsen Did the AWS provisioning of the PV change in this time frame? Was there a CI change? as we discussed offline, back to you Travis Scott and I discussed that there are no known changes to Rook or CephFS that would affect performance from 4.9 to 4.10. Yuli A common issue of AWS clusters is that the performance is not guaranteed for the devices that are provisioned. What is the storage class specified for the mons and OSDs when creating the cluster (storageClassDeviceSets volume template in the CephCluster CR)? Is it gp2? If you need consistent testing, you would to use a storage class that would guarantee consistent IOPS. @Travis Nielsen, The test is using a default storage class, not gp-2. When we test performance, we should use the same storage class as the customers, and that would be the default storage class. Which storage class guarantees consistent IOPs to your opinion and which does not ? io2 looks like the most predictable according to the AWS volume types page [1]: - gp2: 100 IOPS to a maximum of 16,000 IOPS, and provide up to 250 MB/s of throughput per volume - io1: up to 50 IOPS/GB to a maximum of 64,000 IOPS and provide up to 1,000 MB/s of throughput per volume - io2: 500 IOPS for every provisioned GB [1] https://aws.amazon.com/ebs/volume-types I've run the same test ( test_small_file_workload.py) again on 4.0.10.221 , on AWS. The purpose of the run was to see whether the degradation is consistent. The relevant Jenkins job ( where all the must gather logs are stored) is: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/11859/ The Performance Dashboard comparison of the new run on 4.0.10.221 vs 4.9 is availble here: http://ocsperf.ceph.redhat.com:8080/index.php?version1=13&build1=26&platform1=1&az_topology1=1&test_name%5B%5D=2&version2=14&build2=63&platform2=1&az_topology2=1&version3=&build3=&version4=&build4=&submit=Choose+options We actually do see a similar degradation AGAIN in CephFS Small Files results. The degradation in 4 KB files size is 8% for create action and 25% for append action. The degradation in 16KB diles size is 32% for creation action. This confirm the first results that we got on build 4.0.10.73 , when I opened this BZ. Looks like the degradation IS consistent. And the test is using the default storage classes, similarly to what customers are likely to do. |