Bug 2024132
| Summary: | VMware LSO - degradation of performance in CephFS clone creation times in OCP4.9+ODF4.9 vs OCP4.8+OCS4.8 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Yuli Persky <ypersky> |
| Component: | csi-driver | Assignee: | Humble Chirammal <hchiramm> |
| Status: | CLOSED NOTABUG | QA Contact: | Elad <ebenahar> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | alayani, assingh, jopinto, kramdoss, madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot, rar, ygupta, ypadia, ypersky |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-07 02:40:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yuli Persky
2021-11-17 11:58:46 UTC
@ypersky.com I don't see a detailed description in the bug description. Can you please update the bug with the report and other important details. I apologize for not providing proper description earlier. The first comment was update with all the information, please let me know in case you need any further inputs. @Yug Gupta, It is not possible to deploy OCP 4.8 + ODF 4.9 on vmware LSO cluster. Not supported. As for other platforms - here is the AWS 4.9 report : https://docs.google.com/document/d/1vyufd55iDyvKeYOwoXwKSsNoRK2VR41QNTuH-iERR8s/edit#heading=h.2m8gdjc4jhzo From the comparison between 4.8 and 4.9 - no degradation in Clone creation times is seen on AWS. However, we do see degradation on VMware LSO. @Yug Gupta, So what component should I change this BZ to ? Regarding must gather logs - unfortunately we did not collect those logs and the cluster is not available now. If it is needed - I can reproduce the problem on a newly deployed cluster and collect the must gather /start the test from Jenkins and in this way must gather will be collected automatically. ypersky we will need must-gather here to calculate and check the time spend by the ceph-csi here. I've run test_pvc_clone_performance,py test again on 4.9 ( ocp + odf) vmware lso cluster. This is the link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10491/ This is the link to must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ypersky-lso9/ypersky-lso9_20220228T124019/logs/testcases_1646291211/ *please note that there is a chance that the relevant must gather might be located in one of the testcase* directories here: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ypersky-lso9/ypersky-lso9_20220228T124019/logs/ I'm keeping here this link to be on the safe side. However, the first link should contain the relevant logs. I've also run the test_pvc_clone_performance.py test on 4.10 vmware lso cluster and must gather is available here: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10162/ http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ypersky-local10/ypersky-local10_20220215T103839/logs/testcases_1645436035/ Please also note the following: 1) VMWare lso comparison report is available here: https://docs.google.com/document/d/19ZRfwhfbpYF2f6hUxCM5lCt0uLoNo3ibOMWTbZTUTqw/edit# 2) When I run test_pvc_clone_creation_performance.py test on newly deployed 4.9 ocp + 4.9 odf cluster, the measurements for cephfs clone creation are : 4.9 CephFS Clone creation times: 1 GB clone: 1.758 sec 25 GB clone: 64.390 sec 50GB clone: 128.320 sec 100GB clone: 256.135 sec Those measurements are similar to 4.8 results( taken from this bug description) Clone size: 1 Gi, Creation time: 2.63 sec Clone size: 25 Gi, Creation time: 64.12 sec Clones size: 50 Gi, Creation time: 95.025 sec and much better that the 4.9 results also mentioned in the description of this bug ( copying it here): Clone size: 1 Gi, Creation time: 8.12 sec Clone size: 25 Gi, Creation time: 64.74 sec ( here the time is the same as in 4.8 + 4.8) Clones size: 50 Gi, Creation time: 193.26 sec Also please note the current 4.10 results: 4.10 CephFS Clone creation times: 1 GB clone: 2.021 sec 25GB clone: 49.336 sec 50GB clone: 131.583 sec 100GB clone: 258.400 sec I have an explanation to the DIFFERENT 4.9 measurements: it looks like we need to add more samples to the clones creation/deletion test ( this is already in our work plan). Taking all the above into consideration I think that we can close this bug. The only degradation in 4.10 vs 4.8 is in CephFS 50 GB clone ( ~30%). However, the 100GB clone creation time is similar in both 4.8 and 4.10. The QE indeed should add more samples to this test for the measurements to be more accurate. |