Bug 2039924
| Summary: | AWS - degradation in pod reattach time for both CephFS Pods with ~850K files in ODF 4.10 vs ODF 4.9 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Yuli Persky <ypersky> |
| Component: | csi-driver | Assignee: | Rakshith <rar> |
| Status: | CLOSED NOTABUG | QA Contact: | Elad <ebenahar> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.10 | CC: | alayani, jopinto, kramdoss, madam, mmuench, muagarwa, ocs-bugs, odf-bz-bot |
| Target Milestone: | --- | Keywords: | Automation, Performance, Regression |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-11 02:33:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yuli Persky
2022-01-12 18:01:27 UTC
Yuli, Can we also run ODF 4.9 + OCP 4.10? Also, we will need must-gather for all the runs. @Mudit Agarwal 1) I did run the test on OCP 4.9 and ODF 4.10 ( see the results in the bug description). Is OCP 4.10 + ODF 4.9 a supported combination? Please write here if yes, and I will try to deploy it and run the test. 2) Must gather for OCP 4.10 + ODF 4.10 run is available here: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-056ai3c33-p/j-056ai3c33-p_20211230T130122/logs/testcases_1640872857/ >> Is OCP 4.10 + ODF 4.9 a supported combination?
Yes, till ODF 4.10 is released people will have ODF 4.9 only on their OCP 4.10 cluster.
Also, I want to narrow down the problem area. This will help us in determining whether the regression is in ODF or OCP
@Mudit, I will deploy OCP 4.10 with ODF 4.9 , run the test and report the results. @Mudit Agarwal, Per your request I've deployed OCP 4.10 + ODF 4.9 cluster. The results of CephFS pod reattach time on 4.10+4.9 also show degradation comparing to OCP 4.9 + ODF 4.9. Ceph reattach time for pod with ~200K files are: OCP 4.9 + ODF 4.9: 41 secs; OCP 4.9 + ODF 4.10 : 41.1 sec; OCP 4.10 + ODF 4.9 : 52.78 sec; OCP 4.10+ ODF 4.10: 47.43 sec Ceph reattach time for pod with ~850K files are: OCP 4.9 + ODF 4.9: 178 secs; OCP 4.9 + ODF 4.10 : 228.9 sec; OCP 4.10 + ODF 4.9 : 266.14 sec; OCP 4.10+ ODF 4.10: 282 sec The full comparison report which includes the results is available here: https://docs.google.com/document/d/1OJfARHBAJs6bkYqri_HpSNM_N5gchUQ6P-lKe6ujQ6o/edit# Please note : must gather is available here: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-056ai3c33-p/j-056ai3c33-p_20211230T130122/logs/testcases_1640872857/ I've run again the pod reattach test on 4.9 OCP + 4.9 ODF. You can find relevant must gather logs here: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/lr5-ypersky-a9/lr5-ypersky-a9_20220301T225837/logs/testcases_1646222463/ as for the compbinational cluster - unfortunately I do not have must gather for the combinational run. Hi Rakshith, The performance suite was run as a bulk ( one after another) via Jenkins and must gather was collected AFTER all the tests run. Therefore I cannot narrow it down, unfortunately. Also meanwhile we did not add csi times to this test. However, this is pending in our team work plan, and I hope to have this fix added to the test in near future. Also please note that this test will be enhanced by the not calculating pull image each time we create a pod. The fixed ( default pod policy will not pull image each time) test is running on 4.9.4 build 7 : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10676/ on 4.10.0 build 184: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10677/ when the tests finish I'll update on the results comparison. The results of the fixed pod reattachtime tests are available at the Performance Dashboard at this link: http://ocsperf.ceph.redhat.com:8080/index.php?version1=17&build1=51&platform1=1&az_topology1=1&test_name%5B%5D=6&version2=14&build2=53&platform2=1&az_topology2=1&version3=&build3=&version4=&build4=&submit=Choose+options 4.9 Jenkins Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10676/parameters/ 4.9 must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/lr5-ypersky-9aws/lr5-ypersky-9aws_20220309T120256/logs/testcases_1646865587/ 4.10 Jenkins Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10677/ 4.10 must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/lr5-ypersky-10aws/lr5-ypersky-10aws_20220309T120401/logs/testcases_1646865624/ The measurements are the following : 4.9.4. build 7 CephFS Pod Reattach time for pod with ~850K : 308.219 sec 4.10.0 build 184 CephFS Pod Reattach time for pod with ~850K : 315.914 sec Both measures are higher than the measures taken during the previous run, but they do NOT show a degradation. Therefore I think we should close this BZ. PLease note that in general the pod reattach time measurements on 4.10 are high ( 315 seconds) if we compare it to the gp2 performance. But that's a different issue, not related to degradation in ODF and a separate BZ will be filed on that ( providing all the details). |