Bug 2039881 - AWS - degradation in pvc attach time for both RBD and CephFS PVCs in ODF 4.10 vs ODF 4.9
Summary: AWS - degradation in pvc attach time for both RBD and CephFS PVCs in ODF 4.10...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
: ---
Assignee: Rakshith
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-12 16:25 UTC by Yuli Persky
Modified: 2023-08-09 16:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 04:15:03 UTC
Embargoed:


Attachments (Terms of Use)

Description Yuli Persky 2022-01-12 16:25:13 UTC
Description of problem (please be detailed as possible and provide log
snippests):

There is a degradation in pvc attach time for both RBD and CephFS PVCs in ODF 4.10 vs ODF 4.9 

Version of all relevant components (if applicable):

ODF 4.10.0.50

Note : you may find additional details in the following Jenkins job : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/view/Performance/job/qe-trigger-aws-ipi-3az-rhcos-3m-3w-performance/56/


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3


Can this issue reproducible?

Yes. 
I also reproduced this problem ( degradation) on a cluster deployed with OCP 4.9 and ODF 4.10. 


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:

In 4.9 OCP + 4.9 ODF the average of 10 PVCs attach times were: 

RBD: 7.4 sec
CephFS: 6.6 sec


In 4.10 OCP + 4.10 ODF the average of 10 PVCs attach times were: 

RBD: 10.2 sec
CephFS: 8.8 sec


In 4.9 OCP + 4.10 ODF the average of 10 PVCs attach times were: 

RBD: 12.8 sec
CephFS: 11 sec

The detailed comparison report is available here: 

https://docs.google.com/document/d/1OJfARHBAJs6bkYqri_HpSNM_N5gchUQ6P-lKe6ujQ6o/edit#


Steps to Reproduce:
1. Run test_pvc_attachtime.py test
2.Compare its results ( average attach time of 10 samples) to 4.9 results ( available in this report: https://docs.google.com/document/d/1vyufd55iDyvKeYOwoXwKSsNoRK2VR41QNTuH-iERR8s/edit ) 
3.


Actual results:

Average attach time in ODF 4.10 ( with both OCP 4.9 and OCP 4.10) is at least 30% worse than in OCP 4.9 + ODF 4.9, for both RBD and CephFS. 
Please not that this is an average of 10 samples. 

Expected results:

Average attach time should be same or shorter than in OCP 4.9 + ODF 4.9. 


Additional info:

Relevand Jenkins job:

https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/view/Performance/job/qe-trigger-aws-ipi-3az-rhcos-3m-3w-performance/56/

Comparison report: 

https://docs.google.com/document/d/1OJfARHBAJs6bkYqri_HpSNM_N5gchUQ6P-lKe6ujQ6o/edit#

Comment 5 Yuli Persky 2022-02-23 15:39:54 UTC
@Rakshith

Thank you for pointing out that the imagePullPolicy` default is Always. 
We checked our test logs and this is indeed what is going on, not only in the test_pvc_attachtime test but also in others. 
This means that the currently reported measurements of attach/reattach times include pulling image.

I've added fixing all the 3 tests to QPAS team workplan in P0 priority. 

After the tests are fixed we would be able to provide more accurate attach/reattach times.

Comment 6 Yuli Persky 2022-03-07 22:05:45 UTC
All the performance tests that were using default pull policy: Always were fixed not to pull image each time. 

I will run them on 4.10 and 4.9 and will post here the results of this comparison.

Comment 7 Yuli Persky 2022-03-09 18:10:15 UTC
An Update: 

I've run the fixed pvc_attachtime.py test ( the fix was not to pull image each time) on 4.10.0 build 184 latest and 4.9.4 build 7. 
The comparison is available here: 

http://ocsperf.ceph.redhat.com:8080/index.php?version1=17&build1=51&platform1=1&az_topology1=1&test_name%5B%5D=9&version2=14&build2=53&platform2=1&az_topology2=1&version3=&build3=&version4=&build4=&submit=Choose+options

4.9 must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/lr5-ypersky-9aws/lr5-ypersky-9aws_20220309T120256/logs/testcases_1646831255/
4.10 must gather:  http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/lr5-ypersky-10aws/lr5-ypersky-10aws_20220309T120401/logs/testcases_1646831358/

The comparison shown IMPROVEMENT in 4.10.0.184 in PVC attach time for both RBD (47%) and CephFS (42%). 

In 4.9 OCP + 4.9 ODF the average of 10 PVCs attach times were: 

RBD: 7.4 sec
CephFS: 6.6 sec


In 4.10 OCP + 4.10 ODF the average of 10 PVCs attach times were: 

RBD: 10.2 sec
CephFS: 8.8 sec


In 4.9 OCP + 4.10 ODF the average of 10 PVCs attach times were: 

RBD: 12.8 sec
CephFS: 11 sec

In the newly executed fixed test on 4.10OCP + 4.10 ODF the average of 10 PVC attach times are: 


RBD: 5.4
CephFS: 5.6 

Theses are the best times measured so far. 
Therefore the BZ should be closed.


Note You need to log in before you can comment on or make changes to this bug.