Bug 1984804 - [Tracker for OCP BZ #1988013] AWS - degradation in RBD pod reattach time in OCP 4.8 vs 4.7
Summary: [Tracker for OCP BZ #1988013] AWS - degradation in RBD pod reattach time in O...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Humble Chirammal
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks: 1988013
TreeView+ depends on / blocked
 
Reported: 2021-07-22 09:35 UTC by Yuli Persky
Modified: 2023-08-09 16:37 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1988013 (view as bug list)
Environment:
Last Closed: 2021-09-14 09:09:19 UTC
Embargoed:
kramdoss: needinfo+
kramdoss: needinfo+
kramdoss: needinfo+


Attachments (Terms of Use)

Description Yuli Persky 2021-07-22 09:35:16 UTC
Description of problem:

AWS platform - There is degradation in pod reattach time for RBD interface pod in 4.8 versus 4.7. 

In 4.7 on AWS it took around 29 sec for RBD pod to reattach. 
in 4.8 we took 5 measurements which were: 

39.93 sec
39.62 sec
48.83 sec
39.8 sec
44.55 sec



Version-Release number of selected component (if applicable):

HW Platform	AWS
Number of OCS nodes	3
Number of total OSDs	3
OSD Size (TiB)	2.00
Total available storage (GiB)	6,140
OCP Version	4.8.0-0.nightly-2021-07-04-112043
OCS Version	4.8.0-444.ci
Ceph Version	14.2.11-183.el8cp

How reproducible:

Reproducible all the time on AWS ( attach pod to RBD pvc) 

Steps to Reproduce:
1. Deploy AWS cluster with 2TB OSD
2. Run tests/e2e/performance/test_pod_reattachtime.py
3.

Actual results:

Pod creation time is more than in 4.7 ( degradation of around 40% - 50%). 

Expected results:

Pod creation time should be the same or better than in 4.7. 

Additional info:

The complete AWS comparison report for 4.7 vs 4.8 is available here: 
https://docs.google.com/document/d/1-lOb4szqLM4LoWnMr_JCp9zurBqpjeva5BUEH-yer4s/edit?ts=60f62010#

The console logs are available here ( separate log for each sample execution) : 

10.70.39.233:/ypersky_report_logs/48/aws/

Must-gather logs are being collected and a link will be posted shortly.

Comment 2 Yuli Persky 2021-07-22 10:00:24 UTC
Must-gather logs are available here: 

10.70.39.233:/home/ypersky/bz_1984804/logs-20210722-145415

Comment 4 Humble Chirammal 2021-07-23 04:49:00 UTC
Yuli, how can we access the MG logs @ 10.70.39.233:/home/ypersky/bz_1984804/logs-20210722-145415

Comment 5 krishnaram Karthick 2021-07-24 15:52:20 UTC
We re-ran the tests after discussing with engineering with the following combinations to rule out any issues in OCP

1) OCP 4.8 + OCS 4.8 - https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/4756/consoleFull
2) OCP 4.7 + OCS 4.8 - https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/4757/consoleFull

Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds
Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds

Mustgather logs for both cases shall be attached shortly.

Comment 7 Mudit Agarwal 2021-07-26 10:10:05 UTC
Thanks Karthick.

So, the above data suggests that there is a regression in OCP 4.8
Just for the records, we are using new side car images in OCS4.8

Comment 11 Humble Chirammal 2021-07-29 05:15:28 UTC
Hi Avi,  Thanks for the collecting above metrics too. With that, from the available data it looks like below.

Reattach time for RBD in OCP 4.8 + OCS 4.7 is 43.99 seconds
Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds
Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds
Reattach time for RBD in OCP 4.7 + OCS 4.7 is 29.1 seconds

As mentioned earlier, it seems that OCS 4.7 and 4.8  against same OCP versions respond pretty much the same way.  However while looking at the vmware test result [1] for reattach, it has reported an improvement in performance with 4.8 versions:

For POD attach time we can observe improvement of ~70% on CephFS
For POD reattach time we can observe improvement of ~50% on RBD 

Are these build and hardware remains same across these tests in different ( aws and vmware) platforms?  

[1] https://docs.google.com/document/d/1KDPPfVywM5-Y4MzYOSUndAnAbPfhgth9UazppOOfMck/edit#

Comment 12 Avi Liani 2021-07-29 06:01:14 UTC
(In reply to Humble Chirammal from comment #11)
> Hi Avi,  Thanks for the collecting above metrics too. With that, from the
> available data it looks like below.
> 
> Reattach time for RBD in OCP 4.8 + OCS 4.7 is 43.99 seconds
> Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds
> Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds
> Reattach time for RBD in OCP 4.7 + OCS 4.7 is 29.1 seconds
> 
> As mentioned earlier, it seems that OCS 4.7 and 4.8  against same OCP
> versions respond pretty much the same way.  However while looking at the
> vmware test result [1] for reattach, it has reported an improvement in
> performance with 4.8 versions:
> 
> For POD attach time we can observe improvement of ~70% on CephFS
> For POD reattach time we can observe improvement of ~50% on RBD 
> 
> Are these build and hardware remains same across these tests in different (
> aws and vmware) platforms?  

Yes, during the test hardware and build remains the same.

> 
> [1]
> https://docs.google.com/document/d/1KDPPfVywM5-
> Y4MzYOSUndAnAbPfhgth9UazppOOfMck/edit#

Comment 13 Humble Chirammal 2021-07-30 05:34:32 UTC
(In reply to Avi Liani from comment #12)
> (In reply to Humble Chirammal from comment #11)
> > Hi Avi,  Thanks for the collecting above metrics too. With that, from the
> > available data it looks like below.
> > 
> > Reattach time for RBD in OCP 4.8 + OCS 4.7 is 43.99 seconds
> > Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds
> > Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds
> > Reattach time for RBD in OCP 4.7 + OCS 4.7 is 29.1 seconds
> > 
> > As mentioned earlier, it seems that OCS 4.7 and 4.8  against same OCP
> > versions respond pretty much the same way.  However while looking at the
> > vmware test result [1] for reattach, it has reported an improvement in
> > performance with 4.8 versions:
> > 
> > For POD attach time we can observe improvement of ~70% on CephFS
> > For POD reattach time we can observe improvement of ~50% on RBD 
> > 
> > Are these build and hardware remains same across these tests in different (
> > aws and vmware) platforms?  
> 
> Yes, during the test hardware and build remains the same.

This is bit confusing, if all the OCP builds and hardware remains same and reattach time regression showed in AWS but not in VMWARE platform. Its difficult to reach into a conclusion that, even OCP code have a regression.


> 
> > 
> > [1]
> > https://docs.google.com/document/d/1KDPPfVywM5-
> > Y4MzYOSUndAnAbPfhgth9UazppOOfMck/edit#

Comment 15 Mudit Agarwal 2021-09-06 09:41:59 UTC
Hi Yuli/Avi/Karthick

We have a request from Jan on the OCP BZ, PTAL

https://bugzilla.redhat.com/show_bug.cgi?id=1988013#c19

Comment 16 Humble Chirammal 2021-09-14 09:09:19 UTC
I am closing this bug as per the comment (https://bugzilla.redhat.com/show_bug.cgi?id=1988013#c23)  in the tracking issue. Please feel free to open a new issue  if we come across the same issue.


Note You need to log in before you can comment on or make changes to this bug.