Bug 2023476

Summary: VMware LSO - degradation of performance ( ~40%) for RBD pod reattach time with 850K files ( the larger is the pod - the worse is the degradation)
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Yuli Persky <ypersky>
Component: csi-driverAssignee: Humble Chirammal <hchiramm>
Status: CLOSED NOTABUG QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: alayani, hchiramm, jopinto, kramdoss, madam, mmuench, mrajanna, muagarwa, ocs-bugs, odf-bz-bot, rar, ygupta, ypadia
Target Milestone: ---Keywords: Automation, Performance, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-07 11:06:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yuli Persky 2021-11-15 20:05:02 UTC
Description of problem (please be detailed as possible and provide log
snippests):

There is a degradation of performance ( ~40%) for RBD pod attach time with 850 files ( the larger is the pod - the worse is the degradation)

The detailed comparison report is available here: 

https://docs.google.com/document/d/1Ft7gzWCcID2RTXILW3GrN8a6O5v5VidDICuG_tX__v8/edit#

The problem is that the average RBD pod reattachtime ( for pod with ~850files) in OCP 4.8 + OCS 4.8 took 43 secs and in OCP 4.9 + ODF 4.9 it takes 59 secs.

For RBD pod with ~200K files in 4.8+4.8 the average pod reattach time was 21 secs and in 4.9+4.9 it is 24 secs. The degradation exists but it is less evident. 


Version of all relevant components (if applicable):

OCS versions
	==============

		NAME                     DISPLAY                       VERSION   REPLACES   PHASE
		noobaa-operator.v4.9.0   NooBaa Operator               4.9.0                Succeeded
		ocs-operator.v4.9.0      OpenShift Container Storage   4.9.0                Succeeded
		odf-operator.v4.9.0      OpenShift Data Foundation     4.9.0                Succeeded
		
		ODF (OCS) build :		      full_version: 4.9.0-210.ci
		
	Rook versions
	===============

		2021-11-04 09:27:36.633082 I | op-flags: failed to set flag "logtostderr". no such flag -logtostderr
		rook: 4.9-210.f6e2005.release_4.9
		go: go1.16.6
		
	Ceph versions
	===============

		ceph version 16.2.0-143.el8cp (0e2c6f9639c37a03e55885fb922dc0cb1b5173cb) pacific (stable)


Full Version list is available here : 

http://ocsperf.ceph.redhat.com/logs/Performance_tests/4.9/RC0/Vmware-LSO/versions.txt


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?

Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:

The problem is that the average RBD pod reattachtime ( for pod with ~850files) in OCP 4.8 + OCS 4.8 took 43 secs and in OCP 4.9 + ODF 4.9 it takes 59 secs. 


Steps to Reproduce:
1.Run test_pod_reattachtime.py test
2.Measure average reattachtime for RBD for pod with 850K files. 
3.


Actual results:

On VMware LSO cluster with 4.9 OCP + 4.9 ODF the average reattachtime for RBD pod with ~850K files is around 40% worse than in 4.8 OCP + 4.8 OCS.


Expected results:

The measurements should be the same or better in 4.9 + 4.9.

Additional info:

Full comparison report is available here: https://docs.google.com/document/d/1Ft7gzWCcID2RTXILW3GrN8a6O5v5VidDICuG_tX__v8/edit#

Comment 3 Humble Chirammal 2021-11-16 06:02:36 UTC
>
The problem is that the average RBD pod reattachtime ( for pod with ~850files) in OCP 4.8 + OCS 4.8 took 43 secs and in OCP 4.9 + ODF 4.9 it takes 59 secs.
>

Yuli, I believe you meant 850K files scenario and not just 850 files in this test? 

As we request always ( c#2 from Madhu), Have we tested this with OCP 4.8 + ODF 4.9 ?
 

>
2)Path the rook config to set rbd fsgroup policy to none with `oc patch cm rook-ceph-operator-config -n openshift-storage --type json --patch  '[{ "op": "add", "path": "/data/CSI_RBD_FSGROUPPOLICY", "value": "None" }]'` 
3) Run the same testsH
@humble anything else need to be done?

[1]https://github.com/rook/rook/pull/9144
>

Indeed this is a valid test here, especially in a scenario when there are many files like 850K+ ( considering this was the milestone value where we noticed delay of the pod) and its getting worst for the pod startup delay part.

The delay on large volume could be caused by selinux + fsgroup, but we can start with above fsgroup setting.  

As a side note, afaict, the fsgroup setting is unchanged here between the tests for odf 4.8 or 4.9, so changing this does not help to answer if we really have a regression between these releases, but a worth check how it behaves with `None` fsgroup policy.

Comment 4 Yuli Persky 2021-11-17 11:11:26 UTC
@Humble and @Madhu,

1) Yes, the problem is with pod with 850K files

2) I plan to deploy a VMware LSO cluster ( OCP 4.8 and ODF 4.9) this week and will run this test on it, will update regarding the results. 


3) I have a question about the suggested flow 



1) Deploy ODF 4.9 on OCP 4.9
2)Path the rook config to set rbd fsgroup policy to none with `oc patch cm rook-ceph-operator-config -n openshift-storage --type json --patch  '[{ "op": "add", "path": "/data/CSI_RBD_FSGROUPPOLICY", "value": "None" }]'` 
3) Run the same testsH

Should I do this configuration prior to executing the test on 4.8 + 4.9 ?


4) Should I execute the test once again on 4.9 + 4.9 with the above configuration ?

Comment 5 Yuli Persky 2021-11-17 11:15:57 UTC
The current measures ( from the report : https://docs.google.com/document/d/1Ft7gzWCcID2RTXILW3GrN8a6O5v5VidDICuG_tX__v8/edit#  )


RBD
==============================

Number of files : ~200K

Number of samples : 10

Pod reattach time for OCP 4.8 + OCS 4.8  : 21 secs  

Pod reattach time for OCP 4.9 + ODF 4.9  : 24 secs

==============================

Number of files : ~850K

Number of samples : 10

Pod reattach time for OCP 4.8 + OCS 4.8  : 43 secs  

Pod reattach time for OCP 4.9 + ODF 4.9 : 59 secs


==============================

Comment 8 yati padia 2021-11-23 05:44:29 UTC
(In reply to Yuli Persky from comment #5)
> The current measures ( from the report :
> https://docs.google.com/document/d/
> 1Ft7gzWCcID2RTXILW3GrN8a6O5v5VidDICuG_tX__v8/edit#  )
> 
> 
> RBD
> ==============================
> 
> Number of files : ~200K
> 
> Number of samples : 10
> 
> Pod reattach time for OCP 4.8 + OCS 4.8  : 21 secs  
> 
> Pod reattach time for OCP 4.9 + ODF 4.9  : 24 secs
> 
> ==============================
> 
> Number of files : ~850K
> 
> Number of samples : 10
> 
> Pod reattach time for OCP 4.8 + OCS 4.8  : 43 secs  
> 
> Pod reattach time for OCP 4.9 + ODF 4.9 : 59 secs
> 
> 
> ==============================

@ypersky is the above measures with fsgroup policy set to none? and if yes, can you please share the measure with OCP4.8+ ODF4.9 as well.

Comment 9 Yuli Persky 2021-11-23 21:01:47 UTC
@all ,

After trying to deploy a VMware LSO cluster with 4.8 OCP and 4.9 ODF it was found that such combination is not supported. 
It is not possible to deploy an LSO cluster with 4.8 OCP and 4.9 ODF, therefore I cannot provide any statistics and measurements from such cluster. 

@yati padia

Can you please elaborate how I check the fsgroup policy? 
Since I do not know how to configure it, I assume that the tests were running on a default value.

Comment 12 Yuli Persky 2021-11-24 07:25:50 UTC
@all , 

I will deploy 4.9 OCP + 4.9 ODF , make the :`CSI_RBD_FSGROUPPOLICY: None: ,run the test again and update you on the result. 

Will contact Madhu if I have any further questions on the policy configuration.

Comment 14 Yuli Persky 2021-11-25 10:18:36 UTC
Per Dev request @Joy Pinto will run the new version of test_pod_reattachtime.py on AWs 4.8  and AWS 4.9 , compare the result and update whether we see a similar degradation on AWS or not.

Comment 15 Yuli Persky 2021-11-30 09:24:02 UTC
The following configuration change was applied to OCP 4.9 + ODF 4.9 VMware LSO cluster: 

run  oc patch cm rook-ceph-operator-config -n openshift-storage --type json --patch  '[{ "op": "add", "path": "/data/CSI_RBD_FSGROUPPOLICY", "value": "None" }]'

and verified by oc get csidrivers openshift-storage.rbd.csi.ceph.com -oyaml command that fsGroupPolicy is set to none . 

Output : 

(.venv) [ypersky@qpas ocs-ci]$ oc get csidrivers openshift-storage.rbd.csi.ceph.com -oyaml
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  creationTimestamp: "2021-11-30T09:13:58Z"
  managedFields:
  - apiVersion: storage.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:attachRequired: {}
        f:fsGroupPolicy: {}
        f:podInfoOnMount: {}
        f:requiresRepublish: {}
        f:storageCapacity: {}
        f:volumeLifecycleModes:
          .: {}
          v:"Persistent": {}
    manager: rook
    operation: Update
    time: "2021-11-30T09:13:58Z"
  name: openshift-storage.rbd.csi.ceph.com
  resourceVersion: "943315"
  uid: a9368ef7-1460-4f36-8e21-0e75602163b4
spec:
  attachRequired: true
  fsGroupPolicy: None
  podInfoOnMount: false
  requiresRepublish: false
  storageCapacity: false
  volumeLifecycleModes:
  - Persistent


Will update on the test run result.

Comment 16 Yuli Persky 2021-12-01 10:31:01 UTC
I've run pod reattachtime test on VMware LSO with fsGroupPolicy: None and the following measurements were taken : 

10 samples pod reattach times (sec): 

52.195
53
52.9
48.3
53
53.6
55.2
52.5
54.7
55.7


The average time of CephBlockPool pod creation on 10 PVCs is 53.106306719779965 seconds

21:54:57 - MainThread - root - INFO - The standard deviation of CephBlockPool pod creation time on 10 PVCs is 2.063152619635935

To compare VMware LSO results, RBD pod reattach time with ~850K files:  

On 4.9 OCPO + 4.9 ODF  cluster with fsGroupPolicy: None, the average is : 53.1 sec.

On 4.9 OCPO + 4.9 ODF  cluster without fsGroupPolicy: None, the average was : 59 sec. 

On 4.8 OCP + 4.8 OCS cluster without fsGroupPolicy: None, the average was : 43 sec

=> we still see a degradation in 4.9 OCP + 4.9 ODF. 

However, the results are better on a cluster with fsGroupPolicy: None.


Another comparison that might be relevant : 

LSO result for Pod reattach time with ~200K files: 

On 4.9 OCPO + 4.9 ODF  cluster with fsGroupPolicy: None, the average is : 25.25 sec.

On 4.9 OCPO + 4.9 ODF  cluster without fsGroupPolicy: None, the average was : 24 sec. 

On 4.8 OCP + 4.8 OCS cluster without fsGroupPolicy: None, the average was : 21 sec.


=> we still see a degradation compared to 4.8 results on 4.9 clusters ( with AND without fsGroupPolicy: None).

Comment 18 Joy John Pinto 2021-12-02 17:07:00 UTC
Please refer page#8 in below doc for comparison report of pod reattach time with OCP+ODF->4.8 and OCP+ODF->4.9 on AWS platform, https://docs.google.com/document/d/1vyufd55iDyvKeYOwoXwKSsNoRK2VR41QNTuH-iERR8s/edit# 

Thanks,
Joy

Comment 20 Yuli Persky 2022-01-02 22:25:23 UTC
@Yug Gupta,

I've run pod reattach time test on AWS ( OCP 4.8 + ODF 4.9) and comparend it to OCP 4.8 + OCS 4.8. 
The results are also documented here: https://docs.google.com/document/d/1vyufd55iDyvKeYOwoXwKSsNoRK2VR41QNTuH-iERR8s/edit# ( page 9). 

What appears from this AWS comparison is that 

1) there is no degradation in pod reattach time for CephFS for the tests number of files written to the pods.
We do see a slight improvement. 

2) As for RBD:  

 Number of files : ~200K
 
We do see +5.46% improvement in the pod attach time ( average of 10 pods) 

 Number of files : ~850K

We do see a degradation of 14.65% in the pod attach time ( average of 10 pods). 

The average pod attach time for ~850K files was 90.26 sec in OCP 4.8 + OCS 4.8 and is 103.5 in OCP 4.8 + ODF 4.9. 

However, the degradation on AWS is not as significant as on VMware LSO ( 40%).

Comment 21 Yaniv Kaul 2022-01-06 09:11:51 UTC
What's the next step here?

Comment 23 yati padia 2022-02-03 12:53:37 UTC
Here are a few updates from the previous meeting of the ceph-csi team with Yuli regarding the performance bugs:
1. From next time it will be great to calculate the time spend pod reattach, create/delete operations at ceph-csi level. Related documents have been shared.
2. ypersky we will need must-gather here to calculate and check the time spent by the ceph-csi here.

Comment 24 Yuli Persky 2022-03-07 10:45:35 UTC
@Yati ,

Let me summarize what I think it right to do for this bug. 

1) It was found that the pod reattach time test was pulling the image in each sampled pod, therefore the reattach time contained the time needed for pulling the image and it may have affected our measurements. 

2) The QPAS team in working also on adding measurements of pod reattach at csi level to the current test ( still in progress). 

3) Moreover, the pod reattach test was run again on 4.9ocp + 4.9 odf and on 4.10ocp + 4.10 odf ( mixed versions are not deployable on vmware lso). 

The results are: 

4.8 RBD Pod Reattach Time ( from this https://docs.google.com/document/d/1Ft7gzWCcID2RTXILW3GrN8a6O5v5VidDICuG_tX__v8/edit#heading=h.701cgwqui8bn report) : 


pod with ~200K files: 21 sec
pod woth ~850K files : 43 sec


4.9 RBD Pod Reattach Time ( from this https://docs.google.com/document/d/19ZRfwhfbpYF2f6hUxCM5lCt0uLoNo3ibOMWTbZTUTqw/edit# report ) 

pod with ~200K files: 21.595 sec
pod woth ~850K files: 45.752 sec


4.10 RBD Pod Reattach Time ( from this https://docs.google.com/document/d/19ZRfwhfbpYF2f6hUxCM5lCt0uLoNo3ibOMWTbZTUTqw/edit# report ) 


pod with ~200K files: 19.626 sec
pod woth ~850K files:  43.097 sec

As you see - the measurement of 10 samples in the NEW run on 4.9 are similar to 4.8 measurements. 
And the 4.10 measurements are almost the same as on 4.8. 

I think that doe to all the above - this BZ can be closed. 
The reason for different 4.9 measurements is probably due to Pull Image problem in our test ( this was already fixed). 

Therefore let's close this bug, and if we have a suspicion for degradation in the performance with the NEW test - we'll open a new BZ.

Comment 25 yati padia 2022-03-07 11:06:43 UTC
Thank you @ypersky. Closing this bug.