Description of problem: AWS platform - There is degradation in pod reattach time for RBD interface pod in 4.8 versus 4.7. In 4.7 on AWS it took around 29 sec for RBD pod to reattach. in 4.8 we took 5 measurements which were: 39.93 sec 39.62 sec 48.83 sec 39.8 sec 44.55 sec Version-Release number of selected component (if applicable): HW Platform AWS Number of OCS nodes 3 Number of total OSDs 3 OSD Size (TiB) 2.00 Total available storage (GiB) 6,140 OCP Version 4.8.0-0.nightly-2021-07-04-112043 OCS Version 4.8.0-444.ci Ceph Version 14.2.11-183.el8cp How reproducible: Reproducible all the time on AWS ( attach pod to RBD pvc) Steps to Reproduce: 1. Deploy AWS cluster with 2TB OSD 2. Run tests/e2e/performance/test_pod_reattachtime.py 3. Actual results: Pod creation time is more than in 4.7 ( degradation of around 40% - 50%). Expected results: Pod creation time should be the same or better than in 4.7. Additional info: The complete AWS comparison report for 4.7 vs 4.8 is available here: https://docs.google.com/document/d/1-lOb4szqLM4LoWnMr_JCp9zurBqpjeva5BUEH-yer4s/edit?ts=60f62010# The console logs are available here ( separate log for each sample execution) : 10.70.39.233:/ypersky_report_logs/48/aws/ Must-gather logs are being collected and a link will be posted shortly.
Must-gather logs are available here: 10.70.39.233:/home/ypersky/bz_1984804/logs-20210722-145415
Yuli, how can we access the MG logs @ 10.70.39.233:/home/ypersky/bz_1984804/logs-20210722-145415
We re-ran the tests after discussing with engineering with the following combinations to rule out any issues in OCP 1) OCP 4.8 + OCS 4.8 - https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/4756/consoleFull 2) OCP 4.7 + OCS 4.8 - https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/4757/consoleFull Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds Mustgather logs for both cases shall be attached shortly.
Thanks Karthick. So, the above data suggests that there is a regression in OCP 4.8 Just for the records, we are using new side car images in OCS4.8
Hi Avi, Thanks for the collecting above metrics too. With that, from the available data it looks like below. Reattach time for RBD in OCP 4.8 + OCS 4.7 is 43.99 seconds Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds Reattach time for RBD in OCP 4.7 + OCS 4.7 is 29.1 seconds As mentioned earlier, it seems that OCS 4.7 and 4.8 against same OCP versions respond pretty much the same way. However while looking at the vmware test result [1] for reattach, it has reported an improvement in performance with 4.8 versions: For POD attach time we can observe improvement of ~70% on CephFS For POD reattach time we can observe improvement of ~50% on RBD Are these build and hardware remains same across these tests in different ( aws and vmware) platforms? [1] https://docs.google.com/document/d/1KDPPfVywM5-Y4MzYOSUndAnAbPfhgth9UazppOOfMck/edit#
(In reply to Humble Chirammal from comment #11) > Hi Avi, Thanks for the collecting above metrics too. With that, from the > available data it looks like below. > > Reattach time for RBD in OCP 4.8 + OCS 4.7 is 43.99 seconds > Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds > Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds > Reattach time for RBD in OCP 4.7 + OCS 4.7 is 29.1 seconds > > As mentioned earlier, it seems that OCS 4.7 and 4.8 against same OCP > versions respond pretty much the same way. However while looking at the > vmware test result [1] for reattach, it has reported an improvement in > performance with 4.8 versions: > > For POD attach time we can observe improvement of ~70% on CephFS > For POD reattach time we can observe improvement of ~50% on RBD > > Are these build and hardware remains same across these tests in different ( > aws and vmware) platforms? Yes, during the test hardware and build remains the same. > > [1] > https://docs.google.com/document/d/1KDPPfVywM5- > Y4MzYOSUndAnAbPfhgth9UazppOOfMck/edit#
(In reply to Avi Liani from comment #12) > (In reply to Humble Chirammal from comment #11) > > Hi Avi, Thanks for the collecting above metrics too. With that, from the > > available data it looks like below. > > > > Reattach time for RBD in OCP 4.8 + OCS 4.7 is 43.99 seconds > > Reattach time for RBD in OCP 4.8 + OCS 4.8 is 43.99 seconds > > Reattach time for RBD in OCP 4.7 + OCS 4.8 is 29.17 seconds > > Reattach time for RBD in OCP 4.7 + OCS 4.7 is 29.1 seconds > > > > As mentioned earlier, it seems that OCS 4.7 and 4.8 against same OCP > > versions respond pretty much the same way. However while looking at the > > vmware test result [1] for reattach, it has reported an improvement in > > performance with 4.8 versions: > > > > For POD attach time we can observe improvement of ~70% on CephFS > > For POD reattach time we can observe improvement of ~50% on RBD > > > > Are these build and hardware remains same across these tests in different ( > > aws and vmware) platforms? > > Yes, during the test hardware and build remains the same. This is bit confusing, if all the OCP builds and hardware remains same and reattach time regression showed in AWS but not in VMWARE platform. Its difficult to reach into a conclusion that, even OCP code have a regression. > > > > > [1] > > https://docs.google.com/document/d/1KDPPfVywM5- > > Y4MzYOSUndAnAbPfhgth9UazppOOfMck/edit#
Hi Yuli/Avi/Karthick We have a request from Jan on the OCP BZ, PTAL https://bugzilla.redhat.com/show_bug.cgi?id=1988013#c19
I am closing this bug as per the comment (https://bugzilla.redhat.com/show_bug.cgi?id=1988013#c23) in the tracking issue. Please feel free to open a new issue if we come across the same issue.