Description of problem (please be detailed as possible and provide log snippests): There is a degradation in the RBD Snapshot Creation time on VMware LSO 4.10 compared to 4.9 for both snapshots of pvc with one big file and snapshot of pvc with 1 million small files ( 32GB file size). Times measured on 4.10 are about 3 times longer than on 4.9. Version of all relevant components (if applicable): for 4.9 cluster: OCP Version : 4.9.21 ODF Version: 4.9.0-251.ci Ceph Version : 16.2.0-146.el8cp for 4.10 cluster: OCP Version : 4.10.0-0.nightly-2022-02-15-041303 ODF Version: 4.10.0-156 Ceph Version: 16.2.7-61.el8cp More details is available here: http://ocsperf.ceph.redhat.com:8080/index.php?version1=6&build1=46&platform1=2&az_topology1=3&test_name%5B%5D=16&version2=14&build2=37&platform2=2&az_topology2=3&version3=&build3=&version4=&build4=&submit=Choose+options and here: http://ocsperf.ceph.redhat.com:8080/index.php?version1=6&build1=46&platform1=2&az_topology1=3&test_name%5B%5D=15&version2=14&build2=37&platform2=2&az_topology2=3&version3=&build3=&version4=&build4=&submit=Choose+options Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? User impact : basic operation takes 3 times longer. Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: PLease see in the description. The measured times are about 3 times longer in 4.10. Steps to Reproduce: 1. deploy vmware lso cluster 2. run test_pvc_snapshot_performance.py test 3. Actual results: RBD Snapshot Creation times ( one big file with size of 60% of the initial pvc) in 4.9 are: 1GB snapshot: 0.501 sec 10GB snapshot: 0.466 sec 100GB snapshot: 0.798 sec Those times in 4.10 are 1GB snapshot: 1.875 sec 10GB snapshot: 1.893 sec 100GB snapshot: 1.806 sec Also : RBD Snapshot Creation time ( 1 million small files. 32GB each file size) in 4.9 are: 0.210 sec while in 4.10 it is: 1.855 sec Clearly in both snapshot tests we see a degradation in the RBD snapshot creation time. Please note that we do not see degradation in the CephFS snapshot creation times. Please note that these are AVERAGE times of 3 snapshots. Full VMwareLSO 4.10 vs 4.9 comparison report is available here: https://docs.google.com/document/d/19ZRfwhfbpYF2f6hUxCM5lCt0uLoNo3ibOMWTbZTUTqw/edit# Expected results: 4.10 RBD Snapshot creation times should be same of shorter in 4.10 than in 4.9. Additional info: Must gather for 4.10 test run is: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ypersky-local10/ypersky-local10_20220215T103839/logs/testcases_1645035276/ 4.10 relevant Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10162/ Must gather for 4.9 test run is available here: 4.9 relevant Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/10483/
Not a 4.10 blocker based on the analysis done so far, keeping it open as it still awaits some answers.
Relevant Jenkins job for a reproduction of the current BZ : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/11817/
Since the prev job failed with system issue ( not a bug in the system or test) - I've retriggered the job : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/11919/ will update on the results.
The last test execution failed due to leftovers problem on the cluster. Another test run was triggered on the newsly deployed cluster : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/12114/ Will update on the results.
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/12115/console - the updated run
A bug was deiscovered in the test. The test was fixed, and run on 4.10.0-221 and the measurements were taken. The summary of the results is: RBD Snapshot Creation times ( one big file with size of 60% of the initial pvc) in 4.9 are: 1GB snapshot: 0.501 sec 10GB snapshot: 0.466 sec 100GB snapshot: 0.798 sec RBD Snapshot Creation times 4.10 ( build 4.10.0-156) are: 1GB snapshot: 1.875 sec 10GB snapshot: 1.893 sec 100GB snapshot: 1.806 sec RBD Snapshot Creation times 4.10 ( build 4.10.0-221) are: 1GB snapshot: 0.5 sec 10GB snapshot: 0.74 sec 100GB snapshot: 0.5 sec The must gather logs are available here: rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/OCS/ocs-qe-bugs/bz-2064055/logs-20220427-224105/ As we see - the results of 4.10.0. build 221 are much better than results of 4.10.0 build 156. Actually they are about the same, except 10GB snapshot creation time (in 4.9 it is 0.466 sec and in 4.10.0-221 it is 0.74 sec). However, the 100GB snapshot creation time is quite good. At this point I do not see a need to fix anything at this area. Looks like the changes in 4.10 between build 156 and build 221 caused improvement in the performance.
Great. In this case, we can close the bug. thanks, Yuli.