Description of problem (please be detailed as possible and provide log snippests): We see a clear degradation in Performance on various 4.12 builds vs various 4.11 builds ( 4.11.4 tested) . The degradation is as following: IO Performance: FIO - degradation in IOPs and Throughput ( persistent in 4.12.0-145 and 4.12.0-167 and 4.12.173) on Sequential and Random IO on both CephFS and RBD FIO Compressed - degradation in IOPs and Throughput ( persistent in 4.12.0-152, 4.12.0-167 and 4.12.0.173) for both sequential and random IO. Snapshots and Clones : Single Snapshot Restore Time ( Ceph FS) - degradation in Restore time and speed. We see a degradation in RBD as well , while the times are still short, this is persistent in 4.12.0-145 and 4.12.0-167 Snapshot Creation - Multiple Files ( CephFS) - degradation in snapshot creation time, persistent in 4.12.0-145 and 4.12.0-167 Single Clone Creation Time ( CephFS) - degradation in creation time ( persistent in 4.12.0-145, 4.12.0-167, 4.12.0-173) Multiple Clone Creation - degradation in average creation time for both RBD and CephFS. Bulk Clone Creation Time ( Ceph FS) - degradation in CephFS bulk clone creation time ( persistent in 4.12.0-145 and 4.12.0-167, though in 4.12.0-167 the results are better) Pod Attach - Reattach time : There is a degradation in Pod Reattach time for both RBD and CephFS PVCs, especially CephFS, for pods with more files ( checked up to ~820K files) the degradation is much more significant. Bulk Pod Attach Time - there is a degradation in Reattach time ( persistent in 4.12.0-145 and 4.12.0-167 and 4.12.0-173) for both RBD and CephFS pods. Version of all relevant components (if applicable): OCP Version 4.11.0-0.nightly-2023-03-07-114656 4.12.0-0.nightly-2023-02-04-034821 ODF Version 4.11.4-4 4.12.0-173 Ceph Version 16.2.8-84.el8cp 16.2.10-94.el8cp Cluster name ypersky-lso411 ypersky-173a Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, since IO performance and creation/deletion/attach/reattach times have a direct impact on the customer's experience. Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes! I've run the Perf suite on 2 different 4.11 builds and 3 different 4.12 builds, and in comparison of any pair of the results ( 4.12 vs 4.11) we see a clear degradation in 4.12. Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Yes, this is a regression, since in 4.11 the IO performance and all the creation /deletion/attach/reattach measurements are much better. Steps to Reproduce: 1. Deploy VMware LSO cluster with 4.12 GA build) 2. Run Performance suite tests ( performance marker, tests/e2e/performance in ocs_ci project) 3. Compare those results to any 4.11.X results ( X up to 5, not 4.11.5) Actual results: We see a clear degradation. IO Performance: FIO - degradation in IOPs and Throughput ( persistent in 4.12.0-145 and 4.12.0-167 and 4.12.173) on Sequential and Random IO on both CephFS and RBD FIO Compressed - degradation in IOPs and Throughput ( persistent in 4.12.0-152, 4.12.0-167 and 4.12.0.173) for both sequential and random IO. Snapshots and Clones : Single Snapshot Restore Time ( Ceph FS) - degradation in Restore time and speed. We see a degradation in RBD as well , while the times are still short, this is persistent in 4.12.0-145 and 4.12.0-167 Snapshot Creation - Multiple Files ( CephFS) - degradation in snapshot creation time, persistent in 4.12.0-145 and 4.12.0-167 Single Clone Creation Time ( CephFS) - degradation in creation time ( persistent in 4.12.0-145, 4.12.0-167, 4.12.0-173) Multiple Clone Creation - degradation in average creation time for both RBD and CephFS. Bulk Clone Creation Time ( Ceph FS) - degradation in CephFS bulk clone creation time ( persistent in 4.12.0-145 and 4.12.0-167, though in 4.12.0-167 the results are better) Pod Attach - Reattach time : There is a degradation in Pod Reattach time for both RBD and CephFS PVCs, especially CephFS, for pods with more files ( checked up to ~820K files) the degradation is much more significant. Bulk Pod Attach Time - there is a degradation in Reattach time ( persistent in 4.12.0-145 and 4.12.0-167 and 4.12.0-173) for both RBD and CephFS pods. Expected results: No degradation should be seen in the 4.12 results Additional info: Please refer to this document ( Performance Comparison Report) https://docs.google.com/document/d/15ATM0gDw0Df25uYkLy7A_TKK9oHbNXH-Zt4DW9-t3r0/edit# It contains link to Performance Dashboard, links to Jenkins Jobs ( with the names of the clusters and full run logs) .
Closing due to inactivity