Bug 2149665

Summary: [Platform:-Barematel]:Multi snapshot performance test for RBD fails due to delayed snapshot creation time
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Ishwarya Munesh <ishwaryax.munesh>
Component: csi-driverAssignee: Rakshith <rar>
Status: CLOSED NOTABUG QA Contact: krishnaram Karthick <kramdoss>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.11CC: bniver, mrajanna, muagarwa, ocs-bugs, odf-bz-bot, pnataraj, rar, ypersky
Target Milestone: ---Flags: rar: needinfo? (ishwaryax.munesh)
rar: needinfo? (ishwaryax.munesh)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-02 09:34:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
RBD test logs
none
default config yaml used
none
ocp_odf_version
none
snapshots creation time none

Description Ishwarya Munesh 2022-11-30 14:36:58 UTC
Created attachment 1928759 [details]
RBD test logs

Description of problem (please be detailed as possible and provide log
snippests):
On executing the performance test - test_pvc_multiple_snapshot_performance[CephBlockPool-512], the test runs for few hours and fails with timeout error ' TimeoutError: Snapshot was not created on time' while creating snapshot #452.
Cluster Configuration:
Platform: Baremetal
OCP version: 4.11 (installed via UPI)
Node details: 3 masters and 3 workers
ODF version: 4.11
PV count: 12
OSD count: 3
Drive details: Each worker has 2 slower drives(nvme) and 1 faster drive (Optane)

Version of all relevant components (if applicable):
OCP: 4.11, ODF: 4.11

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)? 3



Steps to Reproduce:
1. Create an OCP 4.11 cluster on baremetal servers via UPI method
2. Install ODF 4.11 and ensure ceph is healthy with 3 OSDs
3. Install OCS-CI repo and run the performance test - test_pvc_multiple_snapshot_performance[CephBlockPool-512]
4. Command line used - : run-ci --cluster-name ocs-storagecluster --cluster-path /root/ocpcluster/  tests2/   tests2/e2e/performance/csi_tests/test_pvc_multi_snapshot_performance.py:: TestPvcMultiSnapshotPerformance::test_pvc_multiple_snapshot_performance[CephBlockPool-512]   2>&1 | tee /tmp/perf_multi_snap_rbd_logs.txt
Attached the log file.

Actual results:
After running for few hours, the test fails with timeout error  'TimeoutError: Snapshot was not created on time' while creating the snapshot #452. It is observed that the creation time increases from snapshot #260 and it took more than 600 secs to create snapshot #452 and hence the test failed.

Expected results:
The test should pass without issues


Additional info: 
Ensured that there was no other load/activity performed while the performance test was running. The bastion node on which the test was run and other nodes in the cluster was left undisturbed while the test was running.

Comment 2 Ishwarya Munesh 2022-11-30 14:42:08 UTC
Created attachment 1928761 [details]
default config yaml used

Comment 4 Ishwarya Munesh 2022-12-01 10:40:44 UTC
Created attachment 1929045 [details]
ocp_odf_version

Comment 5 Ishwarya Munesh 2022-12-01 10:44:04 UTC
OCP must gather log is collected and placed in dropbox location - https://www.dropbox.com/s/szd0gy1e6490q9p/ocp_must-gather.tar.gz?dl=0
ODF must gather log link - https://www.dropbox.com/s/4tkn1ms2jiuezxz/odf_must-gather.tar.gz?dl=0
Attached the OCP and ODF version screenshot and also the snapshot creation time alone for all the snapshots in a seperate file.

Comment 6 Ishwarya Munesh 2022-12-01 10:44:53 UTC
Created attachment 1929046 [details]
snapshots creation time

Comment 7 Yuli Persky 2022-12-01 20:37:04 UTC
As I see the results - the problem is not in the snapshot#452 creation failure ( due to timeout) but in the fact that starting from snapshot number ~260 the snapshot creation times continuously grow up to more than 18 secs.