Bug 2177289

Summary: [Metro-DR] Inconsistent performance of random writes for small block sizes
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Abhishek Bose <abose>
Component: cephAssignee: Ilya Dryomov <idryomov>
ceph sub component: RBD QA Contact: Elad <ebenahar>
Status: NEW --- Docs Contact:
Severity: low    
Priority: unspecified CC: abose, bniver, muagarwa, odf-bz-bot, shberry, sostapov
Version: 4.12   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abhishek Bose 2023-03-10 18:11:52 UTC
Description of problem (please be detailed as possible and provide log
snippets):
While executing performance tests of Metro-DR using fio, it was observed that the IO pattern of random writes was not consistent for 4KB and 8KB block sizes. 

The issue was not observed for higher block sizes for both random and sequential workloads.

Usually IO pattern stabilizes after running for a few minutes. Random write tests with 4KB & 8KB were executed for a longer duration (6hrs) to let the IO stabilize but again similar pattern was observed. 

The tests were executed by creating PVC and running fio on them, so I believe the issue is not with Metro-DR but with the underlying stretched RHCS cluster.

Version of all relevant components (if applicable):
OCS Version: 4.12.0-rc.2 (quay.io/openshift-release-dev/ocp-release:4.12.0-rc.2-x86_64)
ODF Version: 4.12.0-158 (quay.io/rhceph-dev/ocs-registry:4.12.0-158)
Ceph version: Ceph-16.2.8-85.el8cp
ACM version: 2.7.0

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, user may observe inconsistent IO performance if executing random writes with block size <= 8KB.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Create a Metro-DR solution using external RHCS as the underlying storage.
2. At the primary site, create PVCs and run random write test using fio for block sizes 4KB & 8KB. 


Actual results:
Inconsistent IO performance of random writes for 4KB & 8KB block sizes.

Expected results:
Consistent IO performance of random writes for all block sizes.

Additional info: 
IO performance charts and other metrics are provided in the Supporting doc.
https://docs.google.com/document/d/1oCpXte59Ctg1GSKOLUt1Rn8Huf8FERWfMaLvvtTIfCk/edit