Bug 1779272

Summary: Workload throughput rates slow as cluster ages with MS active
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Harrigan <jharriga>
Component: RGW-MultisiteAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Tejas <tchandra>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.3CC: ceph-eng-bugs, ceph-qe-bugs, jdurgin, mbenjamin, mkogan, nojha, twilkins, vumrao
Target Milestone: rcKeywords: Performance
Target Release: 5.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-11 18:55:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1727980    
Attachments:
Description Flags
fill cluster workload
none
mixed operation workload none

Description John Harrigan 2019-12-03 16:02:30 UTC
Created attachment 1641715 [details]
fill cluster workload

Description of problem:
Workload throughput rates (operations per second) degrade as cluster ages, through pool deletion/creation.

Version-Release number of selected component (if applicable):
12.2.12-79 = RHCS 3.3z1

How reproducible:
yes

Steps to Reproduce:
1. Deploy two Ceph clusters: 3x MON nodes, 8x OSD nodes (RGW collocated)
2. Configure and activate site1 and site2 multisite
3. Install COSbench
4. Execute cluster-fill (bb-fillWorkload) and mixed-operation (bb-hybridSS) workloads - attached. Record results.
5. Delete and recreate RGW pools
6. Rerun workloads, record results. Note throughput reduction in COSbench.
7. After five test cycles of pool deletion and creation, perf degrades ~40%

Actual results:
Cluster fill workload: w194(initial)=10659, w201(aged)=6309
Hybrid workload: w195(initial)=2631; w202(aged)=1474

Expected results:
Workload performance is roughly sustained (~10% variance) as cluster ages

Additional info:

Comment 1 RHEL Program Management 2019-12-03 16:02:36 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 John Harrigan 2019-12-03 16:03:30 UTC
Created attachment 1641716 [details]
mixed operation workload