Bug 1982116 - [BM] [ Arbiter] poor performance on CephFS [NEEDINFO]
Summary: [BM] [ Arbiter] poor performance on CephFS
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Anjana Suparna Sriram
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 2011326
TreeView+ depends on / blocked
 
Reported: 2021-07-14 08:49 UTC by Avi Liani
Modified: 2023-08-31 11:34 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Poor performance of stretch clusters on CephFS Workloads with many small metadata operations, such as databases, might exhibit poor performance because of the arbitrary placement of metadata server (MDS) on multi-site OpenShift Data Foundation clusters.
Clone Of:
Environment:
Last Closed:
Embargoed:
olakra: needinfo? (aclewett)
olakra: needinfo? (rtalur)
olakra: needinfo? (rtalur)


Attachments (Terms of Use)

Comment 2 Mudit Agarwal 2021-07-21 06:24:57 UTC
Not a 4.8 blocker.

Scott, can someone from Ceph team take a look please.

Comment 3 Scott Ostapovicz 2021-07-21 14:06:20 UTC
This is a two part process.  First we need to establish the baseline of EXPECTED performance.  The and only then can we start an effective performance analysis.

Comment 15 Scott Ostapovicz 2021-09-22 14:09:21 UTC
Given that this is expected behavior, what is the expected result of this ticket?  This is essentially a feature request (RFE) to tune the performance in specific conditions. It is likely too late to add this request to 5.0 z1, thus it won't be available until RHEL 5.1 (which will then be available for ODF 4.10 or later).

Comment 17 Mudit Agarwal 2021-09-28 13:16:43 UTC
(In reply to Scott Ostapovicz from comment #15)
> Given that this is expected behavior, what is the expected result of this
> ticket?  This is essentially a feature request (RFE) to tune the performance
> in specific conditions. It is likely too late to add this request to 5.0 z1,
> thus it won't be available until RHEL 5.1 (which will then be available for
> ODF 4.10 or later).

This can be documented and an RFE for Ceph can be opened and tracked for Ceph 5.1
WDYT?

Comment 18 Scott Ostapovicz 2021-09-28 13:55:32 UTC
This would be a research RFE to see if we can fine tune the behavior and improve performance?  If so then I concur.

Comment 19 Mudit Agarwal 2021-10-06 09:55:31 UTC
Marking it as a known issue and moving it out of 4.9

Can we please fill the doc text?

Comment 20 Patrick Donnelly 2021-10-08 02:04:04 UTC
(In reply to Mudit Agarwal from comment #19)
> Marking it as a known issue and moving it out of 4.9
> 
> Can we please fill the doc text?

Comment 26 Yaniv Kaul 2022-05-17 09:52:16 UTC
I'm unsure what the status of this BZ.

Comment 27 Mudit Agarwal 2022-05-17 12:28:24 UTC
Arbiter feature is not prioritized at the moment, and this has to be looked by Ceph folks which I guess has not happened because of the priority.

Comment 28 Greg Farnum 2022-05-17 13:23:00 UTC
There's no meaningful Ceph work to do here. If you inject 20ms+ of networking delay into every disk IO, things are going to be slow.

Any improvements we can make to performance here are going to be about how the Ceph daemons are placed in the cluster topology. That's on Rook and ODF, so I'm handing this ticket back to be evaluated. :)

Comment 29 Travis Nielsen 2022-05-17 14:42:16 UTC
Is the suggestion that a client workload should be run in the same zone as the MDS? Rook/ODF doesn't treat one of the zones as primary. It sounds like all we can do here is document that you'll get better performance if you run the workload in the same zone as MDS. Reading through this BZ, I don't understand anything else that can be done for the topology. Ultimately, if they choose a high-latency topology, performance of small files is going to suffer.

Comment 30 Greg Farnum 2022-05-17 16:09:59 UTC
(In reply to Travis Nielsen from comment #29)
> Is the suggestion that a client workload should be run in the same zone as
> the MDS?

And the CRUSH rule should be formulated so that that zone is used as the source for primary OSDs, yeah.

> Rook/ODF doesn't treat one of the zones as primary. It sounds like
> all we can do here is document that you'll get better performance if you run
> the workload in the same zone as MDS. Reading through this BZ, I don't
> understand anything else that can be done for the topology. Ultimately, if
> they choose a high-latency topology, performance of small files is going to
> suffer.

Yeah, agreed. This only matters if assigning a primary site and assuming failover to the other is sensible, though I think that generally fits in with the constraints on stretch mode anyway? (If you are running both sites active, and one dies, you need to have enough spare capacity in the remainder to double its workload or you haven't bought yourself much for all that expense!)

Comment 31 Mudit Agarwal 2022-05-24 05:39:25 UTC
Moving this to doc, based on the above comments.

Comment 36 daniel parkes 2022-12-19 08:02:03 UTC
After the DR ENG meeting, we agreed on leaving the docs as they are, with one example in the docs: the Active/Active use case that is what we are seeing more requests from the field. A KCS will be created with examples for an Active/Passive configuration, using a crushmap with all primary OSDs and MDS daemons in the Active Zone.


Note You need to log in before you can comment on or make changes to this bug.