Bug 2022467
| Summary: | [RFE] enable distributed ephemeral pins on "csi" subvolume group | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Patrick Donnelly <pdonnell> |
| Component: | rook | Assignee: | Parth Arora <paarora> |
| Status: | MODIFIED --- | QA Contact: | Neha Berry <nberry> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | unspecified | CC: | mmuench, muagarwa, ndevos, odf-bz-bot, owasserm, paarora, rar, tnielsen, vshankar |
| Target Milestone: | --- | Keywords: | FutureFeature, Performance, Reopened |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-26 09:39:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1759702 | ||
| Bug Blocks: | |||
|
Description
Patrick Donnelly
2021-11-11 17:38:40 UTC
Also a little explanation for why we should want this: Currently we're limiting our Ceph file systems to a single active MDS. For some customers, this present a bottleneck for metadata I/O. Also, using a single rank increases the amount of metadata that must be cached when there are many clients. This can increase failover times. We've already seen one instance where the sole active MDS had so much metadata to load in cache that it would OOM [1]. (Note, this is fixed in RHCS 5 with a new configuration option but could have been avoided by using more MDS ranks.) Increasing max_mds=2 and using two ranks is not the only change required however. It's known that the default automatic balancer is prone to poor behavior which is why we have pinning policies [2] to control how metadata/subtrees are distributed. The "distributed" policy makes the most sense for CSI as it automatically stripes the subvolumes (PVs) across multiple MDS automatically. It will always result in a net improvement over a single rank file system with minimal additional technical risk. [1] bz2020767 [2] https://docs.ceph.com/en/pacific/cephfs/multimds/#setting-subtree-partitioning-policies Travis, `max_mds` seems like an option that Rook needs to set when creating the CephFilesystem. Is that something that is done already, or has that been requested as a feature before? In case Rook does not have the `max_mds` option yet, this RFE should be split in at least two upstream issues: 1. rook: add the `max_mds` CephFS option 2. ceph-csi: call `setfattr -n ceph.dir.pin -v 2` (I think) when create a new volume ODF-4.10 planning is concluded, so this will be considered for ODF-4.11. Yes, Rook currently sets max_mds to the desired number of mds active daemons, which by default (and with OCS) should be 1. See https://github.com/rook/rook/blob/990d92790b58d455cd28bf9773685b3540ff5bf0/pkg/daemon/ceph/client/filesystem.go#L179-L182 (In reply to Niels de Vos from comment #4) > 2. ceph-csi: call `setfattr -n ceph.dir.pin -v 2` (I think) when create a > new volume correction: `ceph fs subvolumegroup pin cephfilesystem-a csi distributed 1` (In reply to Patrick Donnelly from comment #6) > (In reply to Niels de Vos from comment #4) > > 2. ceph-csi: call `setfattr -n ceph.dir.pin -v 2` (I think) when create a > > new volume > > correction: `ceph fs subvolumegroup pin cephfilesystem-a csi distributed 1` Thanks. This should be doable from ceph csi side. https://github.com/ceph/ceph-csi/issues/2637 cover this in Ceph CSI. Created an epic for this. https://issues.redhat.com/browse/RHSTOR-3270 >hchiramm Could you please prioritize this?
the support has to be added in go-ceph and then consume it in Ceph CSI, We have trackers in upstream repos for the same.
For downstream side of things which this bz covers, this is not part of ODF 4.13.
|