1662830 – [RFE] Enable parallel-readdir by default for all gluster volumes

Bug 1662830 - [RFE] Enable parallel-readdir by default for all gluster volumes

Summary: [RFE] Enable parallel-readdir by default for all gluster volumes

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1510724
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-02 06:15 UTC by Raghavendra G
Modified:	2020-02-10 10:34 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:	1510724
Environment:
Last Closed:	2020-02-10 10:34:09 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	21973	0	None	Abandoned	performance/parallel-readdir: enable by default	2019-05-05 19:30:33 UTC

Comment 1 Raghavendra G 2019-01-02 06:22:16 UTC

For some performance data, see:
1. https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf
2. https://www.spinics.net/lists/gluster-users/msg34956.html
3. https://bugzilla.redhat.com/show_bug.cgi?id=1628807#c35

Comment 2 Raghavendra G 2019-01-02 06:42:18 UTC

Also see:
1. https://lists.gluster.org/pipermail/gluster-devel/2018-September/055419.html
2. https://lists.gnu.org/archive/html/gluster-devel/2013-09/msg00034.html

From a mail to gluster-devel titled "serialized readdir(p) across subvols and effect on performance"

<snip>
All,

As many of us are aware, readdir(p)s are serialized across DHT subvols. One of the intuitive first reactions for this algorithm is that readdir(p) is going to be slow.

However this is partly true as reading the contents of a directory is normally split into multiple readdir(p) calls and most of the times (when a directory is sufficiently large to have dentries and inode data is bigger than a typical readdir(p) buffer size - 128K when readdir-ahead is enabled and 4KB on fuse when readdir-ahead is disabled - on each subvol) a single readdir(p) request is served from a single subvolume (or two subvolumes in the worst case) and hence a single readdir(p) is not serialized across all subvolumes.

Having said that, there are definitely cases where a single readdir(p) request can be serialized on many subvolumes. A best example for this is a readdir(p) request on an empty directory. Other relevant examples are those directories which don't have enough dentries to fit into a single readdir(p) buffer size on each subvolume of DHT. This is where performance.parallel-readdir helps. Also, note that this is the same reason why having cache-size for each readdir-ahead (loaded as a parent for each DHT subvolume) way bigger than a single readdir(p) buffer size won't really improve the performance in proportion to cache-size when performance.parallel-readdir is enabled.

Though this is not a new observation [1] (I stumbled upon [1] after realizing the above myself independently while working on performance.parallel-readdir), I felt this as a common misconception (I ran into similar argument while trying to explain DHT architecture to someone new to Glusterfs recently) and hence thought of writing out a mail to clarify the same.


[1] https://lists.gnu.org/archive/html/gluster-devel/2013-09/msg00034.html

regards,
Raghavendra

</snip>

Comment 3 Worker Ant 2019-01-02 06:53:34 UTC

REVIEW: https://review.gluster.org/21973 (performance/parallel-readdir: enable by default) posted (#1) for review on master by Raghavendra G

Comment 5 Xavi Hernandez 2020-02-10 10:34:09 UTC

The patch is abandoned. I'm closing the bug for now.

Note You need to log in before you can comment on or make changes to this bug.