Bug 1492625 - Directory listings on fuse mount are very slow due to small number of getdents() entries
Summary: Directory listings on fuse mount are very slow due to small number of getdent...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1478411 1644389
Blocks: 1499605 1522710 1529075
TreeView+ depends on / blocked
 
Reported: 2017-09-18 10:48 UTC by Raghavendra G
Modified: 2018-10-30 17:19 UTC (History)
7 users (show)

Fixed In Version: glusterfs-4.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1478411
: 1499605 1522710 1529075 (view as bug list)
Environment:
Last Closed: 2018-03-15 11:17:56 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra G 2017-09-18 10:48:54 UTC
+++ This bug was initially created as a clone of Bug #1478411 +++

I have a GlusterFS 3.10 volume and mounted it with the fuse mount (`mount -t glusterfs`), both on Linux.

On it I have a directory with 1 million files in it.

It takes very long to `find /that/directory`.

Using `strace`, I believe I discovered (at least part of) the reason:

1501854600.235524 getdents(4, /* 20 entries */, 131072) = 1048
1501854600.235727 getdents(4, /* 20 entries */, 131072) = 1032
1501854600.235922 getdents(4, /* 20 entries */, 131072) = 1032

Despite `find` issuing `getdents()` with a large buffer size of 128K, glusterfs always only fills in 20 directory entries.

Each of those takes a network roundtrip (seemingly).

I also strace'd the brick on the server, where everything seems fine: There getdents() returns typically 631 entries, filling the 32KB buffer which the brick implementation uses for getdents().

If the find could also do ~631 per call, my directory listing would probably be 30x faster!

So it seems like _something_ in gluster or fuse caps the number of getdents results per call to roughly 20.

What could that be?

--- Additional comment from nh2 on 2017-08-04 10:02:07 EDT ---

Here's some more info that might hint at what the problem is:

Using the example program for getdents() from http://man7.org/linux/man-pages/man2/getdents.2.html and running it on my directory, I got this output (file names blacked out with "a"):

getdents(3, /* 16 entries */, 10240)    = 1552
--------------- nread=1552 ---------------
inode#    file type  d_reclen  d_off   d_name
-6047993476399939220  regular      88      17156  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-8239326137575567467  regular      88      17176  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-9058837543795989278  regular     112      17202  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-7474310353771725673  regular     112      17228  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-8618906312059539401  regular     112      17254  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-7247259159244687795  regular     112      17280  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-5665523655409565673  regular      88      17300  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-9046493272173795318  regular      88      17320  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-7953905749837767518  regular      88      17340  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-5289646910623071030  regular      88      17360  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-6314634794173123334  regular      88      17380  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-7954285670050863136  regular      88      17412  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-7849401699957688376  regular      88      17432  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-5006798607229018939  regular      88      17452  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-8323485281078848697  regular     112      17478  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-8119158990388255908  regular     112      17504  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
getdents(3, /* 20 entries */, 10240)    = 1056
--------------- nread=1056 ---------------
inode#    file type  d_reclen  d_off   d_name
-8208236586179861811  regular      48      17874  aaaaaaaaaaaaaaaaaaaaaaaaaa
-5119236985543845211  regular      56      17884  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-5836971644108853015  regular      48      17894  aaaaaaaaaaaaaaaaaaaaaaaaaa
-9155148501485991780  regular      56      17904  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-8305938675436910138  regular      56      17916  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-5221102094207784962  regular      56      17928  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-8599523819072935976  regular      48      17938  aaaaaaaaaaaaaaaaaaaaaaaaaa
-5829978250186564000  regular      56      17948  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-5911118020253503871  regular      48      17958  aaaaaaaaaaaaaaaaaaaaaaaaaa
-6764000214102234557  regular      56      17968  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-7204082494066266145  regular      56      17980  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-4637505561517422757  regular      48      17990  aaaaaaaaaaaaaaaaaaaaaaaaaa
-9108705813292338787  regular      56      18000  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-6331907578899300543  regular      48      18010  aaaaaaaaaaaaaaaaaaaaaaaaaa
-6095357471175923268  regular      56      18020  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-6954382210669410793  regular      56      18032  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-4974133016612012201  regular      48      18042  aaaaaaaaaaaaaaaaaaaaaaaaaa
-5903271582479185642  regular      56      18052  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-6924142753799783732  regular      56      18064  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-6781216297939192739  regular      48      18074  aaaaaaaaaaaaaaaaaaaaaaaaaa


It seems like when the file names are longer, (first block) getdents() returns less results -- 16 in the above case instead of the usual 20.

So I wonder if there's some fuse-related buffer that gets filled that results getdents() returning so few entries, and whether I can adjust it somehow.

--- Additional comment from nh2 on 2017-08-04 10:39:53 EDT ---

And some more patterns I observed stracing the `glusterfs` fuse process, using `-e 'read,writev'`:

[pid 18266] 1501856999.667820 read(10, "\27\3\3\0\34", 5) = 5
[pid 18266] 1501856999.668085 read(10, "\357U>\245\325n[t\3\277hq!Z\303\32\247\334\336\327N\311\317s\252\267\2\2", 28) = 28
[pid 18266] 1501856999.668317 read(10, "\27\3\3\0000", 5) = 5
[pid 18266] 1501856999.668411 read(10, "\357U>\245\325n[u8\340\250\214\305\7&/\331=\320\214\326\340\227\16\225@c\252\307\213\211V"..., 48) = 48
[pid 18266] 1501856999.668549 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.668597 read(10, "\357U>\245\325n[v\232\225\22/Jk\237\212\363b\215\212S\255\262K\227\347\6\275V-&E"..., 16408) = 16408
[pid 18266] 1501856999.668669 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.668719 read(10, "\357U>\245\325n[wz\237\v\377\252\236'\356\265\37Z\341\241_m\341\2612\346+Dm\224\233"..., 16408) = 16408
[pid 18266] 1501856999.668810 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.668862 read(10, "\357U>\245\325n[x\226\222\222\274\275\332D\304\3271\335M\340\300wq\210\200suU\372\326\17"..., 16408) = 16408
[pid 18266] 1501856999.668941 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.668988 read(10, "\357U>\245\325n[y\216i$\276\237\vA1\33\31:\312\257g\323\221\227\r^\21R/\3713"..., 16408) = 16408
[pid 18266] 1501856999.669093 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.669140 read(10, "\357U>\245\325n[z\205\224\361D\225V%\t\0tk\274K\3\2530U\202\311\222A\335G\266"..., 16408) = 16408
[pid 18266] 1501856999.669216 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.669265 read(10, "\357U>\245\325n[{\204\276\253\272g\354\376\207hPe\22\300\3771\30\313\336,\2729pgn"..., 16408) = 16408
[pid 18266] 1501856999.669345 read(10, "\27\3\3:L", 5) = 5
[pid 18266] 1501856999.669392 read(10, "\357U>\245\325n[|4mQ\334\227\4\206\274 \273E<?mb\334\255\210Q/\350Z\351w"..., 14924) = 14924
[pid 18266] 1501856999.673053 writev(8, [{iov_base="p\16\0\0\0\0\0\0\333B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.673469 writev(8, [{iov_base="p\16\0\0\0\0\0\0\334B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.673802 writev(8, [{iov_base="p\16\0\0\0\0\0\0\335B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.674173 writev(8, [{iov_base="p\16\0\0\0\0\0\0\336B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.674528 writev(8, [{iov_base="p\16\0\0\0\0\0\0\337B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.674873 writev(8, [{iov_base="p\16\0\0\0\0\0\0\340B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.675237 writev(8, [{iov_base="p\16\0\0\0\0\0\0\341B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.675539 writev(8, [{iov_base="p\16\0\0\0\0\0\0\342B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.675887 writev(8, [{iov_base="p\16\0\0\0\0\0\0\343B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.676248 writev(8, [{iov_base="p\16\0\0\0\0\0\0\344B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.676576 writev(8, [{iov_base="p\16\0\0\0\0\0\0\345B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.676893 writev(8, [{iov_base="p\16\0\0\0\0\0\0\346B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677159 writev(8, [{iov_base="p\16\0\0\0\0\0\0\347B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677254 writev(8, [{iov_base="p\16\0\0\0\0\0\0\350B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677344 writev(8, [{iov_base="p\16\0\0\0\0\0\0\351B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677436 writev(8, [{iov_base="p\16\0\0\0\0\0\0\352B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677533 writev(8, [{iov_base="p\16\0\0\0\0\0\0\353B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677633 writev(8, [{iov_base="p\16\0\0\0\0\0\0\354B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677730 writev(8, [{iov_base="p\16\0\0\0\0\0\0\355B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677826 writev(8, [{iov_base="p\16\0\0\0\0\0\0\356B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.677940 writev(8, [{iov_base="p\16\0\0\0\0\0\0\357B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678049 writev(8, [{iov_base="p\16\0\0\0\0\0\0\360B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678152 writev(8, [{iov_base="p\16\0\0\0\0\0\0\361B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678244 writev(8, [{iov_base="p\16\0\0\0\0\0\0\362B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678338 writev(8, [{iov_base="p\16\0\0\0\0\0\0\363B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678432 writev(8, [{iov_base="p\16\0\0\0\0\0\0\364B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678532 writev(8, [{iov_base="p\16\0\0\0\0\0\0\365B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678630 writev(8, [{iov_base="p\16\0\0\0\0\0\0\366B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678726 writev(8, [{iov_base="p\16\0\0\0\0\0\0\367B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678817 writev(8, [{iov_base="p\16\0\0\0\0\0\0\370B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678912 writev(8, [{iov_base="p\16\0\0\0\0\0\0\371B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.678999 writev(8, [{iov_base="p\16\0\0\0\0\0\0\372B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.679099 writev(8, [{iov_base="p\16\0\0\0\0\0\0\373B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18266] 1501856999.936511 read(10, "\27\3\3\0\34", 5) = 5
[pid 18266] 1501856999.936797 read(10, "\357U>\245\325n[}\343\35\272\266C\323\r\226_\362\275\372\355\1\275\367\177\221]\341", 28) = 28
[pid 18266] 1501856999.937045 read(10, "\27\3\3\0000", 5) = 5
[pid 18266] 1501856999.937188 read(10, "\357U>\245\325n[~]l\37\270\336 L\311~~p\t\260\200\242\275~\331%\310\26UX\210"..., 48) = 48
[pid 18266] 1501856999.937291 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.937342 read(10, "\357U>\245\325n[\177\0U\331\214\236\344cy\10\276\266\322\3447h\2\2668\347\266\20\6JM"..., 16408) = 16408
[pid 18266] 1501856999.937401 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.937436 read(10, "\357U>\245\325n[\200\250\372\274)\2\307\227\33_\221\3639\222\2059aI\340<r~\306rb"..., 16408) = 16408
[pid 18266] 1501856999.937498 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.937533 read(10, "\357U>\245\325n[\2018F\0365\336D(O\211\3370!\v\235\271\0275v\231-\2339^\253"..., 16408) = 16408
[pid 18266] 1501856999.937594 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.937628 read(10, "\357U>\245\325n[\202\353`\344\4\17-/\372\204[\277A\251\310n\2250\32S\276!|\361\333"..., 16408) = 16408
[pid 18266] 1501856999.937680 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.937715 read(10, "\357U>\245\325n[\203qbev\336\305\3750O\307\221U\367 @\262\202[p1\347\231\305\2"..., 16408) = 16408
[pid 18266] 1501856999.937770 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501856999.937819 read(10, "\357U>\245\325n[\204\242\245Y\177\302\251\316u\301\354zR>3{D\6gc\365\302\277un"..., 16408) = 16408
[pid 18266] 1501856999.937871 read(10, "\27\3\3:T", 5) = 5
[pid 18266] 1501856999.937905 read(10, "\357U>\245\325n[\205y}X\3221\325\21\275\321\330\371\353\310\362\21\36}Q\352\203\321\350\1\373"..., 14932) = 14932
[pid 18266] 1501856999.940995 writev(8, [{iov_base="p\16\0\0\0\0\0\0\374B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.941429 writev(8, [{iov_base="p\16\0\0\0\0\0\0\375B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.941725 writev(8, [{iov_base="p\16\0\0\0\0\0\0\376B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.941968 writev(8, [{iov_base="p\16\0\0\0\0\0\0\377B\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.942312 writev(8, [{iov_base="p\16\0\0\0\0\0\0\0C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.942604 writev(8, [{iov_base="p\16\0\0\0\0\0\0\1C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.942850 writev(8, [{iov_base="p\16\0\0\0\0\0\0\2C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.943087 writev(8, [{iov_base="p\16\0\0\0\0\0\0\3C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.943352 writev(8, [{iov_base="p\16\0\0\0\0\0\0\4C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.943618 writev(8, [{iov_base="p\16\0\0\0\0\0\0\5C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.943896 writev(8, [{iov_base="p\16\0\0\0\0\0\0\6C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.944181 writev(8, [{iov_base="p\16\0\0\0\0\0\0\7C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.944434 writev(8, [{iov_base="p\16\0\0\0\0\0\0\10C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.944665 writev(8, [{iov_base="p\16\0\0\0\0\0\0\tC\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.944917 writev(8, [{iov_base="p\16\0\0\0\0\0\0\nC\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.945182 writev(8, [{iov_base="p\16\0\0\0\0\0\0\vC\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.945445 writev(8, [{iov_base="p\16\0\0\0\0\0\0\fC\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.945695 writev(8, [{iov_base="p\16\0\0\0\0\0\0\rC\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.945962 writev(8, [{iov_base="p\16\0\0\0\0\0\0\16C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.946222 writev(8, [{iov_base="p\16\0\0\0\0\0\0\17C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.946467 writev(8, [{iov_base="p\16\0\0\0\0\0\0\20C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.946713 writev(8, [{iov_base="p\16\0\0\0\0\0\0\21C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.946955 writev(8, [{iov_base="p\16\0\0\0\0\0\0\22C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.947218 writev(8, [{iov_base="p\16\0\0\0\0\0\0\23C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.947446 writev(8, [{iov_base="p\16\0\0\0\0\0\0\24C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.947674 writev(8, [{iov_base="p\16\0\0\0\0\0\0\25C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.947904 writev(8, [{iov_base="p\16\0\0\0\0\0\0\26C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.948156 writev(8, [{iov_base="p\16\0\0\0\0\0\0\27C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.948446 writev(8, [{iov_base="p\16\0\0\0\0\0\0\30C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.948694 writev(8, [{iov_base="p\16\0\0\0\0\0\0\31C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.948943 writev(8, [{iov_base="p\16\0\0\0\0\0\0\32C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.949196 writev(8, [{iov_base="p\16\0\0\0\0\0\0\33C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501856999.949442 writev(8, [{iov_base="p\16\0\0\0\0\0\0\34C\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696


The read() syscalls that get the file names over from the gluster server happen in bursts, and the bursts are about 300 ms apart. (I'm using SSL so the file names aren't visible in the read calls.) They happen in 16 KB buffers (but here's some strange read()s of size 5 in between, not sure what those are for), and looking at the timings they don't seem to be blocked on network roundtrips, so that's good.

But the roughly 4K sized writev() syscalls each have a network roundtrip in betweeen. That seems strange to me, given that we've just read() all the data before in a batch. What's going on there?

--- Additional comment from nh2 on 2017-08-04 10:46:38 EDT ---

And for completeness, here's the unfiltered strace that shows that in between the writev()s there's only futex() and readv(). The readv() uses the same FD as the writev(), so that's probably localhost fuse communication.

[pid 18266] 1501857629.747928 read(10, "\357U>\245\325n\200#\360\22/9\370\205lL\322\226gk\233\255\2633[\10R\34j\334,'"..., 16408) = 16408
[pid 18266] 1501857629.747975 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501857629.748012 read(10, "\357U>\245\325n\200$^\213\37\351\tcZ\273\352\5k_\311'\345s\261\21\37:%\364\315\227"..., 16408) = 16408
[pid 18266] 1501857629.748060 read(10, "\27\3\3@\30", 5) = 5
[pid 18266] 1501857629.748091 read(10, "\357U>\245\325n\200%\211I\353\304\252\260\256\250t\257\247Z\5\215\7q3\232\236\217\277\373Y-"..., 16408) = 16408
[pid 18266] 1501857629.748137 read(10, "\27\3\3:P", 5) = 5
[pid 18266] 1501857629.748167 read(10, "\357U>\245\325n\200&\3363W\353\313\273b\344/\20\230\305\265#7\30782\371e\221\365\221\17"..., 14928) = 14928
[pid 18266] 1501857629.750987 writev(8, [{iov_base="p\16\0\0\0\0\0\0\331\311\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18269] 1501857629.751199 <... readv resumed> [{iov_base="P\0\0\0,\0\0\0\332\311\0\0\0\0\0\0\0.\0d\f\177\0\0\0\0\0\0\0\0\0\0"..., iov_len=80}, {iov_base="", iov_len=131072}], 2) = 80
[pid 18269] 1501857629.751321 futex(0x7f0d0c036ab4, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 18266] 1501857629.751346 poll([{fd=13, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}, {fd=10, events=POLLIN|POLLPRI|POLLERR|POLLHUP|POLLNVAL}], 2, -1 <unfinished ...>
[pid 18269] 1501857629.751373 <... futex resumed> ) = 1
[pid 18224] 1501857629.751387 <... futex resumed> ) = 0
[pid 18269] 1501857629.751398 readv(8,  <unfinished ...>
[pid 18224] 1501857629.751410 futex(0x7f0d0c036a60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 18224] 1501857629.751459 writev(8, [{iov_base="p\16\0\0\0\0\0\0\332\311\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696
[pid 18224] 1501857629.751536 futex(0x7f0d0c036ab0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1501857749, tv_nsec=0}, 0xffffffff <unfinished ...>
[pid 18269] 1501857629.751619 <... readv resumed> [{iov_base="P\0\0\0,\0\0\0\333\311\0\0\0\0\0\0\0.\0d\f\177\0\0\0\0\0\0\0\0\0\0"..., iov_len=80}, {iov_base="", iov_len=131072}], 2) = 80
[pid 18269] 1501857629.751672 futex(0x7f0d0c036ab0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 18224] 1501857629.751706 <... futex resumed> ) = 0
[pid 18269] 1501857629.751715 readv(8,  <unfinished ...>
[pid 18224] 1501857629.751726 futex(0x7f0d0c036a60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 18224] 1501857629.751772 writev(8, [{iov_base="p\16\0\0\0\0\0\0\333\311\0\0\0\0\0\0", iov_len=16}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=3680}], 2) = 3696

--- Additional comment from Raghavendra G on 2017-09-06 00:00:28 EDT ---

getdents returning small number of entries, looks like a duplicate of bz 1356453.

--- Additional comment from Csaba Henk on 2017-09-06 02:07:11 EDT ---



--- Additional comment from nh2 on 2017-09-06 19:01:18 EDT ---

This bug is NOT fixed with Gluster 3.12, I just tried it:

$ gluster --version
glusterfs 3.12.0

$ for x in `seq 1 1000`; do echo $x; touch /myglustermountdir/$x; done

$ strace find /myglustermountdir
...
getdents(4, /* 24 entries */, 131072)   = 576
getdents(4, /* 23 entries */, 131072)   = 552
getdents(4, /* 23 entries */, 131072)   = 552
getdents(4, /* 23 entries */, 131072)   = 552
...

--- Additional comment from Poornima G on 2017-09-15 03:38:42 EDT ---

(In reply to nh2 from comment #6)
> This bug is NOT fixed with Gluster 3.12, I just tried it:
> 
> $ gluster --version
> glusterfs 3.12.0
> 
> $ for x in `seq 1 1000`; do echo $x; touch /myglustermountdir/$x; done
> 
> $ strace find /myglustermountdir
> ...
> getdents(4, /* 24 entries */, 131072)   = 576
> getdents(4, /* 23 entries */, 131072)   = 552
> getdents(4, /* 23 entries */, 131072)   = 552
> getdents(4, /* 23 entries */, 131072)   = 552
> ...


Thanks for the data. So here is my analysis and possible solution:

1. You are right, even after the BZ bug 1356453 is fixed, the getdents only fetches 20-26 entries.

2. The reason for the same is FUSE kernel module behaviour:
   FUSE, irrespective of what buffer getdents sends, fuse limits it to PAGE_SIZE 4k(4096) and then sends it to gluster. Thus because of fuse, gluster getdent buffer is limited to 4k. But when you assign the gedents buffer size to 4K and run the same on plain XFS, there will be ~128 entries returned, which is much higher than 20-26 returned by FUSE/Gluster. The reason for this is, the structure fuse uses to getdents is (struct fuse_direntplus) and the buffer size mentioned in application is for (struct linux_dirent). The sizeof(struct fuse_direntplus) is ~158 and sizeof(struct linux_dirent) is ~24, thus the number od dentries that can be accommodated by fuse getdents buffer in 4K is ~5 times less than that can be accommodated by application getdents.

   Thus we are limited by the FUSE buffer size and struture type.

3. But the saving grace is, each getdent call is not a network round trip like you mentioned in initial comment. This is because of readdir-ahead caching in gluster. Gluster basically pre-fetches the dentries and keeps it in cache. The pre-fetching buf size is 128K and cache buffer size is 10MB default. This 99% of the getdents gets served from this cache and doesn't result in network round trip. So, even if FUSE fixes getdents buffer thing, you may not see 30X improvement as you expected, but some percentage only, as it saves lots of getdents call and kernel to user context switches.


I would suggest we close this bug on gluster, and raise it in FUSE kernel?

An alternative for this, we can implement glfs_getdents in libgfapi(system call equivalent of GlusterFS and get rid of the FUSE interference) which can be integrated to other applications or we can write a wrapper and provide it as a command.

--- Additional comment from nh2 on 2017-09-15 09:53:27 EDT ---

Thanks for your very insightful reply!

I think we should definitely bring this up as a FUSE issue, I can imagine gluster and other FUSE based software would have much better performance if they weren't limited to 4 KB per syscall.

Do you know where this PAGE_SIZE limit is implemented, or would you even be able to file this issue? I know very little about fuse internals and don't feel prepared to make a high quality issue report on this topic with them yet.

> each getdent call is not a network round trip like you mentioned in initial comment

You are right, for a test I just increased latency tenfold with `tc qdisc add dev eth0 root netem delay 2ms` and the time of getdents() calls stayed the same (with an occasional, much slower getdents() call when it had to fetch new data).

I assumed it was a network roundtrip because the time spent per syscall is roughly my LAN network roundtrip (0.2 ms), but that just happened to be how slow the syscalls were independent of the network.

> I would suggest we close this bug on gluster, and raise it in FUSE kernel?

Personally I would prefer if we could keep it open until getdents() performance is fixed; from a Gluster user's perspective, it is a Gluster problem that directory listings are slow, and the fact that FUSE plays a role in it is an implementation detail.
Also, as you say, FUSE allowing to use larger buffer sizes may not be the only thing needed to improve the performance.

I did a couple more measurements that suggest that there are still large integer factors unexplained:

Using `strace -f -T -e getdents` on the example program from `man getdents` (http://man7.org/linux/man-pages/man2/getdents.2.html) with BUF_SIZE changed from 1024 to 10240 and 131072 (128K), running it against XFS and the fuse mount like:

  $ gcc getdents-listdir.c -O2 -o listdir
  $ strace -f -T -e getdents ./listdir /data/brick/.../mydir > /dev/null
  $ strace -f -T -e getdents ./listdir /fuse-mount/.../mydir > /dev/null

With `strace -T`, the values in <brackets> is the time spent in the syscall.

Results for BUF_SIZE = 10240:

gluster fuse: getdents(3, /*  20 entries */, 10240) =  1040 <0.000199>
XFS:          getdents(3, /* 195 entries */, 10240) = 10240 <0.000054>

Results for BUF_SIZE = 131072:

gluster fuse: getdents(3, /*   20 entries */, 131072) =   1040 <0.000199>
XFS:          getdents(3, /* 2498 entries */, 131072) = 131072 <0.000620>

This shows that, almost independent of BUF_SIZE, computing bytes per time,

* getdents() performance on XFS is around 190 MB/s
* getdents() performance on gluster fuse is around 5 MB/s

That's almost a 40x performance difference (and as you say, no networking is involved).

Even when taking into account the mentioned 5x space overhead of `fuse_direntplus` vs `linux_dirent`, and assuming that 5x space overhead means 5x increased wall time, there's a factor 8x being lost.

Why might an individual getdents() call be that much slower on fuse?

--- Additional comment from nh2 on 2017-09-16 20:01:08 EDT ---

I have now written a kernel patch for fuse that makes the readdir use 32 pages (128 KiB) instead of 1 page (4 KiB):

  https://github.com/nh2/linux/commit/bedbf74a1a6d2958e719ed77a6a4f0baac79deab

(on my branch https://github.com/nh2/linux/compare/v4.9-fuse-large-readdir).

For sshfs (another FUSE program) this brings an immediate improvement:

With 1M files prepared like this:

  
  mkdir sshfsmount-1M sshfsdir-1M
  cd sshfsdir-1M
  seq 1 1000000 | xargs touch
  cd ..
  sshfs localhost:sshfsdir-1M sshfsmount-1M


and my example program adapted from the getdents() man page (code here: https://gist.github.com/nh2/6ebd9d5befe130fd6faacd1024ead3d7) I get an immediate improvement for `./getdents-silent sshfsmount-1M/`:

Without kernel patch (1 page): 0.267021
With 32-page kernel patch:     0.179651

(You have to run twice in quick succession to get these results, because sshfs discards its cache very quickly and we want to measure FUSE syscall overhead, not it fetching the data over sshfs. If you wait too long it may take a minute for a full fetch.)

That's a 1.5x speedup for the entire program run; but because sshfs does some initialisation work, we should look at the actual strace outputs instead:

Without kernel patch (1 page):

# strace -tttT -f -e getdents ./listdir-silent sshfsmount-1M/
1505605898.572720 getdents(3, /* 128 entries */, 131072) = 4064 <47.414827>
1505605945.992171 getdents(3, /* 128 entries */, 131072) = 4096 <0.000036>
1505605945.992311 getdents(3, /* 128 entries */, 131072) = 4096 <0.000031>

With 32-page kernel patch:

# strace -tttT -f -e getdents ./listdir-silent sshfsmount-1M/
1505605890.435250 getdents(3, /* 4096 entries */, 131072) = 130776 <60.054614>
1505605950.494406 getdents(3, /* 4096 entries */, 131072) = 130736 <0.000153>
1505605950.494693 getdents(3, /* 4096 entries */, 131072) = 130728 <0.000157>

Here you can see first the initial fetching work (depends on what's in SSHFS's cahe at that point), and then the real syscalls.

Using 32 pages has increased the bytes per syscall by 32x, and the time by ~5x, so it's approximately:

  6x faster

So landing such a patch seems beneficial to remove syscall overhead.

It would certainly help SSHFS cached directory listings.

Next, back to gluster.

--- Additional comment from Csaba Henk on 2017-09-16 20:38:24 EDT ---

Nice work! I suggest to submit it to the fuse-devel ML.

* * *

As of glusterfs, another possible issue came up. The buffer that a given xlator fills with dirents is fixed size (during the handling of a given readdir[p] fop). However, various xlators operate with various dirent flavors (eg. posix with system dirents, fuse wuth fuse dirents) so when the dirent holding buffer is passed around between xlators, the converted dirents won't fill optimally the next xlator's buffer. Practically, the fuse dirent is bigger, so not all of the entries received from the underlying xlator will fit in the dirent buffer of fuse after conversion. The rest will be discarded, and will have to be read again on next getdents call.

> 3. But the saving grace is, each getdent call is not a network round trip like you > mentioned in initial comment. This is because of readdir-ahead caching in gluster.

Alas, the above described phenomenon defeats this too: because of the re-read constraint the dir offsets of subsequent getdents' won't be monotonic, upon which readdir-ahead deactivates itself.

Whether / to what rate does this occur might depend on the configuration.

@nh2: Therefore we'd like you to kindly ask to share your volume info and TRACE level logs, from mount on to observing the small getdents() calls.

--- Additional comment from nh2 on 2017-09-16 21:57:39 EDT ---

So for SSHFS we got an improvement from ~113 MB/s to ~854  MB/s getdents() performance.

Now running a similar test on gluster, for time's sake with only 10K files.


My gluster config:

  Volume Name: myvol
  Type: Replicate
  Volume ID: ...
  Status: Started
  Snapshot Count: 0
  Number of Bricks: 1 x 3 = 3
  Transport-type: tcp
  Bricks:
  Brick1: 10.0.0.1:/data/brick
  Brick2: 10.0.0.2:/data/brick
  Brick3: 10.0.0.3:/data/brick
  Options Reconfigured:
  nfs.disable: on
  transport.address-family: inet
  client.ssl: on
  server.ssl: on
  storage.linux-aio: on
  performance.io-thread-count: 64
  performance.readdir-ahead: on
  server.event-threads: 32
  client.event-threads: 32
  server.outstanding-rpc-limit: 64
  cluster.lookup-unhashed: auto
  performance.flush-behind: on
  performance.strict-write-ordering: off
  performance.high-prio-threads: 64
  performance.normal-prio-threads: 64
  performance.low-prio-threads: 64
  performance.write-behind-window-size: 10MB
  cluster.ensure-durability: on
  performance.lazy-open: yes
  cluster.use-compound-fops: off
  performance.open-behind: on
  features.cache-invalidation: off
  performance.quick-read: off
  performance.read-ahead: off
  performance.stat-prefetch: off
  changelog.rollover-time: 1
  geo-replication.indexing: on
  geo-replication.ignore-pid-check: on
  changelog.changelog: on

Files created like

  # touch /glustermount/largenames-10k/1234567890123456789012345678901234567890-file{1..10000}

Also doing quickly repeated runs to obtain these numbers.
All of these benchmarks were taken on an AWS t2.small medium machine (these machines are generally bad for benchmarking, but I've checked that the results are consistent; and importantly the numbers are relative to each other).

Without kernel patch (1 page):

  # strace -w -c -f -e getdents ./listdir-silent /glustermount/largenames-10k/
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.213868         384       557           getdents

With 32-page kernel patch:

  # strace -w -c -f -e getdents ./listdir-silent /glustermount/largenames-10k
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.211732       11763        18           getdents

Almost no improvement!

Let's look at the individual getdents() invocations:

Without kernel patch (1 page):

  # strace -Tttt -f -e getdents ./listdir-silent /glustermount/largenames-10k/
  1505608150.612771 getdents(3, /* 19 entries */, 131072) = 1272 <0.007789>
  -- notice large initial time above --
  1505608150.620606 getdents(3, /* 18 entries */, 131072) = 1296 <0.000117>
  1505608150.620738 getdents(3, /* 18 entries */, 131072) = 1296 <0.000090>
  1505608150.620842 getdents(3, /* 18 entries */, 131072) = 1296 <0.000084>
  -- 30 similar lines omitted --
  1505608150.623666 getdents(3, /* 18 entries */, 131072) = 1296 <0.010920>
  -- notice large time above --
  1505608150.634608 getdents(3, /* 18 entries */, 131072) = 1296 <0.000079>
  1505608150.634701 getdents(3, /* 18 entries */, 131072) = 1296 <0.000083>

With 32-page kernel patch:

  # strace -Tttt -f -e getdents ./listdir-silent /glustermount/largenames-10k/
  1505608076.391872 getdents(3, /* 604 entries */, 131072) = 43392 <0.022552>
  1505608076.414921 getdents(3, /* 602 entries */, 131072) = 43344 <0.001411>
  1505608076.416477 getdents(3, /* 601 entries */, 131072) = 43272 <0.025793>
  1505608076.442688 getdents(3, /* 599 entries */, 131072) = 43128 <0.000730>
  1505608076.443521 getdents(3, /* 601 entries */, 131072) = 43272 <0.024431>
  1505608076.468360 getdents(3, /* 599 entries */, 131072) = 43128 <0.001108>
  -- 12 similar lines omitted --

Observations:

1) Every second getdents() in the 32-page case is super fast. I suspect that this is the effect of `readdir-ahead: on` (wasn't that also 128 KB, I read somewhere?). Same thing for the 1-page case, where only every ~30th getdents() hits the network to glusterfsd.
2) Even in the 32-page case, not the entire buffer provided to getdents() (131072 bytes) are filled, only approximately 1/3 is used. Same thing for the 1-page case, about 1/3 of the 4K fuse buffer is filled. When I first saw this I attributed it to the "fuse_direntplus vs linux_dirent" topic that Poornima mentioned, but SSHFS doesn't seem to have this problem and always fills its buffers, so maybe gluster is doing something strange here?
3) The performance is at a miserable ~1.7 MB/s in both cases, much worse than sshfs.

So I started investigating what it (the glusterfs FUSE mount process) is doing.

First thing I noticed: I was wrong when I said

> and as you say, no networking is involved

after doing my test with `tc`: I had only tested that no inter-brick, or FUSE to other-brick networking was happening.

But what is certainly happening is localhost networking!

If I use

  # tc qdisc add dev lo root netem delay 200ms

thus making my `ping localhost` be 400 ms, then all getents() calls get significantly slower:

  # strace -Tttt -f -e getdents ./listdir-silent /mount/glustermount-10k
  1505580070.060286 getdents(3, /* 604 entries */, 131072) = 43392 <0.824180>
  1505580070.885190 getdents(3, /* 602 entries */, 131072) = 43344 <0.000909>
  1505580070.886592 getdents(3, /* 601 entries */, 131072) = 43272 <0.824842>
  1505580071.711986 getdents(3, /* 599 entries */, 131072) = 43128 <0.000693>
  1505580071.713164 getdents(3, /* 601 entries */, 131072) = 43272 <0.826009>
  1505580072.539722 getdents(3, /* 599 entries */, 131072) = 43128 <0.000702>

So there is networking happening, in my case it just happened over the `lo` interface instead of over `eth0` because the machine on which I have the mount is also one of the 3-replica bricks!

A quick `ping localhost` shows that localhost latency is ~50 us; I find that is a lot (but apparently that's just how it is), and it's probably not very relevant here (at least the following strace doesn't indicate so on the userspace level).

So I ran strace against te `glusterfs` FUSE process:

  # strace -tttT -f -p THEPID

What I could see was a FUSE request/response loop:
There is a readv() against /dev/fuse started, with which the process asks fuse for the next operation (e.g. a READDIR opcode); this readv() blocks as long as the FUSE kernel side is still waiting on us, the userspace process, to send it the result of the currently running operation. Once we send it (e.g. the contents of a readdir response) with a writev(), the readv() returns and gives us the next opcode to work on.

As a result, the fulfillment of each getdents() request is bracketed between a `<... readv resumed>` and a final `writev(8, ...)` syscall.

I have posted one of those brackets at

  https://gist.github.com/nh2/163ffea5bdc16b3a509c4b262b1d382a

Each such getdents() fulfillment takes ~29 ms. I find that quite a lot just to fill in ~40000 Byte above, so I analysed what major time sinks are within these 29 ms in this strace output of the `glusterfs` FUSE mount process. You can find that analysis here:

  https://gist.github.com/nh2/163ffea5bdc16b3a509c4b262b1d382a#gistcomment-2205164

Quite some CPU time is spent (again, seemingly way too much for providing 40000 Bytes), but also significant time is spent waiting on communication with the glusterfsd socket (which is FD 10), with poll() and apparently a blocking write().

(There's also some precisely sized reads, such as those 5-byte reads, for which I wonder if they could be combined with larger reads to reduce the amount of syscalls, but they are so fast compared to the rest that's going on that they don't matter for the current investigation).

As there's lots of waiting for glusterfsd in there, I started to strace glusterfsd instead, while running `./listdir-silent` against the fuse mount.

That immediately revealed why getdents() on glusterfs is so much slower than on SSHFS (sorry long lines):

  [pid   972] 1505589830.074176 lstat("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000007>
  [pid   972] 1505589830.074219 lgetxattr("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", "tr"..., 0x7f5bf697e370, 255) = -1 ENODATA (No data available) <0.000008>
  [pid   972] 1505589830.074262 lgetxattr("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", "tr"..., 0x7f5bf697e370, 255) = -1 ENODATA (No data available) <0.000007>
  [pid   972] 1505589830.074302 lgetxattr("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", "tr"..., 0x7f5bf697e370, 255) = -1 ENODATA (No data available) <0.000006>
  [pid   972] 1505589830.074343 lgetxattr("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", "sy"..., 0x7f5bf697e370, 255) = -1 ENODATA (No data available) <0.000007>
  [pid   972] 1505589830.074383 lgetxattr("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", "sy"..., 0x7f5bf697e370, 255) = -1 ENODATA (No data available) <0.000007>
  [pid   972] 1505589830.074423 lgetxattr("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7552", "tr"..., 0x7f5bf697e370, 255) = -1 ENODATA (No data available) <0.000007>
  [pid   972] 1505589830.074464 lstat("/data/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001/largenames-10k/1234567890123456789012345678901234567890-file7553", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000006>
  -- same lgetxattr() calls as above repeated --

Apparently when gluster tries to fulfil a getdents() FUSE request, each individual file returned by my XFS brick file system is stat()ed afterwards and also gets 6 lgetxattr() calls (all returning ENODATA).

I guess that's just part of how gluster works (e.g. to determine based on xattrs whether a file shall actually be shown in the output or not), but it wasn't obvious to me when I started debugging this, and it certainly makes the business much slower than just forwarding some underlying XFS getdents() results.

As mentioned in my Github gist from before, in contrast to stracing `glusterfs`, when stracing `glusterfsd` the strace overhead actually becomes very noticeable (the ./listdir-silent run becomes 5x slower when strace is attached), and the above -ttt times are not accurate and incorporate this strace overhead (you see the <...> times emitted by -T, roughly 6 us each, don't add up to the ~40 us steps -ttt timestamps).

So I switched to measure with `perf` instead, using e.g.

  # perf record -e 'syscalls:sys_*' -p 918

which had 6x overhead for an entire ./listdir-silent run over the 10k files, and then, for lower overhead, specific syscalls (since then I already knew what syscalls were going on, so I couldn't miss any in between),

  # perf record -e 'syscalls:sys_enter_newlstat' -e 'syscalls:sys_exit_newlstat' -e 'syscalls:sys_enter_lgetxattr' -e 'syscalls:sys_exit_lgetxattr' -p 918
  # perf script

which shows much less overhead in the profile:

  26239.494406:  syscalls:sys_enter_newlstat
  26239.494408:   syscalls:sys_exit_newlstat
  26239.494409: syscalls:sys_enter_lgetxattr
  26239.494411:  syscalls:sys_exit_lgetxattr
  26239.494412: syscalls:sys_enter_lgetxattr
  26239.494413:  syscalls:sys_exit_lgetxattr
  26239.494414: syscalls:sys_enter_lgetxattr
  26239.494416:  syscalls:sys_exit_lgetxattr
  26239.494417: syscalls:sys_enter_lgetxattr
  26239.494418:  syscalls:sys_exit_lgetxattr
  26239.494419: syscalls:sys_enter_lgetxattr
  26239.494420:  syscalls:sys_exit_lgetxattr
  26239.494421: syscalls:sys_enter_lgetxattr
  26239.494423:  syscalls:sys_exit_lgetxattr
  26239.494424:  syscalls:sys_enter_newlstat
  26239.494426:   syscalls:sys_exit_newlstat

and also increased total run time only by ~2x to ~0.5 seconds.

I suspect that the profile shows that the amount of syscalls gluster makes on each file (and the fact that it has to make per-file syscalls for getdents()) is somewhat problematic: Each file eats ~18 us, that's roughly 50K files per second, or 100K per second assuming 2x profiling overhead.
Even if there was no networking involved at all, just stat()ing & getfattr()ing the 10K files would take 0.1 seconds, which is half of the 0.2 seconds that the entire ./listdir-silent takes.

I wonder if some future version of Gluster should not store all its metadata info in extended attributes, as putting it there requires lots of syscalls to retrieve it? I guess this is one reason for Gluster's low small-file and directory performance?

I suspect that if it had all this info in memory, or could obtain it via a few-system-calls-large-reads method (e.g. a single index file or even mmap), it could do better at this.

--- Additional comment from nh2 on 2017-09-16 21:59:41 EDT ---

@Csaba: FYI the above post wasn't written in reply to yours, I was still typing more stuff down (and have one more post coming), will take a look at your reply immediately afterwards! :)

--- Additional comment from nh2 on 2017-09-16 22:02:18 EDT ---

Another weird thing that I haven't demystified yet:


  # strace -wc -f -e getdents,lstat ls -1U /glustermount/largenames-10k/ > /dev/null
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.208856       11603        18           getdents
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.208856                    18           total

  # strace -wc -f -e getdents,lstat ls -lU /glustermount/largenames-10k/ > /dev/null
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   74.99    0.168202          17     10001           lstat
   25.01    0.056089        3116        18           getdents
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.224291                 10019           total


Notice the difference ls `-1` vs `-l`, the former stat()s all files and the latter doesn't.

For some weird reason, when I also stat() as above with `-l`, getdents() magically get faster!

It would make sense to me if the present of getdents() made stat()s faster (e.g. as part of some `stat-prefetch`, though I have that off according to my posted volume config), but not the other way around.

What might be going on here?

--- Additional comment from nh2 on 2017-09-16 22:10:44 EDT ---

(In reply to Csaba Henk from comment #10)
> @nh2: Therefore we'd like you to kindly ask to share your volume info and
> TRACE level logs, from mount on to observing the small getdents() calls.

@Csaba: OK, I have read your comment now.

My volume info is in the post further above, would you mind to shortly describe or link how I can set the log level to TRACE and which exact file(s) I should provide?

--- Additional comment from Poornima G on 2017-09-18 02:20:28 EDT ---

I haven't yet gone through the comments in detail, thats a lot of analysis. Thanku.

Meanwhile, you mentioned stat and getxattrs are being performed, so we have stat-prefetch which caches the stat and xattrs of a file/dir. Please execute the following on your test system and see if you get improvements:

$ gluster vol set <VOLNAME> group metadata-cache
Also please increase the value of network.inode-lru-limit to 200000 and check if it improves further.
$ gluster vol set <VOLNAME> network.inode-lru-limit 200000

Also there are few patches(WIP), that were a result of debugging performance issues in readdir:
https://review.gluster.org/#/c/17985/
https://review.gluster.org/#/c/18242/

I suggest to try these patches if possible and the metadata-cache to see if the performance reaches any satisfactory levels?

--- Additional comment from Raghavendra G on 2017-09-18 04:17:43 EDT ---

(In reply to nh2 from comment #14)
> (In reply to Csaba Henk from comment #10)
> > @nh2: Therefore we'd like you to kindly ask to share your volume info and
> > TRACE level logs, from mount on to observing the small getdents() calls.
> 
> @Csaba: OK, I have read your comment now.
> 
> My volume info is in the post further above, would you mind to shortly
> describe or link how I can set the log level to TRACE and which exact
> file(s) I should provide?

Please set following options to get logs at TRACE log-level:
# gluster volume set <volname> diagnostics.client-log-level TRACE
# gluster volume set <volname> diagnostics.brick-log-level TRACE

After this run your tests and attach logfiles of gluster mount and brick processes (usually found /var/log/glusterfs/ and /var/log/glusterfs/bricks)

--- Additional comment from Worker Ant on 2017-09-18 06:38:47 EDT ---

REVIEW: https://review.gluster.org/18312 (cluster/dht: don't overfill the buffer in readdir(p)) posted (#1) for review on master by Raghavendra G (rgowdapp@redhat.com)

--- Additional comment from Worker Ant on 2017-09-18 06:42:18 EDT ---

REVIEW: https://review.gluster.org/18312 (cluster/dht: don't overfill the buffer in readdir(p)) posted (#2) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 1 Worker Ant 2017-09-18 10:49:49 UTC
REVIEW: https://review.gluster.org/18312 (cluster/dht: don't overfill the buffer in readdir(p)) posted (#3) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 2 Worker Ant 2017-09-19 04:18:39 UTC
REVIEW: https://review.gluster.org/18312 (cluster/dht: don't overfill the buffer in readdir(p)) posted (#4) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 3 Worker Ant 2017-09-19 04:18:44 UTC
REVIEW: https://review.gluster.org/18323 (cluster/dht: populate inode in dentry for single subvolume dht) posted (#1) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 4 Worker Ant 2017-09-28 04:40:01 UTC
REVIEW: https://review.gluster.org/18323 (cluster/dht: populate inode in dentry for single subvolume dht) posted (#2) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 5 Worker Ant 2017-09-28 04:40:08 UTC
REVIEW: https://review.gluster.org/18312 (cluster/dht: don't overfill the buffer in readdir(p)) posted (#5) for review on master by Raghavendra G (rgowdapp@redhat.com)

Comment 6 Worker Ant 2017-12-02 11:47:57 UTC
COMMIT: https://review.gluster.org/18312 committed in master by \"Raghavendra G\" <rgowdapp@redhat.com> with a commit message- cluster/dht: don't overfill the buffer in readdir(p)

Superflous dentries that cannot be fit in the buffer size provided by
kernel are thrown away by fuse-bridge. This means,

* the next readdir(p) seen by readdir-ahead would have an offset of a
dentry returned in a previous readdir(p) response. When readdir-ahead
detects non-monotonic offset it turns itself off which can result in
poor readdir performance.

* readdirp can be cpu-intensive on brick and there is no point to read
 all those dentries just to be thrown away by fuse-bridge.

So, the best strategy would be to fill the buffer optimally - neither
overfill nor underfill.

Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84
BUG: 1492625
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>

Comment 7 Worker Ant 2017-12-02 11:48:22 UTC
COMMIT: https://review.gluster.org/18323 committed in master by \"Raghavendra G\" <rgowdapp@redhat.com> with a commit message- cluster/dht: populate inode in dentry for single subvolume dht

... in readdirp response if dentry points to a directory inode. This
is a special case where the entire layout is stored in one single
subvolume and hence no need for lookup to construct the layout

Change-Id: I44fd951e2393ec9dac2af120469be47081a32185
BUG: 1492625
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>

Comment 8 Shyamsundar 2018-03-15 11:17:56 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.