Bug 1070539
Summary: | Very slow Samba Directory Listing when many files or sub-directories | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Jeff Byers <jbyers> | |
Component: | gluster-smb | Assignee: | Ira Cooper <ira> | |
Status: | CLOSED EOL | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 3.4.2 | CC: | bengland, bugs, gluster-bugs, ira, jarrpa, mpillai, pb, vagarwal, zab | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1397179 (view as bug list) | Environment: | ||
Last Closed: | 2015-10-07 13:49:43 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Jeff Byers
2014-02-27 03:46:36 UTC
I didn't yet have the time to investigate it as thoroughly as Jeff did, but we're experiencing the same behavior at the setup in our institution. I have seen this as well and have narrowed it down to performing a stat call. You can see this by stracing ls calls. Essentially, unalias your ls call (\ls) and omit all options to ls. This effectively tells ls to simply do a readdir, which is very fast, even from gluster. Using ls with options like --color (common default alias FYI) or -l tells ls to stat everything it finds to determine what type of thing it is or get additional data about each thing. That stat call is apparently incredibly expensive within gluster. Because of this, we have re-architected the systems that tie into the datastore to avoid performing stat calls whenever possible. CAVEAT: bypassing stat calls means that you will not perform dynamic healing as, I believe, gluster ties into stat calls in order to check replica consistency. Thanks! Happy to hear that you found something. If I understood you correctly, the change you've proposed, to avoid performing stat calls whenever possible, does not affect gluster in distributed mode, right? No, my environment is a distributed, triple replicated volume spanning 24 raided bricks across 4 nodes. All told, 56TB usable. We have a custom map-reduce implementation that makes heavy use of gluster while avoided stat calls. I haven't seen any issue with it. I see. But for a simple gluster setup in distributed mode, without any replication, there would be no dynamic healing, if I understood it correctly. I would imagine so yes. A distribute only volume has no ability to heal. A dev should answer this though as I do not know what the stat call hooks gluster uses actually do in a distribute only volume. I am only assuming the hooks not only exist but also do something because avoiding them helps directory listing performance. Added Manoj and Ira to cc list. So does this customer need to use ACLs? This may be part of the reason that it's so slow. Gluster implemented a READDIRPLUS FOP that was intended to speed up precisely this case. However, I don't think READDIRPLUS includes ACL info and extended attr info (can a developer please confirm?). So if CIFS requires that ACL info or extended attributes be read before the listing can be completed, then you still have the same problem we had before READDIRPLUS, namely >= 1 round trip per file. By the way, READDIRPLUS does not return many files in one round trip, certainly nowhere near as many as it needs to in this case. But I don't think that's the cause of this problem. To confirm the analysis, could someone get a tcpdump file from the SMB server with # tcpdump -i any -w /tmp/a.tcpdump -s 9000 -c 100000 # gzip /tmp/a.tcpdump And post it in this bz as an attachment or in Red Hat's FTP dropbox site? Did any of the above tests turn off ACLs? Another way to confirm it is to use profiling commands in Gluster while you are running a browser test: gluster volume profile your-volume start gluster volume profile your-volume info > /tmp/junk.tmp for pass in `seq 1 20` ; do \ sleep 5 ; \ gluster volume profile your-volume info ; \ done > gvp.log And attach gvp.log to this bz. I have a python script that can reduce gvp.log to a spreadsheet which can show rates for Gluster RPC call types over time, we can then see more about how efficiently Gluster handled this workload and where the bottlenecks might be. GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed. AFAIK this problem has not been fixed, but a fix is feasible if the SMB plugin requests the xattrs it needs in READDIRPLUS FOP. Apparently the READDIRPLUS FOP does support fetching additional xattrs. http://www.gluster.org/community/documentation/index.php/Features/composite-operations#READDIRPLUS_used_to_prefetch_xattrs GlusterFS 3.4.x has reached end-of-life. If this bug still exists in a later release please reopen this and change the version or open a new bug. GlusterFS 3.4.x has reached end-of-life.\ \ If this bug still exists in a later release please reopen this and change the version or open a new bug. |