Bug 927295
Summary: | GlusterFileSystem.listStatus() fetching optimization | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Steve Watt <swatt> |
Component: | rhs-hadoop | Assignee: | Bradley Childs <bchilds> |
Status: | CLOSED NOTABUG | QA Contact: | hcfs-gluster-bugs |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.0 | CC: | aavati, matt, rhs-bugs, vbellur |
Target Milestone: | --- | Flags: | bchilds:
needinfo-
|
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-08-22 22:18:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 854157 | ||
Bug Blocks: | 947153 |
Description
Steve Watt
2013-03-25 15:18:03 UTC
Could you provide a link to the DistributedFileSystem you're referencing? Also, what's the bug or alternative behavior you're proposing? I am referencing org.apache.hadoop.hdfs.DistributedFileSystem.listStatus() in Apache Hadoop 1.0.4. I am recommending that we understand why the implementation of this method differs within the DistributedFileSystem so that we can understand whether we need to incorporate the same approach for the GlusterFileSystem implementation of this method. I am not explicitly suggesting an alternative but we should at least understand why there is a difference. HDFS has been used at an extreme scale and there might be a non-obvious, but important reason for this difference that we should carefully consider. I did some research, and queried the gluster mailing list if a partial directory listing through FUSE layer is possible. Anand Avati respond that there is partial directory listing for GLUSTER but its only available through the C layer. Java's FILE mechanism doesn't tie into the right gluster calls for a partial C listing. If this was essential, we could write a sep C- command to do the partial listing for now, then if we move to a pure native client list this as a feature. I queried the o.a.h hdfs mailing list around this area, and i'm posting my question and the lists response below. TL;DR: partial directory listing so not to overwhelm hdfs' namenode nor introduce long hdfs locks & delays to other clients. My post: Could someone explain why the DistributedFileSystem's listStatus() method does a piecemeal assembly of a directory listing within the method? Is there a locking issue? What if an element is added to the the directory during the operation? What if elements are removed? It would make sense to me that the FileSystem class listStatus() method returned an Iterator allowing only partial fetching/chatter as needed. But I dont understand why you'd want to assemble a giant array of the listing chunk by chunk. Response: (Tod Lipcon)- The reasoning is that the NameNode locking is somewhat coarse grained. In older versions of Hadoop, before it worked this way, we found that listing large directories (eg with 100k+ files) could end up holding the namenode's lock for a quite long period of time and starve other clients. Additionally, I believe there is a second API that does the "on-demand" fetching of the next set of files from the listing as well, no? As for the consistency argument, you're correct that you may have a non-atomic view of the directory contents, but I can't think of any applications where this would be problematic. (Suresh Srinivas)- Additional reason, HDFS does not have limit on number of files in a directory. Some clusters had millions of files in a single directory. Listing such a directory resulted in very large responses, requiring large contiguous memory allocation in JVM (for the array) and unpredictable GC failures. consider as 'fixed' since this bug is investigative and not a fixable issue. if we can demonstrate performance problems around large directories then we should re-evaluate the performance problem itself and the solution at that point may be to use partial directory listings or something similar. |