Bug 927295

Summary:	GlusterFileSystem.listStatus() fetching optimization
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Steve Watt <swatt>
Component:	rhs-hadoop	Assignee:	Bradley Childs <bchilds>
Status:	CLOSED NOTABUG	QA Contact:	hcfs-gluster-bugs
Severity:	medium	Docs Contact:
Priority:	medium
Version:	2.0	CC:	aavati, matt, rhs-bugs, vbellur
Target Milestone:	---	Flags:	bchilds: needinfo-
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-08-22 22:18:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	854157
Bug Blocks:	947153

Description Steve Watt 2013-03-25 15:18:03 UTC

Description of problem:

The DistributedFileSystem.listStatus() method first analyzes the size of the directory and then manages the retrieval of the directory objects according to the amount of objects in the directory. Conversely, the same method in GlusterFileSystem retrieves all the directory objects in a single operation. This might not be a bug, but we should understand why the approach differs within HDFS and whether we might want to implement the same behavior within GlusterFileSystem as well.

Comment 2 Bradley Childs 2013-03-26 21:05:42 UTC

Could you provide a link to the DistributedFileSystem you're referencing? 

Also, what's the bug or alternative behavior you're proposing?

Comment 3 Steve Watt 2013-03-27 14:34:48 UTC

I am referencing org.apache.hadoop.hdfs.DistributedFileSystem.listStatus() in Apache Hadoop 1.0.4. I am recommending that we understand why the implementation of this method differs within the DistributedFileSystem so that we can understand whether we need to incorporate the same approach for the GlusterFileSystem implementation of this method.  I am not explicitly suggesting an alternative but we should at least understand why there is a difference. HDFS has been used at an extreme scale and there might be a non-obvious, but important reason for this difference that we should carefully consider.

Comment 4 Bradley Childs 2013-04-29 15:59:17 UTC

I did some research, and queried the gluster mailing list if a partial directory listing through FUSE layer is possible.

Anand Avati respond that there is partial directory listing for GLUSTER but its only available through the C layer.  Java's FILE mechanism doesn't tie into the right gluster calls for a partial C listing.

If this was essential, we could write a sep C- command to do the partial listing for now, then if we move to a pure native client list this as a feature.

Comment 5 Bradley Childs 2013-05-02 16:59:24 UTC

I queried the o.a.h hdfs mailing list around this area, and i'm posting my question and the lists response below. 

TL;DR: partial directory listing so not to overwhelm hdfs' namenode nor introduce long hdfs locks & delays to other clients.


My post:

Could someone explain why the DistributedFileSystem's listStatus() method does a piecemeal assembly of a directory listing within the method?

Is there a locking issue? What if an element is added to the the directory during the operation?  What if elements are removed?

It would make sense to me that the FileSystem class listStatus() method returned an Iterator allowing only partial fetching/chatter as needed.  But I dont understand why you'd want to assemble a giant array of the listing chunk by chunk.

Response:
(Tod Lipcon)-
The reasoning is that the NameNode locking is somewhat coarse grained. In
older versions of Hadoop, before it worked this way, we found that listing
large directories (eg with 100k+ files) could end up holding the namenode's
lock for a quite long period of time and starve other clients.

Additionally, I believe there is a second API that does the "on-demand"
fetching of the next set of files from the listing as well, no?

As for the consistency argument, you're correct that you may have a
non-atomic view of the directory contents, but I can't think of any
applications where this would be problematic.

(Suresh Srinivas)-
Additional reason, HDFS does not have limit on number of files in a
directory. Some
clusters had millions of files in a single directory. Listing such a
directory
resulted in very large responses, requiring large contiguous memory
allocation in JVM
(for the array) and unpredictable GC failures.

Comment 6 Bradley Childs 2013-08-22 22:18:50 UTC

consider as 'fixed' since this bug is investigative and not a fixable issue. 

if we can demonstrate performance problems around large directories then we should re-evaluate the performance problem itself and the solution at that point may be to use partial directory listings or something similar.