927295 – GlusterFileSystem.listStatus() fetching optimization

Bug 927295 - GlusterFileSystem.listStatus() fetching optimization

Summary: GlusterFileSystem.listStatus() fetching optimization

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhs-hadoop
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Bradley Childs
QA Contact:	hcfs-gluster-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	854157
Blocks:	HCFS
TreeView+	depends on / blocked

Reported:	2013-03-25 15:18 UTC by Steve Watt
Modified:	2013-08-22 22:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-08-22 22:18:50 UTC
Embargoed:
Dependent Products:
Flags:	bchilds: needinfo-

Attachments	(Terms of Use)

Description Steve Watt 2013-03-25 15:18:03 UTC

Description of problem:

The DistributedFileSystem.listStatus() method first analyzes the size of the directory and then manages the retrieval of the directory objects according to the amount of objects in the directory. Conversely, the same method in GlusterFileSystem retrieves all the directory objects in a single operation. This might not be a bug, but we should understand why the approach differs within HDFS and whether we might want to implement the same behavior within GlusterFileSystem as well.

Comment 2 Bradley Childs 2013-03-26 21:05:42 UTC

Could you provide a link to the DistributedFileSystem you're referencing? 

Also, what's the bug or alternative behavior you're proposing?

Comment 3 Steve Watt 2013-03-27 14:34:48 UTC

I am referencing org.apache.hadoop.hdfs.DistributedFileSystem.listStatus() in Apache Hadoop 1.0.4. I am recommending that we understand why the implementation of this method differs within the DistributedFileSystem so that we can understand whether we need to incorporate the same approach for the GlusterFileSystem implementation of this method.  I am not explicitly suggesting an alternative but we should at least understand why there is a difference. HDFS has been used at an extreme scale and there might be a non-obvious, but important reason for this difference that we should carefully consider.

Comment 4 Bradley Childs 2013-04-29 15:59:17 UTC

I did some research, and queried the gluster mailing list if a partial directory listing through FUSE layer is possible.

Anand Avati respond that there is partial directory listing for GLUSTER but its only available through the C layer.  Java's FILE mechanism doesn't tie into the right gluster calls for a partial C listing.

If this was essential, we could write a sep C- command to do the partial listing for now, then if we move to a pure native client list this as a feature.

Comment 5 Bradley Childs 2013-05-02 16:59:24 UTC

I queried the o.a.h hdfs mailing list around this area, and i'm posting my question and the lists response below. 

TL;DR: partial directory listing so not to overwhelm hdfs' namenode nor introduce long hdfs locks & delays to other clients.


My post:

Could someone explain why the DistributedFileSystem's listStatus() method does a piecemeal assembly of a directory listing within the method?

Is there a locking issue? What if an element is added to the the directory during the operation?  What if elements are removed?

It would make sense to me that the FileSystem class listStatus() method returned an Iterator allowing only partial fetching/chatter as needed.  But I dont understand why you'd want to assemble a giant array of the listing chunk by chunk.

Response:
(Tod Lipcon)-
The reasoning is that the NameNode locking is somewhat coarse grained. In
older versions of Hadoop, before it worked this way, we found that listing
large directories (eg with 100k+ files) could end up holding the namenode's
lock for a quite long period of time and starve other clients.

Additionally, I believe there is a second API that does the "on-demand"
fetching of the next set of files from the listing as well, no?

As for the consistency argument, you're correct that you may have a
non-atomic view of the directory contents, but I can't think of any
applications where this would be problematic.

(Suresh Srinivas)-
Additional reason, HDFS does not have limit on number of files in a
directory. Some
clusters had millions of files in a single directory. Listing such a
directory
resulted in very large responses, requiring large contiguous memory
allocation in JVM
(for the array) and unpredictable GC failures.

Comment 6 Bradley Childs 2013-08-22 22:18:50 UTC

consider as 'fixed' since this bug is investigative and not a fixable issue. 

if we can demonstrate performance problems around large directories then we should re-evaluate the performance problem itself and the solution at that point may be to use partial directory listings or something similar.

Note You need to log in before you can comment on or make changes to this bug.