Bug 927418 - GlusterFileSystem.open(path, buffer size) ignores buffer size
Summary: GlusterFileSystem.open(path, buffer size) ignores buffer size
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Bradley Childs
QA Contact: hcfs-gluster-bugs
URL:
Whiteboard:
: 961540 (view as bug list)
Depends On: 927415 959778 959779 961540
Blocks: 1057253
TreeView+ depends on / blocked
 
Reported: 2013-03-25 21:04 UTC by Steve Watt
Modified: 2015-08-21 09:07 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-08-21 09:07:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Steve Watt 2013-03-25 21:04:18 UTC
Description of problem:

The DistributedFileSystem implementation does not ignore the buffer size. We need to ascertain whether we should implement similar behavior.

Comment 2 Bradley Childs 2013-03-26 21:02:02 UTC
Could you provide a link to the code you're referencing? 

And are you suggesting we do need the behavior, or dont?

Comment 3 Bradley Childs 2013-03-26 21:02:53 UTC
Could you provide a link to the code you're referencing? 

And are you suggesting we do need the behavior, or dont?

Comment 4 Steve Watt 2013-03-27 14:25:32 UTC
I am referencing org.apache.hadoop.hdfs.DistributedFileSystem.open(path, buffer size) in Apache Hadoop 1.0.4. I am recommending that we understand why buffer size is used in the DistributedFileSystem implementation of this method so that we can understand whether we need it for the GlusterFileSystem implementation of this method. Currently, the buffersize parameter is not being used at all in the GlusterFileSystem implementation.

Comment 5 Bradley Childs 2013-04-26 19:46:38 UTC
The architecture of HDFS/DistributedFileSystem is different then ours, so their buffering impl doesn't translate directly for us.  ie. inspecting their code doesn't really tell us the right thing to do here. 

our open(..) path returns an input stream (read only).  this input stream sits on top of the raw file unix file system, either reading the file from FUSE, or from the now deprecated direct brick (quick slave i/o).

the write path for FUSE has more locking and performance issues with many writes (the fs must make itself consistent after every write).  the read path is less impactful, so many small reads aren't necessarily harmful.  so it's debatable if java should handle the buffering or FUSE.

that said, SOME java buffering is always desirable. my suggestion on this bug is to plan and schedule this change on our performance cluster to see its impact. this will be blocked with the stale quick.slave.io code that needs to be removed (added to the block list).

Comment 6 Jay Vyas 2013-05-09 21:35:17 UTC
*** Bug 961540 has been marked as a duplicate of this bug. ***

Comment 7 Jay Vyas 2013-05-09 21:36:00 UTC
(from duplicate bug)

We're currently not buffering reads, it appears.  TESTDFSIO results indicate slow downs for large read intensive IO .

https://github.com/gluster/hadoop-glusterfs/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEInputStream.java

We're opening a raw FileInputStream.  We can buffer the reads and improve performance at the java layer.

new BufferedInputStream(...FileInputStream...)

Comment 8 Bradley Childs 2013-08-22 22:02:28 UTC
Buffering is done in the (Raw)LocalFileSystem.  Fixed with redesign in this checkin:

https://github.com/gluster/hadoop-glusterfs/commit/ce6325313dcb0df6cc73379248c1e07a9aa0b025

Comment 9 Jay Vyas 2013-11-20 13:24:27 UTC
So, we shouldhave a test for this. Is it possible to unit test READ buffering?  
I guess one way might be to open a file, force delete it, and see wether , 
after forced deletion, reads of "x" bytes succeed in the opened stream.


Note You need to log in before you can comment on or make changes to this bug.