Bug 927418

Summary: GlusterFileSystem.open(path, buffer size) ignores buffer size
Product: [Community] GlusterFS Reporter: Steve Watt <swatt>
Component: gluster-hadoopAssignee: Bradley Childs <bchilds>
Status: CLOSED CURRENTRELEASE QA Contact: hcfs-gluster-bugs
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: aavati, bchilds, bugs, eboyd, jvyas, matt, mkudlej, rhs-bugs, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-21 09:07:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 927415, 959778, 959779, 961540    
Bug Blocks: 1057253    

Description Steve Watt 2013-03-25 21:04:18 UTC
Description of problem:

The DistributedFileSystem implementation does not ignore the buffer size. We need to ascertain whether we should implement similar behavior.

Comment 2 Bradley Childs 2013-03-26 21:02:02 UTC
Could you provide a link to the code you're referencing? 

And are you suggesting we do need the behavior, or dont?

Comment 3 Bradley Childs 2013-03-26 21:02:53 UTC
Could you provide a link to the code you're referencing? 

And are you suggesting we do need the behavior, or dont?

Comment 4 Steve Watt 2013-03-27 14:25:32 UTC
I am referencing org.apache.hadoop.hdfs.DistributedFileSystem.open(path, buffer size) in Apache Hadoop 1.0.4. I am recommending that we understand why buffer size is used in the DistributedFileSystem implementation of this method so that we can understand whether we need it for the GlusterFileSystem implementation of this method. Currently, the buffersize parameter is not being used at all in the GlusterFileSystem implementation.

Comment 5 Bradley Childs 2013-04-26 19:46:38 UTC
The architecture of HDFS/DistributedFileSystem is different then ours, so their buffering impl doesn't translate directly for us.  ie. inspecting their code doesn't really tell us the right thing to do here. 

our open(..) path returns an input stream (read only).  this input stream sits on top of the raw file unix file system, either reading the file from FUSE, or from the now deprecated direct brick (quick slave i/o).

the write path for FUSE has more locking and performance issues with many writes (the fs must make itself consistent after every write).  the read path is less impactful, so many small reads aren't necessarily harmful.  so it's debatable if java should handle the buffering or FUSE.

that said, SOME java buffering is always desirable. my suggestion on this bug is to plan and schedule this change on our performance cluster to see its impact. this will be blocked with the stale quick.slave.io code that needs to be removed (added to the block list).

Comment 6 Jay Vyas 2013-05-09 21:35:17 UTC
*** Bug 961540 has been marked as a duplicate of this bug. ***

Comment 7 Jay Vyas 2013-05-09 21:36:00 UTC
(from duplicate bug)

We're currently not buffering reads, it appears.  TESTDFSIO results indicate slow downs for large read intensive IO .

https://github.com/gluster/hadoop-glusterfs/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEInputStream.java

We're opening a raw FileInputStream.  We can buffer the reads and improve performance at the java layer.

new BufferedInputStream(...FileInputStream...)

Comment 8 Bradley Childs 2013-08-22 22:02:28 UTC
Buffering is done in the (Raw)LocalFileSystem.  Fixed with redesign in this checkin:

https://github.com/gluster/hadoop-glusterfs/commit/ce6325313dcb0df6cc73379248c1e07a9aa0b025

Comment 9 Jay Vyas 2013-11-20 13:24:27 UTC
So, we shouldhave a test for this. Is it possible to unit test READ buffering?  
I guess one way might be to open a file, force delete it, and see wether , 
after forced deletion, reads of "x" bytes succeed in the opened stream.