Bug 927418
Summary: | GlusterFileSystem.open(path, buffer size) ignores buffer size | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Steve Watt <swatt> |
Component: | gluster-hadoop | Assignee: | Bradley Childs <bchilds> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | hcfs-gluster-bugs |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | mainline | CC: | aavati, bchilds, bugs, eboyd, jvyas, matt, mkudlej, rhs-bugs, vbellur |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-21 09:07:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 927415, 959778, 959779, 961540 | ||
Bug Blocks: | 1057253 |
Description
Steve Watt
2013-03-25 21:04:18 UTC
Could you provide a link to the code you're referencing? And are you suggesting we do need the behavior, or dont? Could you provide a link to the code you're referencing? And are you suggesting we do need the behavior, or dont? I am referencing org.apache.hadoop.hdfs.DistributedFileSystem.open(path, buffer size) in Apache Hadoop 1.0.4. I am recommending that we understand why buffer size is used in the DistributedFileSystem implementation of this method so that we can understand whether we need it for the GlusterFileSystem implementation of this method. Currently, the buffersize parameter is not being used at all in the GlusterFileSystem implementation. The architecture of HDFS/DistributedFileSystem is different then ours, so their buffering impl doesn't translate directly for us. ie. inspecting their code doesn't really tell us the right thing to do here. our open(..) path returns an input stream (read only). this input stream sits on top of the raw file unix file system, either reading the file from FUSE, or from the now deprecated direct brick (quick slave i/o). the write path for FUSE has more locking and performance issues with many writes (the fs must make itself consistent after every write). the read path is less impactful, so many small reads aren't necessarily harmful. so it's debatable if java should handle the buffering or FUSE. that said, SOME java buffering is always desirable. my suggestion on this bug is to plan and schedule this change on our performance cluster to see its impact. this will be blocked with the stale quick.slave.io code that needs to be removed (added to the block list). *** Bug 961540 has been marked as a duplicate of this bug. *** (from duplicate bug) We're currently not buffering reads, it appears. TESTDFSIO results indicate slow downs for large read intensive IO . https://github.com/gluster/hadoop-glusterfs/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFUSEInputStream.java We're opening a raw FileInputStream. We can buffer the reads and improve performance at the java layer. new BufferedInputStream(...FileInputStream...) Buffering is done in the (Raw)LocalFileSystem. Fixed with redesign in this checkin: https://github.com/gluster/hadoop-glusterfs/commit/ce6325313dcb0df6cc73379248c1e07a9aa0b025 So, we shouldhave a test for this. Is it possible to unit test READ buffering? I guess one way might be to open a file, force delete it, and see wether , after forced deletion, reads of "x" bytes succeed in the opened stream. |