Bug 927006

Summary: GlusterFS Hadoop Plugin does not track read and write operations for org.apache.hadoop.fs.Statistics
Product: [Community] GlusterFS Reporter: Steve Watt <swatt>
Component: gluster-hadoopAssignee: Bradley Childs <bchilds>
Status: CLOSED EOL QA Contact: hcfs-gluster-bugs
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: bchilds, bugs, chrisw, eboyd, esammons, matt, mkudlej, rhs-bugs, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:46:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steve Watt 2013-03-24 18:31:38 UTC
Description of problem: 

The Apache Hadoop Distributed File System (org.apache.hadoop.hdfs.DistributedFileSystem) tracks all read and write operations to the Distributed File System using org.apache.hadoop.fs.Statistics.incrementReadOps and incrementWriteOps . The Gluster Plugin should offer the same features as this is expected behavior for a Hadoop Compatible File System. Currently the Gluster Plugin does not track these operations in the plugin, it needs to be extended to similarly track these operations.

Comment 2 Jay Vyas 2013-03-25 21:25:18 UTC
For detail: here is an initial list of the methods that implement incrementRead/Write ops in the org.apache.hadoop's DistributedFileSystem class, (each one below which we implement in GlusterFS will require the said modification),

  public BlockLocation[] getFileBlockLocations(Path p, 
  public FSDataInputStream open(Path f, int bufferSize) throws IOException {
  public FSDataOutputStream append(Path f, int bufferSize,
  public FSDataOutputStream create(Path f, FsPermission permission,
  public FSDataOutputStream createNonRecursive(Path f, FsPermission permission,
  public boolean setReplication(Path src, 
  public boolean rename(Path src, Path dst) throws IOException {
  public void rename(Path src, Path dst, Options.Rename... options) throws IOException {
  public boolean delete(Path f, boolean recursive) throws IOException {
  public ContentSummary getContentSummary(Path f) throws IOException {
  public boolean mkdir(Path f, FsPermission permission) throws IOException {
  public boolean mkdirs(Path f, FsPermission permission) throws IOException {
  public FsStatus getStatus(Path p) throws IOException {
  public FileStatus getFileStatus(Path f) throws IOException {
  public MD5MD5CRC32FileChecksum getFileChecksum(Path f) throws IOException {
  public void setPermission(Path p, FsPermission permission
  public void setTimes(Path p, long mtime, long atime

For example:
 
  @Override
  public FileStatus getFileStatus(Path f) throws IOException {
    statistics.incrementReadOps(1);
    HdfsFileStatus fi = dfs.getFileInfo(getPathName(f));
    ... 
  }

Comment 3 Bradley Childs 2013-08-22 22:09:36 UTC
Fixed with redesign in this checkin:

https://github.com/gluster/hadoop-glusterfs/commit/ce6325313dcb0df6cc73379248c1e07a9aa0b025

Comment 4 Martin Kudlej 2014-01-23 15:00:38 UTC
I've tried to test this and found that read/write operations are 0.
	FILE: Number of bytes read=226
	FILE: Number of bytes written=921439
	FILE: Number of read operations=0
	FILE: Number of large read operations=0
	FILE: Number of write operations=0
	GLUSTERFS: Number of bytes read=8605
	GLUSTERFS: Number of bytes written=215
	GLUSTERFS: Number of read operations=0
	GLUSTERFS: Number of large read operations=0
	GLUSTERFS: Number of write operations=0
same example with hdfs:
		FILE: Number of bytes read=226
		FILE: Number of bytes written=913772
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2870
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=43
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3

I've used 
rhs-hadoop-2.1.5-1.noarch
hadoop-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-client-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-yarn-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-mapreduce-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-libhdfs-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-lzo-0.5.0-1.x86_64
hadoop-lzo-native-0.5.0-1.x86_64
hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64

--->ASSIGNED

Log from example run:
 hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=51e5108, git.commit.user.email=bchilds, git.commit.message.full=2.1.5 branch/build
, git.commit.id=51e5108fbec0b50d921aeb00ba2489bbdbe3d6ff, git.commit.message.short=2.1.5 branch/build, git.commit.user.name=childsb, git.build.user.name=Unknown, git.commit.id.describe=2.1.4-21-g51e5108, git.build.user.email=Unknown, git.branch=master, git.commit.time=17.01.2014 @ 16:05:54 EST, git.build.time=21.01.2014 @ 02:19:28 EST}
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.4
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/glusterfs
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : null
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/root
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Write buffer size : 131072
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/01/23 13:25:50 INFO client.RMProxy: Connecting to ResourceManager
at _machine_:8050
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/glusterfs
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : null
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/root
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/01/23 13:25:56 INFO input.FileInputFormat: Total input paths to process : 10
14/01/23 13:25:56 INFO mapreduce.JobSubmitter: number of splits:10
14/01/23 13:25:56 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/01/23 13:25:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1390474574687_0001
14/01/23 13:25:58 INFO impl.YarnClientImpl: Submitted application
application_1390474574687_0001 to ResourceManager at _machine_:8050
14/01/23 13:25:58 INFO mapreduce.Job: The url to track the job:
_machine_:8088/proxy/application_1390474574687_0001/
14/01/23 13:25:58 INFO mapreduce.Job: Running job: job_1390474574687_0001
14/01/23 13:26:17 INFO mapreduce.Job: Job job_1390474574687_0001 running in uber mode : false
14/01/23 13:26:17 INFO mapreduce.Job:  map 0% reduce 0%
14/01/23 13:28:06 INFO mapreduce.Job:  map 50% reduce 0%
14/01/23 13:29:07 INFO mapreduce.Job:  map 100% reduce 0%
14/01/23 13:29:21 INFO mapreduce.Job:  map 100% reduce 100%
14/01/23 13:29:25 INFO mapreduce.Job: Job job_1390474574687_0001 completed successfully
14/01/23 13:29:27 INFO mapreduce.Job: Counters: 43
	File System Counters
		FILE: Number of bytes read=226
		FILE: Number of bytes written=921439
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		GLUSTERFS: Number of bytes read=8605
		GLUSTERFS: Number of bytes written=215
		GLUSTERFS: Number of read operations=0
		GLUSTERFS: Number of large read operations=0
		GLUSTERFS: Number of write operations=0
	Job Counters 
		Launched map tasks=10
		Launched reduce tasks=1
		Data-local map tasks=10
		Total time spent by all maps in occupied slots (ms)=2480649
		Total time spent by all reduces in occupied slots (ms)=51608
	Map-Reduce Framework
		Map input records=10
		Map output records=20
		Map output bytes=180
		Map output materialized bytes=280
		Input split bytes=1350
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=280
		Reduce input records=20
		Reduce output records=0
		Spilled Records=40
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=25269
		CPU time spent (ms)=35510
		Physical memory (bytes) snapshot=2833489920
		Virtual memory (bytes) snapshot=12698566656
		Total committed heap usage (bytes)=2426511360
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1180
	File Output Format Counters 
		Bytes Written=97
Job Finished in 217.403 seconds
Estimated value of Pi is 3.20000000000000000000

Comment 5 Tomas Meszaros 2014-06-02 12:30:53 UTC
According to the hadoop documentation, "large read operations" (in HDFS) counter should be incremented every time when listing files under a large directory. Does this counter work the same way in glusterfs-hadoop as it is working in the HDFS? I've tried several mapreduce jobs on directories containing thousands of files but still got 0 large read operations on the counter. We need to know how this counter works in glusterfs-hadoop (so We can properly test it).

Comment 6 Kaleb KEITHLEY 2015-10-22 15:46:38 UTC
because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Comment 7 Red Hat Bugzilla 2023-09-14 01:42:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days