927006 – GlusterFS Hadoop Plugin does not track read and write operations for org.apache.hadoop.fs.Statistics

Bug 927006 - GlusterFS Hadoop Plugin does not track read and write operations for org.apache.hadoop.fs.Statistics

Summary: GlusterFS Hadoop Plugin does not track read and write operations for org.apac...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	gluster-hadoop
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Bradley Childs
QA Contact:	hcfs-gluster-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-24 18:31 UTC by Steve Watt
Modified:	2023-09-14 01:42 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-22 15:46:38 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Steve Watt 2013-03-24 18:31:38 UTC

Description of problem: 

The Apache Hadoop Distributed File System (org.apache.hadoop.hdfs.DistributedFileSystem) tracks all read and write operations to the Distributed File System using org.apache.hadoop.fs.Statistics.incrementReadOps and incrementWriteOps . The Gluster Plugin should offer the same features as this is expected behavior for a Hadoop Compatible File System. Currently the Gluster Plugin does not track these operations in the plugin, it needs to be extended to similarly track these operations.

Comment 2 Jay Vyas 2013-03-25 21:25:18 UTC

For detail: here is an initial list of the methods that implement incrementRead/Write ops in the org.apache.hadoop's DistributedFileSystem class, (each one below which we implement in GlusterFS will require the said modification),

  public BlockLocation[] getFileBlockLocations(Path p, 
  public FSDataInputStream open(Path f, int bufferSize) throws IOException {
  public FSDataOutputStream append(Path f, int bufferSize,
  public FSDataOutputStream create(Path f, FsPermission permission,
  public FSDataOutputStream createNonRecursive(Path f, FsPermission permission,
  public boolean setReplication(Path src, 
  public boolean rename(Path src, Path dst) throws IOException {
  public void rename(Path src, Path dst, Options.Rename... options) throws IOException {
  public boolean delete(Path f, boolean recursive) throws IOException {
  public ContentSummary getContentSummary(Path f) throws IOException {
  public boolean mkdir(Path f, FsPermission permission) throws IOException {
  public boolean mkdirs(Path f, FsPermission permission) throws IOException {
  public FsStatus getStatus(Path p) throws IOException {
  public FileStatus getFileStatus(Path f) throws IOException {
  public MD5MD5CRC32FileChecksum getFileChecksum(Path f) throws IOException {
  public void setPermission(Path p, FsPermission permission
  public void setTimes(Path p, long mtime, long atime

For example:
 
  @Override
  public FileStatus getFileStatus(Path f) throws IOException {
    statistics.incrementReadOps(1);
    HdfsFileStatus fi = dfs.getFileInfo(getPathName(f));
    ... 
  }

Comment 3 Bradley Childs 2013-08-22 22:09:36 UTC

Fixed with redesign in this checkin:

https://github.com/gluster/hadoop-glusterfs/commit/ce6325313dcb0df6cc73379248c1e07a9aa0b025

Comment 4 Martin Kudlej 2014-01-23 15:00:38 UTC

I've tried to test this and found that read/write operations are 0.
	FILE: Number of bytes read=226
	FILE: Number of bytes written=921439
	FILE: Number of read operations=0
	FILE: Number of large read operations=0
	FILE: Number of write operations=0
	GLUSTERFS: Number of bytes read=8605
	GLUSTERFS: Number of bytes written=215
	GLUSTERFS: Number of read operations=0
	GLUSTERFS: Number of large read operations=0
	GLUSTERFS: Number of write operations=0
same example with hdfs:
		FILE: Number of bytes read=226
		FILE: Number of bytes written=913772
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2870
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=43
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3

I've used 
rhs-hadoop-2.1.5-1.noarch
hadoop-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-client-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-yarn-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-mapreduce-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-libhdfs-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-lzo-0.5.0-1.x86_64
hadoop-lzo-native-0.5.0-1.x86_64
hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64

--->ASSIGNED

Log from example run:
 hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=51e5108, git.commit.user.email=bchilds, git.commit.message.full=2.1.5 branch/build
, git.commit.id=51e5108fbec0b50d921aeb00ba2489bbdbe3d6ff, git.commit.message.short=2.1.5 branch/build, git.commit.user.name=childsb, git.build.user.name=Unknown, git.commit.id.describe=2.1.4-21-g51e5108, git.build.user.email=Unknown, git.branch=master, git.commit.time=17.01.2014 @ 16:05:54 EST, git.build.time=21.01.2014 @ 02:19:28 EST}
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.4
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/glusterfs
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : null
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/root
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Write buffer size : 131072
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/01/23 13:25:50 INFO client.RMProxy: Connecting to ResourceManager
at _machine_:8050
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/glusterfs
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : null
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/root
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/01/23 13:25:56 INFO input.FileInputFormat: Total input paths to process : 10
14/01/23 13:25:56 INFO mapreduce.JobSubmitter: number of splits:10
14/01/23 13:25:56 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/01/23 13:25:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1390474574687_0001
14/01/23 13:25:58 INFO impl.YarnClientImpl: Submitted application
application_1390474574687_0001 to ResourceManager at _machine_:8050
14/01/23 13:25:58 INFO mapreduce.Job: The url to track the job:
_machine_:8088/proxy/application_1390474574687_0001/
14/01/23 13:25:58 INFO mapreduce.Job: Running job: job_1390474574687_0001
14/01/23 13:26:17 INFO mapreduce.Job: Job job_1390474574687_0001 running in uber mode : false
14/01/23 13:26:17 INFO mapreduce.Job:  map 0% reduce 0%
14/01/23 13:28:06 INFO mapreduce.Job:  map 50% reduce 0%
14/01/23 13:29:07 INFO mapreduce.Job:  map 100% reduce 0%
14/01/23 13:29:21 INFO mapreduce.Job:  map 100% reduce 100%
14/01/23 13:29:25 INFO mapreduce.Job: Job job_1390474574687_0001 completed successfully
14/01/23 13:29:27 INFO mapreduce.Job: Counters: 43
	File System Counters
		FILE: Number of bytes read=226
		FILE: Number of bytes written=921439
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		GLUSTERFS: Number of bytes read=8605
		GLUSTERFS: Number of bytes written=215
		GLUSTERFS: Number of read operations=0
		GLUSTERFS: Number of large read operations=0
		GLUSTERFS: Number of write operations=0
	Job Counters 
		Launched map tasks=10
		Launched reduce tasks=1
		Data-local map tasks=10
		Total time spent by all maps in occupied slots (ms)=2480649
		Total time spent by all reduces in occupied slots (ms)=51608
	Map-Reduce Framework
		Map input records=10
		Map output records=20
		Map output bytes=180
		Map output materialized bytes=280
		Input split bytes=1350
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=280
		Reduce input records=20
		Reduce output records=0
		Spilled Records=40
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=25269
		CPU time spent (ms)=35510
		Physical memory (bytes) snapshot=2833489920
		Virtual memory (bytes) snapshot=12698566656
		Total committed heap usage (bytes)=2426511360
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1180
	File Output Format Counters 
		Bytes Written=97
Job Finished in 217.403 seconds
Estimated value of Pi is 3.20000000000000000000

Comment 5 Tomas Meszaros 2014-06-02 12:30:53 UTC

According to the hadoop documentation, "large read operations" (in HDFS) counter should be incremented every time when listing files under a large directory. Does this counter work the same way in glusterfs-hadoop as it is working in the HDFS? I've tried several mapreduce jobs on directories containing thousands of files but still got 0 large read operations on the counter. We need to know how this counter works in glusterfs-hadoop (so We can properly test it).

Comment 6 Kaleb KEITHLEY 2015-10-22 15:46:38 UTC

because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Comment 7 Red Hat Bugzilla 2023-09-14 01:42:41 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.