Bug 927006
| Summary: | GlusterFS Hadoop Plugin does not track read and write operations for org.apache.hadoop.fs.Statistics | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Steve Watt <swatt> |
| Component: | gluster-hadoop | Assignee: | Bradley Childs <bchilds> |
| Status: | CLOSED EOL | QA Contact: | hcfs-gluster-bugs |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | mainline | CC: | bchilds, bugs, chrisw, eboyd, esammons, matt, mkudlej, rhs-bugs, vbellur |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-10-22 15:46:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Steve Watt
2013-03-24 18:31:38 UTC
For detail: here is an initial list of the methods that implement incrementRead/Write ops in the org.apache.hadoop's DistributedFileSystem class, (each one below which we implement in GlusterFS will require the said modification),
public BlockLocation[] getFileBlockLocations(Path p,
public FSDataInputStream open(Path f, int bufferSize) throws IOException {
public FSDataOutputStream append(Path f, int bufferSize,
public FSDataOutputStream create(Path f, FsPermission permission,
public FSDataOutputStream createNonRecursive(Path f, FsPermission permission,
public boolean setReplication(Path src,
public boolean rename(Path src, Path dst) throws IOException {
public void rename(Path src, Path dst, Options.Rename... options) throws IOException {
public boolean delete(Path f, boolean recursive) throws IOException {
public ContentSummary getContentSummary(Path f) throws IOException {
public boolean mkdir(Path f, FsPermission permission) throws IOException {
public boolean mkdirs(Path f, FsPermission permission) throws IOException {
public FsStatus getStatus(Path p) throws IOException {
public FileStatus getFileStatus(Path f) throws IOException {
public MD5MD5CRC32FileChecksum getFileChecksum(Path f) throws IOException {
public void setPermission(Path p, FsPermission permission
public void setTimes(Path p, long mtime, long atime
For example:
@Override
public FileStatus getFileStatus(Path f) throws IOException {
statistics.incrementReadOps(1);
HdfsFileStatus fi = dfs.getFileInfo(getPathName(f));
...
}
Fixed with redesign in this checkin: https://github.com/gluster/hadoop-glusterfs/commit/ce6325313dcb0df6cc73379248c1e07a9aa0b025 I've tried to test this and found that read/write operations are 0.
FILE: Number of bytes read=226
FILE: Number of bytes written=921439
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GLUSTERFS: Number of bytes read=8605
GLUSTERFS: Number of bytes written=215
GLUSTERFS: Number of read operations=0
GLUSTERFS: Number of large read operations=0
GLUSTERFS: Number of write operations=0
same example with hdfs:
FILE: Number of bytes read=226
FILE: Number of bytes written=913772
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2870
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
I've used
rhs-hadoop-2.1.5-1.noarch
hadoop-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-client-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-yarn-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-mapreduce-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-libhdfs-2.2.0.2.0.6.0-76.el6.x86_64
hadoop-lzo-0.5.0-1.x86_64
hadoop-lzo-native-0.5.0-1.x86_64
hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64
glusterfs-3.4.0.44rhs-1.el6rhs.x86_64
--->ASSIGNED
Log from example run:
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-*.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=51e5108, git.commit.user.email=bchilds, git.commit.message.full=2.1.5 branch/build
, git.commit.id=51e5108fbec0b50d921aeb00ba2489bbdbe3d6ff, git.commit.message.short=2.1.5 branch/build, git.commit.user.name=childsb, git.build.user.name=Unknown, git.commit.id.describe=2.1.4-21-g51e5108, git.build.user.email=Unknown, git.branch=master, git.commit.time=17.01.2014 @ 16:05:54 EST, git.build.time=21.01.2014 @ 02:19:28 EST}
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.4
14/01/23 13:25:46 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/glusterfs
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : null
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/root
14/01/23 13:25:46 INFO glusterfs.GlusterVolume: Write buffer size : 131072
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/01/23 13:25:50 INFO client.RMProxy: Connecting to ResourceManager
at _machine_:8050
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/glusterfs
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : null
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/root
14/01/23 13:25:50 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/01/23 13:25:56 INFO input.FileInputFormat: Total input paths to process : 10
14/01/23 13:25:56 INFO mapreduce.JobSubmitter: number of splits:10
14/01/23 13:25:56 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/23 13:25:56 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/01/23 13:25:56 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/01/23 13:25:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1390474574687_0001
14/01/23 13:25:58 INFO impl.YarnClientImpl: Submitted application
application_1390474574687_0001 to ResourceManager at _machine_:8050
14/01/23 13:25:58 INFO mapreduce.Job: The url to track the job:
_machine_:8088/proxy/application_1390474574687_0001/
14/01/23 13:25:58 INFO mapreduce.Job: Running job: job_1390474574687_0001
14/01/23 13:26:17 INFO mapreduce.Job: Job job_1390474574687_0001 running in uber mode : false
14/01/23 13:26:17 INFO mapreduce.Job: map 0% reduce 0%
14/01/23 13:28:06 INFO mapreduce.Job: map 50% reduce 0%
14/01/23 13:29:07 INFO mapreduce.Job: map 100% reduce 0%
14/01/23 13:29:21 INFO mapreduce.Job: map 100% reduce 100%
14/01/23 13:29:25 INFO mapreduce.Job: Job job_1390474574687_0001 completed successfully
14/01/23 13:29:27 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=921439
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
GLUSTERFS: Number of bytes read=8605
GLUSTERFS: Number of bytes written=215
GLUSTERFS: Number of read operations=0
GLUSTERFS: Number of large read operations=0
GLUSTERFS: Number of write operations=0
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=2480649
Total time spent by all reduces in occupied slots (ms)=51608
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1350
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=25269
CPU time spent (ms)=35510
Physical memory (bytes) snapshot=2833489920
Virtual memory (bytes) snapshot=12698566656
Total committed heap usage (bytes)=2426511360
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 217.403 seconds
Estimated value of Pi is 3.20000000000000000000
According to the hadoop documentation, "large read operations" (in HDFS) counter should be incremented every time when listing files under a large directory. Does this counter work the same way in glusterfs-hadoop as it is working in the HDFS? I've tried several mapreduce jobs on directories containing thousands of files but still got 0 large read operations on the counter. We need to know how this counter works in glusterfs-hadoop (so We can properly test it). because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |