Bug 916371 - Hadoop benchmark TestDFSIO task fails with "java.io.IOException: cannot create parent directory: /mnt/gluster-hdfs0/benchmarks/TestDFSIO/io_data"
Summary: Hadoop benchmark TestDFSIO task fails with "java.io.IOException: cannot creat...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Bradley Childs
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks: 1057253
TreeView+ depends on / blocked
 
Reported: 2013-02-27 22:58 UTC by Diane Feddema
Modified: 2014-03-03 16:32 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-03 16:32:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Diane Feddema 2013-02-27 22:58:09 UTC
Description of problem:
Hadoop benchmark, TestDFSIO, task fails with "java.io.IOException: cannot create parent directory: /mnt/gluster-hdfs0/benchmarks/TestDFSIO/io_data".
Subsequently, another task creates the directory and the hadoop job finishes successfully. 

Version-Release number of selected component (if applicable):
RHS 2.1 + RHEL 6.3 + Apache Hadoop 1.0.4 + Openjdk 1.6.0

How reproducible: intermittent, on two nodes with one jobtracker and one tasktracker


Steps to Reproduce:
1. Install Apache Hadoop distribution on RHS 2.1/RHEL 6.3
2. cd $HADOOP_INSTALL  
3. bin/hadoop jar hadoop-test-1.0.4.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
  
Actual results:
java.io.IOException: cannot create parent directory: /mnt/gluster-hdfs0/benchmarks/TestDFSIO/io_data
        at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.create(GlusterFileSystem.java:319)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:498)
        at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:212)
        at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:198)
        at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:118)
        at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:36)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)


Expected results:
# bin/hadoop jar hadoop-test-1.0.4.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
TestDFSIO.0.0.4
13/02/27 17:43:12 INFO fs.TestDFSIO: nrFiles = 10
13/02/27 17:43:12 INFO fs.TestDFSIO: fileSize (MB) = 1000
13/02/27 17:43:12 INFO fs.TestDFSIO: bufferSize = 1000000
Initializing GlusterFS
13/02/27 17:43:13 INFO fs.TestDFSIO: creating control file: 1000 mega bytes, 10 files
13/02/27 17:43:13 INFO fs.TestDFSIO: created control files for: 10 files
13/02/27 17:43:13 INFO mapred.FileInputFormat: Total input paths to process : 10
13/02/27 17:43:13 INFO mapred.JobClient: Running job: job_201302271357_0008
13/02/27 17:43:14 INFO mapred.JobClient:  map 0% reduce 0%
13/02/27 17:43:33 INFO mapred.JobClient:  map 20% reduce 0%
13/02/27 17:43:42 INFO mapred.JobClient:  map 20% reduce 6%
13/02/27 17:43:51 INFO mapred.JobClient:  map 40% reduce 6%
13/02/27 17:43:57 INFO mapred.JobClient:  map 40% reduce 13%
13/02/27 17:45:34 INFO mapred.JobClient:  map 60% reduce 13%
13/02/27 17:45:43 INFO mapred.JobClient:  map 70% reduce 13%
13/02/27 17:45:46 INFO mapred.JobClient:  map 70% reduce 20%
13/02/27 17:45:52 INFO mapred.JobClient:  map 70% reduce 23%
13/02/27 17:46:07 INFO mapred.JobClient:  map 80% reduce 23%
13/02/27 17:46:16 INFO mapred.JobClient:  map 80% reduce 26%
13/02/27 17:47:36 INFO mapred.JobClient:  map 90% reduce 26%
13/02/27 17:47:48 INFO mapred.JobClient:  map 90% reduce 30%
13/02/27 17:48:08 INFO mapred.JobClient:  map 100% reduce 30%
13/02/27 17:48:14 INFO mapred.JobClient:  map 100% reduce 100%
13/02/27 17:48:19 INFO mapred.JobClient: Job complete: job_201302271357_0008
13/02/27 17:48:19 INFO mapred.JobClient: Counters: 29
13/02/27 17:48:19 INFO mapred.JobClient:   Job Counters 
13/02/27 17:48:19 INFO mapred.JobClient:     Launched reduce tasks=1
13/02/27 17:48:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=517213
13/02/27 17:48:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/02/27 17:48:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/02/27 17:48:19 INFO mapred.JobClient:     Rack-local map tasks=7
13/02/27 17:48:19 INFO mapred.JobClient:     Launched map tasks=10
13/02/27 17:48:19 INFO mapred.JobClient:     Data-local map tasks=3
13/02/27 17:48:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=278225
13/02/27 17:48:19 INFO mapred.JobClient:   File Input Format Counters 
13/02/27 17:48:19 INFO mapred.JobClient:     Bytes Read=0
13/02/27 17:48:19 INFO mapred.JobClient:   File Output Format Counters 
13/02/27 17:48:19 INFO mapred.JobClient:     Bytes Written=0
13/02/27 17:48:19 INFO mapred.JobClient:   FileSystemCounters
13/02/27 17:48:19 INFO mapred.JobClient:     FILE_BYTES_READ=870
13/02/27 17:48:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=245743
13/02/27 17:48:19 INFO mapred.JobClient:   Map-Reduce Framework
13/02/27 17:48:19 INFO mapred.JobClient:     Map output materialized bytes=924
13/02/27 17:48:19 INFO mapred.JobClient:     Map input records=10
13/02/27 17:48:19 INFO mapred.JobClient:     Reduce shuffle bytes=832
13/02/27 17:48:19 INFO mapred.JobClient:     Spilled Records=100
13/02/27 17:48:19 INFO mapred.JobClient:     Map output bytes=764
13/02/27 17:48:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=2049703936
13/02/27 17:48:19 INFO mapred.JobClient:     CPU time spent (ms)=29410
13/02/27 17:48:19 INFO mapred.JobClient:     Map input bytes=260
13/02/27 17:48:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=-5110
13/02/27 17:48:19 INFO mapred.JobClient:     Combine input records=0
13/02/27 17:48:19 INFO mapred.JobClient:     Reduce input records=50
13/02/27 17:48:19 INFO mapred.JobClient:     Reduce input groups=5
13/02/27 17:48:19 INFO mapred.JobClient:     Combine output records=0
13/02/27 17:48:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2332360704
13/02/27 17:48:19 INFO mapred.JobClient:     Reduce output records=5
13/02/27 17:48:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=9295192064
13/02/27 17:48:19 INFO mapred.JobClient:     Map output records=50
13/02/27 17:48:20 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
13/02/27 17:48:20 INFO fs.TestDFSIO:            Date & time: Wed Feb 27 17:48:20 EST 2013
13/02/27 17:48:20 INFO fs.TestDFSIO:        Number of files: 10
13/02/27 17:48:20 INFO fs.TestDFSIO: Total MBytes processed: 10000
13/02/27 17:48:20 INFO fs.TestDFSIO:      Throughput mb/sec: 24.545659836427724
13/02/27 17:48:20 INFO fs.TestDFSIO: Average IO rate mb/sec: 83.52056121826172
13/02/27 17:48:20 INFO fs.TestDFSIO:  IO rate std deviation: 81.47757452613324
13/02/27 17:48:20 INFO fs.TestDFSIO:     Test exec time sec: 306.704
13/02/27 17:48:20 INFO fs.TestDFSIO: 


Additional info:

Comment 2 Matthew Farrellee 2013-03-01 00:53:46 UTC
Patch - https://github.com/gluster/hadoop-glusterfs/pull/9

Comment 3 Bradley Childs 2013-03-04 19:31:28 UTC
merged, closing.

Comment 5 Steve Watt 2013-03-14 14:53:15 UTC
Diane Feddema has confirmed she can still reproduce this issue with the current path. The initial error is actually a symptom of a broader root cause that the mkdir call on the glusterfs-hadoop component needs to have idempotent semantics equivalent to a mkdir -p bash command. Moving this bug back to Assigned so we can close out the root cause.

Comment 6 Bradley Childs 2013-03-14 14:57:16 UTC
https://github.com/gluster/hadoop-glusterfs/pull/14


Note You need to log in before you can comment on or make changes to this bug.