Description of problem: Hadoop benchmark, TestDFSIO, task fails with "java.io.IOException: cannot create parent directory: /mnt/gluster-hdfs0/benchmarks/TestDFSIO/io_data". Subsequently, another task creates the directory and the hadoop job finishes successfully. Version-Release number of selected component (if applicable): RHS 2.1 + RHEL 6.3 + Apache Hadoop 1.0.4 + Openjdk 1.6.0 How reproducible: intermittent, on two nodes with one jobtracker and one tasktracker Steps to Reproduce: 1. Install Apache Hadoop distribution on RHS 2.1/RHEL 6.3 2. cd $HADOOP_INSTALL 3. bin/hadoop jar hadoop-test-1.0.4.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 Actual results: java.io.IOException: cannot create parent directory: /mnt/gluster-hdfs0/benchmarks/TestDFSIO/io_data at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.create(GlusterFileSystem.java:319) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:498) at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:212) at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:198) at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:118) at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:36) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Expected results: # bin/hadoop jar hadoop-test-1.0.4.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 TestDFSIO.0.0.4 13/02/27 17:43:12 INFO fs.TestDFSIO: nrFiles = 10 13/02/27 17:43:12 INFO fs.TestDFSIO: fileSize (MB) = 1000 13/02/27 17:43:12 INFO fs.TestDFSIO: bufferSize = 1000000 Initializing GlusterFS 13/02/27 17:43:13 INFO fs.TestDFSIO: creating control file: 1000 mega bytes, 10 files 13/02/27 17:43:13 INFO fs.TestDFSIO: created control files for: 10 files 13/02/27 17:43:13 INFO mapred.FileInputFormat: Total input paths to process : 10 13/02/27 17:43:13 INFO mapred.JobClient: Running job: job_201302271357_0008 13/02/27 17:43:14 INFO mapred.JobClient: map 0% reduce 0% 13/02/27 17:43:33 INFO mapred.JobClient: map 20% reduce 0% 13/02/27 17:43:42 INFO mapred.JobClient: map 20% reduce 6% 13/02/27 17:43:51 INFO mapred.JobClient: map 40% reduce 6% 13/02/27 17:43:57 INFO mapred.JobClient: map 40% reduce 13% 13/02/27 17:45:34 INFO mapred.JobClient: map 60% reduce 13% 13/02/27 17:45:43 INFO mapred.JobClient: map 70% reduce 13% 13/02/27 17:45:46 INFO mapred.JobClient: map 70% reduce 20% 13/02/27 17:45:52 INFO mapred.JobClient: map 70% reduce 23% 13/02/27 17:46:07 INFO mapred.JobClient: map 80% reduce 23% 13/02/27 17:46:16 INFO mapred.JobClient: map 80% reduce 26% 13/02/27 17:47:36 INFO mapred.JobClient: map 90% reduce 26% 13/02/27 17:47:48 INFO mapred.JobClient: map 90% reduce 30% 13/02/27 17:48:08 INFO mapred.JobClient: map 100% reduce 30% 13/02/27 17:48:14 INFO mapred.JobClient: map 100% reduce 100% 13/02/27 17:48:19 INFO mapred.JobClient: Job complete: job_201302271357_0008 13/02/27 17:48:19 INFO mapred.JobClient: Counters: 29 13/02/27 17:48:19 INFO mapred.JobClient: Job Counters 13/02/27 17:48:19 INFO mapred.JobClient: Launched reduce tasks=1 13/02/27 17:48:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=517213 13/02/27 17:48:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/02/27 17:48:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/02/27 17:48:19 INFO mapred.JobClient: Rack-local map tasks=7 13/02/27 17:48:19 INFO mapred.JobClient: Launched map tasks=10 13/02/27 17:48:19 INFO mapred.JobClient: Data-local map tasks=3 13/02/27 17:48:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=278225 13/02/27 17:48:19 INFO mapred.JobClient: File Input Format Counters 13/02/27 17:48:19 INFO mapred.JobClient: Bytes Read=0 13/02/27 17:48:19 INFO mapred.JobClient: File Output Format Counters 13/02/27 17:48:19 INFO mapred.JobClient: Bytes Written=0 13/02/27 17:48:19 INFO mapred.JobClient: FileSystemCounters 13/02/27 17:48:19 INFO mapred.JobClient: FILE_BYTES_READ=870 13/02/27 17:48:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=245743 13/02/27 17:48:19 INFO mapred.JobClient: Map-Reduce Framework 13/02/27 17:48:19 INFO mapred.JobClient: Map output materialized bytes=924 13/02/27 17:48:19 INFO mapred.JobClient: Map input records=10 13/02/27 17:48:19 INFO mapred.JobClient: Reduce shuffle bytes=832 13/02/27 17:48:19 INFO mapred.JobClient: Spilled Records=100 13/02/27 17:48:19 INFO mapred.JobClient: Map output bytes=764 13/02/27 17:48:19 INFO mapred.JobClient: Total committed heap usage (bytes)=2049703936 13/02/27 17:48:19 INFO mapred.JobClient: CPU time spent (ms)=29410 13/02/27 17:48:19 INFO mapred.JobClient: Map input bytes=260 13/02/27 17:48:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=-5110 13/02/27 17:48:19 INFO mapred.JobClient: Combine input records=0 13/02/27 17:48:19 INFO mapred.JobClient: Reduce input records=50 13/02/27 17:48:19 INFO mapred.JobClient: Reduce input groups=5 13/02/27 17:48:19 INFO mapred.JobClient: Combine output records=0 13/02/27 17:48:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=2332360704 13/02/27 17:48:19 INFO mapred.JobClient: Reduce output records=5 13/02/27 17:48:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=9295192064 13/02/27 17:48:19 INFO mapred.JobClient: Map output records=50 13/02/27 17:48:20 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 13/02/27 17:48:20 INFO fs.TestDFSIO: Date & time: Wed Feb 27 17:48:20 EST 2013 13/02/27 17:48:20 INFO fs.TestDFSIO: Number of files: 10 13/02/27 17:48:20 INFO fs.TestDFSIO: Total MBytes processed: 10000 13/02/27 17:48:20 INFO fs.TestDFSIO: Throughput mb/sec: 24.545659836427724 13/02/27 17:48:20 INFO fs.TestDFSIO: Average IO rate mb/sec: 83.52056121826172 13/02/27 17:48:20 INFO fs.TestDFSIO: IO rate std deviation: 81.47757452613324 13/02/27 17:48:20 INFO fs.TestDFSIO: Test exec time sec: 306.704 13/02/27 17:48:20 INFO fs.TestDFSIO: Additional info:
Patch - https://github.com/gluster/hadoop-glusterfs/pull/9
merged, closing.
Diane Feddema has confirmed she can still reproduce this issue with the current path. The initial error is actually a symptom of a broader root cause that the mkdir call on the glusterfs-hadoop component needs to have idempotent semantics equivalent to a mkdir -p bash command. Moving this bug back to Assigned so we can close out the root cause.
https://github.com/gluster/hadoop-glusterfs/pull/14