Description of problem: Hadoop benchmark TestDFSIO fails because staging directory ./tmp/hadoop-root/mapred/staging/root/.staging created by hadoop has the wrong permissions. Version-Release number of selected component (if applicable): RHEL 6.2/ RHS 2.0.4 / Apache hadoop 1.0.4 How reproducible: 100% Steps to Reproduce: 1.bin/hadoop jar hadoop-test-1.1.2.23.jar TestDFSIO -write -nrFiles 10 -fileSize 1 2. 3. Actual results: [root@gprfs001 hadoop-1.1.2.23]# bin/hadoop jar hadoop-test-1.1.2.23.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/04/12 02:08:51 INFO fs.TestDFSIO: nrFiles = 10 13/04/12 02:08:51 INFO fs.TestDFSIO: fileSize (MB) = 1 13/04/12 02:08:51 INFO fs.TestDFSIO: bufferSize = 1000000 13/04/12 02:08:51 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS 13/04/12 02:08:51 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs 13/04/12 02:08:51 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs 13/04/12 02:08:51 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/04/12 02:08:51 INFO fs.TestDFSIO: created control files for: 10 files 13/04/12 02:08:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: The ownership/permissions on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxr-xr-x. The directory must be owned by the submitter root or by root and permissions must be rwx------ java.io.IOException: The ownership/permissions on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxr-xr-x. The directory must be owned by the submitter root or by root and permissions must be rwx------ at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:108) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323) at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:257) at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:237) at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:457) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:317) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Expected results: [root@gprfs001 hadoop-1.1.2.23]# bin/hadoop jar hadoop-test-1.1.2.23.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/04/12 02:21:57 INFO fs.TestDFSIO: nrFiles = 10 13/04/12 02:21:57 INFO fs.TestDFSIO: fileSize (MB) = 1 13/04/12 02:21:57 INFO fs.TestDFSIO: bufferSize = 1000000 13/04/12 02:21:57 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS 13/04/12 02:21:57 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs 13/04/12 02:21:57 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs 13/04/12 02:21:57 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/04/12 02:21:58 INFO fs.TestDFSIO: created control files for: 10 files 13/04/12 02:21:59 INFO mapred.FileInputFormat: Total input paths to process : 10 13/04/12 02:21:59 INFO mapred.JobClient: Running job: job_201304112246_0002 13/04/12 02:22:00 INFO mapred.JobClient: map 0% reduce 0% 13/04/12 02:22:06 INFO mapred.JobClient: map 30% reduce 0% 13/04/12 02:22:07 INFO mapred.JobClient: map 100% reduce 0% 13/04/12 02:22:13 INFO mapred.JobClient: map 100% reduce 33% 13/04/12 02:22:16 INFO mapred.JobClient: map 100% reduce 100% 13/04/12 02:22:17 INFO mapred.JobClient: Job complete: job_201304112246_0002 13/04/12 02:22:17 INFO mapred.JobClient: Counters: 29 13/04/12 02:22:17 INFO mapred.JobClient: Job Counters 13/04/12 02:22:17 INFO mapred.JobClient: Launched reduce tasks=1 13/04/12 02:22:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=23455 13/04/12 02:22:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/04/12 02:22:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/04/12 02:22:17 INFO mapred.JobClient: Rack-local map tasks=4 13/04/12 02:22:17 INFO mapred.JobClient: Launched map tasks=10 13/04/12 02:22:17 INFO mapred.JobClient: Data-local map tasks=6 13/04/12 02:22:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9231 13/04/12 02:22:17 INFO mapred.JobClient: File Input Format Counters 13/04/12 02:22:17 INFO mapred.JobClient: Bytes Read=0 13/04/12 02:22:17 INFO mapred.JobClient: File Output Format Counters 13/04/12 02:22:17 INFO mapred.JobClient: Bytes Written=0 13/04/12 02:22:17 INFO mapred.JobClient: FileSystemCounters 13/04/12 02:22:17 INFO mapred.JobClient: FILE_BYTES_READ=812 13/04/12 02:22:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=556986 13/04/12 02:22:17 INFO mapred.JobClient: Map-Reduce Framework 13/04/12 02:22:17 INFO mapred.JobClient: Map output materialized bytes=866 13/04/12 02:22:17 INFO mapred.JobClient: Map input records=10 13/04/12 02:22:17 INFO mapred.JobClient: Reduce shuffle bytes=866 13/04/12 02:22:17 INFO mapred.JobClient: Spilled Records=100 13/04/12 02:22:17 INFO mapred.JobClient: Map output bytes=706 13/04/12 02:22:17 INFO mapred.JobClient: Total committed heap usage (bytes)=4421976064 13/04/12 02:22:17 INFO mapred.JobClient: CPU time spent (ms)=6210 13/04/12 02:22:17 INFO mapred.JobClient: Map input bytes=260 13/04/12 02:22:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=-4550 13/04/12 02:22:17 INFO mapred.JobClient: Combine input records=0 13/04/12 02:22:17 INFO mapred.JobClient: Reduce input records=50 13/04/12 02:22:17 INFO mapred.JobClient: Reduce input groups=5 13/04/12 02:22:17 INFO mapred.JobClient: Combine output records=0 13/04/12 02:22:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=2585493504 13/04/12 02:22:17 INFO mapred.JobClient: Reduce output records=5 13/04/12 02:22:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=10631364608 13/04/12 02:22:17 INFO mapred.JobClient: Map output records=50 13/04/12 02:22:17 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 13/04/12 02:22:17 INFO fs.TestDFSIO: Date & time: Fri Apr 12 02:22:17 UTC 2013 13/04/12 02:22:17 INFO fs.TestDFSIO: Number of files: 10 13/04/12 02:22:17 INFO fs.TestDFSIO: Total MBytes processed: 10 13/04/12 02:22:17 INFO fs.TestDFSIO: Throughput mb/sec: 22.988505747126435 13/04/12 02:22:17 INFO fs.TestDFSIO: Average IO rate mb/sec: 24.021678924560547 13/04/12 02:22:17 INFO fs.TestDFSIO: IO rate std deviation: 5.010013775633995 13/04/12 02:22:17 INFO fs.TestDFSIO: Test exec time sec: 18.752 13/04/12 02:22:17 INFO fs.TestDFSIO: Additional info: To workaround this problem: Run TestDFSIO once, after it fails with Error 13/04/12 02:08:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: The ownership/permissions on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxr-xr-x. The directory must be owned by the submitter root or by root and permissions must be rwx------ do the following: chmod 700 $GLUSTER_MOUNT_POINT/tmp/hadoop-root/mapred/staging/root/.staging/ to change the permissions on the directory, repeat TestDFSIO run successfully. __________________________________________________________________________________________________________________________________________________________________
The problem is that we were not reading in hadoop API assigned privileges on ** writes ** of directories and files in the gluster plugin. It turns out that newer release of hadoop (branch-1) actually fix this for you: By contrasting these two files, you can see that newer hadoop (branch-1) versions actually defensively set the permissions correctly: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/JobSubmissionFiles.java Whereas older hadoop versions do not: http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/mapreduce/JobSubmissionFiles.java.html As mentioned above, for the solution you can: 1) chmod the .staging directly yourself, or use umask to change the way the directory privileges work 2) compile into your existing hadoop distro (if you have the source, not sure how cloudera works here) the branch-1 JobSubmissionFiles.java logic above. This branch in development is expected to remedy the issue (not fully tested). https://github.com/gluster/hadoop-glusterfs/branches
Adding setOwner will be necessary for maintining group privileges - this is necessary for CLI unit tests in this ticket which I'm working on separately: https://bugzilla.redhat.com/show_bug.cgi?id=949200. /** * Adopted from RawLocalFileSystem to make absolute path. */ @Override public void setOwner(Path p, String username, String groupname) throws IOException { if (username == null && groupname == null) { throw new IOException("username == null && groupname == null"); } if (username == null) { execCommand(new File(makeAbsolute(p).toUri()), Shell.SET_GROUP_COMMAND, groupname); } else { //OWNER[:[GROUP]] String s = username + (groupname == null? "": ":" + groupname); execCommand(new File(makeAbsolute(p).toUri()), Shell.SET_OWNER_COMMAND, s); } }
This bug is now fixed in the pending patch https://github.com/gluster/hadoop-glusterfs/pull/27/commits