Description of problem: Between runs of the hadoop jobs, the /mnt/glusterfs/tmp directory must be removed or hadoop fail with "ERROR security.UserGroupInformationerror" (where fs.glusterfs.mount =/mnt/glusterfs) Version-Release number of selected component (if applicable): How reproducible: very Steps to Reproduce: 1. Install hadoop with RHS 2.0/RHEL 6.2 2. run any haddop job , for example, bin/hadoop jar hadoop-examples.jar teragen 10000 /in-dir 3. run another hadoop job, bin/hadoop jar hadoop-examples.jar terasort /in-dir /out-dir The terasort (step 3) will fail because you did not remove directory /mnt/glusterfs/tmp after you ran teragen. Note: where /mnt/glusters is property fs.glusterfs.mount from configuration file core-site.xml. Actual results: 13/02/07 14:40:40 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: The ownership on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root java.io.IOException: The ownership on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:900) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:894) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1113) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:894) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:868) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323) at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) Expected results: After a user runs a hadoop job they should not be required to remove directory /mnt/glusterfs/tmp ( /mnt/glusterfs is an example directory name, this directory name is set in hadoop configuration file core-site.xml via property fs.gluster.mount). Desired behavior would be to have the hadoop plugin copy directory /mnt/glusterfs/tmp to a new directory /mnt/glusterfs/tmp.2 or some versioned /mnt/glusterfs/tmp directory after each hadoop run. A system limit for max space allowed for versioned /mnt/glusterfs/tmp directories would need to be established. When max space limit for /mnt/glustefs/tmp.version directories is reached, start removing oldest tmp dirs first until you are under the limit. Additional info:
Per Feb-13 bug triage meeting, reassigning to swatt.
I'm now in the process of confirming that this is not related to privileges in the FileSystem : (line 290 on https://github.com/gluster/hadoop-glusterfs/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java) appears not to be using the permissions. At first glance it looks similar to this s3 filesystem bug: https://issues.apache.org/jira/browse/HADOOP-8984
Possible cause: FileStatus.getOwner is not overriden properly. Looking into this more now, appears that its due to missing filestatus properties.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< The below stack traces confirm that, by swapping in public STring getOwner(){ return "hdfs"; } in the getFileStatus(Path p) of the GlusterFileSystem class, we can avoid the error entirely. This is not a solution per-se - because the real solution is to correctly write/read FileSystem ownership metadata in the plugin. The next step will be do actually fix the way the plugin reads fs privileges. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< [root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/02/21 00:41:19 INFO fs.TestDFSIO: nrFiles = 10 13/02/21 00:41:19 INFO fs.TestDFSIO: fileSize (MB) = 1 13/02/21 00:41:19 INFO fs.TestDFSIO: bufferSize = 1000000 13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. Initializing GlusterFS 13/02/21 00:41:20 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/02/21 00:41:20 INFO fs.TestDFSIO: created control files for: 10 files 13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:41:21 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------ 13/02/21 00:41:21 INFO mapred.FileInputFormat: Total input paths to process : 10 13/02/21 00:41:21 INFO mapred.JobClient: Running job: job_201302210041_0001 13/02/21 00:41:22 INFO mapred.JobClient: map 0% reduce 0% 13/02/21 00:41:35 INFO mapred.JobClient: map 10% reduce 0% 13/02/21 00:41:36 INFO mapred.JobClient: map 20% reduce 0% ^C[root@rhs-1 hadoop]# [root@rhs-1 hadoop]# mv /tmp/0AglusterfsBZ908890.jar /usr/lib/hadoop/lib/ mv: cannot stat `/tmp/0AglusterfsBZ908890.jar': No such file or directory [root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/02/21 00:42:07 INFO fs.TestDFSIO: nrFiles = 10 13/02/21 00:42:07 INFO fs.TestDFSIO: fileSize (MB) = 1 13/02/21 00:42:07 INFO fs.TestDFSIO: bufferSize = 1000000 13/02/21 00:42:07 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:42:07 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. Initializing GlusterFS 13/02/21 00:42:08 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/02/21 00:42:08 INFO fs.TestDFSIO: created control files for: 10 files 13/02/21 00:42:08 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:42:08 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:42:08 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------ 13/02/21 00:42:08 INFO mapred.FileInputFormat: Total input paths to process : 10 13/02/21 00:42:09 INFO mapred.JobClient: Running job: job_201302210041_0002 13/02/21 00:42:10 INFO mapred.JobClient: map 0% reduce 0% ^C[root@rhs-1 hadoop]mv /usr/lib/hadoop/lib/0AglusterfsBZ908890.jar /tmp/ [root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/02/21 00:43:21 INFO fs.TestDFSIO: nrFiles = 10 13/02/21 00:43:21 INFO fs.TestDFSIO: fileSize (MB) = 1 13/02/21 00:43:21 INFO fs.TestDFSIO: bufferSize = 1000000 13/02/21 00:43:21 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:43:21 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. Initializing GlusterFS 13/02/21 00:43:21 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/02/21 00:43:22 INFO fs.TestDFSIO: created control files for: 10 files 13/02/21 00:43:22 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:43:22 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 00:43:22 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: The ownership on the staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root java.io.IOException: The ownership on the staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:900) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:894) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1113) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:894) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:868) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323) at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:257) at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:237) at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:457) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:317) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Confirmed that new version works, now assigning ticket to brad for review. do: git checkout BZ908898 [root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/02/21 20:50:54 INFO fs.TestDFSIO: nrFiles = 10 13/02/21 20:50:54 INFO fs.TestDFSIO: fileSize (MB) = 1 13/02/21 20:50:54 INFO fs.TestDFSIO: bufferSize = 1000000 13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. Initializing GlusterFS 13/02/21 20:50:55 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/02/21 20:50:55 INFO fs.TestDFSIO: created control files for: 10 files 13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 20:50:56 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------ 13/02/21 20:50:56 INFO mapred.FileInputFormat: Total input paths to process : 10 13/02/21 20:50:56 INFO mapred.JobClient: Running job: job_201302211942_0003 13/02/21 20:50:57 INFO mapred.JobClient: map 0% reduce 0% 13/02/21 20:51:09 INFO mapred.JobClient: map 20% reduce 0% ^C[root@rhs-1 hadoop]# [root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1 TestDFSIO.0.0.4 13/02/21 20:51:13 INFO fs.TestDFSIO: nrFiles = 10 13/02/21 20:51:13 INFO fs.TestDFSIO: fileSize (MB) = 1 13/02/21 20:51:13 INFO fs.TestDFSIO: bufferSize = 1000000 13/02/21 20:51:15 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 20:51:16 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. Initializing GlusterFS 13/02/21 20:51:17 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files 13/02/21 20:51:17 INFO fs.TestDFSIO: created control files for: 10 files 13/02/21 20:51:23 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 20:51:23 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 13/02/21 20:51:24 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------ 13/02/21 20:51:24 INFO mapred.FileInputFormat: Total input paths to process : 10 13/02/21 20:51:24 INFO mapred.JobClient: Running job: job_201302211942_0004 13/02/21 20:51:25 INFO mapred.JobClient: map 0% reduce 0% 13/02/21 20:51:43 INFO mapred.JobClient: map 20% reduce 0%
reassigned to jay, as per our new ticketing protocol (only one owner, even during review) but now brad to review.
Fixed in merge to head today for commit "f7162f1d31357cac8c2ed4577dcb3bc70e01df2e" The test name that verifies is "public void test0aPermissions()".
*** Bug 927410 has been marked as a duplicate of this bug. ***
*** Bug 927396 has been marked as a duplicate of this bug. ***
*** Bug 909453 has been marked as a duplicate of this bug. ***