Bug 908898 - Between runs of haddop jobs the user must remove /mnt/glusterfs/tmp ( fs.glusterfs.mount=/mnt/glusterfs in core-site.xml) otherwise hadoop job fails with "ERROR security.UserGroupInformationerror"
Summary: Between runs of haddop jobs the user must remove /mnt/glusterfs/tmp ( fs.glu...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: mainline
Hardware: All
OS: Linux
high
low
Target Milestone: ---
Assignee: Jay Vyas
QA Contact: Martin Kudlej
URL:
Whiteboard:
: 909453 927396 927410 (view as bug list)
Depends On:
Blocks: 1057253
TreeView+ depends on / blocked
 
Reported: 2013-02-07 19:57 UTC by Diane Feddema
Modified: 2014-03-03 16:32 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-03-03 16:32:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Diane Feddema 2013-02-07 19:57:13 UTC
Description of problem:
Between runs of the hadoop jobs, the /mnt/glusterfs/tmp directory must be removed or hadoop fail with
"ERROR security.UserGroupInformationerror" (where fs.glusterfs.mount =/mnt/glusterfs)

Version-Release number of selected component (if applicable):


How reproducible: very


Steps to Reproduce:
1. Install hadoop with RHS 2.0/RHEL 6.2 
2. run any haddop job , for example,  bin/hadoop jar hadoop-examples.jar teragen 10000 /in-dir
3. run another hadoop job,   bin/hadoop jar hadoop-examples.jar terasort /in-dir /out-dir

The terasort (step 3) will fail because you did not remove directory /mnt/glusterfs/tmp after you ran teragen.
Note: where /mnt/glusters is property fs.glusterfs.mount from configuration file core-site.xml.

Actual results:
13/02/07 14:40:40 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: The ownership on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root
java.io.IOException: The ownership on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:900)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:894)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1113)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:894)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:868)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
        at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)


Expected results:

After a user runs a hadoop job they should not be required to remove directory /mnt/glusterfs/tmp ( /mnt/glusterfs is an example directory name, this directory name is set in hadoop configuration file core-site.xml via property fs.gluster.mount).

Desired behavior would be to have the hadoop plugin copy directory /mnt/glusterfs/tmp to a new directory /mnt/glusterfs/tmp.2 or some versioned /mnt/glusterfs/tmp directory after each hadoop run.  A system limit for max space allowed for versioned /mnt/glusterfs/tmp directories would need to be established. When max space limit for /mnt/glustefs/tmp.version directories is reached, start removing oldest tmp dirs first until you are under the limit.   

Additional info:

Comment 4 Scott Haines 2013-02-13 22:42:57 UTC
Per Feb-13 bug triage meeting, reassigning to swatt.

Comment 5 Jay Vyas 2013-02-20 17:50:44 UTC
I'm now in the process of confirming that this is not related to privileges in the FileSystem : (line 290 on https://github.com/gluster/hadoop-glusterfs/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java) appears not to be using the permissions.

At first glance it looks similar to this s3 filesystem bug:

https://issues.apache.org/jira/browse/HADOOP-8984

Comment 6 Jay Vyas 2013-02-20 20:19:07 UTC
Possible cause: FileStatus.getOwner is not overriden properly. Looking into this more now, appears that its due to missing filestatus properties.

Comment 7 Jay Vyas 2013-02-21 00:55:55 UTC
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
The below stack traces confirm that, by 
swapping in 

public STring getOwner(){ return "hdfs"; } in the getFileStatus(Path p) of the GlusterFileSystem class, 
we can avoid the error entirely. This is not a solution per-se - because the real solution is to correctly 
write/read FileSystem ownership metadata in the plugin. 

The next step will be do actually fix the way the plugin reads fs privileges. 
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 


[root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/02/21 00:41:19 INFO fs.TestDFSIO: nrFiles = 10
13/02/21 00:41:19 INFO fs.TestDFSIO: fileSize (MB) = 1
13/02/21 00:41:19 INFO fs.TestDFSIO: bufferSize = 1000000
13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
Initializing GlusterFS
13/02/21 00:41:20 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/02/21 00:41:20 INFO fs.TestDFSIO: created control files for: 10 files
13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:41:20 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:41:21 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------
13/02/21 00:41:21 INFO mapred.FileInputFormat: Total input paths to process : 10
13/02/21 00:41:21 INFO mapred.JobClient: Running job: job_201302210041_0001
13/02/21 00:41:22 INFO mapred.JobClient:  map 0% reduce 0%
13/02/21 00:41:35 INFO mapred.JobClient:  map 10% reduce 0%
13/02/21 00:41:36 INFO mapred.JobClient:  map 20% reduce 0%
^C[root@rhs-1 hadoop]# 
[root@rhs-1 hadoop]# mv /tmp/0AglusterfsBZ908890.jar /usr/lib/hadoop/lib/
mv: cannot stat `/tmp/0AglusterfsBZ908890.jar': No such file or directory
[root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/02/21 00:42:07 INFO fs.TestDFSIO: nrFiles = 10
13/02/21 00:42:07 INFO fs.TestDFSIO: fileSize (MB) = 1
13/02/21 00:42:07 INFO fs.TestDFSIO: bufferSize = 1000000
13/02/21 00:42:07 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:42:07 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
Initializing GlusterFS
13/02/21 00:42:08 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/02/21 00:42:08 INFO fs.TestDFSIO: created control files for: 10 files
13/02/21 00:42:08 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:42:08 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:42:08 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------
13/02/21 00:42:08 INFO mapred.FileInputFormat: Total input paths to process : 10
13/02/21 00:42:09 INFO mapred.JobClient: Running job: job_201302210041_0002
13/02/21 00:42:10 INFO mapred.JobClient:  map 0% reduce 0%
^C[root@rhs-1 hadoop]mv /usr/lib/hadoop/lib/0AglusterfsBZ908890.jar /tmp/
[root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/02/21 00:43:21 INFO fs.TestDFSIO: nrFiles = 10
13/02/21 00:43:21 INFO fs.TestDFSIO: fileSize (MB) = 1
13/02/21 00:43:21 INFO fs.TestDFSIO: bufferSize = 1000000
13/02/21 00:43:21 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:43:21 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
Initializing GlusterFS
13/02/21 00:43:21 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/02/21 00:43:22 INFO fs.TestDFSIO: created control files for: 10 files
13/02/21 00:43:22 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:43:22 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 00:43:22 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: The ownership on the staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root
java.io.IOException: The ownership on the staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by . The directory must be owned by the submitter root or by root
	at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:900)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:894)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1113)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:894)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:868)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:257)
	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:237)
	at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:457)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:317)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:83)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

Comment 8 Jay Vyas 2013-02-21 22:01:12 UTC
Confirmed that new version works, now assigning ticket to brad for review. 


do:

git checkout BZ908898


[root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/02/21 20:50:54 INFO fs.TestDFSIO: nrFiles = 10
13/02/21 20:50:54 INFO fs.TestDFSIO: fileSize (MB) = 1
13/02/21 20:50:54 INFO fs.TestDFSIO: bufferSize = 1000000
13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
Initializing GlusterFS
13/02/21 20:50:55 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/02/21 20:50:55 INFO fs.TestDFSIO: created control files for: 10 files
13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 20:50:55 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 20:50:56 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------
13/02/21 20:50:56 INFO mapred.FileInputFormat: Total input paths to process : 10
13/02/21 20:50:56 INFO mapred.JobClient: Running job: job_201302211942_0003
13/02/21 20:50:57 INFO mapred.JobClient:  map 0% reduce 0%
13/02/21 20:51:09 INFO mapred.JobClient:  map 20% reduce 0%
^C[root@rhs-1 hadoop]# 
[root@rhs-1 hadoop]# bin/hadoop jar hadoop-test-1.0.3-Intel.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/02/21 20:51:13 INFO fs.TestDFSIO: nrFiles = 10
13/02/21 20:51:13 INFO fs.TestDFSIO: fileSize (MB) = 1
13/02/21 20:51:13 INFO fs.TestDFSIO: bufferSize = 1000000
13/02/21 20:51:15 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 20:51:16 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
Initializing GlusterFS
13/02/21 20:51:17 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/02/21 20:51:17 INFO fs.TestDFSIO: created control files for: 10 files
13/02/21 20:51:23 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 20:51:23 WARN conf.Configuration: mapred-site.xml:a attempt to override final parameter: mapred.local.dir;  Ignoring.
13/02/21 20:51:24 INFO mapreduce.JobSubmissionFiles: Permissions on staging directory glusterfs://rhs-1:9000/tmp/hadoop-root/mapred/staging/root/.staging are incorrect: rwxrwxrwx. Fixing permissions to correct value rwx------
13/02/21 20:51:24 INFO mapred.FileInputFormat: Total input paths to process : 10
13/02/21 20:51:24 INFO mapred.JobClient: Running job: job_201302211942_0004
13/02/21 20:51:25 INFO mapred.JobClient:  map 0% reduce 0%
13/02/21 20:51:43 INFO mapred.JobClient:  map 20% reduce 0%

Comment 9 Jay Vyas 2013-02-22 21:57:13 UTC
reassigned to jay, as per our new ticketing protocol (only one owner, even during review) but now brad to review.

Comment 10 Jay Vyas 2013-03-25 21:45:30 UTC
Fixed in merge to head today for commit "f7162f1d31357cac8c2ed4577dcb3bc70e01df2e"

The test name that verifies is "public void test0aPermissions()".

Comment 11 Jay Vyas 2013-03-25 21:48:09 UTC
*** Bug 927410 has been marked as a duplicate of this bug. ***

Comment 12 Jay Vyas 2013-03-25 22:03:14 UTC
*** Bug 927396 has been marked as a duplicate of this bug. ***

Comment 13 Jay Vyas 2013-03-31 16:12:15 UTC
*** Bug 909453 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.