Bug 951305 - Hadoop benchmark TestDFSIO fails with ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: The ownership/permissions on the staging directory glusterfs://.../.staging is not as expected.
Summary: Hadoop benchmark TestDFSIO fails with ERROR security.UserGroupInformation: Pr...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
Assignee: Bradley Childs
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks: 1057253
TreeView+ depends on / blocked
 
Reported: 2013-04-12 03:33 UTC by Diane Feddema
Modified: 2014-03-03 16:31 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-03-03 16:31:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Diane Feddema 2013-04-12 03:33:51 UTC
Description of problem:
Hadoop benchmark TestDFSIO fails because staging directory ./tmp/hadoop-root/mapred/staging/root/.staging created by hadoop has the wrong permissions.  

Version-Release number of selected component (if applicable):
RHEL 6.2/ RHS 2.0.4 / Apache hadoop 1.0.4

How reproducible: 100%


Steps to Reproduce:
1.bin/hadoop jar hadoop-test-1.1.2.23.jar TestDFSIO -write -nrFiles 10 -fileSize 1
2.
3.
  
Actual results:
[root@gprfs001 hadoop-1.1.2.23]# bin/hadoop jar hadoop-test-1.1.2.23.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/04/12 02:08:51 INFO fs.TestDFSIO: nrFiles = 10
13/04/12 02:08:51 INFO fs.TestDFSIO: fileSize (MB) = 1
13/04/12 02:08:51 INFO fs.TestDFSIO: bufferSize = 1000000
13/04/12 02:08:51 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS
13/04/12 02:08:51 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs
13/04/12 02:08:51 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs
13/04/12 02:08:51 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/04/12 02:08:51 INFO fs.TestDFSIO: created control files for: 10 files
13/04/12 02:08:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: The ownership/permissions on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxr-xr-x. The directory must be owned by the submitter root or by root and permissions must be rwx------
java.io.IOException: The ownership/permissions on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxr-xr-x. The directory must be owned by the submitter root or by root and permissions must be rwx------
	at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:108)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:257)
	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:237)
	at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:457)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:317)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


Expected results:
[root@gprfs001 hadoop-1.1.2.23]# bin/hadoop jar hadoop-test-1.1.2.23.jar TestDFSIO -write -nrFiles 10 -fileSize 1
TestDFSIO.0.0.4
13/04/12 02:21:57 INFO fs.TestDFSIO: nrFiles = 10
13/04/12 02:21:57 INFO fs.TestDFSIO: fileSize (MB) = 1
13/04/12 02:21:57 INFO fs.TestDFSIO: bufferSize = 1000000
13/04/12 02:21:57 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS
13/04/12 02:21:57 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs
13/04/12 02:21:57 INFO glusterfs.GlusterFileSystem: mount -t glusterfs gprfs001:/HadoopVol /mnt/glusterfs
13/04/12 02:21:57 INFO fs.TestDFSIO: creating control file: 1 mega bytes, 10 files
13/04/12 02:21:58 INFO fs.TestDFSIO: created control files for: 10 files
13/04/12 02:21:59 INFO mapred.FileInputFormat: Total input paths to process : 10
13/04/12 02:21:59 INFO mapred.JobClient: Running job: job_201304112246_0002
13/04/12 02:22:00 INFO mapred.JobClient:  map 0% reduce 0%
13/04/12 02:22:06 INFO mapred.JobClient:  map 30% reduce 0%
13/04/12 02:22:07 INFO mapred.JobClient:  map 100% reduce 0%
13/04/12 02:22:13 INFO mapred.JobClient:  map 100% reduce 33%
13/04/12 02:22:16 INFO mapred.JobClient:  map 100% reduce 100%
13/04/12 02:22:17 INFO mapred.JobClient: Job complete: job_201304112246_0002
13/04/12 02:22:17 INFO mapred.JobClient: Counters: 29
13/04/12 02:22:17 INFO mapred.JobClient:   Job Counters 
13/04/12 02:22:17 INFO mapred.JobClient:     Launched reduce tasks=1
13/04/12 02:22:17 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=23455
13/04/12 02:22:17 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/12 02:22:17 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/12 02:22:17 INFO mapred.JobClient:     Rack-local map tasks=4
13/04/12 02:22:17 INFO mapred.JobClient:     Launched map tasks=10
13/04/12 02:22:17 INFO mapred.JobClient:     Data-local map tasks=6
13/04/12 02:22:17 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9231
13/04/12 02:22:17 INFO mapred.JobClient:   File Input Format Counters 
13/04/12 02:22:17 INFO mapred.JobClient:     Bytes Read=0
13/04/12 02:22:17 INFO mapred.JobClient:   File Output Format Counters 
13/04/12 02:22:17 INFO mapred.JobClient:     Bytes Written=0
13/04/12 02:22:17 INFO mapred.JobClient:   FileSystemCounters
13/04/12 02:22:17 INFO mapred.JobClient:     FILE_BYTES_READ=812
13/04/12 02:22:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=556986
13/04/12 02:22:17 INFO mapred.JobClient:   Map-Reduce Framework
13/04/12 02:22:17 INFO mapred.JobClient:     Map output materialized bytes=866
13/04/12 02:22:17 INFO mapred.JobClient:     Map input records=10
13/04/12 02:22:17 INFO mapred.JobClient:     Reduce shuffle bytes=866
13/04/12 02:22:17 INFO mapred.JobClient:     Spilled Records=100
13/04/12 02:22:17 INFO mapred.JobClient:     Map output bytes=706
13/04/12 02:22:17 INFO mapred.JobClient:     Total committed heap usage (bytes)=4421976064
13/04/12 02:22:17 INFO mapred.JobClient:     CPU time spent (ms)=6210
13/04/12 02:22:17 INFO mapred.JobClient:     Map input bytes=260
13/04/12 02:22:17 INFO mapred.JobClient:     SPLIT_RAW_BYTES=-4550
13/04/12 02:22:17 INFO mapred.JobClient:     Combine input records=0
13/04/12 02:22:17 INFO mapred.JobClient:     Reduce input records=50
13/04/12 02:22:17 INFO mapred.JobClient:     Reduce input groups=5
13/04/12 02:22:17 INFO mapred.JobClient:     Combine output records=0
13/04/12 02:22:17 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2585493504
13/04/12 02:22:17 INFO mapred.JobClient:     Reduce output records=5
13/04/12 02:22:17 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=10631364608
13/04/12 02:22:17 INFO mapred.JobClient:     Map output records=50
13/04/12 02:22:17 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
13/04/12 02:22:17 INFO fs.TestDFSIO:            Date & time: Fri Apr 12 02:22:17 UTC 2013
13/04/12 02:22:17 INFO fs.TestDFSIO:        Number of files: 10
13/04/12 02:22:17 INFO fs.TestDFSIO: Total MBytes processed: 10
13/04/12 02:22:17 INFO fs.TestDFSIO:      Throughput mb/sec: 22.988505747126435
13/04/12 02:22:17 INFO fs.TestDFSIO: Average IO rate mb/sec: 24.021678924560547
13/04/12 02:22:17 INFO fs.TestDFSIO:  IO rate std deviation: 5.010013775633995
13/04/12 02:22:17 INFO fs.TestDFSIO:     Test exec time sec: 18.752
13/04/12 02:22:17 INFO fs.TestDFSIO: 


Additional info:

To workaround this problem:
Run TestDFSIO once, after it fails with 
Error 13/04/12 02:08:52 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: The ownership/permissions on the staging directory glusterfs://gprfs001:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxr-xr-x. The directory must be owned by the submitter root or by root and permissions must be rwx------

do the following:

chmod 700 $GLUSTER_MOUNT_POINT/tmp/hadoop-root/mapred/staging/root/.staging/

to change the permissions on the directory, repeat TestDFSIO run successfully. 
__________________________________________________________________________________________________________________________________________________________________

Comment 2 Jay Vyas 2013-04-12 22:34:26 UTC

The problem is that we were not reading in hadoop API assigned privileges on ** writes ** of directories and files in the gluster plugin.  

It turns out that newer release of hadoop (branch-1) actually fix this for you: 

By contrasting these two files, you can see that newer hadoop (branch-1) versions actually defensively set the permissions correctly:

https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/mapred/org/apache/hadoop/mapreduce/JobSubmissionFiles.java

Whereas older hadoop versions do not: 

http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/mapreduce/JobSubmissionFiles.java.html

As mentioned above, for the solution you can:

1) chmod the .staging directly yourself, or use umask to change the way the directory privileges work 
2) compile into your existing hadoop distro (if you have the source, not sure how cloudera works here) the branch-1 JobSubmissionFiles.java logic above.

This branch in development is expected to remedy the issue (not fully tested). 

https://github.com/gluster/hadoop-glusterfs/branches

Comment 3 Jay Vyas 2013-04-13 20:38:46 UTC
Adding setOwner will be necessary for maintining group privileges - this is necessary for CLI unit tests in this ticket which I'm working on separately: https://bugzilla.redhat.com/show_bug.cgi?id=949200.


    /**
     * Adopted from RawLocalFileSystem to make absolute path.
     */
    @Override
    public void setOwner(Path p, String username, String groupname) throws IOException {
      if (username == null && groupname == null) {
        throw new IOException("username == null && groupname == null");
      }

      if (username == null) {
        execCommand(new File(makeAbsolute(p).toUri()), Shell.SET_GROUP_COMMAND, groupname); 
      } 
      else {
        //OWNER[:[GROUP]]
        String s = username + (groupname == null? "": ":" + groupname);
        execCommand(new File(makeAbsolute(p).toUri()), Shell.SET_OWNER_COMMAND, s);
      }
    }

Comment 4 Jay Vyas 2013-04-26 18:36:03 UTC
This bug is now fixed in the pending patch 
https://github.com/gluster/hadoop-glusterfs/pull/27/commits


Note You need to log in before you can comment on or make changes to this bug.