Description of problem: Various multi user scenarios fail running Hadoop on GlusterFS. Reproduce: 1) Create 2 users, mapred and ambari_qa both belong to a hadoop group. 2) Create a 2 node gluster configuration with a brick on each machine. 3) Edit fstab to automount gluter to /mnt/glusterfs 4) Install hadoop per RHN documentation. Configure for GlusterFS, and jobtracker on one node task tracker on the other. 5) hand start the job and task tracker using the users listed below to recreate the scenarios. su to the specified map reduce user and run a sample job. The commands are: Start Job Tracker: sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start jobtracker Start Task Tracker: sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start tasktracker Run Map Reduce cp /usr/lib/hadoop/LICENSE.txt /mnt/glusterfs /usr/lib/hadoop/bin/hadoop jar hadoop-examples.jar wordcount LICENSE.txt out Scenarios: Scenario 1 (control) ----------- jobtracker (root) tasktracker (root) mapreduce job (root) Result: success Scenario 2 ----------- jobtracker (root) tasktracker (mapred) mapreduce job (root) Result: 13/05/09 20:34:21 WARN mapred.JobClient: Error reading task outputhttp://gluster-1:50060/tasklog?plaintext=true&attemptid=attempt_201305092033_0001_m_000001_1&filter=stderr 13/05/09 20:34:21 INFO mapred.JobClient: Task Id : attempt_201305092033_0001_m_000001_2, Status : FAILED Error initializing attempt_201305092033_0001_m_000001_2: java.io.FileNotFoundException: File /mnt/glusterfs/mapred/system/job_201305092033_0001/jobToken does not exist. at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430) at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4450) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2573) at java.lang.Thread.run(Thread.java:662) Scenario 3 ----------- jobtracker (mapred) tasktracker (mapred) mapreduce job (root) Result: 13/05/09 20:40:06 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist. at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399) Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist. at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430) at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756) ... 12 more Scenario 4 ----------- jobtracker (mapred) tasktracker (mapred) mapreduce job (mapred) Result: success Scenario 4 ----------- jobtracker (mapred) tasktracker (mapred) mapreduce job (ambari_qa) Result: 13/05/09 20:43:27 ERROR security.UserGroupInformation: PriviledgedActionException as:ambari_qa cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist. at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399) Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist. at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430) at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756) ... 12 more org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist. at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399) Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist. at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430) at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756) ... 12 more
When trying scenario 4 (running all parts under mapred user), it doesn't work for me: I'm getting the same error you spotted with scenario 2: java.io.FileNotFoundException: File glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist. I used shim 2.1 with HDP 1.3. It doesn't matter if I use just runuser or try "su - mapred" and then run the job. I used different hadoop job, but the problem seems to be the same: ~~~ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4 ~~~
(In reply to Martin Bukatovic from comment #2) > When trying scenario 4 (running all parts under mapred user), it doesn't > work for me: I'm getting the same error you spotted with scenario 2: > > java.io.FileNotFoundException: File > glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist. > > I used shim 2.1 with HDP 1.3. > > It doesn't matter if I use just runuser or try "su - mapred" and then run > the job. > > I used different hadoop job, but the problem seems to be the same: > > ~~~ > /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4 > ~~~ for the record: this was my bad: I didn't check if glusterfs volume is correctly mounted on all nodes in the cluster (previous shim version didn't require that)
Solution/Workaround in: https://github.com/gluster/glusterfs-hadoop/pull/61