Bug 961535 - Multi user failures for hadoop map reduce jobs on GlusterFS
Summary: Multi user failures for hadoop map reduce jobs on GlusterFS
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Bradley Childs
QA Contact: hcfs-gluster-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1057253
TreeView+ depends on / blocked
 
Reported: 2013-05-09 21:01 UTC by Bradley Childs
Modified: 2015-08-21 09:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-21 09:07:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Bradley Childs 2013-05-09 21:01:09 UTC
Description of problem:
Various multi user scenarios fail running Hadoop on GlusterFS.


Reproduce:
1) Create 2 users, mapred and ambari_qa both belong to a hadoop group.
2) Create a 2 node gluster configuration with a brick on each machine.
3) Edit fstab to automount gluter to /mnt/glusterfs
4) Install hadoop per RHN documentation. Configure for GlusterFS, and jobtracker on one node task tracker on the other.
5) hand start the job and task tracker using the users listed below to recreate the scenarios. su to the specified map reduce user and run a sample job.

The commands are:
Start Job Tracker:
sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start jobtracker

Start Task Tracker:
sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start tasktracker

Run Map Reduce
cp /usr/lib/hadoop/LICENSE.txt /mnt/glusterfs 
/usr/lib/hadoop/bin/hadoop jar hadoop-examples.jar wordcount LICENSE.txt out



Scenarios:
Scenario 1 (control)
-----------
jobtracker (root)
tasktracker (root)
mapreduce job (root)

Result:
success


Scenario 2 
-----------
jobtracker (root)
tasktracker (mapred)
mapreduce job (root)

Result:
13/05/09 20:34:21 WARN mapred.JobClient: Error reading task outputhttp://gluster-1:50060/tasklog?plaintext=true&attemptid=attempt_201305092033_0001_m_000001_1&filter=stderr
13/05/09 20:34:21 INFO mapred.JobClient: Task Id : attempt_201305092033_0001_m_000001_2, Status : FAILED
Error initializing attempt_201305092033_0001_m_000001_2:
java.io.FileNotFoundException: File /mnt/glusterfs/mapred/system/job_201305092033_0001/jobToken does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4450)
	at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
	at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2573)
	at java.lang.Thread.run(Thread.java:662)


Scenario 3
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (root)

Result:
13/05/09 20:40:06 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

Scenario 4
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (mapred)

Result:
success


Scenario 4
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (ambari_qa)

Result:
13/05/09 20:43:27 ERROR security.UserGroupInformation: PriviledgedActionException as:ambari_qa cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

Comment 2 Martin Bukatovic 2013-08-28 11:49:34 UTC
When trying scenario 4 (running all parts under mapred user), it doesn't work for me: I'm getting the same error you spotted with scenario 2:

java.io.FileNotFoundException: File glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist.

I used shim 2.1 with HDP 1.3.

It doesn't matter if I use just runuser or try "su - mapred" and then run the job.

I used different hadoop job, but the problem seems to be the same:

~~~
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
~~~

Comment 3 Martin Bukatovic 2013-08-28 15:57:39 UTC
(In reply to Martin Bukatovic from comment #2)
> When trying scenario 4 (running all parts under mapred user), it doesn't
> work for me: I'm getting the same error you spotted with scenario 2:
> 
> java.io.FileNotFoundException: File
> glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist.
> 
> I used shim 2.1 with HDP 1.3.
> 
> It doesn't matter if I use just runuser or try "su - mapred" and then run
> the job.
> 
> I used different hadoop job, but the problem seems to be the same:
> 
> ~~~
> /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
> ~~~

for the record: this was my bad: I didn't check if glusterfs volume is correctly mounted on all nodes in the cluster (previous shim version didn't require that)

Comment 4 Bradley Childs 2013-11-07 21:46:58 UTC
Solution/Workaround in:

https://github.com/gluster/glusterfs-hadoop/pull/61


Note You need to log in before you can comment on or make changes to this bug.