Bug 961535 - Multi user failures for hadoop map reduce jobs on GlusterFS
Multi user failures for hadoop map reduce jobs on GlusterFS
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: gluster-hadoop (Show other bugs)
mainline
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Bradley Childs
hcfs-gluster-bugs
: Triaged
Depends On:
Blocks: 1057253
  Show dependency treegraph
 
Reported: 2013-05-09 17:01 EDT by Bradley Childs
Modified: 2015-08-21 05:07 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-08-21 05:07:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bradley Childs 2013-05-09 17:01:09 EDT
Description of problem:
Various multi user scenarios fail running Hadoop on GlusterFS.


Reproduce:
1) Create 2 users, mapred and ambari_qa both belong to a hadoop group.
2) Create a 2 node gluster configuration with a brick on each machine.
3) Edit fstab to automount gluter to /mnt/glusterfs
4) Install hadoop per RHN documentation. Configure for GlusterFS, and jobtracker on one node task tracker on the other.
5) hand start the job and task tracker using the users listed below to recreate the scenarios. su to the specified map reduce user and run a sample job.

The commands are:
Start Job Tracker:
sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start jobtracker

Start Task Tracker:
sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start tasktracker

Run Map Reduce
cp /usr/lib/hadoop/LICENSE.txt /mnt/glusterfs 
/usr/lib/hadoop/bin/hadoop jar hadoop-examples.jar wordcount LICENSE.txt out



Scenarios:
Scenario 1 (control)
-----------
jobtracker (root)
tasktracker (root)
mapreduce job (root)

Result:
success


Scenario 2 
-----------
jobtracker (root)
tasktracker (mapred)
mapreduce job (root)

Result:
13/05/09 20:34:21 WARN mapred.JobClient: Error reading task outputhttp://gluster-1:50060/tasklog?plaintext=true&attemptid=attempt_201305092033_0001_m_000001_1&filter=stderr
13/05/09 20:34:21 INFO mapred.JobClient: Task Id : attempt_201305092033_0001_m_000001_2, Status : FAILED
Error initializing attempt_201305092033_0001_m_000001_2:
java.io.FileNotFoundException: File /mnt/glusterfs/mapred/system/job_201305092033_0001/jobToken does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4450)
	at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
	at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2573)
	at java.lang.Thread.run(Thread.java:662)


Scenario 3
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (root)

Result:
13/05/09 20:40:06 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

Scenario 4
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (mapred)

Result:
success


Scenario 4
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (ambari_qa)

Result:
13/05/09 20:43:27 ERROR security.UserGroupInformation: PriviledgedActionException as:ambari_qa cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more
Comment 2 Martin Bukatovic 2013-08-28 07:49:34 EDT
When trying scenario 4 (running all parts under mapred user), it doesn't work for me: I'm getting the same error you spotted with scenario 2:

java.io.FileNotFoundException: File glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist.

I used shim 2.1 with HDP 1.3.

It doesn't matter if I use just runuser or try "su - mapred" and then run the job.

I used different hadoop job, but the problem seems to be the same:

~~~
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
~~~
Comment 3 Martin Bukatovic 2013-08-28 11:57:39 EDT
(In reply to Martin Bukatovic from comment #2)
> When trying scenario 4 (running all parts under mapred user), it doesn't
> work for me: I'm getting the same error you spotted with scenario 2:
> 
> java.io.FileNotFoundException: File
> glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist.
> 
> I used shim 2.1 with HDP 1.3.
> 
> It doesn't matter if I use just runuser or try "su - mapred" and then run
> the job.
> 
> I used different hadoop job, but the problem seems to be the same:
> 
> ~~~
> /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
> ~~~

for the record: this was my bad: I didn't check if glusterfs volume is correctly mounted on all nodes in the cluster (previous shim version didn't require that)
Comment 4 Bradley Childs 2013-11-07 16:46:58 EST
Solution/Workaround in:

https://github.com/gluster/glusterfs-hadoop/pull/61

Note You need to log in before you can comment on or make changes to this bug.