Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 961535

Summary:	Multi user failures for hadoop map reduce jobs on GlusterFS
Product:	[Community] GlusterFS	Reporter:	Bradley Childs <bchilds>
Component:	gluster-hadoop	Assignee:	Bradley Childs <bchilds>
Status:	CLOSED CURRENTRELEASE	QA Contact:	hcfs-gluster-bugs
Severity:	medium	Docs Contact:
Priority:	medium
Version:	mainline	CC:	aavati, bchilds, bugs, eboyd, matt, mbukatov, mkudlej, rhs-bugs, vbellur
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-08-21 09:07:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1057253

Description Bradley Childs 2013-05-09 21:01:09 UTC

Description of problem:
Various multi user scenarios fail running Hadoop on GlusterFS.


Reproduce:
1) Create 2 users, mapred and ambari_qa both belong to a hadoop group.
2) Create a 2 node gluster configuration with a brick on each machine.
3) Edit fstab to automount gluter to /mnt/glusterfs
4) Install hadoop per RHN documentation. Configure for GlusterFS, and jobtracker on one node task tracker on the other.
5) hand start the job and task tracker using the users listed below to recreate the scenarios. su to the specified map reduce user and run a sample job.

The commands are:
Start Job Tracker:
sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start jobtracker

Start Task Tracker:
sudo -u <USERNAME> /usr/lib/hadoop/bin/hadoop-daemon.sh --config /usr/lib/hadoop/conf start tasktracker

Run Map Reduce
cp /usr/lib/hadoop/LICENSE.txt /mnt/glusterfs 
/usr/lib/hadoop/bin/hadoop jar hadoop-examples.jar wordcount LICENSE.txt out



Scenarios:
Scenario 1 (control)
-----------
jobtracker (root)
tasktracker (root)
mapreduce job (root)

Result:
success


Scenario 2 
-----------
jobtracker (root)
tasktracker (mapred)
mapreduce job (root)

Result:
13/05/09 20:34:21 WARN mapred.JobClient: Error reading task outputhttp://gluster-1:50060/tasklog?plaintext=true&attemptid=attempt_201305092033_0001_m_000001_1&filter=stderr
13/05/09 20:34:21 INFO mapred.JobClient: Task Id : attempt_201305092033_0001_m_000001_2, Status : FAILED
Error initializing attempt_201305092033_0001_m_000001_2:
java.io.FileNotFoundException: File /mnt/glusterfs/mapred/system/job_201305092033_0001/jobToken does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4450)
	at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
	at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2573)
	at java.lang.Thread.run(Thread.java:662)


Scenario 3
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (root)

Result:
13/05/09 20:40:06 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/root/.staging/job_201305092039_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

Scenario 4
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (mapred)

Result:
success


Scenario 4
-----------
jobtracker (mapred)
tasktracker (mapred)
mapreduce job (ambari_qa)

Result:
13/05/09 20:43:27 ERROR security.UserGroupInformation: PriviledgedActionException as:ambari_qa cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3758)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3722)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1405)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1401)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1399)
Caused by: java.io.FileNotFoundException: File /mnt/glusterfs/user/ambari_qa/.staging/job_201305092042_0001/job.xml does not exist.
	at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileStatus(GlusterFileSystem.java:430)
	at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:405)
	at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3756)
	... 12 more

Comment 2 Martin Bukatovic 2013-08-28 11:49:34 UTC

When trying scenario 4 (running all parts under mapred user), it doesn't work for me: I'm getting the same error you spotted with scenario 2:

java.io.FileNotFoundException: File glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist.

I used shim 2.1 with HDP 1.3.

It doesn't matter if I use just runuser or try "su - mapred" and then run the job.

I used different hadoop job, but the problem seems to be the same:

~~~
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
~~~

Comment 3 Martin Bukatovic 2013-08-28 15:57:39 UTC

(In reply to Martin Bukatovic from comment #2)
> When trying scenario 4 (running all parts under mapred user), it doesn't
> work for me: I'm getting the same error you spotted with scenario 2:
> 
> java.io.FileNotFoundException: File
> glusterfs:/mapred/system/job_201308271948_0035/jobToken does not exist.
> 
> I used shim 2.1 with HDP 1.3.
> 
> It doesn't matter if I use just runuser or try "su - mapred" and then run
> the job.
> 
> I used different hadoop job, but the problem seems to be the same:
> 
> ~~~
> /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
> ~~~

for the record: this was my bad: I didn't check if glusterfs volume is correctly mounted on all nodes in the cluster (previous shim version didn't require that)

Comment 4 Bradley Childs 2013-11-07 21:46:58 UTC

Solution/Workaround in:

https://github.com/gluster/glusterfs-hadoop/pull/61