Bug 1002020

Summary: shim raises java.io.IOException when executing hadoop job under mapred user via su
Product: [Community] GlusterFS Reporter: Martin Bukatovic <mbukatov>
Component: gluster-hadoopAssignee: Bradley Childs <bchilds>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Kudlej <mkudlej>
Severity: unspecified Docs Contact:
Priority: high    
Version: mainlineCC: aavati, dahorak, eboyd, matt, mkudlej, rhs-bugs, shaines, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-03 16:31:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1057253    

Description Martin Bukatovic 2013-08-28 10:53:42 UTC
Description of problem
======================

When executing hadoop job (eg. pi from hadoop-examples) under mapred user (as
described in BZ 970178 [1]), shim raises java.io.IOException. Using runuser
instead seems to solve the problem.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=970178#c2

Version-Release number of selected component (if applicable)
============================================================

glusterfs hadoop shim 2.1

~~~
# ls /usr/lib/hadoop/lib/ | grep glusterfs
glusterfs-hadoop-2.1.jar
~~~

hortonworks hadoop distribution (HDP 1.3):

~~~
# rpm -qa | grep hadoop
hadoop-pipes-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-libhdfs-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-native-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-lzo-0.5.0-1.x86_64
hadoop-lzo-native-0.5.0-1.x86_64
hadoop-sbin-1.2.0.1.3.0.0-107.el6.x86_64
~~~

How reproducible
================

always

Steps to Reproduce
==================

1. configure hadoop cluster to use glusterfs using shim 2.1
2. run pi hadoop job from hadoop-examples jar:

~~~
# su mapred -c "/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4"
~~~

Actual results
==============

Job doesn't start running, fails with `java.io.IOException: Cannot get layout`
exception:

~~~
# su mapred -c "/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4"
Number of Maps  = 4
Samples per Map = 4
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystemCRC: Initializing gluster volume..
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystemCRC: Initializing gluster volume..
13/08/28 12:35:47 INFO util.NativeCodeLoader: Loaded the native-hadoop library
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
13/08/28 12:35:48 INFO mapred.FileInputFormat: Total input paths to process : 4
13/08/28 12:35:49 INFO mapred.JobClient: Cleaning up the staging area glusterfs:/user/mapred/.staging/job_201308271948_0029
13/08/28 12:35:49 ERROR security.UserGroupInformation: PriviledgedActionException as:mapred cause:java.io.IOException: Cannot get layout
java.io.IOException: Cannot get layout
        at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.execGetFattr(GlusterFSXattr.java:225)
        at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.getPathInfo(GlusterFSXattr.java:84)
        at org.apache.hadoop.fs.glusterfs.GlusterVolume.getFileBlockLocations(GlusterVolume.java:155)
        at org.apache.hadoop.fs.FilterFileSystem.getFileBlockLocations(FilterFileSystem.java:98)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:231)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
        at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
        at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297)
        at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
~~~


Expected results
================

Job executes normally.

Additional info
===============

I run it in permissive mode, so it's not a SELinux issue.

When using runuser instead like this:

~~~
runuser mapred /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
~~~

the problem seems to be avoided (in most cases).

The question here is if it's ok to just mention this in documentation or fix it
to avoid breaking worklow of our parters and customers (there are little
details on this so it's hard to guess if they would mind).

Since there is no good reason for the job to fail when using 'su' anyway, I
suggest to try to find out what's wrong here. On the other hand runuser command
should be preffered over su there days [2], so just stressing this in the
documentation may be also ok. It's hard to decide this without looking into
details.

[2] http://danwalsh.livejournal.com/55588.html

Comment 3 Martin Bukatovic 2013-10-18 15:31:28 UTC
Turning off 'requiretty' seems to fix this issue.

I used following /etc/sudoers.d/20_gluster file:

~~~
Defaults:%hadoop !requiretty
mapred ALL= NOPASSWD: /usr/bin/getfattr
~~~

Comment 4 Jay Vyas 2013-11-19 13:05:55 UTC
Yes, the proper solution to this is to (1) either use a precompiled xattr 
program that doesnt require sudo or (2) turn of requiretty and make sure 
sudoers is working correctly.  

At some point I guess a smoke test shell script wouldbe nice to bundle with the 
code so that minor hiccups like this can be directly exposed in bash, before 
trying to run a mapreduce job.