Bug 1002020 - shim raises java.io.IOException when executing hadoop job under mapred user via su
shim raises java.io.IOException when executing hadoop job under mapred user v...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: gluster-hadoop (Show other bugs)
mainline
x86_64 Linux
high Severity unspecified
: ---
: ---
Assigned To: Bradley Childs
Martin Kudlej
:
Depends On:
Blocks: 1057253
  Show dependency treegraph
 
Reported: 2013-08-28 06:53 EDT by Martin Bukatovic
Modified: 2014-03-03 11:31 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-03-03 11:31:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Martin Bukatovic 2013-08-28 06:53:42 EDT
Description of problem
======================

When executing hadoop job (eg. pi from hadoop-examples) under mapred user (as
described in BZ 970178 [1]), shim raises java.io.IOException. Using runuser
instead seems to solve the problem.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=970178#c2

Version-Release number of selected component (if applicable)
============================================================

glusterfs hadoop shim 2.1

~~~
# ls /usr/lib/hadoop/lib/ | grep glusterfs
glusterfs-hadoop-2.1.jar
~~~

hortonworks hadoop distribution (HDP 1.3):

~~~
# rpm -qa | grep hadoop
hadoop-pipes-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-libhdfs-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-native-1.2.0.1.3.0.0-107.el6.x86_64
hadoop-lzo-0.5.0-1.x86_64
hadoop-lzo-native-0.5.0-1.x86_64
hadoop-sbin-1.2.0.1.3.0.0-107.el6.x86_64
~~~

How reproducible
================

always

Steps to Reproduce
==================

1. configure hadoop cluster to use glusterfs using shim 2.1
2. run pi hadoop job from hadoop-examples jar:

~~~
# su mapred -c "/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4"
~~~

Actual results
==============

Job doesn't start running, fails with `java.io.IOException: Cannot get layout`
exception:

~~~
# su mapred -c "/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4"
Number of Maps  = 4
Samples per Map = 4
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystemCRC: Initializing gluster volume..
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
13/08/28 12:35:47 INFO glusterfs.GlusterFileSystemCRC: Initializing gluster volume..
13/08/28 12:35:47 INFO util.NativeCodeLoader: Loaded the native-hadoop library
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
13/08/28 12:35:48 INFO mapred.FileInputFormat: Total input paths to process : 4
13/08/28 12:35:49 INFO mapred.JobClient: Cleaning up the staging area glusterfs:/user/mapred/.staging/job_201308271948_0029
13/08/28 12:35:49 ERROR security.UserGroupInformation: PriviledgedActionException as:mapred cause:java.io.IOException: Cannot get layout
java.io.IOException: Cannot get layout
        at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.execGetFattr(GlusterFSXattr.java:225)
        at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.getPathInfo(GlusterFSXattr.java:84)
        at org.apache.hadoop.fs.glusterfs.GlusterVolume.getFileBlockLocations(GlusterVolume.java:155)
        at org.apache.hadoop.fs.FilterFileSystem.getFileBlockLocations(FilterFileSystem.java:98)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:231)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)
        at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
        at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297)
        at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
~~~


Expected results
================

Job executes normally.

Additional info
===============

I run it in permissive mode, so it's not a SELinux issue.

When using runuser instead like this:

~~~
runuser mapred /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 4 4
~~~

the problem seems to be avoided (in most cases).

The question here is if it's ok to just mention this in documentation or fix it
to avoid breaking worklow of our parters and customers (there are little
details on this so it's hard to guess if they would mind).

Since there is no good reason for the job to fail when using 'su' anyway, I
suggest to try to find out what's wrong here. On the other hand runuser command
should be preffered over su there days [2], so just stressing this in the
documentation may be also ok. It's hard to decide this without looking into
details.

[2] http://danwalsh.livejournal.com/55588.html
Comment 3 Martin Bukatovic 2013-10-18 11:31:28 EDT
Turning off 'requiretty' seems to fix this issue.

I used following /etc/sudoers.d/20_gluster file:

~~~
Defaults:%hadoop !requiretty
mapred ALL= NOPASSWD: /usr/bin/getfattr
~~~
Comment 4 Jay Vyas 2013-11-19 08:05:55 EST
Yes, the proper solution to this is to (1) either use a precompiled xattr 
program that doesnt require sudo or (2) turn of requiretty and make sure 
sudoers is working correctly.  

At some point I guess a smoke test shell script wouldbe nice to bundle with the 
code so that minor hiccups like this can be directly exposed in bash, before 
trying to run a mapreduce job.

Note You need to log in before you can comment on or make changes to this bug.