Bug 996290 - mapred system directory is not created when using gluster hadoop shim
Summary: mapred system directory is not created when using gluster hadoop shim
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Bradley Childs
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-12 20:46 UTC by Martin Bukatovic
Modified: 2015-05-26 18:17 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-26 18:17:43 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Martin Bukatovic 2013-08-12 20:46:23 UTC
Description of problem
======================

When using gluster hadoop shim with hadoop, the mapred system directory is not
created automatically during job tracker startup. This failure pollutes job
tracker logs and later leads to errors during hadoop job processing.

Version-Release number of selected component (if applicable)
============================================================

gluster hadoop shim: 2.1 (checked also 2.0 and 1.0 which are also affected)

note: we don't have proper rpm yet

hortonworks hadoop distribution (HDP 1.2.2) via tarball: hadoop-1.1.2.21

How reproducible
================

always

Steps to Reproduce
==================

1. configure hadoop cluster to use gluster volume via gluster hadoop plugin 
2. make sure that gluster volume is empty
3. start job tracker
4. check mountpoint of gluster volume (eg. /mnt/hadoop-volume in my case)

Actual results
==============

Glusterfs volume is empty. There is no mapred system directory created
(see hadoop property mapred.system.dir).

Also note that log files of hadoop job tracker contains following error:

~~~
2013-08-12 18:36:10,119 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: glusterfs:/sysdirmapred
java.io.FileNotFoundException: File glusterfs:/sysdirmapred does not exist
        at org.apache.hadoop.fs.glusterfs.GlusterVolume.listStatus(GlusterVolume.java:103)
        at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:159)
        at org.apache.hadoop.mapred.JobTracker.initialize(JobTracker.java:1973)
        at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2341)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4816)
~~~

This error is endlessly repeated until some job is started, then the directories
are created. Nevertheless a mapreduce job then hangs and fails to finish.


Expected results
================

Mapred system directory created inside /mnt/hadoop-volume:

~~~
root@mb1_gluster:/mnt/hadoop-volume# tree
.
└── tmp
    └── hadoop-root
        └── mapred
            └── system
                └── jobtracker.info

4 directories, 1 file
~~~

There are no java.io.FileNotFoundException entries in task tracker log and mapreduce jobs runs normally.

Additional info
===============

older versions
--------------

Version shipped as tech preview with RHS (glusterfs-0.20.2-0.2.jar) doesn't
have this problem.

quick fix
---------

When creating mapred system directory by hand, job 
tracker works normally (no errors in the logs or failures of mapreduce jobs).

another hadoop distro: HDP 1.3.1 from rpm
-----------------------------------------

HDP 1.3.1 from rpm with hadoop-1.2.0.1.3.0.0-107.el6.x86_64 is also affected.
The difference is, that one have to set proper access rights as well:

~~~
root@mb1_gluster:/mnt/hadoop-volume# mkdir -p mapred/system                     
root@mb1_gluster:/mnt/hadoop-volume# chown mapred:hadoop -R mapred/
~~~

Another interesting fact is that hortonworks hint the users to create this
directory manually:

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.2/bk_installing_manually_book/content/rpm-chap4-3.html

The reason here is probably custom setup of ownership and access rights (in HDP
distribution, each component has it's dedicated unix user).

But this doesn't mean that this issue should not deserve fixing.

others
------

The mapred.system.dir should be shared across all machines so it's placed in
DFS by default [1] and a path value of this property is relative to default
filesystem - glusterfs in our case.
Btw the default value seems to be "${hadoop.tmp.dir}/mapred/system"  [2] (and I
also checked this in xml job file in hadoop log directory) which is the reason
for the confusion since hadoop.tmp.dir property is used to configure local tmp
dir (with default /tmp/hadoop-${user.name}).

[1] https://hadoop.apache.org/docs/stable/cluster_setup.html
[2] http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir

Comment 3 Jay Vyas 2013-11-19 01:30:56 UTC
Not sure this is our responsibility:  See, for example ~

docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_installing_manually_book/content/rpm-chap2-4.html

I think that RPM installers, etc. should solve the deployment specific 
directory creation etc, by first installing the filesystem, then creating 
working dirs for mapred, and THEN starting mapreduce.

Comment 4 Martin Bukatovic 2014-05-13 15:20:00 UTC
I agree with the Jay now that this is not a responsibility of the shim. When I
created the BZ I was working with quite old version of the shim which did this
and was not sure what a correct behavior is (I pointed out both HDP docs and
previous behavior).

So I would have closed this as a NOTABUG (considering we now have the installer
which handles this anyway), but the fix is already in. Do you think it make
sense to remove the fix now?

Comment 5 Martin Bukatovic 2014-05-13 16:32:13 UTC
BZ 1084239 (comment 2) seems to be related.

Comment 6 Bradley Childs 2014-06-04 18:38:21 UTC
The code is removed upstream here:  https://github.com/gluster/glusterfs-hadoop/pull/101

If its voted into trunk I'll resin a RHS build including it.

Comment 7 Bradley Childs 2014-08-28 15:42:15 UTC
this is removed from the SHIM.  moving to QE.

Comment 8 Daniel Horák 2014-08-29 07:41:37 UTC
Please fill the Fixed In Version.

Comment 9 Daniel Horák 2014-09-01 13:20:04 UTC
Tested through tcms case testing[1].

[1] https://tcms.engineering.redhat.com/case/336154/#case_run

>> VERIFIED


Note You need to log in before you can comment on or make changes to this bug.