Description of problem ====================== When using gluster hadoop shim with hadoop, the mapred system directory is not created automatically during job tracker startup. This failure pollutes job tracker logs and later leads to errors during hadoop job processing. Version-Release number of selected component (if applicable) ============================================================ gluster hadoop shim: 2.1 (checked also 2.0 and 1.0 which are also affected) note: we don't have proper rpm yet hortonworks hadoop distribution (HDP 1.2.2) via tarball: hadoop-1.1.2.21 How reproducible ================ always Steps to Reproduce ================== 1. configure hadoop cluster to use gluster volume via gluster hadoop plugin 2. make sure that gluster volume is empty 3. start job tracker 4. check mountpoint of gluster volume (eg. /mnt/hadoop-volume in my case) Actual results ============== Glusterfs volume is empty. There is no mapred system directory created (see hadoop property mapred.system.dir). Also note that log files of hadoop job tracker contains following error: ~~~ 2013-08-12 18:36:10,119 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: glusterfs:/sysdirmapred java.io.FileNotFoundException: File glusterfs:/sysdirmapred does not exist at org.apache.hadoop.fs.glusterfs.GlusterVolume.listStatus(GlusterVolume.java:103) at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:159) at org.apache.hadoop.mapred.JobTracker.initialize(JobTracker.java:1973) at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2341) at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4816) ~~~ This error is endlessly repeated until some job is started, then the directories are created. Nevertheless a mapreduce job then hangs and fails to finish. Expected results ================ Mapred system directory created inside /mnt/hadoop-volume: ~~~ root@mb1_gluster:/mnt/hadoop-volume# tree . └── tmp └── hadoop-root └── mapred └── system └── jobtracker.info 4 directories, 1 file ~~~ There are no java.io.FileNotFoundException entries in task tracker log and mapreduce jobs runs normally. Additional info =============== older versions -------------- Version shipped as tech preview with RHS (glusterfs-0.20.2-0.2.jar) doesn't have this problem. quick fix --------- When creating mapred system directory by hand, job tracker works normally (no errors in the logs or failures of mapreduce jobs). another hadoop distro: HDP 1.3.1 from rpm ----------------------------------------- HDP 1.3.1 from rpm with hadoop-1.2.0.1.3.0.0-107.el6.x86_64 is also affected. The difference is, that one have to set proper access rights as well: ~~~ root@mb1_gluster:/mnt/hadoop-volume# mkdir -p mapred/system root@mb1_gluster:/mnt/hadoop-volume# chown mapred:hadoop -R mapred/ ~~~ Another interesting fact is that hortonworks hint the users to create this directory manually: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.2/bk_installing_manually_book/content/rpm-chap4-3.html The reason here is probably custom setup of ownership and access rights (in HDP distribution, each component has it's dedicated unix user). But this doesn't mean that this issue should not deserve fixing. others ------ The mapred.system.dir should be shared across all machines so it's placed in DFS by default [1] and a path value of this property is relative to default filesystem - glusterfs in our case. Btw the default value seems to be "${hadoop.tmp.dir}/mapred/system" [2] (and I also checked this in xml job file in hadoop log directory) which is the reason for the confusion since hadoop.tmp.dir property is used to configure local tmp dir (with default /tmp/hadoop-${user.name}). [1] https://hadoop.apache.org/docs/stable/cluster_setup.html [2] http://stackoverflow.com/questions/2354525/what-should-be-hadoop-tmp-dir
Fixed here: https://github.com/gluster/glusterfs-hadoop/commit/1487714f5f2f8b31affe6281fd128a54422caa31
Not sure this is our responsibility: See, for example ~ docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_installing_manually_book/content/rpm-chap2-4.html I think that RPM installers, etc. should solve the deployment specific directory creation etc, by first installing the filesystem, then creating working dirs for mapred, and THEN starting mapreduce.
I agree with the Jay now that this is not a responsibility of the shim. When I created the BZ I was working with quite old version of the shim which did this and was not sure what a correct behavior is (I pointed out both HDP docs and previous behavior). So I would have closed this as a NOTABUG (considering we now have the installer which handles this anyway), but the fix is already in. Do you think it make sense to remove the fix now?
BZ 1084239 (comment 2) seems to be related.
The code is removed upstream here: https://github.com/gluster/glusterfs-hadoop/pull/101 If its voted into trunk I'll resin a RHS build including it.
this is removed from the SHIM. moving to QE.
Please fill the Fixed In Version.
Tested through tcms case testing[1]. [1] https://tcms.engineering.redhat.com/case/336154/#case_run >> VERIFIED