Description of problem: TL;DR See MAPREDUCE-5902 for the manifestation of this bug in a hadoop cluster, and https://github.com/gluster/glusterfs-hadoop/pull/99/ for the fix. Description: Our current listStatus implementation converts URIs and corrupts escape characters , like % signs. For example, the file path a%2b will not be returned via listStatus. This causes jobs with %'s in their name to be ignored by the JobHistoryServer, and that causes operations like getting hadoop counters to fail. How reproducible: 100% Steps to Reproduce: There are 3 ways to reproduce this bug, i list them here to demonstrate how it manifests in the real world as well as how to develop/test a fix. METHOD 1: Just run the unit test in https://github.com/gluster/glusterfs-hadoop/pull/99/files, as a unit test you can reproduce this bug easily. METHOD 2: In a simple cluster, you can reproduce it by running "hadoop fs -ls" against a directory which has files such as "a%2a" in it. You will see these files arent picked up. METHOD 3: In a a cluster with mahout installed , you can reproduce this bug by running the mahout parallelALS job.
Fixed here: https://brewweb.devel.redhat.com/buildinfo?buildID=357130
Using plugin rhs-hadoop-2.3.2-2.noarch Listing directory which contains files with '%' in filename: ~~~ $ hadoop fs -ls /tmp/bz-1102467 14/07/11 11:47:35 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled. 14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=ab8c3c1, git.commit.user.email=bchilds.rdu2.redhat.com, git.commit.message.full=[update RPM spec file/changelog] - 2.3.2 , git.commit.id=ab8c3c13e09884dac077c7576f7bc80ac519a9b4, git.commit.message.short=[update RPM spec file/changelog] - 2.3.2, git.commit.user.name=Brad Childs, git.build.user.name=Unknown, git.commit.id.describe=2.3.4-2-gab8c3c1, git.build.user.email=Unknown, git.branch=master, git.commit.time=03.06.2014 @ 14:15:03 EDT, git.build.time=03.06.2014 @ 14:24:36 EDT} 14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.3.4 14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS 14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Initializing gluster volume.. 14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol1 at : /mnt/glusterfs/HadoopVol1 14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/tom 14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Write buffer size : 131072 14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Default block size : 67108864 Found 6 items -rw-r--r-- 1 tom hadoop 0 2014-07-11 11:46 /tmp/bz-1102467/a%2a -rw-r--r-- 1 tom hadoop 0 2014-07-11 11:46 /tmp/bz-1102467/bar -rw-r--r-- 1 tom hadoop 0 2014-07-11 11:46 /tmp/bz-1102467/foo -rw-r--r-- 1 tom hadoop 0 2014-07-11 11:47 /tmp/bz-1102467/foo%2a -rw-r--r-- 1 tom hadoop 0 2014-07-11 11:47 /tmp/bz-1102467/foo%2b -rw-r--r-- 1 tom hadoop 0 2014-07-11 11:47 /tmp/bz-1102467/foo%3c $ ~~~ All files are listed, so it works.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2014-1275.html