Bug 1102467 - listStatus corrupts : file paths with % and other URI Escape encodings are corrupted
Summary: listStatus corrupts : file paths with % and other URI Escape encodings are co...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhs-hadoop
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: Release Candidate
: ---
Assignee: Bradley Childs
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks: 1159155
TreeView+ depends on / blocked
 
Reported: 2014-05-29 03:48 UTC by Jay Vyas
Modified: 2014-11-24 11:54 UTC (History)
8 users (show)

Fixed In Version: glusterfs-hadoop-2.3.2-2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-24 11:54:36 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1275 0 normal SHIPPED_LIVE Red Hat Storage Server 3 Hadoop plug-in enhancement update 2014-11-24 16:53:36 UTC

Description Jay Vyas 2014-05-29 03:48:23 UTC
Description of problem:

TL;DR 

See MAPREDUCE-5902 for the manifestation of this bug in a hadoop cluster, and 
https://github.com/gluster/glusterfs-hadoop/pull/99/ for the fix.


Description:
 
Our current listStatus implementation converts URIs and corrupts escape characters , like % signs.  For example, the file path a%2b will not be returned via listStatus.  This causes jobs with %'s in their name to be ignored by the JobHistoryServer, and that causes operations like getting hadoop counters  to fail. 


How reproducible:

100% 

Steps to Reproduce:

There are 3 ways to reproduce this bug, i list them here to demonstrate how it manifests in the real world as well as how to develop/test a fix.

METHOD 1: Just run the unit test in https://github.com/gluster/glusterfs-hadoop/pull/99/files, as a unit test  you can reproduce this bug easily.   

METHOD 2: In a simple cluster, you can reproduce it by running "hadoop fs -ls" against a directory which has files such as "a%2a" in it.  You will see these files arent picked up.

METHOD 3: In a a cluster with mahout installed , you can reproduce this bug by running the 
mahout parallelALS job.

Comment 2 Bradley Childs 2014-06-03 18:35:20 UTC
Fixed here: https://brewweb.devel.redhat.com/buildinfo?buildID=357130

Comment 3 Martin Bukatovic 2014-07-11 09:53:20 UTC
Using plugin rhs-hadoop-2.3.2-2.noarch

Listing directory which contains files with '%' in filename:

~~~
$ hadoop fs -ls /tmp/bz-1102467
14/07/11 11:47:35 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS,  CRC disabled.
14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=ab8c3c1, git.commit.user.email=bchilds.rdu2.redhat.com, git.commit.message.full=[update RPM spec file/changelog] - 2.3.2
, git.commit.id=ab8c3c13e09884dac077c7576f7bc80ac519a9b4, git.commit.message.short=[update RPM spec file/changelog] - 2.3.2, git.commit.user.name=Brad Childs, git.build.user.name=Unknown, git.commit.id.describe=2.3.4-2-gab8c3c1, git.build.user.email=Unknown, git.branch=master, git.commit.time=03.06.2014 @ 14:15:03 EDT, git.build.time=03.06.2014 @ 14:24:36 EDT}
14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.3.4
14/07/11 11:47:35 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Gluster volume: HadoopVol1 at : /mnt/glusterfs/HadoopVol1
14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/tom
14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/07/11 11:47:36 INFO glusterfs.GlusterVolume: Default block size : 67108864
Found 6 items
-rw-r--r--   1 tom hadoop          0 2014-07-11 11:46 /tmp/bz-1102467/a%2a
-rw-r--r--   1 tom hadoop          0 2014-07-11 11:46 /tmp/bz-1102467/bar
-rw-r--r--   1 tom hadoop          0 2014-07-11 11:46 /tmp/bz-1102467/foo
-rw-r--r--   1 tom hadoop          0 2014-07-11 11:47 /tmp/bz-1102467/foo%2a
-rw-r--r--   1 tom hadoop          0 2014-07-11 11:47 /tmp/bz-1102467/foo%2b
-rw-r--r--   1 tom hadoop          0 2014-07-11 11:47 /tmp/bz-1102467/foo%3c
$
~~~

All files are listed, so it works.

Comment 5 errata-xmlrpc 2014-11-24 11:54:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2014-1275.html


Note You need to log in before you can comment on or make changes to this bug.