Bug 1033093 - listStatus test failure reveals we need to upgrade tests and then : Fork code OR focus on MR2
Summary: listStatus test failure reveals we need to upgrade tests and then : Fork code...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-hadoop
Version: pre-release
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
Assignee: Jay Vyas
QA Contact: hcfs-gluster-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1057253
TreeView+ depends on / blocked
 
Reported: 2013-11-21 14:34 UTC by Jay Vyas
Modified: 2014-06-04 14:44 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-04 14:44:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jay Vyas 2013-11-21 14:34:29 UTC
Problem: 

Our listStatus implementation doesn't return files in sorted order.
This breaks the build by breaking the "testListStatus" unit test, which asserts 
that listStatus returns files in sorted order.

Suggested Fix: 

Sort files after calling list status using Collections.sort(File.listFiles), 
new Comparator(){...}

Root cause: 

However, our underlying listStatus method relies on File.listFiles(), which 
DOES NOT gaurantee returning files in sorted order. 

(((( from JDK docs , 1.6  : 

public File[] listFiles()
    Returns an array of abstract pathnames denoting the files in the directory 
denoted by this abstract pathname.
    ...There is no guarantee that the name strings in the resulting array will 
appear in any specific order; they are not, in particular, guaranteed to appear 
in alphabetical order... )))

Comment 1 Jay Vyas 2013-11-22 17:42:31 UTC
This bug has now been broadened:  The semantics of listFiles has changed in newer hadoop versions. 

We thus need to do one of the following:

1) create 2 test modules in the pom: One using old "hadoop-test" (1x) and another using "hadoop-common" (2x).  

  Pros: two code paths means more flexibility.  
  Cons: Its also more code and build to maintain.  It also might mean we need to produce two jar artifacts :(.

2) Embrace 2x tests, and update our tests to pass all 2x tests along with our pom file to pull in newer 2.x tests, and not use 1x tests. 

  Pros: Simpler to maintain and most likely to be "good enough" for any real world 1.x functionality
  Cons: Not quite as fine grained test coverage for 1.x semantics. 

My vote is to go with approach 2 : Its less code, easier to maintain, and gaurantees that the plugin test coverage and artifact is tested against the latest expected community standards for the FileSystem interface.

Comment 2 Bradley Childs 2013-11-22 18:04:02 UTC
I don't believe approach #1 is viable.  You'd really need a separate branch for 1.x and 2.x.

My vote is approach #2 modified-

Iff we find 1.x semantics that pose real world problems vs hypotheticals, then branch into a 1.x branch, and revert the semantics. Then continue to maintain the 1.x and 2.x branch independently. The semantic differences in 1.x and 2.x may be enough to fail unit tests, but realistically insignificant to an end developer.

Comment 3 Jay Vyas 2013-11-25 17:25:13 UTC
I agree with brad that we should go with approach #2... but we certainly won't need to fork though.

** why? because even in the "worst case" scenario outlined above where we need to support real world MR1 semantic differences, we still won't need to fork: The GlusterFileSystem is implemented in a different class than the GlusterFs.

MR1:

https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java

MR2: 

https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/local/GlusterFs.java

Both depend on FilterFileSystem, which is set to "wrap" GlusterVolume in its implementation, but nevertheless the FileSystem implementations can override functionality and diverge as required.

Onward and upwards with MR2 !.

Comment 4 Scott Haines 2013-12-04 16:41:11 UTC
Per 2013-12-04 bug triage meeting, re-assigning to jvyas.

Comment 5 Jay Vyas 2014-05-01 17:02:13 UTC
This is fixed in the upstream.  Brew release still pending.

The fix was to run tests against 2x semantics, and over time maybe we can add in 1x file tests where they dont conflict with behaviour of 2x

Comment 7 Martin Kudlej 2014-05-20 14:09:30 UTC
Do i understand it correctly that list of files should be in alphabetical order? Is this example of testcase?
"hadoop fs -ls"

Comment 8 Martin Kudlej 2014-06-04 14:44:24 UTC
This is bug just for 1x. -->close


Note You need to log in before you can comment on or make changes to this bug.