Problem: Our listStatus implementation doesn't return files in sorted order. This breaks the build by breaking the "testListStatus" unit test, which asserts that listStatus returns files in sorted order. Suggested Fix: Sort files after calling list status using Collections.sort(File.listFiles), new Comparator(){...} Root cause: However, our underlying listStatus method relies on File.listFiles(), which DOES NOT gaurantee returning files in sorted order. (((( from JDK docs , 1.6 : public File[] listFiles() Returns an array of abstract pathnames denoting the files in the directory denoted by this abstract pathname. ...There is no guarantee that the name strings in the resulting array will appear in any specific order; they are not, in particular, guaranteed to appear in alphabetical order... )))
This bug has now been broadened: The semantics of listFiles has changed in newer hadoop versions. We thus need to do one of the following: 1) create 2 test modules in the pom: One using old "hadoop-test" (1x) and another using "hadoop-common" (2x). Pros: two code paths means more flexibility. Cons: Its also more code and build to maintain. It also might mean we need to produce two jar artifacts :(. 2) Embrace 2x tests, and update our tests to pass all 2x tests along with our pom file to pull in newer 2.x tests, and not use 1x tests. Pros: Simpler to maintain and most likely to be "good enough" for any real world 1.x functionality Cons: Not quite as fine grained test coverage for 1.x semantics. My vote is to go with approach 2 : Its less code, easier to maintain, and gaurantees that the plugin test coverage and artifact is tested against the latest expected community standards for the FileSystem interface.
I don't believe approach #1 is viable. You'd really need a separate branch for 1.x and 2.x. My vote is approach #2 modified- Iff we find 1.x semantics that pose real world problems vs hypotheticals, then branch into a 1.x branch, and revert the semantics. Then continue to maintain the 1.x and 2.x branch independently. The semantic differences in 1.x and 2.x may be enough to fail unit tests, but realistically insignificant to an end developer.
I agree with brad that we should go with approach #2... but we certainly won't need to fork though. ** why? because even in the "worst case" scenario outlined above where we need to support real world MR1 semantic differences, we still won't need to fork: The GlusterFileSystem is implemented in a different class than the GlusterFs. MR1: https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java MR2: https://github.com/gluster/glusterfs-hadoop/blob/master/src/main/java/org/apache/hadoop/fs/local/GlusterFs.java Both depend on FilterFileSystem, which is set to "wrap" GlusterVolume in its implementation, but nevertheless the FileSystem implementations can override functionality and diverge as required. Onward and upwards with MR2 !.
Per 2013-12-04 bug triage meeting, re-assigning to jvyas.
This is fixed in the upstream. Brew release still pending. The fix was to run tests against 2x semantics, and over time maybe we can add in 1x file tests where they dont conflict with behaviour of 2x
Do i understand it correctly that list of files should be in alphabetical order? Is this example of testcase? "hadoop fs -ls"
This is bug just for 1x. -->close