Description of problem: Was running a map-rep job which is read intensive ( "wordcount" and "grep" ). Now when one of the replicate pair is down, I get Index Out of Bounds exceptions. Version-Release number of selected component (if applicable): git master How reproducible: Inconsistent Steps to Reproduce: 1. Create and start a 2*2*2 distributed-striped-replicated volume. 2. Bring first node down. 3. Now try running a "grep" or "wordcount" job. Actual results: [root@gqac013 hadoop-0.20.2]# ./bin/hadoop jar hadoop-0.20.2-examples.jar grep input-text grep-output 'msvbhat' Initializing GlusterFS 12/03/29 06:26:05 INFO mapred.FileInputFormat: Total input paths to process : 3 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:571) at java.util.ArrayList.get(ArrayList.java:349) at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.getHints(GlusterFSXattr.java:357) at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.getPathInfo(GlusterFSXattr.java:75) at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileBlockLocations(GlusterFileSystem.java:457) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:222) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.hadoop.examples.Grep.run(Grep.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Expected results: Jobs should pass even when one of the node is down.
patch sent to MS (offline) for testing and feedback.
*** Bug 804053 has been marked as a duplicate of this bug. ***
(In reply to comment #1) > patch sent to MS (offline) for testing and feedback. The patch sent by Venky fixes the issue... Didn't see the same exception with that patch.
CHANGE: http://review.gluster.com/3087 (hadoop-glusterfs: Fix IndexOutOfBounds Exception) merged in master by Vijay Bellur (vijay)
Took down the first node and then trued running the grep and wordcount. Both went on to completion. Initializing GlusterFS 12/06/08 05:46:32 INFO mapred.FileInputFormat: Total input paths to process : 1 12/06/08 05:46:32 INFO mapred.JobClient: Running job: job_201206080542_0003 12/06/08 05:46:33 INFO mapred.JobClient: map 0% reduce 0% 12/06/08 05:46:43 INFO mapred.JobClient: map 42% reduce 0% 12/06/08 05:46:45 INFO mapred.JobClient: map 100% reduce 0% 12/06/08 05:46:55 INFO mapred.JobClient: map 100% reduce 100% 12/06/08 05:46:57 INFO mapred.JobClient: Job complete: job_201206080542_0003 12/06/08 05:46:57 INFO mapred.JobClient: Counters: 17 12/06/08 05:46:57 INFO mapred.JobClient: Job Counters 12/06/08 05:46:57 INFO mapred.JobClient: Launched reduce tasks=1 12/06/08 05:46:57 INFO mapred.JobClient: Rack-local map tasks=4 12/06/08 05:46:57 INFO mapred.JobClient: Launched map tasks=7 12/06/08 05:46:57 INFO mapred.JobClient: Data-local map tasks=3 12/06/08 05:46:57 INFO mapred.JobClient: FileSystemCounters 12/06/08 05:46:57 INFO mapred.JobClient: FILE_BYTES_READ=66 12/06/08 05:46:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=392 12/06/08 05:46:57 INFO mapred.JobClient: Map-Reduce Framework 12/06/08 05:46:57 INFO mapred.JobClient: Reduce input groups=1 12/06/08 05:46:57 INFO mapred.JobClient: Combine output records=3 12/06/08 05:46:57 INFO mapred.JobClient: Map input records=380 12/06/08 05:46:57 INFO mapred.JobClient: Reduce shuffle bytes=102 12/06/08 05:46:57 INFO mapred.JobClient: Reduce output records=1 12/06/08 05:46:57 INFO mapred.JobClient: Spilled Records=6 12/06/08 05:46:57 INFO mapred.JobClient: Map output bytes=270 12/06/08 05:46:57 INFO mapred.JobClient: Map input bytes=25791 12/06/08 05:46:57 INFO mapred.JobClient: Combine input records=15 12/06/08 05:46:57 INFO mapred.JobClient: Map output records=15 12/06/08 05:46:57 INFO mapred.JobClient: Reduce input records=3 12/06/08 05:46:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/08 05:46:57 INFO mapred.FileInputFormat: Total input paths to process : 1 12/06/08 05:46:58 INFO mapred.JobClient: Running job: job_201206080542_0004 12/06/08 05:46:59 INFO mapred.JobClient: map 0% reduce 0% 12/06/08 05:47:07 INFO mapred.JobClient: map 100% reduce 0% 12/06/08 05:47:19 INFO mapred.JobClient: map 100% reduce 100% 12/06/08 05:47:21 INFO mapred.JobClient: Job complete: job_201206080542_0004 12/06/08 05:47:21 INFO mapred.JobClient: Counters: 16 12/06/08 05:47:21 INFO mapred.JobClient: Job Counters 12/06/08 05:47:21 INFO mapred.JobClient: Launched reduce tasks=1 12/06/08 05:47:21 INFO mapred.JobClient: Rack-local map tasks=1 12/06/08 05:47:21 INFO mapred.JobClient: Launched map tasks=1 12/06/08 05:47:21 INFO mapred.JobClient: FileSystemCounters 12/06/08 05:47:21 INFO mapred.JobClient: FILE_BYTES_READ=26 12/06/08 05:47:21 INFO mapred.JobClient: FILE_BYTES_WRITTEN=84 12/06/08 05:47:21 INFO mapred.JobClient: Map-Reduce Framework 12/06/08 05:47:21 INFO mapred.JobClient: Reduce input groups=1 12/06/08 05:47:21 INFO mapred.JobClient: Combine output records=0 12/06/08 05:47:21 INFO mapred.JobClient: Map input records=1 12/06/08 05:47:21 INFO mapred.JobClient: Reduce shuffle bytes=26 12/06/08 05:47:21 INFO mapred.JobClient: Reduce output records=1 12/06/08 05:47:21 INFO mapred.JobClient: Spilled Records=2 12/06/08 05:47:21 INFO mapred.JobClient: Map output bytes=18 12/06/08 05:47:21 INFO mapred.JobClient: Map input bytes=26 12/06/08 05:47:21 INFO mapred.JobClient: Combine input records=0 12/06/08 05:47:21 INFO mapred.JobClient: Map output records=1 12/06/08 05:47:21 INFO mapred.JobClient: Reduce input records=1