Bug 808009 - Index Out of Bounds exception while running a job when one of the replicate pair is down
Summary: Index Out of Bounds exception while running a job when one of the replicate p...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: HDFS
Version: pre-release
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Venky Shankar
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
: 804053 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-29 10:41 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:55 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:15:14 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description M S Vishwanath Bhat 2012-03-29 10:41:55 UTC
Description of problem:
Was running a map-rep job which is read intensive ( "wordcount" and "grep" ). 
Now when one of the replicate pair is down, I get Index Out of Bounds exceptions.

Version-Release number of selected component (if applicable):
git master

How reproducible:
Inconsistent

Steps to Reproduce:
1. Create and start a 2*2*2 distributed-striped-replicated volume.
2. Bring first node down.
3. Now try running a "grep" or "wordcount" job.
  
Actual results:
[root@gqac013 hadoop-0.20.2]# ./bin/hadoop jar hadoop-0.20.2-examples.jar grep input-text grep-output 'msvbhat'
Initializing GlusterFS
12/03/29 06:26:05 INFO mapred.FileInputFormat: Total input paths to process : 3
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:571)
        at java.util.ArrayList.get(ArrayList.java:349)
        at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.getHints(GlusterFSXattr.java:357)
        at org.apache.hadoop.fs.glusterfs.GlusterFSXattr.getPathInfo(GlusterFSXattr.java:75)
        at org.apache.hadoop.fs.glusterfs.GlusterFileSystem.getFileBlockLocations(GlusterFileSystem.java:457)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:222)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.hadoop.examples.Grep.run(Grep.java:69)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.examples.Grep.main(Grep.java:93)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)




Expected results:
Jobs should pass even when one of the node is down.

Comment 1 Venky Shankar 2012-04-04 10:26:33 UTC
patch sent to MS (offline) for testing and feedback.

Comment 2 Venky Shankar 2012-04-04 10:26:54 UTC
*** Bug 804053 has been marked as a duplicate of this bug. ***

Comment 3 M S Vishwanath Bhat 2012-04-05 09:45:29 UTC
(In reply to comment #1)
> patch sent to MS (offline) for testing and feedback.

The patch sent by Venky fixes the issue... Didn't see the same exception with that patch.

Comment 4 Anand Avati 2012-04-06 09:35:13 UTC
CHANGE: http://review.gluster.com/3087 (hadoop-glusterfs: Fix IndexOutOfBounds Exception) merged in master by Vijay Bellur (vijay)

Comment 5 M S Vishwanath Bhat 2012-06-08 09:36:43 UTC
Took down the first node and then trued running the grep and wordcount. Both went on to completion. 


Initializing GlusterFS
12/06/08 05:46:32 INFO mapred.FileInputFormat: Total input paths to process : 1
12/06/08 05:46:32 INFO mapred.JobClient: Running job: job_201206080542_0003
12/06/08 05:46:33 INFO mapred.JobClient:  map 0% reduce 0%
12/06/08 05:46:43 INFO mapred.JobClient:  map 42% reduce 0%
12/06/08 05:46:45 INFO mapred.JobClient:  map 100% reduce 0%
12/06/08 05:46:55 INFO mapred.JobClient:  map 100% reduce 100%
12/06/08 05:46:57 INFO mapred.JobClient: Job complete: job_201206080542_0003
12/06/08 05:46:57 INFO mapred.JobClient: Counters: 17
12/06/08 05:46:57 INFO mapred.JobClient:   Job Counters 
12/06/08 05:46:57 INFO mapred.JobClient:     Launched reduce tasks=1
12/06/08 05:46:57 INFO mapred.JobClient:     Rack-local map tasks=4
12/06/08 05:46:57 INFO mapred.JobClient:     Launched map tasks=7
12/06/08 05:46:57 INFO mapred.JobClient:     Data-local map tasks=3
12/06/08 05:46:57 INFO mapred.JobClient:   FileSystemCounters
12/06/08 05:46:57 INFO mapred.JobClient:     FILE_BYTES_READ=66
12/06/08 05:46:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=392
12/06/08 05:46:57 INFO mapred.JobClient:   Map-Reduce Framework
12/06/08 05:46:57 INFO mapred.JobClient:     Reduce input groups=1
12/06/08 05:46:57 INFO mapred.JobClient:     Combine output records=3
12/06/08 05:46:57 INFO mapred.JobClient:     Map input records=380
12/06/08 05:46:57 INFO mapred.JobClient:     Reduce shuffle bytes=102
12/06/08 05:46:57 INFO mapred.JobClient:     Reduce output records=1
12/06/08 05:46:57 INFO mapred.JobClient:     Spilled Records=6
12/06/08 05:46:57 INFO mapred.JobClient:     Map output bytes=270
12/06/08 05:46:57 INFO mapred.JobClient:     Map input bytes=25791
12/06/08 05:46:57 INFO mapred.JobClient:     Combine input records=15
12/06/08 05:46:57 INFO mapred.JobClient:     Map output records=15
12/06/08 05:46:57 INFO mapred.JobClient:     Reduce input records=3
12/06/08 05:46:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/06/08 05:46:57 INFO mapred.FileInputFormat: Total input paths to process : 1
12/06/08 05:46:58 INFO mapred.JobClient: Running job: job_201206080542_0004
12/06/08 05:46:59 INFO mapred.JobClient:  map 0% reduce 0%
12/06/08 05:47:07 INFO mapred.JobClient:  map 100% reduce 0%
12/06/08 05:47:19 INFO mapred.JobClient:  map 100% reduce 100%
12/06/08 05:47:21 INFO mapred.JobClient: Job complete: job_201206080542_0004
12/06/08 05:47:21 INFO mapred.JobClient: Counters: 16
12/06/08 05:47:21 INFO mapred.JobClient:   Job Counters 
12/06/08 05:47:21 INFO mapred.JobClient:     Launched reduce tasks=1
12/06/08 05:47:21 INFO mapred.JobClient:     Rack-local map tasks=1
12/06/08 05:47:21 INFO mapred.JobClient:     Launched map tasks=1
12/06/08 05:47:21 INFO mapred.JobClient:   FileSystemCounters
12/06/08 05:47:21 INFO mapred.JobClient:     FILE_BYTES_READ=26
12/06/08 05:47:21 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=84
12/06/08 05:47:21 INFO mapred.JobClient:   Map-Reduce Framework
12/06/08 05:47:21 INFO mapred.JobClient:     Reduce input groups=1
12/06/08 05:47:21 INFO mapred.JobClient:     Combine output records=0
12/06/08 05:47:21 INFO mapred.JobClient:     Map input records=1
12/06/08 05:47:21 INFO mapred.JobClient:     Reduce shuffle bytes=26
12/06/08 05:47:21 INFO mapred.JobClient:     Reduce output records=1
12/06/08 05:47:21 INFO mapred.JobClient:     Spilled Records=2
12/06/08 05:47:21 INFO mapred.JobClient:     Map output bytes=18
12/06/08 05:47:21 INFO mapred.JobClient:     Map input bytes=26
12/06/08 05:47:21 INFO mapred.JobClient:     Combine input records=0
12/06/08 05:47:21 INFO mapred.JobClient:     Map output records=1
12/06/08 05:47:21 INFO mapred.JobClient:     Reduce input records=1


Note You need to log in before you can comment on or make changes to this bug.