Created attachment 675
Created attachment 676
In a 2*3 striped-replicated gluster volume with quick-slave-io ON randomwriter job failed with following backtrace. 11/09/26 18:39:18 INFO mapred.JobClient: map 88% reduce 0% 11/09/26 18:47:52 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000068_1, Status : FAILED java.io.IOException: Transport endpoint is not connected at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:297) at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67) at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41) at java.io.DataOutputStream.writeInt(DataOutputStream.java:199) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466) at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188) at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) attempt_201109242150_0008_m_000068_1: Initializing GlusterFS 11/09/26 18:51:45 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000076_0, Status : FAILED Task attempt_201109242150_0008_m_000076_0 failed to report status for 601 seconds. Killing! Task attempt_201109242150_0008_m_000076_0 failed to report status for 602 seconds. Killing! I jobtracker logs pointed out error in ubuntu4 machine. 2011-09-26 18:43:54,211 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201109242150_0008_m_000079_0' to tip task_201109242150_0008_m_000079, for tracker 'tracker_ubuntu4.gluster.com:localhost/127.0.0.1:38797' 2011-09-26 18:47:49,237 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201109242150_0008_m_000068_1: java.io.IOException: Transport endpoint is not connected at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:297) at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67) at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41) at java.io.DataOutputStream.writeInt(DataOutputStream.java:199) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466) at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188) at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-09-26 18:47:52,374 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a non-local task task_201109242150_0008_m_000076 for speculation In ubuntu4 machine I was following errors in client logs. [2011-09-26 18:33:44.404076] I [afr-self-heal-common.c:2012:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background data missing-entry gfid self-heal completed on /rdata/_temporary/_attempt_201109242150_0008_m_000004_0/part-00004 [2011-09-26 18:33:44.404280] W [rpc-clnt.c:1432:rpc_clnt_submit] 0-hosdu-client-1: failed to submit rpc-request (XID: 0x5021529x Program: GlusterFS 3.1, ProgVers: 310, Proc: 41) to rpc-transport (hosdu-client-1) [2011-09-26 18:33:44.404638] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.635563 [2011-09-26 18:33:44.404727] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.643358 [2011-09-26 18:33:44.404803] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644278 [2011-09-26 18:33:44.404866] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644580 [2011-09-26 18:33:44.404931] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644867 [2011-09-26 18:33:44.404986] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645171 [2011-09-26 18:33:44.405051] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645451 [2011-09-26 18:33:44.405078] I [client.c:1885:client_rpc_notify] 0-hosdu-client-1: disconnected [2011-09-26 18:33:44.405125] E [afr-common.c:3476:afr_notify] 0-hosdu-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2011-09-26 18:33:47.399088] I [client-handshake.c:1077:select_server_supported_programs] 0-hosdu-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-09-26 18:33:47.400153] I [client-handshake.c:917:client_setvolume_cbk] 0-hosdu-client-1: Connected to 10.1.11.30:24009, attached to remote volume '/data/brick'. I will attach the jobtracker log and client log from ubuntu 4 machine.
pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.