Bug 765372 (GLUSTER-3640) - "randomwriter" job failed with 'transport endpoint not connected" error in quick-slave-io ON
Summary: "randomwriter" job failed with 'transport endpoint not connected" error in qu...
Keywords:
Status: CLOSED EOL
Alias: GLUSTER-3640
Product: GlusterFS
Classification: Community
Component: HDFS
Version: pre-release
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Steve Watt
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-27 06:25 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:57 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-10-22 15:40:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Jobtracker logs (108.06 KB, application/octet-stream)
2011-09-27 03:32 UTC, M S Vishwanath Bhat
no flags Details
glusterfs client from ubuntu4 machine (75.35 KB, text/x-log)
2011-09-27 03:33 UTC, M S Vishwanath Bhat
no flags Details

Description M S Vishwanath Bhat 2011-09-27 03:32:35 UTC
Created attachment 675

Comment 1 M S Vishwanath Bhat 2011-09-27 03:33:08 UTC
Created attachment 676

Comment 2 M S Vishwanath Bhat 2011-09-27 06:25:41 UTC
In a 2*3 striped-replicated gluster volume with quick-slave-io ON randomwriter job failed with following backtrace.

11/09/26 18:39:18 INFO mapred.JobClient:  map 88% reduce 0%
11/09/26 18:47:52 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000068_1, Status : FAILED
java.io.IOException: Transport endpoint is not connected
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:297)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011)
        at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75)
        at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

attempt_201109242150_0008_m_000068_1: Initializing GlusterFS
11/09/26 18:51:45 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000076_0, Status : FAILED
Task attempt_201109242150_0008_m_000076_0 failed to report status for 601 seconds. Killing!
Task attempt_201109242150_0008_m_000076_0 failed to report status for 602 seconds. Killing!


I jobtracker logs pointed out error in ubuntu4 machine.

2011-09-26 18:43:54,211 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201109242150_0008_m_000079_0' to tip task_201109242150_0008_m_000079, for tracker 'tracker_ubuntu4.gluster.com:localhost/127.0.0.1:38797'
2011-09-26 18:47:49,237 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201109242150_0008_m_000068_1: java.io.IOException: Transport endpoint is not connected
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:297)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011)
        at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75)
        at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

2011-09-26 18:47:52,374 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a non-local task task_201109242150_0008_m_000076 for speculation


In ubuntu4 machine I was following errors in client logs.

[2011-09-26 18:33:44.404076] I [afr-self-heal-common.c:2012:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  data missing-entry gfid self-heal completed on /rdata/_temporary/_attempt_201109242150_0008_m_000004_0/part-00004
[2011-09-26 18:33:44.404280] W [rpc-clnt.c:1432:rpc_clnt_submit] 0-hosdu-client-1: failed to submit rpc-request (XID: 0x5021529x Program: GlusterFS 3.1, ProgVers: 310, Proc: 41) to rpc-transport (hosdu-client-1)
[2011-09-26 18:33:44.404638] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.635563
[2011-09-26 18:33:44.404727] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.643358
[2011-09-26 18:33:44.404803] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644278
[2011-09-26 18:33:44.404866] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644580
[2011-09-26 18:33:44.404931] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644867
[2011-09-26 18:33:44.404986] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645171
[2011-09-26 18:33:44.405051] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645451
[2011-09-26 18:33:44.405078] I [client.c:1885:client_rpc_notify] 0-hosdu-client-1: disconnected
[2011-09-26 18:33:44.405125] E [afr-common.c:3476:afr_notify] 0-hosdu-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-09-26 18:33:47.399088] I [client-handshake.c:1077:select_server_supported_programs] 0-hosdu-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-09-26 18:33:47.400153] I [client-handshake.c:917:client_setvolume_cbk] 0-hosdu-client-1: Connected to 10.1.11.30:24009, attached to remote volume '/data/brick'.

I will attach the jobtracker log and client log from ubuntu 4 machine.

Comment 3 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.


Note You need to log in before you can comment on or make changes to this bug.