This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 765372 - (GLUSTER-3640) "randomwriter" job failed with 'transport endpoint not connected" error in quick-slave-io ON
"randomwriter" job failed with 'transport endpoint not connected" error in qu...
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: HDFS (Show other bugs)
pre-release
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Steve Watt
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-27 02:25 EDT by M S Vishwanath Bhat
Modified: 2016-05-31 21:57 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-22 11:40:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Jobtracker logs (108.06 KB, application/octet-stream)
2011-09-26 23:32 EDT, M S Vishwanath Bhat
no flags Details
glusterfs client from ubuntu4 machine (75.35 KB, text/x-log)
2011-09-26 23:33 EDT, M S Vishwanath Bhat
no flags Details

  None (edit)
Description M S Vishwanath Bhat 2011-09-26 23:32:35 EDT
Created attachment 675
Comment 1 M S Vishwanath Bhat 2011-09-26 23:33:08 EDT
Created attachment 676
Comment 2 M S Vishwanath Bhat 2011-09-27 02:25:41 EDT
In a 2*3 striped-replicated gluster volume with quick-slave-io ON randomwriter job failed with following backtrace.

11/09/26 18:39:18 INFO mapred.JobClient:  map 88% reduce 0%
11/09/26 18:47:52 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000068_1, Status : FAILED
java.io.IOException: Transport endpoint is not connected
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:297)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011)
        at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75)
        at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

attempt_201109242150_0008_m_000068_1: Initializing GlusterFS
11/09/26 18:51:45 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000076_0, Status : FAILED
Task attempt_201109242150_0008_m_000076_0 failed to report status for 601 seconds. Killing!
Task attempt_201109242150_0008_m_000076_0 failed to report status for 602 seconds. Killing!


I jobtracker logs pointed out error in ubuntu4 machine.

2011-09-26 18:43:54,211 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201109242150_0008_m_000079_0' to tip task_201109242150_0008_m_000079, for tracker 'tracker_ubuntu4.gluster.com:localhost/127.0.0.1:38797'
2011-09-26 18:47:49,237 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201109242150_0008_m_000068_1: java.io.IOException: Transport endpoint is not connected
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:297)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67)
        at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
        at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011)
        at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75)
        at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680)
        at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

2011-09-26 18:47:52,374 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a non-local task task_201109242150_0008_m_000076 for speculation


In ubuntu4 machine I was following errors in client logs.

[2011-09-26 18:33:44.404076] I [afr-self-heal-common.c:2012:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  data missing-entry gfid self-heal completed on /rdata/_temporary/_attempt_201109242150_0008_m_000004_0/part-00004
[2011-09-26 18:33:44.404280] W [rpc-clnt.c:1432:rpc_clnt_submit] 0-hosdu-client-1: failed to submit rpc-request (XID: 0x5021529x Program: GlusterFS 3.1, ProgVers: 310, Proc: 41) to rpc-transport (hosdu-client-1)
[2011-09-26 18:33:44.404638] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.635563
[2011-09-26 18:33:44.404727] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.643358
[2011-09-26 18:33:44.404803] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644278
[2011-09-26 18:33:44.404866] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644580
[2011-09-26 18:33:44.404931] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644867
[2011-09-26 18:33:44.404986] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645171
[2011-09-26 18:33:44.405051] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645451
[2011-09-26 18:33:44.405078] I [client.c:1885:client_rpc_notify] 0-hosdu-client-1: disconnected
[2011-09-26 18:33:44.405125] E [afr-common.c:3476:afr_notify] 0-hosdu-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-09-26 18:33:47.399088] I [client-handshake.c:1077:select_server_supported_programs] 0-hosdu-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-09-26 18:33:47.400153] I [client-handshake.c:917:client_setvolume_cbk] 0-hosdu-client-1: Connected to 10.1.11.30:24009, attached to remote volume '/data/brick'.

I will attach the jobtracker log and client log from ubuntu 4 machine.
Comment 3 Kaleb KEITHLEY 2015-10-22 11:40:20 EDT
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.