Bug 765372 (GLUSTER-3640)
| Summary: | "randomwriter" job failed with 'transport endpoint not connected" error in quick-slave-io ON | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | M S Vishwanath Bhat <vbhat> | ||||||
| Component: | HDFS | Assignee: | Steve Watt <swatt> | ||||||
| Status: | CLOSED EOL | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | pre-release | CC: | bugs, gluster-bugs, mzywusko | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-10-22 15:40:20 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
M S Vishwanath Bhat
2011-09-27 03:32:35 UTC
Created attachment 676 In a 2*3 striped-replicated gluster volume with quick-slave-io ON randomwriter job failed with following backtrace.
11/09/26 18:39:18 INFO mapred.JobClient: map 88% reduce 0%
11/09/26 18:47:52 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000068_1, Status : FAILED
java.io.IOException: Transport endpoint is not connected
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:297)
at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67)
at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011)
at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188)
at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
attempt_201109242150_0008_m_000068_1: Initializing GlusterFS
11/09/26 18:51:45 INFO mapred.JobClient: Task Id : attempt_201109242150_0008_m_000076_0, Status : FAILED
Task attempt_201109242150_0008_m_000076_0 failed to report status for 601 seconds. Killing!
Task attempt_201109242150_0008_m_000076_0 failed to report status for 602 seconds. Killing!
I jobtracker logs pointed out error in ubuntu4 machine.
2011-09-26 18:43:54,211 INFO org.apache.hadoop.mapred.JobTracker: Adding task (cleanup)'attempt_201109242150_0008_m_000079_0' to tip task_201109242150_0008_m_000079, for tracker 'tracker_ubuntu4.gluster.com:localhost/127.0.0.1:38797'
2011-09-26 18:47:49,237 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201109242150_0008_m_000068_1: java.io.IOException: Transport endpoint is not connected
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:297)
at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:67)
at org.apache.hadoop.fs.glusterfs.GlusterFUSEOutputStream.write(GlusterFUSEOutputStream.java:52)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1011)
at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:75)
at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:680)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:188)
at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:152)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-09-26 18:47:52,374 INFO org.apache.hadoop.mapred.JobInProgress: Choosing a non-local task task_201109242150_0008_m_000076 for speculation
In ubuntu4 machine I was following errors in client logs.
[2011-09-26 18:33:44.404076] I [afr-self-heal-common.c:2012:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background data missing-entry gfid self-heal completed on /rdata/_temporary/_attempt_201109242150_0008_m_000004_0/part-00004
[2011-09-26 18:33:44.404280] W [rpc-clnt.c:1432:rpc_clnt_submit] 0-hosdu-client-1: failed to submit rpc-request (XID: 0x5021529x Program: GlusterFS 3.1, ProgVers: 310, Proc: 41) to rpc-transport (hosdu-client-1)
[2011-09-26 18:33:44.404638] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.635563
[2011-09-26 18:33:44.404727] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.643358
[2011-09-26 18:33:44.404803] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644278
[2011-09-26 18:33:44.404866] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644580
[2011-09-26 18:33:44.404931] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.644867
[2011-09-26 18:33:44.404986] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645171
[2011-09-26 18:33:44.405051] E [rpc-clnt.c:340:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f24e229b7a8] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7f24e229afad] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f24e229af0e]))) 0-hosdu-client-1: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2011-09-26 18:33:43.645451
[2011-09-26 18:33:44.405078] I [client.c:1885:client_rpc_notify] 0-hosdu-client-1: disconnected
[2011-09-26 18:33:44.405125] E [afr-common.c:3476:afr_notify] 0-hosdu-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-09-26 18:33:47.399088] I [client-handshake.c:1077:select_server_supported_programs] 0-hosdu-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-09-26 18:33:47.400153] I [client-handshake.c:917:client_setvolume_cbk] 0-hosdu-client-1: Connected to 10.1.11.30:24009, attached to remote volume '/data/brick'.
I will attach the jobtracker log and client log from ubuntu 4 machine.
pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it. |