Bug 800396

Summary: [228d01916c57d5a5716e1097e39e7aa06f31f3e4]: nfs client reports IO error with transport endpoint not connected
Product: [Community] GlusterFS Reporter: Rahul C S <rahulcs>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: pre-releaseCC: bugs, gluster-bugs, rwheeler, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: ---
Regression: --- Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterfs logs
none
directory hierarchy structure creation script none

Description Rahul C S 2012-03-06 12:50:12 UTC
Created attachment 567940 [details]
glusterfs logs

Description of problem:
1.Created a replicate volume with replica 3. Set stat-prefetch/md-cache translator off.
2. On a Fuse client, ran posix compliance & then started running fileop with force factor of 30.
3. Set geo-replication.indexing on
4. Mounted a nfs client & started a directory hierarchy structure creating script which is attached.
5. Brought down 2 bricks, & then issued heal command. 
6. Brought the bricks back up & then issued heal full command.
7. Next i brought down the 3rd brick, the nfs server started issuing IO errors with transport endpoint not connected. [No crashes found]. 

I have attached the whole log directory as well as the script i ran.

Server Nfs log:
[2012-03-06 17:45:25.366349] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-vol-replicate-0: path /simple/49/85 on subvo
lume vol-client-0 => -1 (No such file or directory)
[2012-03-06 17:45:25.823846] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 0-vol-replicate-0: background  meta-data entry self
-heal completed on /simple/49
[2012-03-06 17:45:50.881161] W [socket.c:204:__socket_rwv] 0-vol-client-2: readv failed (Connection reset by peer)
[2012-03-06 17:45:50.881192] W [socket.c:1521:__socket_proto_state_machine] 0-vol-client-2: reading from socket failed. Error (Connection res
et by peer), peer (127.0.1.1:24011)
[2012-03-06 17:45:51.025457] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f5fbdb2426b] 
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f5fbdb237e0] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x
1f) [0x7f5fbdb2326e]))) 0-vol-client-2: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-06 17:45:50.852267 (xid=0
x97569x)
[2012-03-06 17:45:51.025565] W [client3_1-fops.c:2180:client3_1_lookup_cbk] 0-vol-client-2: remote operation failed: Transport endpoint is no
t connected. Path: /simple/51/05
[2012-03-06 17:45:51.026190] I [socket.c:2314:socket_submit_request] 0-vol-client-2: not connected (priv->connected = 0)
[2012-03-06 17:45:51.026210] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-vol-client-2: failed to submit rpc-request (XID: 0x97570x Program: Gluster
FS 3.1, ProgVers: 330, Proc: 31) to rpc-transport (vol-client-2)
[2012-03-06 17:45:51.026227] W [client3_1-fops.c:1305:client3_1_entrylk_cbk] 0-vol-client-2: remote operation failed: Transport endpoint is n
ot connected
[2012-03-06 17:45:51.026272] W [client.c:2011:client_rpc_notify] 0-vol-client-2: Registering a grace timer
[2012-03-06 17:45:51.026287] I [client.c:2024:client_rpc_notify] 0-vol-client-2: disconnected
[2012-03-06 17:45:51.026424] E [socket.c:1724:socket_connect_finish] 0-vol-client-2: connection to 127.0.1.1:24011 failed (Connection refused
)
[2012-03-06 17:45:51.027251] W [client3_1-fops.c:4789:client3_1_entrylk] 0-vol-client-2: failed to send the fop: Transport endpoint is not co
nnected
[2012-03-06 17:45:52.070098] W [nfs3.c:1479:nfs3svc_access_cbk] 0-nfs: 7bb538f9: / => -1 (Transport endpoint is not connected)
[2012-03-06 17:45:52.106642] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 7bb538f9, ACCESS: NFS: 5(I/O error), POSIX: 107(Tr
ansport endpoint is not connected)
[2012-03-06 17:45:52.111343] W [nfs3.c:1479:nfs3svc_access_cbk] 0-nfs: 80b538f9: / => -1 (Transport endpoint is not connected)
[2012-03-06 17:45:52.111383] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: 80b538f9, ACCESS: NFS: 5(I/O error), POSIX: 107(Tr
ansport endpoint is not connected)

Comment 1 Rahul C S 2012-03-06 12:51:54 UTC
Created attachment 567941 [details]
directory hierarchy structure creation script

Comment 2 Rajesh 2012-03-15 06:20:42 UTC
this looks like an afr issue, reassigning to pranith

Comment 4 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.