Bug 786094

Summary: fuse client inaccessible with transport endpoint not connected error
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: unclassifiedAssignee: shishir gowda <sgowda>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, mzywusko, nsathyan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-27 05:37:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
fuse client log none

Description M S Vishwanath Bhat 2012-01-31 12:51:30 UTC
Created attachment 558618 [details]
fuse client log

Description of problem:
Was building glusterfs on the mountpoint and doing some profile/top operations on the server. It was stripe-replicate volume with stripe-block-size set to 64MB. After make exited successfully with the zero exit status took down one of the replicate pair down,then mountpoint became inaccessible.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa20

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a stripe replicate volume.
2. Set the stripe-block-size to 64MB and enable profiling.
3. untar both linux kernel source and glusterfs source and start building the glusterfs source.
4. meanwhile Keep running  some profile and top operations.
5. After 'make' took one of the glusterfsd down.

Actual results:
mountpoint became inaccessible. 

[root@RHEL6 hosa_dir]# ls
ls: reading directory .: Transport endpoint is not connected
[root@RHEL6 hosa_dir]# 



Expected results:
Mountpoint should be accessible.

Additional info:

Following options were set on volume.

Volume Name: hosdu
Type: Striped-Replicate
Volume ID: 56528124-1918-4923-a1cd-c02ddf22e671
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.11.113:/data/brick/hosdu_brick1
Brick2: 10.1.11.114:/data/brick/hosdu_brick2
Brick3: 10.1.11.136:/data/brick/hosdu_brick3
Brick4: 10.1.11.137:/data/brick/hosdu_brick4
Options Reconfigured:
cluster.stripe-block-size: 64MB
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on


Entries from the client log.


2012-01-31 07:21:56.632023] W [client3_1-fops.c:1273:client3_1_finodelk_cbk] 0-hosdu-client-1: remote operation failed: Invalid argument
[2012-01-31 07:21:56.632077] E [afr-lk-common.c:567:afr_unlock_inodelk_cbk] 0-hosdu-replicate-0: /hosa_dir/glusterfs-3.3.0qa20/rpc/rpc-lib/src/rpcsvc.loT: unlock failed on 1, reason: Invalid argument
[2012-01-31 07:28:07.133314] W [socket.c:1510:__socket_proto_state_machine] 0-hosdu-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.11.113:24009)
[2012-01-31 07:28:07.133443] I [client.c:1885:client_rpc_notify] 0-hosdu-client-0: disconnected
[2012-01-31 07:28:17.351524] E [socket.c:1713:socket_connect_finish] 0-hosdu-client-0: connection to 10.1.11.113:24009 failed (Connection refused)
[2012-01-31 07:28:19.221959] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100583: READDIR => -1 (Transport endpoint is not connected)
[2012-01-31 07:28:28.205013] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100597: READDIR => -1 (Transport endpoint is not connected)
[2012-01-31 07:28:38.486903] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100615: READDIR => -1 (Transport endpoint is not connected)
[2012-01-31 07:28:39.418324] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100619: READDIR => -1 (Transport endpoint is not connected)
[2012-01-31 07:28:42.739546] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100623: READDIR => -1 (Transport endpoint is not connected)
[2012-01-31 07:29:12.037329] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1100627: READDIR => -1 (Transport endpoint is not connected)
[2012-01-31 07:34:52.870849] W [fuse-bridge.c:2352:fuse_readdir_cbk] 0-glusterfs-fuse: 1101063: READDIR => -1 (Transport endpoint is not connected)


I have attached the client log.

Comment 1 M S Vishwanath Bhat 2012-04-12 11:15:58 UTC
I got a core this time around with the glusterfs-3.3.30qa34. 


(gdb) bt
#0  0x00007ffda03c4da1 in stripe_readv_cbk (frame=0x7ffda3c190f4, cookie=<value optimized out>, this=<value optimized out>, op_ret=8070, op_errno=<value optimized out>, vector=<value optimized out>, count=1, stbuf=0x7fff71f33e10, 
    iobref=0x5544190, xdata=0x0) at stripe.c:3271
#1  0x00007ffda05e64f1 in afr_readv_cbk (frame=0x7ffda3db7368, cookie=<value optimized out>, this=<value optimized out>, op_ret=8070, op_errno=2, vector=0x7fff71f33c80, count=1, buf=0x7fff71f33e10, iobref=0x5544190, xdata=0x0)
    at afr-inode-read.c:1298
#2  0x00007ffda085e3fb in client3_1_readv_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7ffda3da3e58) at client3_1-fops.c:2679
#3  0x00007ffda4d2e515 in rpc_clnt_handle_reply (clnt=0x25fda80, pollin=0x590e6b0) at rpc-clnt.c:797
#4  0x00007ffda4d2ed10 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x25fdab0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:916
#5  0x00007ffda4d29e48 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:498
#6  0x00007ffda1693704 in socket_event_poll_in (this=0x260d4e0) at socket.c:1686
#7  0x00007ffda16937e7 in socket_event_handler (fd=<value optimized out>, idx=1, data=0x260d4e0, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1801
#8  0x00007ffda4f75884 in event_dispatch_epoll_handler (event_pool=0x2538db0) at event.c:794
#9  event_dispatch_epoll (event_pool=0x2538db0) at event.c:856
#10 0x0000000000406eda in main (argc=<value optimized out>, argv=0x7fff71f34188) at glusterfsd.c:1650


(gdb) p ((stripe_local_t *)(((stripe_local_t *)(frame->local))->orig_frame->local))->fctx->xl_array[0]
$9 = (xlator_t *) 0x25638e0
(gdb) p ((stripe_local_t *)(((stripe_local_t *)(frame->local))->orig_frame->local))->fctx->xl_array[1]
$10 = (xlator_t *) 0x0

Comment 2 Vijay Bellur 2012-04-18 07:14:37 UTC
Shishir,

Can you please take a look in?

Comment 3 shishir gowda 2012-04-27 05:37:24 UTC

*** This bug has been marked as a duplicate of bug 810450 ***