Bug 809409

Summary: [0e4c74861f762d4af7b7d8ffce5384920a6aa335] I/O exits with ENOTCONN when replace-brick is started
Product: [Community] GlusterFS Reporter: Anush Shetty <ashetty>
Component: transportAssignee: Raghavendra G <rgowdapp>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-04 16:23:45 UTC Type: Bug
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anush Shetty 2012-04-03 09:53:21 UTC
Description of problem: When replace-brick is started with ongoing I/O on the fuse mount point, the I/O exited with an ENOTCONN


Version-Release number of selected component (if applicable): Upstream


How reproducible: Consistently


Steps to Reproduce:
1. while true; do dbench -s 10 -t 10 -D /mnt/gluster/; done
2. gluster volume replace-brick test2 shortwing:/falcon/d1 shortwing:/falcon/d2 start
3.
  
Actual results:
Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
1 of 10 processes prepared for launch   0 sec
[3] open ./clients/client1 failed for handle 16385 (Transport endpoint is not connected)
10 of 10 processes prepared for launch   0 sec
releasing clients
[3] open ./clients/client4 failed for handle 16385 (Transport endpoint is not connected)
(4) ERROR: handle 16385 was not found
Child failed with status 1
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004



Expected results:

I/O should continue without exiting.

Additional info:

Client log:
[2012-04-03 15:10:10.165967] D [socket.c:193:__socket_rwv] 0-test2-client-0: EOF from peer 127.0.1.1:24009
[2012-04-03 15:10:10.166048] W [socket.c:1521:__socket_proto_state_machine] 0-test2-client-0: reading from socket failed. Error (Transport en
dpoint is not connected), peer (127.0.1.1:24009)
[2012-04-03 15:10:10.166077] D [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-04-03 15:10:10.166424] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7fd6b866949f] 
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7fd6b8668a14] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x
1f) [0x7fd6b86684a2]))) 0-test2-client-0: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-04-03 15:10:09.729963 (xid=
0x1643x)
[2012-04-03 15:10:10.166462] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is n
ot connected
[2012-04-03 15:10:10.166536] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7fd6b866949f] 
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7fd6b8668a14] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x
1f) [0x7fd6b86684a2]))) 0-test2-client-0: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-04-03 15:10:09.730065 (xid=
0x1644x)
[2012-04-03 15:10:10.166583] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is n
ot connected
[2012-04-03 15:10:10.166638] D [name.c:158:client_fill_address_family] 0-test2-client-0: address-family not specified, guessing it to be inet
/inet6
[2012-04-03 15:10:10.166972] D [common-utils.c:161:gf_resolve_ip6] 0-resolver: returning ip-127.0.1.1 (port-24007) for hostname: shortwing an
d port: 24007
[2012-04-03 15:10:10.167045] I [socket.c:2314:socket_submit_request] 0-test2-client-0: not connected (priv->connected = 0)
[2012-04-03 15:10:10.167067] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1673x Program: Gluste
rFS 3.1, ProgVers: 330, Proc: 15) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167093] W [client3_1-fops.c:882:client3_1_flush_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is no
t connected
[2012-04-03 15:10:10.167115] D [client.c:243:client_submit_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167146] W [fuse-bridge.c:949:fuse_err_cbk] 0-glusterfs-fuse: 2800: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2012-04-03 15:10:10.167247] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7fd6b866949f] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7fd6b8668a14] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7fd6b86684a2]))) 0-test2-client-0: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-04-03 15:10:09.746359 (xid=0x1651x)
[2012-04-03 15:10:10.167275] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167290] D [client3_1-fops.c:2767:client_fdctx_destroy] 0-test2-client-0: sending release on fd
[2012-04-03 15:10:10.167318] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1674x Program: GlusterFS 3.1, ProgVers: 330, Proc: 13) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167349] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167358] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1675x Program: GlusterFS 3.1, ProgVers: 330, Proc: 41) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167399] D [client.c:243:client_submit_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167378] D [client3_1-fops.c:104:client_submit_vec_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167446] W [client3_1-fops.c:4000:client3_1_writev] 0-test2-client-0: failed to send the fop
[2012-04-03 15:10:10.167478] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1676x Program: GlusterFS 3.1, ProgVers: 330, Proc: 13) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167502] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167537] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1677x Program: GlusterFS 3.1, ProgVers: 330, Proc: 15) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167562] W [client3_1-fops.c:882:client3_1_flush_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167583] D [client.c:243:client_submit_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167594] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1678x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167609] W [fuse-bridge.c:949:fuse_err_cbk] 0-glusterfs-fuse: 2903: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2012-04-03 15:10:10.167628] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected. Path: /clients/client1
[2012-04-03 15:10:10.167657] D [client3_1-fops.c:104:client_submit_vec_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167735] W [client3_1-fops.c:4000:client3_1_writev] 0-test2-client-0: failed to send the fop
[2012-04-03 15:10:10.167753] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1679x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167800] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected. Path: /clients/client1
[2012-04-03 15:10:10.167837] W [fuse-bridge.c:272:fuse_entry_cbk] 0-glusterfs-fuse: 2942: LOOKUP() /clients/client1 => -1 (Transport endpoint is not connected)

Comment 1 Raghavendra G 2012-04-04 16:23:45 UTC
This is a known issue. As of now, Graph switch cannot be done seamlessly at least for the cases like replace-brick when there is no translator like replicate that can provide High availability on client side. What cannot be done seamlessly during graph-switch cannot be assured by the code that does cleanup of sockets too. Hence closing this bug for now and can be reopened when requirement for such functionality arises.