Bug 809409 - [0e4c74861f762d4af7b7d8ffce5384920a6aa335] I/O exits with ENOTCONN when replace-brick is started
Summary: [0e4c74861f762d4af7b7d8ffce5384920a6aa335] I/O exits with ENOTCONN when repla...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: transport
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-04-03 09:53 UTC by Anush Shetty
Modified: 2012-04-04 16:23 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-04 16:23:45 UTC
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anush Shetty 2012-04-03 09:53:21 UTC
Description of problem: When replace-brick is started with ongoing I/O on the fuse mount point, the I/O exited with an ENOTCONN


Version-Release number of selected component (if applicable): Upstream


How reproducible: Consistently


Steps to Reproduce:
1. while true; do dbench -s 10 -t 10 -D /mnt/gluster/; done
2. gluster volume replace-brick test2 shortwing:/falcon/d1 shortwing:/falcon/d2 start
3.
  
Actual results:
Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
1 of 10 processes prepared for launch   0 sec
[3] open ./clients/client1 failed for handle 16385 (Transport endpoint is not connected)
10 of 10 processes prepared for launch   0 sec
releasing clients
[3] open ./clients/client4 failed for handle 16385 (Transport endpoint is not connected)
(4) ERROR: handle 16385 was not found
Child failed with status 1
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004



Expected results:

I/O should continue without exiting.

Additional info:

Client log:
[2012-04-03 15:10:10.165967] D [socket.c:193:__socket_rwv] 0-test2-client-0: EOF from peer 127.0.1.1:24009
[2012-04-03 15:10:10.166048] W [socket.c:1521:__socket_proto_state_machine] 0-test2-client-0: reading from socket failed. Error (Transport en
dpoint is not connected), peer (127.0.1.1:24009)
[2012-04-03 15:10:10.166077] D [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-04-03 15:10:10.166424] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7fd6b866949f] 
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7fd6b8668a14] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x
1f) [0x7fd6b86684a2]))) 0-test2-client-0: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-04-03 15:10:09.729963 (xid=
0x1643x)
[2012-04-03 15:10:10.166462] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is n
ot connected
[2012-04-03 15:10:10.166536] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7fd6b866949f] 
(-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7fd6b8668a14] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x
1f) [0x7fd6b86684a2]))) 0-test2-client-0: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-04-03 15:10:09.730065 (xid=
0x1644x)
[2012-04-03 15:10:10.166583] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is n
ot connected
[2012-04-03 15:10:10.166638] D [name.c:158:client_fill_address_family] 0-test2-client-0: address-family not specified, guessing it to be inet
/inet6
[2012-04-03 15:10:10.166972] D [common-utils.c:161:gf_resolve_ip6] 0-resolver: returning ip-127.0.1.1 (port-24007) for hostname: shortwing an
d port: 24007
[2012-04-03 15:10:10.167045] I [socket.c:2314:socket_submit_request] 0-test2-client-0: not connected (priv->connected = 0)
[2012-04-03 15:10:10.167067] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1673x Program: Gluste
rFS 3.1, ProgVers: 330, Proc: 15) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167093] W [client3_1-fops.c:882:client3_1_flush_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is no
t connected
[2012-04-03 15:10:10.167115] D [client.c:243:client_submit_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167146] W [fuse-bridge.c:949:fuse_err_cbk] 0-glusterfs-fuse: 2800: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2012-04-03 15:10:10.167247] E [rpc-clnt.c:382:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7fd6b866949f] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7fd6b8668a14] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7fd6b86684a2]))) 0-test2-client-0: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-04-03 15:10:09.746359 (xid=0x1651x)
[2012-04-03 15:10:10.167275] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167290] D [client3_1-fops.c:2767:client_fdctx_destroy] 0-test2-client-0: sending release on fd
[2012-04-03 15:10:10.167318] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1674x Program: GlusterFS 3.1, ProgVers: 330, Proc: 13) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167349] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167358] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1675x Program: GlusterFS 3.1, ProgVers: 330, Proc: 41) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167399] D [client.c:243:client_submit_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167378] D [client3_1-fops.c:104:client_submit_vec_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167446] W [client3_1-fops.c:4000:client3_1_writev] 0-test2-client-0: failed to send the fop
[2012-04-03 15:10:10.167478] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1676x Program: GlusterFS 3.1, ProgVers: 330, Proc: 13) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167502] W [client3_1-fops.c:822:client3_1_writev_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167537] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1677x Program: GlusterFS 3.1, ProgVers: 330, Proc: 15) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167562] W [client3_1-fops.c:882:client3_1_flush_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 15:10:10.167583] D [client.c:243:client_submit_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167594] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1678x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167609] W [fuse-bridge.c:949:fuse_err_cbk] 0-glusterfs-fuse: 2903: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2012-04-03 15:10:10.167628] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected. Path: /clients/client1
[2012-04-03 15:10:10.167657] D [client3_1-fops.c:104:client_submit_vec_request] 0-test2-client-0: rpc_clnt_submit failed
[2012-04-03 15:10:10.167735] W [client3_1-fops.c:4000:client3_1_writev] 0-test2-client-0: failed to send the fop
[2012-04-03 15:10:10.167753] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-test2-client-0: failed to submit rpc-request (XID: 0x1679x Program: GlusterFS 3.1, ProgVers: 330, Proc: 27) to rpc-transport (test2-client-0)
[2012-04-03 15:10:10.167800] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 0-test2-client-0: remote operation failed: Transport endpoint is not connected. Path: /clients/client1
[2012-04-03 15:10:10.167837] W [fuse-bridge.c:272:fuse_entry_cbk] 0-glusterfs-fuse: 2942: LOOKUP() /clients/client1 => -1 (Transport endpoint is not connected)

Comment 1 Raghavendra G 2012-04-04 16:23:45 UTC
This is a known issue. As of now, Graph switch cannot be done seamlessly at least for the cases like replace-brick when there is no translator like replicate that can provide High availability on client side. What cannot be done seamlessly during graph-switch cannot be assured by the code that does cleanup of sockets too. Hence closing this bug for now and can be reopened when requirement for such functionality arises.


Note You need to log in before you can comment on or make changes to this bug.