Hide Forgot
Description of problem: While adding multiple bricks in a loop when the I/O on the mount point is in progress, the I/O exited returning EINVAL after a while. Version-Release number of selected component (if applicable): 3.3.0qa24 How reproducible: Consistently Steps to Reproduce: 1. for i in `seq 1 100`; do gluster volume add-brick test2 shortwing:/falcon/addbricks/d_1_$i --mode=script; sleep 5; done 2. while true; do echo 'ssdsds' > dot; cat dot > /dev/null; done Actual results: Returns EINVAL on the client Additional info: Client log: [2012-02-28 14:49:59.742377] W [socket.c:1521:__socket_proto_state_machine] 27-test2-client-16: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.1.1:24007) [2012-02-28 14:49:59.742521] E [rpc-clnt.c:381:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f7f351f83a5] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7f351f791a] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x 1f) [0x7f7f351f73a8]))) 27-test2-client-16: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2012-02-28 14:49:59.738688 [2012-02-28 14:49:59.742541] W [client-handshake.c:1727:client_dump_version_cbk] 27-test2-client-16: received RPC status error [2012-02-28 14:49:59.742557] W [client.c:2011:client_rpc_notify] 27-test2-client-16: Registering a grace timer [2012-02-28 14:49:59.742573] I [client.c:2024:client_rpc_notify] 27-test2-client-16: disconnected [2012-02-28 14:49:59.742598] W [socket.c:1521:__socket_proto_state_machine] 27-test2-client-17: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.1.1:24007) [2012-02-28 14:49:59.742661] E [rpc-clnt.c:381:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f7f351f83a5] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7f351f791a] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7f351f73a8]))) 27-test2-client-17: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2012-02-28 14:49:59.738782 [2012-02-28 14:49:59.742680] W [client-handshake.c:1727:client_dump_version_cbk] 27-test2-client-17: received RPC status error [2012-02-28 14:49:59.742695] W [client.c:2011:client_rpc_notify] 27-test2-client-17: Registering a grace timer [2012-02-28 14:49:59.742709] I [client.c:2024:client_rpc_notify] 27-test2-client-17: disconnected [2012-02-28 14:49:59.742728] W [socket.c:1521:__socket_proto_state_machine] 27-test2-client-18: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.1.1:24007) [2012-02-28 14:49:59.742790] E [rpc-clnt.c:381:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f7f351f83a5] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7f351f791a] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7f351f73a8]))) 27-test2-client-18: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2012-02-28 14:49:59.738870 [2012-02-28 14:49:59.742833] W [client-handshake.c:1727:client_dump_version_cbk] 27-test2-client-18: received RPC status error [2012-02-28 14:49:59.742849] W [client.c:2011:client_rpc_notify] 27-test2-client-18: Registering a grace timer [2012-02-28 14:49:59.742863] I [client.c:2024:client_rpc_notify] 27-test2-client-18: disconnected [2012-02-28 14:49:59.742882] W [socket.c:1521:__socket_proto_state_machine] 27-test2-client-19: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.1.1:24007) [2012-02-28 14:49:59.742941] E [rpc-clnt.c:381:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f7f351f83a5] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7f351f791a] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7f351f73a8]))) 27-test2-client-19: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2012-02-28 14:49:59.738958 [2012-02-28 14:49:59.742960] W [client-handshake.c:1727:client_dump_version_cbk] 27-test2-client-19: received RPC status error [2012-02-28 14:49:59.742975] W [client.c:2011:client_rpc_notify] 27-test2-client-19: Registering a grace timer [2012-02-28 14:49:59.742989] I [client.c:2024:client_rpc_notify] 27-test2-client-19: disconnected [2012-02-28 14:49:59.743008] W [socket.c:1521:__socket_proto_state_machine] 27-test2-client-20: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.1.1:24007) [2012-02-28 14:49:59.743068] E [rpc-clnt.c:381:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f7f351f83a5] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f7f351f791a] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f7f351f73a8]))) 27-test2-client-20: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2012-02-28 14:49:59.739049 [2012-02-28 14:49:59.743086] W [client-handshake.c:1727:client_dump_version_cbk] 27-test2-client-20: received RPC status error [2012-02-28 14:49:59.743101] W [client.c:2011:client_rpc_notify] 27-test2-client-20: Registering a grace timer 2012-02-28 14:50:03.513240] I [dht-layout.c:600:dht_layout_normalize] 27-test2-dht: found anomalies in <gfid:00000000-0000-0000-0000-0000000 00000>. holes=1 overlaps=0 [2012-02-28 14:50:03.513266] W [fuse-resolve.c:150:fuse_resolve_gfid_cbk] 0-fuse: a0499678-3d2d-46d1-a74b-20155dc2524c: failed to resolve (In valid argument) [2012-02-28 14:50:03.513283] E [fuse-bridge.c:1838:fuse_open_resume] 0-glusterfs-fuse: 444410: OPEN a0499678-3d2d-46d1-a74b-20155dc2524c reso lution failed [2012-02-28 14:50:03.514596] I [dht-layout.c:600:dht_layout_normalize] 27-test2-dht: found anomalies in <gfid:00000000-0000-0000-0000-0000000 00000>. holes=1 overlaps=0 [2012-02-28 14:50:03.514637] W [fuse-resolve.c:150:fuse_resolve_gfid_cbk] 0-fuse: a0499678-3d2d-46d1-a74b-20155dc2524c: failed to resolve (In valid argument) [2012-02-28 14:50:03.514654] E [fuse-bridge.c:1838:fuse_open_resume] 0-glusterfs-fuse: 444411: OPEN a0499678-3d2d-46d1-a74b-20155dc2524c reso lution failed [2012-02-28 14:50:03.515232] I [dht-layout.c:600:dht_layout_normalize] 27-test2-dht: found anomalies in <gfid:00000000-0000-0000-0000-0000000 00000>. holes=1 overlaps=0 [2012-02-28 14:50:03.515275] W [fuse-resolve.c:150:fuse_resolve_gfid_cbk] 0-fuse: a0499678-3d2d-46d1-a74b-20155dc2524c: failed to resolve (In valid argument) [2012-02-28 14:50:03.515292] E [fuse-bridge.c:1838:fuse_open_resume] 0-glusterfs-fuse: 444412: OPEN a0499678-3d2d-46d1-a74b-20155dc2524c reso lution failed [2012-02-28 14:50:03.516627] I [dht-layout.c:600:dht_layout_normalize] 27-test2-dht: found anomalies in <gfid:00000000-0000-0000-0000-0000000 00000>. holes=1 overlaps=0 [2012-02-28 14:50:03.516666] W [fuse-resolve.c:150:fuse_resolve_gfid_cbk] 0-fuse: a0499678-3d2d-46d1-a74b-20155dc2524c: failed to resolve (In valid argument) [2012-02-28 14:50:03.516684] E [fuse-bridge.c:1838:fuse_open_resume] 0-glusterfs-fuse: 444413: OPEN a0499678-3d2d-46d1-a74b-20155dc2524c reso lution failed [2012-02-28 14:50:03.518632] I [dht-layout.c:600:dht_layout_normalize] 27-test2-dht: found anomalies in <gfid:00000000-0000-0000-0000-0000000 00000>. holes=1 overlaps=0 [2012-02-28 14:50:03.518665] W [fuse-resolve.c:150:fuse_resolve_gfid_cbk] 0-fuse: a0499678-3d2d-46d1-a74b-20155dc2524c: failed to resolve (In valid argument) [2012-02-28 14:50:03.518683] E [fuse-bridge.c:1838:fuse_open_resume] 0-glusterfs-fuse: 444414: OPEN a0499678-3d2d-46d1-a74b-20155dc2524c reso lution failed [2012-02-28 14:50:03.520110] E [dht-common.c:1341:dht_lookup] 27-test2-dht: Failed to get hashed subvol for /dot [2012-02-28 14:50:03.520149] W [fuse-bridge.c:272:fuse_entry_cbk] 0-glusterfs-fuse: 444415: LOOKUP() /dot => -1 (Invalid argument) [2012-02-28 14:50:03.520280] E [dht-common.c:1341:dht_lookup] 27-test2-dht: Failed to get hashed subvol for /dot [2012-02-28 14:50:03.520300] W [fuse-bridge.c:272:fuse_entry_cbk] 0-glusterfs-fuse: 444417: LOOKUP() /dot => -1 (Invalid argument)
This seems to happen because of the brick processes running out reserved ports. All ops fail, as connections are broken. The work-around is to: 1. Start glusterd with --xlator-option *.rpc-auth-allow-insecure=on 2. set allow-insecure option for the volume 3. mount client with --xlator-option *.client-bind-insecure=on Closing the bug, as this is a known limitation, and a work-around is available.