Created attachment 680139 [details] nfs server log file Description of problem: gluster mountpoint fails with ENOENT when tried to access the fuse mountpoint. And the nfs mountpoint became hung. The volume was 2*2 striped-replicated volume. And fs-perf-test was running on fuse mount and fileop was running on nfs mount simultaneously. Version-Release number of selected component (if applicable): glusterfs-3.4.0qa5 How reproducible: 1/1 Steps to Reproduce: 1. Create and start 2*2 striped-replicated volume. 2. Now do a fuse mount and run fs-perf-test from it. 3. While fs-per-test is going on take down one sub-volume of replicate translator. 4. Now start fileop from nfs mount (fileop -f 50) 5. After sometime bring back the glusterfsd. Actual results: nfs mountpoint became inaccessible. Trying to access the nfs mount would hang forever. And accessing fuse mount fails with ENOENT. Expected results: nfs mount should not hang and fuse mountpoint should not fail. Additional info: From nfs server log, I see lot of these errors [2013-01-17 09:35:39.976760] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:36:57.408600] W [socket.c:501:__socket_rwv] 0-hosdu-client-2: readv failed (Connection reset by peer) [2013-01-17 09:36:57.408616] W [socket.c:1932:__socket_proto_state_machine] 0-hosdu-client-2: reading from socket failed. Error (Connection reset by peer), peer (10.16.159.188:49152) [2013-01-17 09:36:57.408673] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7fd0d893c7f8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_ cleanup+0xc3) [0x7fd0d893c5a3] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fd0d893bcbe]))) 0-hosdu-client-2: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 20 13-01-17 09:33:39.453902 (xid=0x1116x) [2013-01-17 09:36:57.408681] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-2: received RPC status error [2013-01-17 09:36:57.408688] I [client.c:2097:client_rpc_notify] 0-hosdu-client-2: disconnected [2013-01-17 09:36:58.070753] W [common-utils.c:2296:gf_ports_reserved] 0-glusterfs-socket: is not a valid port identifier [2013-01-17 09:36:58.072765] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-hosdu-client-2: changing port to 49152 (from 0) [2013-01-17 09:36:58.072804] W [socket.c:501:__socket_rwv] 0-hosdu-client-2: readv failed (No data available) [2013-01-17 09:36:58.075285] W [common-utils.c:2296:gf_ports_reserved] 0-glusterfs-socket: is not a valid port identifier [2013-01-17 09:37:20.734728] E [nfs3.c:2848:nfs3_mkdir] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:37:20.734740] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:37:26.876739] E [nfs3.c:4621:nfs3_fsstat] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:37:26.876753] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:37:38.566770] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:37:38.566789] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:37:39.976737] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:37:39.976752] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:38:20.734717] E [nfs3.c:2848:nfs3_mkdir] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:38:20.734734] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:38:26.876732] E [nfs3.c:4621:nfs3_fsstat] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:38:26.876744] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:38:38.566770] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:38:38.566830] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:38:39.976780] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:38:39.976793] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:40:20.734748] E [nfs3.c:2848:nfs3_mkdir] 0-nfs-nfsv3: Volume is disabled: hosdu [2013-01-17 09:40:20.734767] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2013-01-17 09:40:23.428621] W [socket.c:501:__socket_rwv] 0-hosdu-client-2: readv failed (Connection reset by peer) [2013-01-17 09:40:23.428635] W [socket.c:1932:__socket_proto_state_machine] 0-hosdu-client-2: reading from socket failed. Error (Connection reset by peer), peer (10.16.159.188:49152) [2013-01-17 09:40:23.428696] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7fd0d893c7f8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_ cleanup+0xc3) [0x7fd0d893c5a3] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fd0d893bcbe]))) 0-hosdu-client-2: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 20 13-01-17 09:37:05.473932 (xid=0x1119x) From fuse mount log I see lot of these log messages [2013-01-16 12:37:48.895184] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.895192] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406122: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895278] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406123: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895387] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up [2013-01-16 12:37:48.895396] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.895405] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406125: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895517] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406126: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895628] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up [2013-01-16 12:37:48.895638] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.895647] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406128: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895733] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406129: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895845] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up [2013-01-16 12:37:48.895870] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.895881] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406131: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.895969] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406132: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.896078] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up [2013-01-16 12:37:48.896087] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.896096] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406134: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.896182] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406135: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.896291] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up [2013-01-16 12:37:48.896300] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.896310] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406137: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.896396] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406138: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.896529] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up [2013-01-16 12:37:48.896539] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up [2013-01-16 12:37:48.896548] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406140: FSYNC() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:37:48.896635] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406141: FLUSH() ERR => -1 (Transport endpoint is not connected) [2013-01-16 12:50:21.543366] E [rpc-clnt.c:207:call_bail] 0-hosdu-client-0: bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0x167114x sent = 2013-01-16 12:20:20.751504. timeout = 1800 [2013-01-16 12:50:21.543390] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-0: received RPC status error [2013-01-16 12:50:29.551816] E [rpc-clnt.c:207:call_bail] 0-hosdu-client-3: bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0x811131x sent = 2013-01-16 12:20:28.312014. timeout = 1800 [2013-01-16 12:50:29.551836] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-3: received RPC status error [2013-01-16 12:50:37.566300] E [rpc-clnt.c:207:call_bail] 0-hosdu-client-1: bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0x840899x sent = 2013-01-16 12:20:36.368791. timeout = 1800 [2013-01-16 12:50:37.566320] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-1: received RPC status error [2013-01-16 13:28:29.492911] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406143: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.497642] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406185: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.502324] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406227: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.506992] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406269: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.511745] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406311: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.516398] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406353: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.521212] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406395: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.526040] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406437: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.530599] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406479: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.535460] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406521: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.540157] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406563: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.544850] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406605: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.549776] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406647: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.554371] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406689: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.558965] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406731: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.563774] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406773: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.568352] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406815: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.572954] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406857: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.577446] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406899: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.582131] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406941: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.586680] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406983: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.591168] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407025: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.595734] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407067: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.600196] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407109: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.604735] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407151: LOOKUP() / => -1 (No such file or directory) [2013-01-16 13:28:29.609239] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407193: LOOKUP() / => -1 (No such file or directory) I've attached the fuse mount log, nfs server log and nfs servers statedump.
Created attachment 680140 [details] nfs server statedump
Pranith, Rajesh, I need your help in looking at these logs and pointing possible issues. I suspect (at the top glance) the issue may be because of CHILD_UP/DOWN events. Need to be sure. Priority 'medium' as it involves stripe
I couldn't reproduce it in my local setup. As per the logs, 0-hosdu-replicate-0: no subvolumes up, 0-hosdu-replicate-1: no subvolumes up which says none of the children are up; this is an expected behaviour. Can you reproduce it and give us complete information?
Striped replicate volume is not something we want to support. Please feel free to log a new bug if this bug appears on a volume that is supported. Closing this for now.