Bug 896462 - ENOENT while trying to access gluster fuse mountpoint, on a 2*2 striped-replicated volume.
Summary: ENOENT while trying to access gluster fuse mountpoint, on a 2*2 striped-repli...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: 2.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-17 11:13 UTC by M S Vishwanath Bhat
Modified: 2016-09-19 22:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-26 21:19:08 UTC
Embargoed:


Attachments (Terms of Use)
nfs server log file (1.57 MB, text/x-log)
2013-01-17 11:13 UTC, M S Vishwanath Bhat
no flags Details
nfs server statedump (7.23 KB, application/octet-stream)
2013-01-17 11:14 UTC, M S Vishwanath Bhat
no flags Details

Description M S Vishwanath Bhat 2013-01-17 11:13:18 UTC
Created attachment 680139 [details]
nfs server log file

Description of problem:
gluster mountpoint fails with ENOENT when tried to access the fuse mountpoint. And the nfs mountpoint became hung. The volume was 2*2 striped-replicated volume. And fs-perf-test was running on fuse mount and fileop was running on nfs mount simultaneously.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0qa5

How reproducible:
1/1

Steps to Reproduce:
1. Create and start 2*2 striped-replicated volume.
2. Now do a fuse mount and run fs-perf-test from it. 
3. While fs-per-test is going on take down one sub-volume of replicate translator.
4. Now start fileop from nfs mount (fileop -f 50)
5. After sometime bring back the glusterfsd.
  
Actual results:
nfs mountpoint became inaccessible. Trying to access the nfs mount would hang forever. And accessing fuse mount fails with ENOENT.


Expected results:
nfs mount should not hang and fuse mountpoint should not fail.


Additional info:
From nfs server log, I see lot of these errors


[2013-01-17 09:35:39.976760] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:36:57.408600] W [socket.c:501:__socket_rwv] 0-hosdu-client-2: readv failed (Connection reset by peer)
[2013-01-17 09:36:57.408616] W [socket.c:1932:__socket_proto_state_machine] 0-hosdu-client-2: reading from socket failed. Error (Connection reset by peer), peer (10.16.159.188:49152)
[2013-01-17 09:36:57.408673] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7fd0d893c7f8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_
cleanup+0xc3) [0x7fd0d893c5a3] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fd0d893bcbe]))) 0-hosdu-client-2: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 20
13-01-17 09:33:39.453902 (xid=0x1116x)
[2013-01-17 09:36:57.408681] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-2: received RPC status error
[2013-01-17 09:36:57.408688] I [client.c:2097:client_rpc_notify] 0-hosdu-client-2: disconnected
[2013-01-17 09:36:58.070753] W [common-utils.c:2296:gf_ports_reserved] 0-glusterfs-socket:  is not a valid port identifier
[2013-01-17 09:36:58.072765] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-hosdu-client-2: changing port to 49152 (from 0)
[2013-01-17 09:36:58.072804] W [socket.c:501:__socket_rwv] 0-hosdu-client-2: readv failed (No data available)
[2013-01-17 09:36:58.075285] W [common-utils.c:2296:gf_ports_reserved] 0-glusterfs-socket:  is not a valid port identifier
[2013-01-17 09:37:20.734728] E [nfs3.c:2848:nfs3_mkdir] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:37:20.734740] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:37:26.876739] E [nfs3.c:4621:nfs3_fsstat] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:37:26.876753] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:37:38.566770] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:37:38.566789] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:37:39.976737] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:37:39.976752] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:38:20.734717] E [nfs3.c:2848:nfs3_mkdir] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:38:20.734734] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:38:26.876732] E [nfs3.c:4621:nfs3_fsstat] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:38:26.876744] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:38:38.566770] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:38:38.566830] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:38:39.976780] E [nfs3.c:842:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:38:39.976793] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:40:20.734748] E [nfs3.c:2848:nfs3_mkdir] 0-nfs-nfsv3: Volume is disabled: hosdu
[2013-01-17 09:40:20.734767] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2013-01-17 09:40:23.428621] W [socket.c:501:__socket_rwv] 0-hosdu-client-2: readv failed (Connection reset by peer)
[2013-01-17 09:40:23.428635] W [socket.c:1932:__socket_proto_state_machine] 0-hosdu-client-2: reading from socket failed. Error (Connection reset by peer), peer (10.16.159.188:49152)
[2013-01-17 09:40:23.428696] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7fd0d893c7f8] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_
cleanup+0xc3) [0x7fd0d893c5a3] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fd0d893bcbe]))) 0-hosdu-client-2: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 20
13-01-17 09:37:05.473932 (xid=0x1119x)



From fuse mount log I see lot of these log messages


[2013-01-16 12:37:48.895184] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.895192] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406122: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895278] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406123: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895387] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up
[2013-01-16 12:37:48.895396] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.895405] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406125: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895517] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406126: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895628] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up
[2013-01-16 12:37:48.895638] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.895647] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406128: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895733] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406129: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895845] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up
[2013-01-16 12:37:48.895870] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.895881] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406131: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.895969] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406132: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.896078] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up
[2013-01-16 12:37:48.896087] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.896096] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406134: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.896182] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406135: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.896291] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up
[2013-01-16 12:37:48.896300] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.896310] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406137: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.896396] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406138: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.896529] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-0: no subvolumes up

[2013-01-16 12:37:48.896539] I [afr-common.c:3874:afr_local_init] 0-hosdu-replicate-1: no subvolumes up
[2013-01-16 12:37:48.896548] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406140: FSYNC() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:37:48.896635] W [fuse-bridge.c:1127:fuse_err_cbk] 0-glusterfs-fuse: 20406141: FLUSH() ERR => -1 (Transport endpoint is not connected)
[2013-01-16 12:50:21.543366] E [rpc-clnt.c:207:call_bail] 0-hosdu-client-0: bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0x167114x sent = 2013-01-16 12:20:20.751504. timeout = 1800
[2013-01-16 12:50:21.543390] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-0: received RPC status error
[2013-01-16 12:50:29.551816] E [rpc-clnt.c:207:call_bail] 0-hosdu-client-3: bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0x811131x sent = 2013-01-16 12:20:28.312014. timeout = 1800
[2013-01-16 12:50:29.551836] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-3: received RPC status error
[2013-01-16 12:50:37.566300] E [rpc-clnt.c:207:call_bail] 0-hosdu-client-1: bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0x840899x sent = 2013-01-16 12:20:36.368791. timeout = 1800
[2013-01-16 12:50:37.566320] W [client-handshake.c:1797:client_dump_version_cbk] 0-hosdu-client-1: received RPC status error
[2013-01-16 13:28:29.492911] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406143: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.497642] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406185: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.502324] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406227: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.506992] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406269: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.511745] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406311: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.516398] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406353: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.521212] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406395: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.526040] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406437: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.530599] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406479: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.535460] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406521: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.540157] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406563: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.544850] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406605: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.549776] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406647: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.554371] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406689: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.558965] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406731: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.563774] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406773: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.568352] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406815: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.572954] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406857: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.577446] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406899: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.582131] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406941: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.586680] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20406983: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.591168] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407025: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.595734] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407067: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.600196] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407109: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.604735] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407151: LOOKUP() / => -1 (No such file or directory)
[2013-01-16 13:28:29.609239] W [fuse-bridge.c:660:fuse_attr_cbk] 0-glusterfs-fuse: 20407193: LOOKUP() / => -1 (No such file or directory)


I've attached the fuse mount log, nfs server log and nfs servers statedump.

Comment 1 M S Vishwanath Bhat 2013-01-17 11:14:13 UTC
Created attachment 680140 [details]
nfs server statedump

Comment 4 Amar Tumballi 2013-01-18 05:46:59 UTC
Pranith, Rajesh, I need your help in looking at these logs and pointing possible issues. I suspect (at the top glance) the issue may be because of CHILD_UP/DOWN events. Need to be sure.

Priority 'medium' as it involves stripe

Comment 5 vpshastry 2013-03-18 07:23:45 UTC
I couldn't reproduce it in my local setup. As per the logs, 0-hosdu-replicate-0: no subvolumes up, 0-hosdu-replicate-1: no subvolumes up which says none of the children are up; this is an expected behaviour.

Can you reproduce it and give us complete information?

Comment 6 Pranith Kumar K 2015-08-26 21:19:08 UTC
Striped replicate volume is not something we want to support. Please feel free to log a new bug if this bug appears on a volume that is supported. Closing this for now.


Note You need to log in before you can comment on or make changes to this bug.