Description of problem: glusterfsd crash randomly once a day on a replicated volume with error: E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully Version-Release number of selected component (if applicable): 3.5.2 How reproducible: Steps to Reproduce: 1. N/A 2. 3. Actual results: [2015-02-27 16:55:01.554964] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2015-02-27 17:00:02.340402] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2015-02-27 17:02:11.845002] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/4f4216db5dffe909b3ed8430b737d9d8.socket failed (No data available) [2015-02-27 17:02:11.845585] I [glusterd-handler.c:3713:__glusterd_brick_rpc_notify] 0-management: Disconnected from glusterprod006.bo.shopzilla.sea:/brick02/gfs [2015-02-27 17:02:11.854461] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2015-02-27 17:02:11.854479] E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-02-27 17:02:11.865963] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) [2015-02-27 17:02:11.865987] E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-02-27 17:02:36.504901] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /brick02/gfs on port 49153 [2015-02-27 17:02:36.523359] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick02/gfs on port 49153 [2015-02-27 17:02:40.298697] E [glusterd-utils.c:4124:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/8f75ea8dc7d25bf6095380ad15310042.socket error: Permission denied [2015-02-27 17:02:40.300631] I [glusterd-utils.c:4158:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2015-02-27 17:02:40.300943] I [glusterd-utils.c:4163:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2015-02-27 17:02:40.301172] I [glusterd-utils.c:4168:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2015-02-27 17:02:40.301433] I [glusterd-utils.c:4173:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2015-02-27 17:02:40.301791] I [glusterd-utils.c:4178:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2015-02-27 17:02:40.302149] I [glusterd-utils.c:4183:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2015-02-27 17:02:40.304581] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-02-27 17:02:40.305013] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-02-27 17:02:40.305030] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-02-27 17:02:40.305615] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled [2015-02-27 17:02:40.305633] I [socket.c:3576:socket_init] 0-management: using system polling thread [2015-02-27 17:02:40.306513] I [socket.c:2238:socket_event_handler] 0-transport: disconnecting now [2015-02-27 17:02:41.315085] E [glusterd-utils.c:4124:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/f422793f928763c541562cd141488c0c.socket error: No such file or directory [2015-02-27 17:02:41.317551] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-02-27 17:02:41.317617] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled [2015-02-27 17:02:41.317626] I [socket.c:3576:socket_init] 0-management: using system polling thread [2015-02-27 17:02:41.318974] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=588 max=1 total=4454 [2015-02-27 17:02:41.319032] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=4454 [2015-02-27 17:02:41.319117] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/f422793f928763c541562cd141488c0c.socket failed (No data available) [2015-02-27 17:02:42.319567] E [glusterd-utils.c:4124:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/14294a56444cea1dc097c88aef4d8f1c.socket error: No such file or directory [2015-02-27 17:02:42.320171] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/f422793f928763c541562cd141488c0c.socket failed (No data available) [2015-02-27 17:02:42.320409] I [socket.c:2238:socket_event_handler] 0-transport: disconnecting now [2015-02-27 17:02:42.320454] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=588 max=0 total=0 [2015-02-27 17:02:42.320471] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2015-02-27 17:02:42.362839] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-02-27 17:02:42.362904] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled [2015-02-27 17:02:42.362915] I [socket.c:3576:socket_init] 0-management: using system polling thread [2015-02-27 17:05:03.252884] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req Expected results: Brick should not crash Additional info:
Please attach the core file of brick process and provide the glusterd & brick logs.
Hi Atin, This similar to this bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1122120 and this issue moved to a non replicated volume and we found out when even user did a rm or mv to a "corrupted" director within an nfs share I filed this bug https://bugzilla.redhat.com/show_bug.cgi?id=1203433
Please kindly download the core from https://dl.dropboxusercontent.com/u/1410745/core/core.20150227.gz
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.