Bug 1197185 - Brick/glusterfsd crash randomly once a day on a replicated volume
Summary: Brick/glusterfsd crash randomly once a day on a replicated volume
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.5.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard: AFR
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-27 17:33 UTC by Peter Auyeung
Modified: 2016-06-17 15:57 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-06-17 15:57:47 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Peter Auyeung 2015-02-27 17:33:52 UTC
Description of problem:
glusterfsd crash randomly once a day on a replicated volume with error:
E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully

Version-Release number of selected component (if applicable):
3.5.2

How reproducible:


Steps to Reproduce:
1. N/A
2.
3.

Actual results:
[2015-02-27 16:55:01.554964] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2015-02-27 17:00:02.340402] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2015-02-27 17:02:11.845002] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/4f4216db5dffe909b3ed8430b737d9d8.socket failed (No data available)
[2015-02-27 17:02:11.845585] I [glusterd-handler.c:3713:__glusterd_brick_rpc_notify] 0-management: Disconnected from glusterprod006.bo.shopzilla.sea:/brick02/gfs
[2015-02-27 17:02:11.854461] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2015-02-27 17:02:11.854479] E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2015-02-27 17:02:11.865963] W [rpcsvc.c:254:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330)
[2015-02-27 17:02:11.865987] E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2015-02-27 17:02:36.504901] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /brick02/gfs on port 49153
[2015-02-27 17:02:36.523359] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /brick02/gfs on port 49153
[2015-02-27 17:02:40.298697] E [glusterd-utils.c:4124:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/8f75ea8dc7d25bf6095380ad15310042.socket error: Permission denied
[2015-02-27 17:02:40.300631] I [glusterd-utils.c:4158:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2015-02-27 17:02:40.300943] I [glusterd-utils.c:4163:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2015-02-27 17:02:40.301172] I [glusterd-utils.c:4168:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2015-02-27 17:02:40.301433] I [glusterd-utils.c:4173:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2015-02-27 17:02:40.301791] I [glusterd-utils.c:4178:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2015-02-27 17:02:40.302149] I [glusterd-utils.c:4183:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2015-02-27 17:02:40.304581] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-02-27 17:02:40.305013] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=588 max=0 total=0
[2015-02-27 17:02:40.305030] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2015-02-27 17:02:40.305615] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled
[2015-02-27 17:02:40.305633] I [socket.c:3576:socket_init] 0-management: using system polling thread
[2015-02-27 17:02:40.306513] I [socket.c:2238:socket_event_handler] 0-transport: disconnecting now
[2015-02-27 17:02:41.315085] E [glusterd-utils.c:4124:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/f422793f928763c541562cd141488c0c.socket error: No such file or directory
[2015-02-27 17:02:41.317551] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-02-27 17:02:41.317617] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled
[2015-02-27 17:02:41.317626] I [socket.c:3576:socket_init] 0-management: using system polling thread
[2015-02-27 17:02:41.318974] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=588 max=1 total=4454
[2015-02-27 17:02:41.319032] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=4454
[2015-02-27 17:02:41.319117] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/f422793f928763c541562cd141488c0c.socket failed (No data available)
[2015-02-27 17:02:42.319567] E [glusterd-utils.c:4124:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/14294a56444cea1dc097c88aef4d8f1c.socket error: No such file or directory
[2015-02-27 17:02:42.320171] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/f422793f928763c541562cd141488c0c.socket failed (No data available)
[2015-02-27 17:02:42.320409] I [socket.c:2238:socket_event_handler] 0-transport: disconnecting now
[2015-02-27 17:02:42.320454] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=588 max=0 total=0
[2015-02-27 17:02:42.320471] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2015-02-27 17:02:42.362839] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2015-02-27 17:02:42.362904] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled
[2015-02-27 17:02:42.362915] I [socket.c:3576:socket_init] 0-management: using system polling thread
[2015-02-27 17:05:03.252884] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

Expected results:
Brick should not crash

Additional info:

Comment 1 Atin Mukherjee 2015-03-03 12:23:36 UTC
Please attach the core file of brick process and provide the glusterd & brick logs.

Comment 2 Peter Auyeung 2015-03-23 16:33:54 UTC
Hi Atin,

This similar to this bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1122120

and this issue moved to a non replicated volume and we found out when even user did a rm or mv to a "corrupted" director within an nfs share

I filed this bug

https://bugzilla.redhat.com/show_bug.cgi?id=1203433

Comment 3 Peter Auyeung 2015-03-23 17:16:59 UTC
Please kindly download the core from 
https://dl.dropboxusercontent.com/u/1410745/core/core.20150227.gz

Comment 4 Niels de Vos 2016-06-17 15:57:47 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.