Description of problem: We use v3.6.9 and we have a deployment with 2 storage nodes with bricks and a third one acting as quorum. We have noticed a weird behavior in storage nodes with bricks in every deployment: glusterd logs are full of the following message: [2016-07-15 04:08:01.558743] W [socket.c:620:__socket_rwv] 0-management: readv on /var/run/2d164c87498372d99c47b3b627dcd393.socket failed (Invalid argument) In addition, the following message is reported periodically in glusterd logs: The message "I [MSGID: 106006] [glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2016-07-15 04:06:04.513882] and [2016-07-15 04:08:01.558787] We found the following bug fix, which is quite similar: http://review.gluster.org/#/c/9269/ In addition to messages mentioned above, many times we face functionality problems and storage is not operational. We have collected the following logs from glusterfsd: [2016-07-18 07:48:36.800400] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2016-07-18 07:48:36.974391] I [glusterfsd-mgmt.c:1504:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-07-18 07:48:36.982390] I [server.c:518:server_rpc_notify] 0-voldata-server: disconnecting connection from SN-0-1580-2016/07/18-07:48:34:958658-voldata-client-0-0-0 [2016-07-18 07:48:36.982422] I [client_t.c:417:gf_client_unref] 0-voldata-server: Shutting down connection SN-0-1580-2016/07/18-07:48:34:958658-voldata-client-0-0-0 [2016-07-18 07:48:38.015640] I [login.c:82:gf_auth] 0-auth/login: allowed user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 [2016-07-18 07:48:38.015662] I [server-handshake.c:585:server_setvolume] 0-voldata-server: accepted client from SN-0-1622-2016/07/18-07:48:37:983796-voldata-client-0-0-0 (version: 3.6.9) [2016-07-18 07:48:38.284775] I [server.c:518:server_rpc_notify] 0-voldata-server: disconnecting connection from SN-1-2297-2016/07/18-07:48:36:82735-voldata-client-0-0-0 [2016-07-18 07:48:38.284823] I [client_t.c:417:gf_client_unref] 0-voldata-server: Shutting down connection SN-1-2297-2016/07/18-07:48:36:82735-voldata-client-0-0-0 [2016-07-18 07:48:39.331257] I [login.c:82:gf_auth] 0-auth/login: allowed user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 [2016-07-18 07:48:39.331282] I [server-handshake.c:585:server_setvolume] 0-voldata-server: accepted client from SN-1-2350-2016/07/18-07:48:38:966005-voldata-client-0-0-0 (version: 3.6.9) [2016-07-18 07:48:41.357084] I [login.c:82:gf_auth] 0-auth/login: allowed user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 [2016-07-18 07:48:41.357109] I [server-handshake.c:585:server_setvolume] 0-voldata-server: accepted client from SN-2-2765-2016/07/18-07:48:41:308956-voldata-client-0-0-0 (version: 3.6.9) [2016-07-18 07:48:42.915110] I [login.c:82:gf_auth] 0-auth/login: allowed user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 [2016-07-18 07:48:42.915136] I [server-handshake.c:585:server_setvolume] 0-voldata-server: accepted client from SN-0-1894-2016/07/18-07:48:42:894933-voldata-client-0-0-0 (version: 3.6.9) [2016-07-18 07:48:53.873241] I [server-handshake.c:585:server_setvolume] 0-voldata-server: accepted client from MN-0-740-2016/07/18-07:48:53:295469-voldata-client-0-0-0 (version: 3.6.9) [2016-07-18 07:48:59.645589] I [server-handshake.c:585:server_setvolume] 0-voldata-server: accepted client from MN-1-728-2016/07/18-07:48:59:628747-voldata-client-0-0-0 (version: 3.6.9) [2016-07-18 07:56:06.939668] I [server.c:518:server_rpc_notify] 0-voldata-server: disconnecting connection from SN-1-1152-2016/07/18-07:48:09:699332-voldata-client-0-0-0 [2016-07-18 07:56:06.939692] I [client_t.c:417:gf_client_unref] 0-voldata-server: Shutting down connection SN-1-1152-2016/07/18-07:48:09:699332-voldata-client-0-0-0 [2016-07-18 07:56:06.939722] I [server.c:518:server_rpc_notify] 0-voldata-server: disconnecting connection from SN-2-1134-2016/07/18-07:48:10:12643-voldata-client-0-0-0 [2016-07-18 07:56:06.939735] I [client_t.c:417:gf_client_unref] 0-voldata-server: Shutting down connection SN-2-1134-2016/07/18-07:48:10:12643-voldata-client-0-0-0 [2016-07-18 07:56:06.942400] W [glusterfsd.c:1211:cleanup_and_exit] (--> 0-: received signum (15), shutting down After the last message, glusterfsd restarts and after a few minutes it comes on the same situation with above messages printed again. During the short period of operation we observed many disconnections and re-connections of gluster clients. At the same time, glusterd prints the following messages: /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x356)[0x7f9df8acc1f5] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x289)[0x7f9df8a2da4b] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x61)[0x7f9df8a22384] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_peer_rpc_notify+0x38)[0x7f9df8a2dbea] ))))) 0-management: Lock for vol services not held The message "I [MSGID: 106006] [glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2016-07-18 07:58:08.475445] and [2016-07-18 08:00:06.414826] and after a while connection with the other storage server and quorum server lost. Could you please check and provide some help? Version-Release number of selected component (if applicable): v3.6.9 How reproducible: Often Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: FYI we have back-ported the following patches from 3.7 branch: http://review.gluster.org/#/c/10829/ http://review.gluster.org/#/c/13358/ http://review.gluster.org/#/c/14052/ http://review.gluster.org/#/c/12310/ http://review.gluster.org/#/c/13392/
First of all 3.6 is not going to receive any bug fixes any further, please refer to http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html . You'd need to upgrade your cluster to 3.7/3.8. Now coming to the relevant glusterd log here is what I think: /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x356)[0x7f9df8acc1f5] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x289)[0x7f9df8a2da4b] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x61)[0x7f9df8a22384] (--> /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd.so(glusterd_peer_rpc_notify+0x38)[0x7f9df8a2dbea] ))))) 0-management: Lock for vol services not held The message "I [MSGID: 106006] [glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2016-07-18 07:58:08.475445] and [2016-07-18 08:00:06.414826] This is expected and on a peer disconnect glusterd tries to release all the stale locks, if the lock is not held by the peer for the volume(s) warning logs are captured.
(In reply to Olia Kremmyda from comment #0) > Description of problem: > > We use v3.6.9 and we have a deployment with 2 storage nodes with bricks and > a third one acting as quorum. We have noticed a weird behavior in storage > nodes with bricks in every deployment: > > glusterd logs are full of the following message: > > [2016-07-15 04:08:01.558743] W [socket.c:620:__socket_rwv] 0-management: > readv on /var/run/2d164c87498372d99c47b3b627dcd393.socket failed (Invalid > argument) > > In addition, the following message is reported periodically in glusterd logs: > > The message "I [MSGID: 106006] > [glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs > has disconnected from glusterd." repeated 39 times between [2016-07-15 > 04:06:04.513882] and [2016-07-15 04:08:01.558787] > > We found the following bug fix, which is quite similar: > > http://review.gluster.org/#/c/9269/ > > > In addition to messages mentioned above, many times we face functionality > problems and storage is not operational. We have collected the following > logs from glusterfsd: > > [2016-07-18 07:48:36.800400] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: > Volume file changed > [2016-07-18 07:48:36.974391] I [glusterfsd-mgmt.c:1504:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2016-07-18 07:48:36.982390] I [server.c:518:server_rpc_notify] > 0-voldata-server: disconnecting connection from > SN-0-1580-2016/07/18-07:48:34:958658-voldata-client-0-0-0 > [2016-07-18 07:48:36.982422] I [client_t.c:417:gf_client_unref] > 0-voldata-server: Shutting down connection > SN-0-1580-2016/07/18-07:48:34:958658-voldata-client-0-0-0 > [2016-07-18 07:48:38.015640] I [login.c:82:gf_auth] 0-auth/login: allowed > user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 > [2016-07-18 07:48:38.015662] I [server-handshake.c:585:server_setvolume] > 0-voldata-server: accepted client from > SN-0-1622-2016/07/18-07:48:37:983796-voldata-client-0-0-0 (version: 3.6.9) > [2016-07-18 07:48:38.284775] I [server.c:518:server_rpc_notify] > 0-voldata-server: disconnecting connection from > SN-1-2297-2016/07/18-07:48:36:82735-voldata-client-0-0-0 > [2016-07-18 07:48:38.284823] I [client_t.c:417:gf_client_unref] > 0-voldata-server: Shutting down connection > SN-1-2297-2016/07/18-07:48:36:82735-voldata-client-0-0-0 > [2016-07-18 07:48:39.331257] I [login.c:82:gf_auth] 0-auth/login: allowed > user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 > [2016-07-18 07:48:39.331282] I [server-handshake.c:585:server_setvolume] > 0-voldata-server: accepted client from > SN-1-2350-2016/07/18-07:48:38:966005-voldata-client-0-0-0 (version: 3.6.9) > [2016-07-18 07:48:41.357084] I [login.c:82:gf_auth] 0-auth/login: allowed > user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 > [2016-07-18 07:48:41.357109] I [server-handshake.c:585:server_setvolume] > 0-voldata-server: accepted client from > SN-2-2765-2016/07/18-07:48:41:308956-voldata-client-0-0-0 (version: 3.6.9) > [2016-07-18 07:48:42.915110] I [login.c:82:gf_auth] 0-auth/login: allowed > user names: 7a00c0d0-e728-4c9a-92d7-819a10952886 > [2016-07-18 07:48:42.915136] I [server-handshake.c:585:server_setvolume] > 0-voldata-server: accepted client from > SN-0-1894-2016/07/18-07:48:42:894933-voldata-client-0-0-0 (version: 3.6.9) > [2016-07-18 07:48:53.873241] I [server-handshake.c:585:server_setvolume] > 0-voldata-server: accepted client from > MN-0-740-2016/07/18-07:48:53:295469-voldata-client-0-0-0 (version: 3.6.9) > [2016-07-18 07:48:59.645589] I [server-handshake.c:585:server_setvolume] > 0-voldata-server: accepted client from > MN-1-728-2016/07/18-07:48:59:628747-voldata-client-0-0-0 (version: 3.6.9) > [2016-07-18 07:56:06.939668] I [server.c:518:server_rpc_notify] > 0-voldata-server: disconnecting connection from > SN-1-1152-2016/07/18-07:48:09:699332-voldata-client-0-0-0 > [2016-07-18 07:56:06.939692] I [client_t.c:417:gf_client_unref] > 0-voldata-server: Shutting down connection > SN-1-1152-2016/07/18-07:48:09:699332-voldata-client-0-0-0 > [2016-07-18 07:56:06.939722] I [server.c:518:server_rpc_notify] > 0-voldata-server: disconnecting connection from > SN-2-1134-2016/07/18-07:48:10:12643-voldata-client-0-0-0 > [2016-07-18 07:56:06.939735] I [client_t.c:417:gf_client_unref] > 0-voldata-server: Shutting down connection > SN-2-1134-2016/07/18-07:48:10:12643-voldata-client-0-0-0 > [2016-07-18 07:56:06.942400] W [glusterfsd.c:1211:cleanup_and_exit] (--> 0-: > received signum (15), shutting down Pranith, Do you want to check this behavior? ~Atin > > After the last message, glusterfsd restarts and after a few minutes it comes > on the same situation with above messages printed again. During the short > period of operation we observed many disconnections and re-connections of > gluster clients. > > At the same time, glusterd prints the following messages: > > /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd. > so(glusterd_mgmt_v3_unlock+0x356)[0x7f9df8acc1f5] (--> > /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd. > so(__glusterd_peer_rpc_notify+0x289)[0x7f9df8a2da4b] (--> > /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd. > so(glusterd_big_locked_notify+0x61)[0x7f9df8a22384] (--> > /usr/lib64/glusterfs/glusterfs/3.6.9/xlator/mgmt/glusterd. > so(glusterd_peer_rpc_notify+0x38)[0x7f9df8a2dbea] ))))) 0-management: Lock > for vol services not held > The message "I [MSGID: 106006] > [glusterd-handler.c:4290:__glusterd_nodesvc_rpc_notify] 0-management: nfs > has disconnected from glusterd." repeated 39 times between [2016-07-18 > 07:58:08.475445] and [2016-07-18 08:00:06.414826] > > and after a while connection with the other storage server and quorum server > lost. > > Could you please check and provide some help? > > Version-Release number of selected component (if applicable): > v3.6.9 > > How reproducible: > Often > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: > > FYI we have back-ported the following patches from 3.7 branch: > http://review.gluster.org/#/c/10829/ > http://review.gluster.org/#/c/13358/ > http://review.gluster.org/#/c/14052/ > http://review.gluster.org/#/c/12310/ > http://review.gluster.org/#/c/13392/
GlusterFS-3.6 is nearing its End-Of-Life, only important security bugs still make a chance on getting fixed. Moving this to the mainline 'version' as I still see similar disconnection messages in GlusterD logs.
Please try glusterfs-4.1.4 version, and see if this helps you to get to better state. There were multiple fixes in these areas which helps to get better n/w behavior. Feel free to re-open this if it is still present!