Bug 1096729

Summary:	Disconnects of peer and brick is logged while snapshot creations were in progress during IO
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rahul Hinduja <rhinduja>
Component:	snapshot	Assignee:	rjoseph
Status:	CLOSED DEFERRED	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	low	Docs Contact:
Priority:	urgent
Version:	rhgs-3.0	CC:	amukherj, asengupt, asriram, jbyers, nlevinki, nsathyan, rcyriac, rhs-bugs, rjoseph, sasundar, storage-qa-internal, vagarwal, vbellur, vmallika
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.6.0.5-1.el7	Doc Type:	Bug Fix
Doc Text:	When multiple snapshot operation are performed simultaneously from different nodes in a cluster, glusterd peer disconnects can happen by ping-timer. Solution is to disable the ping-timer by setting 'option ping-timeout 0' in file '/etc/glusterfs/glusterd.vol'	Story Points:	---
Clone Of:
Clones:	1097224 1098021 1098025 1104459 (view as bug list)		Environment:
Last Closed:	2016-01-29 13:23:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1088355, 1097224, 1098021, 1098025, 1104459, 1104462, 1109150

Description Rahul Hinduja 2014-05-12 10:53:11 UTC

Description of problem:
=======================

While snapshot creation was inprogress for multiple volumes during IO, the peer and bricks are disconnected. Which resulted in snapshot creation failure. Following logs suggests the disconnects and the reason for snap commit failure.

Log Snippet:
============

Bricks observed disconnects:
+++++++++++++++++++++++++++++
[2014-05-12 15:31:23.501709] I [MSGID: 106005] [glusterd-handler.c:4126:__glusterd_brick_rpc_notify] 0-management: Brick snapshot10.lab.eng.blr.redhat.com:/var/run/glus
ter/snaps/7a4fc5ea23ca4c0899d4acef74c3c32d/brick2/b1 has disconnected from glusterd.
[2014-05-12 15:31:23.572224] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now
[2014-05-12 15:31:23.576315] I [MSGID: 106005] [glusterd-handler.c:4126:__glusterd_brick_rpc_notify] 0-management: Brick snapshot10.lab.eng.blr.redhat.com:/var/run/glus
ter/snaps/716efee4a233420f9168de24dd96a0c7/brick2/b0 has disconnected from glusterd.
[2014-05-12 15:31:23.576787] W [glusterd-utils.c:1558:glusterd_snap_volinfo_find] 0-management: Snap volume 7a4fc5ea23ca4c0899d4acef74c3c32d.snapshot10.lab.eng.blr.redh
at.com.var-run-gluster-snaps-7a4fc5ea23ca4c0899d4acef74c3c32d-brick2-b1 not found
[2014-05-12 15:31:23.578316] W [glusterd-utils.c:1558:glusterd_snap_volinfo_find] 0-management: Snap volume 716efee4a233420f9168de24dd96a0c7.snapshot10.lab.eng.blr.redh
at.com.var-run-gluster-snaps-716efee4a233420f9168de24dd96a0c7-brick2-b0 not found
[2014-05-12 15:31:23.687750] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/716efee4a233420f9168de24dd96a0c7/brick2/b0 on port 4
9189
[2014-05-12 15:31:23.801734] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/7a4fc5ea23ca4c0899d4acef74c3c32d/brick2/b1 on port 4
9188

Peer did not respond for 30 seconds and is disconnected from cluster:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[2014-05-12 15:31:53.873317] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-management: server 10.70.44.65:24007 has not responded in the last 30 seconds, discon
necting.
[2014-05-12 15:31:53.873845] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7f27a838be9d] (-->/usr/lib64/libgfrpc.so.0(rp
c_clnt_connection_cleanup+0x91) [0x7f27a838b8d1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f27a838b81e]))) 0-management: forced unwinding frame type(gl
usterd mgmt v3) op(--(4)) called at 2014-05-12 15:31:23.579332 (xid=0x72)
[2014-05-12 15:31:53.873883] E [glusterd-mgmt.c:116:gd_mgmt_v3_collate_errors] 0-management: Commit failed on snapshot12.lab.eng.blr.redhat.com. Please check log file for details.
[2014-05-12 15:31:53.874077] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7f27a838be9d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7f27a838b8d1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f27a838b81e]))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2014-05-12 15:31:23.579368 (xid=0x73)
[2014-05-12 15:31:53.874099] W [rpc-clnt-ping.c:150:rpc_clnt_ping_cbk] 0-management: timer must have expired
[2014-05-12 15:31:53.874145] I [MSGID: 106004] [glusterd-handler.c:4290:__glusterd_peer_rpc_notify] 0-management: Peer 70808d26-8d87-4082-b123-bf22481735df, in Peer in Cluster state, has disconnected from glusterd.
[2014-05-12 15:31:53.874187] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol vol0 held by d7ec3cfe-48e1-4ff1-af39-9aedf526979a
[2014-05-12 15:31:53.874213] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol vol1 held by 5684a263-cee0-4fef-bc58-a972eb862724
[2014-05-12 15:31:53.874263] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol vol2 held by e2a99572-6307-49d3-aa45-19cd596f286c



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.6.0-3.0.el6rhs.x86_64


How reproducible:
=================

Issue has been observed multiple times by different QE's.


Steps Carried:
=============
1. Start and Create multiple volumes 2*2(vol0 to vol3) for 4 node cluster.
2. Mount the volume (Fuse and NFS)
3. Start the heavy IO from the client
4. Start creating snapshots of vol0 to vol3 at the same time from different nodes.
5. Monitor the glusterd logs AND/OR gluster peer status

Actual results:
===============

Peer and bricks sees disconnects and snapshot creation fails


Expected results:
=================
Peers should be in connected state and creation should be successful

Comment 3 Vijaikumar Mallikarjuna 2014-05-13 07:13:25 UTC

Brick process could be busy processing something and didn't respond to glusterd within 30secs.
I think this issue exists earlier as well, but it now visible after moving the ping-timer to the lower level to make it available to all the process.


Hi Rahul,
Can please re-try your test by disabling ping-timer.
ping-timer can be disabled by setting the ping-timeout to 0 in the glusterd volume file.

Comment 4 Vijaikumar Mallikarjuna 2014-05-13 12:28:13 UTC

When there are many IO happening on the brick process, epoll thread will be busy and will not respong to the ping packet within 60secs.
For now we will disable ping-timer for the below:
glusterd->brick
brick->glusterd
cli->glusterd

Later when the epoll thread model changed and made lighter, we need to revert back the change.

Patch# 7753 is posted upstream

Comment 5 Nagaprasad Sathyanarayana 2014-05-15 06:02:04 UTC

<Summary of e-mail discussion>
This issue requires some more analysis.  To unblock QE immediately, a temporary patch is sent to downstream branch (RHS 3.0).

https://code.engineering.redhat.com/gerrit/#/c/25040/

This would be a downstream only patch until we uncover the root cause
for the ping timeouts. In this patch we are disabling ping timers on the some connections, so that we don't the problems faced go away. This just avoids the problem but doesn't fix it correctly. 

Root causing of the issue and fixing it shall be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1098021

Comment 6 Nagaprasad Sathyanarayana 2014-05-15 06:09:09 UTC

*** Bug 1098021 has been marked as a duplicate of this bug. ***

Comment 7 Vijaikumar Mallikarjuna 2014-05-15 11:27:47 UTC

I was trying to debug the root cause by executing below steps and found epoll thread is blocked by a big-lock:

1) I made a change in the snapshot creation part of code by adding a sleep of 30sec so that it holds a big-lock for 30+ secs.
2) Executed snapshot creation from two different nodes of two different volumes. After sometime ping-timer on the brick-process expired and disconnected the socket.
3) Here is the stack trace from glusterd.

#0  0x0000003c9620b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fac2b1a71fb in __synclock_lock (lock=0x16a9a58) at syncop.c:764
#2  0x00007fac2b1a728e in synclock_lock (lock=0x16a9a58) at syncop.c:782
#3  0x00007fac20e70691 in glusterd_big_locked_cbk (req=0x16d9118, iov=0x16d9158, count=1,
    myframe=0x7fac29dd7740, fn=0x7fac20ed5680 <gd_mgmt_v3_unlock_cbk_fn>) at glusterd-rpc-ops.c:206
#4  0x00007fac2af41e25 in rpc_clnt_handle_reply (clnt=0x16c62c0, pollin=0x170ca40) at rpc-clnt.c:767
#5  0x00007fac2af42cbf in rpc_clnt_notify (trans=<value optimized out>, mydata=0x16c62f0,
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:895
#6  0x00007fac2af3e568 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>,
    data=<value optimized out>) at rpc-transport.c:512
#7  0x00007fac20bbce85 in socket_event_poll_in (this=0x16dbb90) at socket.c:2120
#8  0x00007fac20bbe86d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>,
    data=0x16dbb90, poll_in=1, poll_out=0, poll_err=0) at socket.c:2233
#9  0x00007fac2b1c07e7 in event_dispatch_epoll_handler (event_pool=0x16824d0) at event-epoll.c:384
#10 event_dispatch_epoll (event_pool=0x16824d0) at event-epoll.c:445
#11 0x0000000000407ecd in main (argc=2, argv=0x7fffcab7fdd8) at glusterfsd.c:2023

Notification handler is blocked by a big-locked callback.

Comment 8 Vijaikumar Mallikarjuna 2014-05-16 07:51:47 UTC

Ping-timer is set for every rpc connection which defaults to 30secs.

Patch gives (https://code.engineering.redhat.com/gerrit/#/c/25040/) the workaround by disabling the ping-timer between glusterd and brick process connection.

In case if the ping-timer expiry issue still exists, ping-timer can be disabled between glusterd-glusterd connection as well
by setting ping-timeout option in 'glusterd.vol' file.


We will keep the bug open util the fix is made for the actual issue.

Comment 9 Vijaikumar Mallikarjuna 2014-05-20 06:31:31 UTC

Patch posted: https://code.engineering.redhat.com/gerrit/25294
With this patch default ping-timeout value is '0' for all the rpc connection except for the glsuerd-glusterd (set to 30sec)
and client-glusterd (set to 42sec).

Comment 10 Vijaikumar Mallikarjuna 2014-06-02 09:24:28 UTC

Closing this bug. We will open a new RFC bug to address the epoll issue

Comment 12 Vijaikumar Mallikarjuna 2014-06-09 11:08:12 UTC

We cannot set ping-timeout to 0 for glusterd peer connection, because this will create regression for one of the intuit bug# 1034479.

Also we are seeing peer disconnect only when multiple snapshot operations are executed simultaneously from different nodes.
So we can document the same saying 'set ping-timeout=0' in '/etc/glusterfs/glusterd.vol' when multiple snapshot operations executed simultaneously.

Comment 13 senaik 2014-06-13 09:23:29 UTC

Version: glusterfs-3.6.0.15-1.el6rhs.x86_64

Retried creating snapshots by setting ping-timeout to 30 (default value) in /etc/glusterfs/glusterd.vol, we saw disconnect messages and faced glusterd crashed.

Steps followed : 
==============
created snapshots in loop on 4 volumes at the same time with IO in progress 

for i in {201..400}; do dd if=/dev/urandom of=fuse_vol2"$i" bs=10M count=1; done


[2014-06-11 12:28:40.783197] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-management: server 10.70.40.169:24007 has not responded in the last 30 seconds, disconnecting.
[2014-06-11 12:28:56.586750] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2014-06-11 12:28:57.214459] I [socket.c:2246:socket_event_handler] 0-transport: disconnecting now
[2014-06-11 12:28:57.218328] I [MSGID: 106005] [glusterd-handler.c:4131:__glusterd_brick_rpc_notify] 0-management: Brick snapshot14.lab.eng.blr.redhat.com:/var/run/gluster/snaps/c8a70faa0459443c87b37991a243b405/brick2/b2 has disconnected from glusterd.


Uploaded the sosreports and the core file:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1096729/

Comment 15 senaik 2014-06-17 05:44:46 UTC

Doc bug raised to track the Known issue BZ 1109150. Lowering the severity of this bug. 

As discussed with the developers, raised another BZ 1110119 to track the issue of glusterd crash as mentioned in Comment 13.

Comment 16 Shalaka 2014-06-18 05:57:38 UTC

Please add doc text for this known issue.

Comment 17 Vijaikumar Mallikarjuna 2014-06-18 06:05:07 UTC

Doc text added

Comment 21 Atin Mukherjee 2015-12-30 09:08:21 UTC

When this bug was raised at that time multi-threaded epoll support was not in place, we had to turn off ping timer to get rid of this problem. However still ping timer is not enabled between GlusterDs due to some issues seen in BZ 1259992. Once we identify the root cause of the same and fix it, multi threaded epoll support with ping timer enabled can be brought in back and then this bug can be verified.

Moving the bug from ateam to gabbar as test cases coverage belongs to snapshot.

Comment 22 Avra Sengupta 2016-01-29 13:23:10 UTC

Current Gluster architecture does not support implementation of this feature. Therefore this feature request is deferred till Gluterd 2.0.