Description of problem: ======================= While snapshot creation was inprogress for multiple volumes during IO, the peer and bricks are disconnected. Which resulted in snapshot creation failure. Following logs suggests the disconnects and the reason for snap commit failure. Log Snippet: ============ Bricks observed disconnects: +++++++++++++++++++++++++++++ [2014-05-12 15:31:23.501709] I [MSGID: 106005] [glusterd-handler.c:4126:__glusterd_brick_rpc_notify] 0-management: Brick snapshot10.lab.eng.blr.redhat.com:/var/run/glus ter/snaps/7a4fc5ea23ca4c0899d4acef74c3c32d/brick2/b1 has disconnected from glusterd. [2014-05-12 15:31:23.572224] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now [2014-05-12 15:31:23.576315] I [MSGID: 106005] [glusterd-handler.c:4126:__glusterd_brick_rpc_notify] 0-management: Brick snapshot10.lab.eng.blr.redhat.com:/var/run/glus ter/snaps/716efee4a233420f9168de24dd96a0c7/brick2/b0 has disconnected from glusterd. [2014-05-12 15:31:23.576787] W [glusterd-utils.c:1558:glusterd_snap_volinfo_find] 0-management: Snap volume 7a4fc5ea23ca4c0899d4acef74c3c32d.snapshot10.lab.eng.blr.redh at.com.var-run-gluster-snaps-7a4fc5ea23ca4c0899d4acef74c3c32d-brick2-b1 not found [2014-05-12 15:31:23.578316] W [glusterd-utils.c:1558:glusterd_snap_volinfo_find] 0-management: Snap volume 716efee4a233420f9168de24dd96a0c7.snapshot10.lab.eng.blr.redh at.com.var-run-gluster-snaps-716efee4a233420f9168de24dd96a0c7-brick2-b0 not found [2014-05-12 15:31:23.687750] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/716efee4a233420f9168de24dd96a0c7/brick2/b0 on port 4 9189 [2014-05-12 15:31:23.801734] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /var/run/gluster/snaps/7a4fc5ea23ca4c0899d4acef74c3c32d/brick2/b1 on port 4 9188 Peer did not respond for 30 seconds and is disconnected from cluster: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [2014-05-12 15:31:53.873317] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-management: server 10.70.44.65:24007 has not responded in the last 30 seconds, discon necting. [2014-05-12 15:31:53.873845] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7f27a838be9d] (-->/usr/lib64/libgfrpc.so.0(rp c_clnt_connection_cleanup+0x91) [0x7f27a838b8d1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f27a838b81e]))) 0-management: forced unwinding frame type(gl usterd mgmt v3) op(--(4)) called at 2014-05-12 15:31:23.579332 (xid=0x72) [2014-05-12 15:31:53.873883] E [glusterd-mgmt.c:116:gd_mgmt_v3_collate_errors] 0-management: Commit failed on snapshot12.lab.eng.blr.redhat.com. Please check log file for details. [2014-05-12 15:31:53.874077] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7f27a838be9d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7f27a838b8d1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f27a838b81e]))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2014-05-12 15:31:23.579368 (xid=0x73) [2014-05-12 15:31:53.874099] W [rpc-clnt-ping.c:150:rpc_clnt_ping_cbk] 0-management: timer must have expired [2014-05-12 15:31:53.874145] I [MSGID: 106004] [glusterd-handler.c:4290:__glusterd_peer_rpc_notify] 0-management: Peer 70808d26-8d87-4082-b123-bf22481735df, in Peer in Cluster state, has disconnected from glusterd. [2014-05-12 15:31:53.874187] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol vol0 held by d7ec3cfe-48e1-4ff1-af39-9aedf526979a [2014-05-12 15:31:53.874213] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol vol1 held by 5684a263-cee0-4fef-bc58-a972eb862724 [2014-05-12 15:31:53.874263] W [glusterd-locks.c:641:glusterd_mgmt_v3_unlock] 0-management: Lock owner mismatch. Lock for vol vol2 held by e2a99572-6307-49d3-aa45-19cd596f286c Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.6.0-3.0.el6rhs.x86_64 How reproducible: ================= Issue has been observed multiple times by different QE's. Steps Carried: ============= 1. Start and Create multiple volumes 2*2(vol0 to vol3) for 4 node cluster. 2. Mount the volume (Fuse and NFS) 3. Start the heavy IO from the client 4. Start creating snapshots of vol0 to vol3 at the same time from different nodes. 5. Monitor the glusterd logs AND/OR gluster peer status Actual results: =============== Peer and bricks sees disconnects and snapshot creation fails Expected results: ================= Peers should be in connected state and creation should be successful
Brick process could be busy processing something and didn't respond to glusterd within 30secs. I think this issue exists earlier as well, but it now visible after moving the ping-timer to the lower level to make it available to all the process. Hi Rahul, Can please re-try your test by disabling ping-timer. ping-timer can be disabled by setting the ping-timeout to 0 in the glusterd volume file.
When there are many IO happening on the brick process, epoll thread will be busy and will not respong to the ping packet within 60secs. For now we will disable ping-timer for the below: glusterd->brick brick->glusterd cli->glusterd Later when the epoll thread model changed and made lighter, we need to revert back the change. Patch# 7753 is posted upstream
<Summary of e-mail discussion> This issue requires some more analysis. To unblock QE immediately, a temporary patch is sent to downstream branch (RHS 3.0). https://code.engineering.redhat.com/gerrit/#/c/25040/ This would be a downstream only patch until we uncover the root cause for the ping timeouts. In this patch we are disabling ping timers on the some connections, so that we don't the problems faced go away. This just avoids the problem but doesn't fix it correctly. Root causing of the issue and fixing it shall be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1098021
*** Bug 1098021 has been marked as a duplicate of this bug. ***
I was trying to debug the root cause by executing below steps and found epoll thread is blocked by a big-lock: 1) I made a change in the snapshot creation part of code by adding a sleep of 30sec so that it holds a big-lock for 30+ secs. 2) Executed snapshot creation from two different nodes of two different volumes. After sometime ping-timer on the brick-process expired and disconnected the socket. 3) Here is the stack trace from glusterd. #0 0x0000003c9620b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fac2b1a71fb in __synclock_lock (lock=0x16a9a58) at syncop.c:764 #2 0x00007fac2b1a728e in synclock_lock (lock=0x16a9a58) at syncop.c:782 #3 0x00007fac20e70691 in glusterd_big_locked_cbk (req=0x16d9118, iov=0x16d9158, count=1, myframe=0x7fac29dd7740, fn=0x7fac20ed5680 <gd_mgmt_v3_unlock_cbk_fn>) at glusterd-rpc-ops.c:206 #4 0x00007fac2af41e25 in rpc_clnt_handle_reply (clnt=0x16c62c0, pollin=0x170ca40) at rpc-clnt.c:767 #5 0x00007fac2af42cbf in rpc_clnt_notify (trans=<value optimized out>, mydata=0x16c62f0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:895 #6 0x00007fac2af3e568 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512 #7 0x00007fac20bbce85 in socket_event_poll_in (this=0x16dbb90) at socket.c:2120 #8 0x00007fac20bbe86d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x16dbb90, poll_in=1, poll_out=0, poll_err=0) at socket.c:2233 #9 0x00007fac2b1c07e7 in event_dispatch_epoll_handler (event_pool=0x16824d0) at event-epoll.c:384 #10 event_dispatch_epoll (event_pool=0x16824d0) at event-epoll.c:445 #11 0x0000000000407ecd in main (argc=2, argv=0x7fffcab7fdd8) at glusterfsd.c:2023 Notification handler is blocked by a big-locked callback.
Ping-timer is set for every rpc connection which defaults to 30secs. Patch gives (https://code.engineering.redhat.com/gerrit/#/c/25040/) the workaround by disabling the ping-timer between glusterd and brick process connection. In case if the ping-timer expiry issue still exists, ping-timer can be disabled between glusterd-glusterd connection as well by setting ping-timeout option in 'glusterd.vol' file. We will keep the bug open util the fix is made for the actual issue.
Patch posted: https://code.engineering.redhat.com/gerrit/25294 With this patch default ping-timeout value is '0' for all the rpc connection except for the glsuerd-glusterd (set to 30sec) and client-glusterd (set to 42sec).
Closing this bug. We will open a new RFC bug to address the epoll issue
We cannot set ping-timeout to 0 for glusterd peer connection, because this will create regression for one of the intuit bug# 1034479. Also we are seeing peer disconnect only when multiple snapshot operations are executed simultaneously from different nodes. So we can document the same saying 'set ping-timeout=0' in '/etc/glusterfs/glusterd.vol' when multiple snapshot operations executed simultaneously.
Version: glusterfs-3.6.0.15-1.el6rhs.x86_64 Retried creating snapshots by setting ping-timeout to 30 (default value) in /etc/glusterfs/glusterd.vol, we saw disconnect messages and faced glusterd crashed. Steps followed : ============== created snapshots in loop on 4 volumes at the same time with IO in progress for i in {201..400}; do dd if=/dev/urandom of=fuse_vol2"$i" bs=10M count=1; done [2014-06-11 12:28:40.783197] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-management: server 10.70.40.169:24007 has not responded in the last 30 seconds, disconnecting. [2014-06-11 12:28:56.586750] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-06-11 12:28:57.214459] I [socket.c:2246:socket_event_handler] 0-transport: disconnecting now [2014-06-11 12:28:57.218328] I [MSGID: 106005] [glusterd-handler.c:4131:__glusterd_brick_rpc_notify] 0-management: Brick snapshot14.lab.eng.blr.redhat.com:/var/run/gluster/snaps/c8a70faa0459443c87b37991a243b405/brick2/b2 has disconnected from glusterd. Uploaded the sosreports and the core file: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1096729/
Doc bug raised to track the Known issue BZ 1109150. Lowering the severity of this bug. As discussed with the developers, raised another BZ 1110119 to track the issue of glusterd crash as mentioned in Comment 13.
Please add doc text for this known issue.
Doc text added
When this bug was raised at that time multi-threaded epoll support was not in place, we had to turn off ping timer to get rid of this problem. However still ping timer is not enabled between GlusterDs due to some issues seen in BZ 1259992. Once we identify the root cause of the same and fix it, multi threaded epoll support with ping timer enabled can be brought in back and then this bug can be verified. Moving the bug from ateam to gabbar as test cases coverage belongs to snapshot.
Current Gluster architecture does not support implementation of this feature. Therefore this feature request is deferred till Gluterd 2.0.