Steps to reproduce:
1. Create 90 1 X 3 volumes with brick mux mode in a 3 node cluster and start all the volumes
2. trigger volume stop & delete commands parallely from the cli of N1 & N2 & N3 in following manner:
node 1 : vol1 - vol30
node 2 : vol31 - vol60
node 3: vol61 - vol90
some of the volume stop started timing out and based on further probing into glusterd processes one of the glusterd's thread was observed as:
T hread 8 (Thread 0x7f9baadfb700 (LWP 12643)):
#0 0x00007f9bb202d460 in nanosleep () from /lib64/libc.so.6
#1 0x00007f9bb202d36a in sleep () from /lib64/libc.so.6
#2 0x00007f9bae7d0882 in glusterd_wait_for_blockers (priv=0x7f9bb3c92050) at glusterd-op-sm.c:6264
#3 0x00007f9bae7d912d in glusterd_op_commit_perform (op=GD_OP_STOP_VOLUME, dict=dict@entry=0x7f9ba4a0ec70, op_errstr=op_errstr@entry=0x7f9baadfa978,
rsp_dict=rsp_dict@entry=0x7f9ba0226d30) at glusterd-op-sm.c:6287
#4 0x00007f9bae7e24b0 in glusterd_op_ac_commit_op (event=0x7f9ba0212980, ctx=0x7f9ba4d1f3a0) at glusterd-op-sm.c:6019
#5 0x00007f9bae7df6ab in glusterd_op_sm () at glusterd-op-sm.c:8391
#6 0x00007f9bae80c10c in __glusterd_brick_op_cbk (req=req@entry=0x7f9ba4cc47a0, iov=iov@entry=0x7f9ba4cc47e0, count=count@entry=1,
myframe=myframe@entry=0x7f9ba4b15b70) at glusterd-rpc-ops.c:2241
#7 0x00007f9bae80f0e9 in glusterd_big_locked_cbk (req=0x7f9ba4cc47a0, iov=0x7f9ba4cc47e0, count=1, myframe=0x7f9ba4b15b70,
fn=0x7f9bae80bd50 <__glusterd_brick_op_cbk>) at glusterd-rpc-ops.c:223
#8 0x00007f9bb3762a50 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f9ba46547a0, pollin=pollin@entry=0x7f9ba00c4170) at rpc-clnt.c:778
#9 0x00007f9bb3762da3 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f9ba46547d0, event=<optimized out>, data=0x7f9ba00c4170) at rpc-clnt.c:971
#10 0x00007f9bb375f313 in rpc_transport_notify (this=this@entry=0x7f9ba44f5190, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f9ba00c4170)
#11 0x00007f9baba70bb2 in socket_event_poll_in (this=this@entry=0x7f9ba44f5190, notify_handled=<optimized out>) at socket.c:2315
#12 0x00007f9baba73023 in socket_event_handler (fd=13, idx=7, gen=1, data=0x7f9ba44f5190, poll_in=<optimized out>, poll_out=0, poll_err=0) at socket.c:2467
#13 0x00007f9bb39ecad9 in event_dispatch_epoll_handler (event=0x7f9baadfae74, event_pool=0xeca570) at event-epoll.c:583
#14 event_dispatch_epoll_worker (data=0xf2ab00) at event-epoll.c:659
#15 0x00007f9bb27af50b in start_thread () from /lib64/libpthread.so.0
#16 0x00007f9bb205f16f in clone () from /lib64/libc.so.6
Turned out to be a brick multiplexing bug in glusterd.
When a volume is stopped, individual bricks get detached from the brick process through brick op phase in the volume stop transaction and in the brick-op callback the counter called "blockers" is decremented insider sync_lock & sync_unlock () section. Before sending the rpc_submit the same counter is incremented. The goal of this counter is to ensure at any given point of time, before originating a commit phase of a transaction this counter is set to 0. Now because of heketi distributes the volume stop command across different cli & glusterd, the processing of these volume stop requests become parallel and hence because of the additional context switches through sync_lock and sync_unlock for updating the blocker counter, the overall time of wait process goes up. Since RHGS 3.3.1 if any regular transactions take more than 180 seconds the lock timer is forcibly expired. Unfortunately in this situation some of the transactions are kicked out while they are still in progress. This is the reason why we see unlock timer callback log entries in glusterd log. I believe this also had a ripple effect on the saved frames which is why we see critical log like:
[2018-08-28 21:23:49.759511] C [rpc-clnt.c:449:rpc_clnt_fill_request_info] 0-management: cannot lookup the saved frame corresponding to xid (2170)
and this causes a disconnect to the original brick given the rpc connection for all the attached bricks are a copy of the parent brick connection.
The need of updating the blockers counter under synclock is an overkill. Updating it through GF_ATOMIC API should be sufficient for the atomicity guarantee.
upstream patch : https://review.gluster.org/#/c/glusterfs/+/21221/
REVIEW: https://review.gluster.org/21221 (glusterd: Use GF_ATOMIC to update 'blockers' counter at glusterd_conf) posted (#3) for review on master by Atin Mukherjee
COMMIT: https://review.gluster.org/21221 committed in master by "Atin Mukherjee" <firstname.lastname@example.org> with a commit message- glusterd: Use GF_ATOMIC to update 'blockers' counter at glusterd_conf
Currently in glusterd code uses sync_lock/sync_unlock to update blockers
counter which could add delays to the overall transaction phase
escpecially when there's a batch of volume stop operations processed by
glusterd in brick multiplexing mode.
Solution: Use GF_ATOMIC to update blocker counter to ensure unnecessary
context switching can be avoided.
Signed-off-by: Mohit Agrawal <email@example.com>
REVIEW: https://review.gluster.org/21242 (glusterd: don't wait for blockers flag for stop volume) posted (#2) for review on master by Atin Mukherjee
COMMIT: https://review.gluster.org/21242 committed in master by "Atin Mukherjee" <firstname.lastname@example.org> with a commit message- glusterd: don't wait for blockers flag for stop volume
With commit 4f6ae8 even though the overall transaction time for gluster
volume stop can be reduced, but based on testing it can't be guaranteed
that the transaction can finish in 3 minutes before the unlock timer
gets kicked in. The ground problem to this is the command serialization
which atomic field 'blockers' does for volume stop operation.
This patch removes that dependency for volume stop not to wait for
Signed-off-by: Atin Mukherjee <email@example.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.
glusterfs-6.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.