Bug 1421721

Summary:	volume start command hangs
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	glusterd	Assignee:	Jeff Darcy <jeff>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	mainline	CC:	amukherj, bugs, jdarcy
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	brick-multiplexing-testing
Fixed In Version:	glusterfs-3.11.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1431175 (view as bug list)		Environment:
Last Closed:	2017-05-30 18:42:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1431175

Description Atin Mukherjee 2017-02-13 14:16:49 UTC

Description of problem:

If cluster.brick-multiplex was turned on and a volume was created and stopped, then if someone tries to switch off brick multiplexing and start the volume, command hangs with the following backtrace:

(gdb) t a a bt

Thread 8 (Thread 0x7f42977fe700 (LWP 21259)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f42a4c87bfb in __synclock_lock (lock=lock@entry=0x7f42a4fac828) at syncop.c:924
#2  0x00007f42a4c8a7c6 in synclock_lock (lock=lock@entry=0x7f42a4fac828) at syncop.c:950
#3  0x00007f429fc1b65b in glusterd_big_locked_notify (rpc=0x7f4298001fc0, mydata=0x7f4298198730, 
    event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f429fc18c20 <__glusterd_brick_rpc_notify>)
    at glusterd-handler.c:68
#4  0x00007f42a4a1fc14 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f4298002018, event=<optimized out>, 
    data=0x7f4298194800) at rpc-clnt.c:1004
#5  0x00007f42a4a1c223 in rpc_transport_notify (this=this@entry=0x7f4298194800, 
    event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7f4298194800) at rpc-transport.c:538
#6  0x00007f429d4f7d59 in socket_connect_finish (this=this@entry=0x7f4298194800) at socket.c:2344
#7  0x00007f429d4fc047 in socket_event_handler (fd=<optimized out>, idx=4, data=0x7f4298194800, poll_in=0, 
    poll_out=4, poll_err=0) at socket.c:2387
#8  0x00007f42a4ca621a in event_dispatch_epoll_handler (event=0x7f42977fde90, event_pool=0x1a67f20)
    at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x1a88f50) at event-epoll.c:675
#10 0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#11 0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f4297fff700 (LWP 21258)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f429fcc415b in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f42a0780700 (LWP 21130)):
#0  0x00007f42a3ab4f69 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f42a4c89929 in syncenv_task (proc=proc@entry=0x1a70fc0) at syncop.c:603
#2  0x00007f42a4c8a500 in syncenv_processor (thdata=0x1a70fc0) at syncop.c:695
---Type <return> to continue, or q <return> to quit---
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f42a0f81700 (LWP 21129)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c614f5 in gf_timer_call_cancel (ctx=ctx@entry=0x1a31010, event=0x7f4280000920) at timer.c:98
#3  0x00007f42a4c58173 in gf_log_disable_suppression_before_exit (ctx=0x1a31010) at logging.c:428
#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at common-utils.c:714
#5  <signal handler called>
#6  0x00007f42a4c61481 in list_del (old=0x7f4280000a40) at list.h:76
#7  gf_timer_call_cancel (ctx=<optimized out>, event=0x7f4280000a40) at timer.c:103
#8  0x00007f429fc425fe in glusterd_volume_start_glusterfs (volinfo=volinfo@entry=0x7f428c01b120, 
    brickinfo=brickinfo@entry=0x7f428c011d40, wait=wait@entry=_gf_true) at glusterd-utils.c:1879
#9  0x00007f429fc44a24 in glusterd_brick_start (volinfo=volinfo@entry=0x7f428c01b120, 
    brickinfo=brickinfo@entry=0x7f428c011d40, wait=wait@entry=_gf_true) at glusterd-utils.c:5382
#10 0x00007f429fcb2f01 in glusterd_start_volume (volinfo=0x7f428c01b120, flags=<optimized out>, 
    wait=wait@entry=_gf_true) at glusterd-volume-ops.c:2508
#11 0x00007f429fcb3351 in glusterd_op_start_volume (dict=dict@entry=0x7f42981882e0, 
    op_errstr=op_errstr@entry=0x7f4288206ba8) at glusterd-volume-ops.c:2636
#12 0x00007f429fce9a1b in gd_mgmt_v3_commit_fn (op=op@entry=GD_OP_START_VOLUME, dict=dict@entry=0x7f42981882e0, 
    op_errstr=op_errstr@entry=0x7f4288206ba8, op_errno=op_errno@entry=0x7f4288206b9c, 
    rsp_dict=rsp_dict@entry=0x7f4298185ba0) at glusterd-mgmt.c:319
#13 0x00007f429fcec7f4 in glusterd_mgmt_v3_commit (op=op@entry=GD_OP_START_VOLUME, 
    op_ctx=op_ctx@entry=0x7f429819ac70, req_dict=0x7f42981882e0, op_errstr=op_errstr@entry=0x7f4288206ba8, 
    op_errno=op_errno@entry=0x7f4288206b9c, txn_generation=txn_generation@entry=0) at glusterd-mgmt.c:1524
#14 0x00007f429fcee360 in glusterd_mgmt_v3_initiate_all_phases (req=req@entry=0x7f4288003750, 
    op=op@entry=GD_OP_START_VOLUME, dict=0x7f429819ac70) at glusterd-mgmt.c:2141
#15 0x00007f429fcac532 in __glusterd_handle_cli_start_volume (req=req@entry=0x7f4288003750)
    at glusterd-volume-ops.c:549
#16 0x00007f429fc1b6cd in glusterd_big_locked_handler (req=0x7f4288003750, 
---Type <return> to continue, or q <return> to quit---
    actor_fn=0x7f429fcac2d0 <__glusterd_handle_cli_start_volume>) at glusterd-handler.c:81
#17 0x00007f42a4c87b10 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#18 0x00007f42a32ce2c0 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f42a1782700 (LWP 21128)):
#0  0x00007f42a334e14d in nanosleep () from /lib64/libc.so.6
#1  0x00007f42a334e09a in sleep () from /lib64/libc.so.6
#2  0x00007f42a4c7927a in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f42a1f83700 (LWP 21127)):
#0  0x00007f42a3ab89d6 in sigwait () from /lib64/libpthread.so.0
#1  0x00000000004085c7 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2095
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f42a2784700 (LWP 21126)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c60f65 in gf_timer_proc (data=0x1a6e7c0) at timer.c:135
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f42a512e780 (LWP 21125)):
#0  0x00007f42a3ab06ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f42a4ca6668 in event_dispatch_epoll (event_pool=0x1a67f20) at event-epoll.c:759
#2  0x00000000004059b0 in main (argc=<optimized out>, argv=<optimized out>) at glusterfsd.c:2505

The above trace does indicate a deadlock situation.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Darcy 2017-02-13 18:50:15 UTC

Looks a bit racy.  I ran this sequence a few times and didn't see any failures.  This part of the trace (thread 3) seems most important:

#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at common-utils.c:714
#5  <signal handler called>

So we're probably looking at some sort of list/memory corruption.  Is it possible that the rpc isn't completely valid (even though the pointer is non-null) or that the timer has already been removed from its list?

Comment 2 Worker Ant 2017-02-16 22:01:39 UTC

REVIEW: https://review.gluster.org/16650 (tests: add test for toggling MPX and restarting a volume) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 3 Jeff Darcy 2017-02-16 22:04:48 UTC

I have tested this more than a hundred times, with zero failures.  You can see the script here.

https://review.gluster.org/#/c/16650/

Is there something else that you did, that's missing from the script?

Comment 4 Jeff Darcy 2017-02-17 02:20:19 UTC

There is one place in the multiplexing code where I added a call to gf_timer_call_cancel.  It's in glusterd_volume_start_glusterfs, which is part of the stack above.  Suspiciously, this is done without a lock on conn->lock as is done in other places (e.g. rpc_clnt_reconnect_cleanup).  This fits with the theory that it's a race.  In gf_timer_call_cancel we do call list_del instead of list_del_init, and there seems to be no other protection against being called twice, so I suspect that when we hit the race (because of the missing lock) we corrupt the list.  Unfortunately, since the test already passes consistently for me, I won't be able to test whether the fix has any effect.

Comment 5 Worker Ant 2017-02-17 14:44:55 UTC

REVIEW: https://review.gluster.org/16662 (glusterd: take conn->lock around operations on conn->reconnect) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 6 Worker Ant 2017-02-17 15:00:12 UTC

REVIEW: https://review.gluster.org/16662 (glusterd: take conn->lock around operations on conn->reconnect) posted (#2) for review on master by Jeff Darcy (jdarcy)

Comment 7 Worker Ant 2017-02-18 06:59:14 UTC

COMMIT: https://review.gluster.org/16662 committed in master by Atin Mukherjee (amukherj) 
------
commit 4e0d4b15717da1f6466133158a26927fb91384b8
Author: Jeff Darcy <jdarcy>
Date:   Fri Feb 17 09:42:46 2017 -0500

    glusterd: take conn->lock around operations on conn->reconnect
    
    Failure to do this could lead to a race in which a timer would be
    removed twice concurrently, corrupting the timer list (because
    gf_timer_call_cancel has no internal protection against this) and
    possibly causing a crash.
    
    Change-Id: Ic1a8b612d436daec88fd6cee935db0ae81a47d5c
    BUG: 1421721
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: https://review.gluster.org/16662
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 8 Shyamsundar 2017-05-30 18:42:26 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/