Bug 1431175 - volume start command hangs
Summary: volume start command hangs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Jeff Darcy
QA Contact:
URL:
Whiteboard: brick-multiplexing-testing
Depends On: 1421721
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-10 14:20 UTC by Jeff Darcy
Modified: 2017-04-05 00:02 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.10.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1421721
Environment:
Last Closed: 2017-04-05 00:02:14 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jeff Darcy 2017-03-10 14:20:11 UTC
+++ This bug was initially created as a clone of Bug #1421721 +++

Description of problem:

If cluster.brick-multiplex was turned on and a volume was created and stopped, then if someone tries to switch off brick multiplexing and start the volume, command hangs with the following backtrace:

(gdb) t a a bt

Thread 8 (Thread 0x7f42977fe700 (LWP 21259)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f42a4c87bfb in __synclock_lock (lock=lock@entry=0x7f42a4fac828) at syncop.c:924
#2  0x00007f42a4c8a7c6 in synclock_lock (lock=lock@entry=0x7f42a4fac828) at syncop.c:950
#3  0x00007f429fc1b65b in glusterd_big_locked_notify (rpc=0x7f4298001fc0, mydata=0x7f4298198730, 
    event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7f429fc18c20 <__glusterd_brick_rpc_notify>)
    at glusterd-handler.c:68
#4  0x00007f42a4a1fc14 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f4298002018, event=<optimized out>, 
    data=0x7f4298194800) at rpc-clnt.c:1004
#5  0x00007f42a4a1c223 in rpc_transport_notify (this=this@entry=0x7f4298194800, 
    event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7f4298194800) at rpc-transport.c:538
#6  0x00007f429d4f7d59 in socket_connect_finish (this=this@entry=0x7f4298194800) at socket.c:2344
#7  0x00007f429d4fc047 in socket_event_handler (fd=<optimized out>, idx=4, data=0x7f4298194800, poll_in=0, 
    poll_out=4, poll_err=0) at socket.c:2387
#8  0x00007f42a4ca621a in event_dispatch_epoll_handler (event=0x7f42977fde90, event_pool=0x1a67f20)
    at event-epoll.c:572
#9  event_dispatch_epoll_worker (data=0x1a88f50) at event-epoll.c:675
#10 0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#11 0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f4297fff700 (LWP 21258)):
#0  0x00007f42a3ab4bc0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f429fcc415b in hooks_worker (args=<optimized out>) at glusterd-hooks.c:529
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f42a0780700 (LWP 21130)):
#0  0x00007f42a3ab4f69 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f42a4c89929 in syncenv_task (proc=proc@entry=0x1a70fc0) at syncop.c:603
#2  0x00007f42a4c8a500 in syncenv_processor (thdata=0x1a70fc0) at syncop.c:695
---Type <return> to continue, or q <return> to quit---
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f42a0f81700 (LWP 21129)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c614f5 in gf_timer_call_cancel (ctx=ctx@entry=0x1a31010, event=0x7f4280000920) at timer.c:98
#3  0x00007f42a4c58173 in gf_log_disable_suppression_before_exit (ctx=0x1a31010) at logging.c:428
#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at common-utils.c:714
#5  <signal handler called>
#6  0x00007f42a4c61481 in list_del (old=0x7f4280000a40) at list.h:76
#7  gf_timer_call_cancel (ctx=<optimized out>, event=0x7f4280000a40) at timer.c:103
#8  0x00007f429fc425fe in glusterd_volume_start_glusterfs (volinfo=volinfo@entry=0x7f428c01b120, 
    brickinfo=brickinfo@entry=0x7f428c011d40, wait=wait@entry=_gf_true) at glusterd-utils.c:1879
#9  0x00007f429fc44a24 in glusterd_brick_start (volinfo=volinfo@entry=0x7f428c01b120, 
    brickinfo=brickinfo@entry=0x7f428c011d40, wait=wait@entry=_gf_true) at glusterd-utils.c:5382
#10 0x00007f429fcb2f01 in glusterd_start_volume (volinfo=0x7f428c01b120, flags=<optimized out>, 
    wait=wait@entry=_gf_true) at glusterd-volume-ops.c:2508
#11 0x00007f429fcb3351 in glusterd_op_start_volume (dict=dict@entry=0x7f42981882e0, 
    op_errstr=op_errstr@entry=0x7f4288206ba8) at glusterd-volume-ops.c:2636
#12 0x00007f429fce9a1b in gd_mgmt_v3_commit_fn (op=op@entry=GD_OP_START_VOLUME, dict=dict@entry=0x7f42981882e0, 
    op_errstr=op_errstr@entry=0x7f4288206ba8, op_errno=op_errno@entry=0x7f4288206b9c, 
    rsp_dict=rsp_dict@entry=0x7f4298185ba0) at glusterd-mgmt.c:319
#13 0x00007f429fcec7f4 in glusterd_mgmt_v3_commit (op=op@entry=GD_OP_START_VOLUME, 
    op_ctx=op_ctx@entry=0x7f429819ac70, req_dict=0x7f42981882e0, op_errstr=op_errstr@entry=0x7f4288206ba8, 
    op_errno=op_errno@entry=0x7f4288206b9c, txn_generation=txn_generation@entry=0) at glusterd-mgmt.c:1524
#14 0x00007f429fcee360 in glusterd_mgmt_v3_initiate_all_phases (req=req@entry=0x7f4288003750, 
    op=op@entry=GD_OP_START_VOLUME, dict=0x7f429819ac70) at glusterd-mgmt.c:2141
#15 0x00007f429fcac532 in __glusterd_handle_cli_start_volume (req=req@entry=0x7f4288003750)
    at glusterd-volume-ops.c:549
#16 0x00007f429fc1b6cd in glusterd_big_locked_handler (req=0x7f4288003750, 
---Type <return> to continue, or q <return> to quit---
    actor_fn=0x7f429fcac2d0 <__glusterd_handle_cli_start_volume>) at glusterd-handler.c:81
#17 0x00007f42a4c87b10 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#18 0x00007f42a32ce2c0 in ?? () from /lib64/libc.so.6
#19 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f42a1782700 (LWP 21128)):
#0  0x00007f42a334e14d in nanosleep () from /lib64/libc.so.6
#1  0x00007f42a334e09a in sleep () from /lib64/libc.so.6
#2  0x00007f42a4c7927a in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f42a1f83700 (LWP 21127)):
#0  0x00007f42a3ab89d6 in sigwait () from /lib64/libpthread.so.0
#1  0x00000000004085c7 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2095
#2  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#3  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f42a2784700 (LWP 21126)):
#0  0x00007f42a3ab7add in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f42a3ab19fd in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f42a4c60f65 in gf_timer_proc (data=0x1a6e7c0) at timer.c:135
#3  0x00007f42a3aaf5ba in start_thread () from /lib64/libpthread.so.0
#4  0x00007f42a33887cd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f42a512e780 (LWP 21125)):
#0  0x00007f42a3ab06ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f42a4ca6668 in event_dispatch_epoll (event_pool=0x1a67f20) at event-epoll.c:759
#2  0x00000000004059b0 in main (argc=<optimized out>, argv=<optimized out>) at glusterfsd.c:2505

The above trace does indicate a deadlock situation.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Jeff Darcy on 2017-02-13 13:50:15 EST ---

Looks a bit racy.  I ran this sequence a few times and didn't see any failures.  This part of the trace (thread 3) seems most important:

#4  0x00007f42a4c5daf0 in gf_print_trace (signum=11, ctx=0x1a31010) at common-utils.c:714
#5  <signal handler called>

So we're probably looking at some sort of list/memory corruption.  Is it possible that the rpc isn't completely valid (even though the pointer is non-null) or that the timer has already been removed from its list?

--- Additional comment from Worker Ant on 2017-02-16 17:01:39 EST ---

REVIEW: https://review.gluster.org/16650 (tests: add test for toggling MPX and restarting a volume) posted (#1) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Jeff Darcy on 2017-02-16 17:04:48 EST ---

I have tested this more than a hundred times, with zero failures.  You can see the script here.

https://review.gluster.org/#/c/16650/

Is there something else that you did, that's missing from the script?

--- Additional comment from Jeff Darcy on 2017-02-16 21:20:19 EST ---

There is one place in the multiplexing code where I added a call to gf_timer_call_cancel.  It's in glusterd_volume_start_glusterfs, which is part of the stack above.  Suspiciously, this is done without a lock on conn->lock as is done in other places (e.g. rpc_clnt_reconnect_cleanup).  This fits with the theory that it's a race.  In gf_timer_call_cancel we do call list_del instead of list_del_init, and there seems to be no other protection against being called twice, so I suspect that when we hit the race (because of the missing lock) we corrupt the list.  Unfortunately, since the test already passes consistently for me, I won't be able to test whether the fix has any effect.

--- Additional comment from Worker Ant on 2017-02-17 09:44:55 EST ---

REVIEW: https://review.gluster.org/16662 (glusterd: take conn->lock around operations on conn->reconnect) posted (#1) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Worker Ant on 2017-02-17 10:00:12 EST ---

REVIEW: https://review.gluster.org/16662 (glusterd: take conn->lock around operations on conn->reconnect) posted (#2) for review on master by Jeff Darcy (jdarcy)

--- Additional comment from Worker Ant on 2017-02-18 01:59:14 EST ---

COMMIT: https://review.gluster.org/16662 committed in master by Atin Mukherjee (amukherj) 
------
commit 4e0d4b15717da1f6466133158a26927fb91384b8
Author: Jeff Darcy <jdarcy>
Date:   Fri Feb 17 09:42:46 2017 -0500

    glusterd: take conn->lock around operations on conn->reconnect
    
    Failure to do this could lead to a race in which a timer would be
    removed twice concurrently, corrupting the timer list (because
    gf_timer_call_cancel has no internal protection against this) and
    possibly causing a crash.
    
    Change-Id: Ic1a8b612d436daec88fd6cee935db0ae81a47d5c
    BUG: 1421721
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: https://review.gluster.org/16662
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 1 Worker Ant 2017-03-10 14:27:52 UTC
REVIEW: https://review.gluster.org/16885 (glusterd: take conn->lock around operations on conn->reconnect) posted (#1) for review on release-3.10 by Jeff Darcy (jdarcy)

Comment 2 Worker Ant 2017-03-10 19:48:38 UTC
COMMIT: https://review.gluster.org/16885 committed in release-3.10 by Shyamsundar Ranganathan (srangana) 
------
commit 4ee59efe3826745a5359bd36e79957d3b78ec4ba
Author: Jeff Darcy <jdarcy>
Date:   Fri Feb 17 09:42:46 2017 -0500

    glusterd: take conn->lock around operations on conn->reconnect
    
    Failure to do this could lead to a race in which a timer would be
    removed twice concurrently, corrupting the timer list (because
    gf_timer_call_cancel has no internal protection against this) and
    possibly causing a crash.
    
    Backport of:
    > 4e0d4b15717da1f6466133158a26927fb91384b8
    > BUG: 1421721
    > Reviewed-on: https://review.gluster.org/16662
    
    Change-Id: Ic1a8b612d436daec88fd6cee935db0ae81a47d5c
    BUG: 1431175
    Signed-off-by: Jeff Darcy <jdarcy>
    Reviewed-on: https://review.gluster.org/16885
    Reviewed-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Shyamsundar 2017-04-05 00:02:14 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.1, please open a new bug report.

glusterfs-3.10.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-April/030494.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.