Bug 1037849 - glusterd hangs on big lock
Summary: glusterd hangs on big lock
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1037851
TreeView+ depends on / blocked
 
Reported: 2013-12-03 22:19 UTC by Anand Avati
Modified: 2015-11-03 23:05 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1037851 (view as bug list)
Environment:
Last Closed: 2014-04-17 11:51:54 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Anand Avati 2013-12-03 22:19:46 UTC
Description of problem:

On simulating a network failure (with iptables) on a server and recovering, glusterd sometimes hangs indefinitely

Version-Release number of selected component (if applicable):


How reproducible:

sometimes


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

GDB on hung glusterd:

Thread 6 (Thread 0x7fb7ce3c0700 (LWP 29387)):
#0  0x0000003928a0f2a5 in sigwait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x000000000040532b in glusterfs_sigwaiter ()
No symbol table info available.
#2  0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00000039282e894d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 5 (Thread 0x7fb7cd9bf700 (LWP 29388)):
#0  0x0000003928a0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000038e304921f in syncenv_task () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00000038e304d7c0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#3  0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000039282e894d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 4 (Thread 0x7fb7ccfbe700 (LWP 29389)):
#0  0x0000003928a0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000038e304921f in syncenv_task () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00000038e304d7c0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#3  0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x00000039282e894d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 3 (Thread 0x7fb7cbad7700 (LWP 29665)):
#0  0x0000003928a0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000038e30490ab in ?? () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00000038e304913e in synclock_lock () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#3  0x00007fb7cc349a81 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#4  0x00000038e340d6f3 in ?? () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#5  0x00000038e302bf30 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#6  0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#7  0x00000039282e894d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (Thread 0x7fb7cb0d6700 (LWP 29666)):
#0  0x0000003928a0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fb7cc387183 in ?? () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#2  0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00000039282e894d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (Thread 0x7fb7cf95b700 (LWP 29386)):
#0  0x0000003928a0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000038e30490ab in ?? () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#2  0x00000038e304913e in synclock_lock () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#3  0x00007fb7cc349a81 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#4  0x00000038e340cf76 in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#5  0x00007fb7cc3396c4 in glusterd_submit_request () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#6  0x00007fb7cc348e54 in glusterd_cluster_unlock () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#7  0x00007fb7cc325bc9 in ?? () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#8  0x00007fb7cc329361 in glusterd_op_sm () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#9  0x00007fb7cc34b6c1 in __glusterd_cluster_lock_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#10 0x00007fb7cc349a90 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so
No symbol table info available.
#11 0x00000038e340da5f in saved_frames_unwind () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#12 0x00000038e340db4e in saved_frames_destroy () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#13 0x00000038e340dc33 in rpc_clnt_connection_cleanup () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#14 0x00000038e340e0f4 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#15 0x00000038e3409918 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
No symbol table info available.
#16 0x00007fb7cc0825f1 in ?? () from /usr/lib64/glusterfs/3.4.0.44geo1/rpc-transport/socket.so
No symbol table info available.
#17 0x00000038e305fd37 in ?? () from /usr/lib64/libglusterfs.so.0
No symbol table info available.
#18 0x0000000000406988 in main ()
No symbol table info available.
A debugging session is active.

Comment 2 Anand Avati 2013-12-03 22:29:38 UTC
REVIEW: http://review.gluster.org/6413 (glusterd: submit RPC requests without holding big lock) posted (#1) for review on master by Anand Avati (avati)

Comment 3 Anand Avati 2013-12-03 22:53:31 UTC
REVIEW: http://review.gluster.org/6413 (glusterd: submit RPC requests without holding big lock) posted (#2) for review on master by Anand Avati (avati)

Comment 4 Anand Avati 2013-12-03 23:02:34 UTC
REVIEW: http://review.gluster.org/6414 (glusterd: submit RPC requests without holding big lock) posted (#1) for review on release-3.4 by Anand Avati (avati)

Comment 5 Anand Avati 2013-12-03 23:03:01 UTC
REVIEW: http://review.gluster.org/6415 (glusterd: submit RPC requests without holding big lock) posted (#1) for review on release-3.5 by Anand Avati (avati)

Comment 6 Anand Avati 2013-12-04 06:37:14 UTC
COMMIT: http://review.gluster.org/6413 committed in master by Anand Avati (avati) 
------
commit ae540f8e2732ab1bd0fbeabd4d4f5c6f2f417914
Author: Anand Avati <avati>
Date:   Wed Nov 27 05:09:57 2013 -0800

    glusterd: submit RPC requests without holding big lock
    
    If the endpoint of an RPC is not connected, the callback is called
    synchronously within rpc_clnt_submit(). Since callbacks typically
    hold the big lock, give up the big lock before calling rpc_clnt_submit
    and acquire it freshly after the call.
    
    Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b
    BUG: 1037849
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6413
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Krishnan Parthasarathi <kparthas>

Comment 7 Anand Avati 2013-12-04 08:35:31 UTC
COMMIT: http://review.gluster.org/6414 committed in release-3.4 by Anand Avati (avati) 
------
commit c23f35f7ad28b03b3ce5a530c7453bc9f5b7bc05
Author: Anand Avati <avati>
Date:   Wed Nov 27 05:09:57 2013 -0800

    glusterd: submit RPC requests without holding big lock
    
    If the endpoint of an RPC is not connected, the callback is called
    synchronously within rpc_clnt_submit(). Since callbacks typically
    hold the big lock, give up the big lock before calling rpc_clnt_submit
    and acquire it freshly after the call.
    
    Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b
    BUG: 1037849
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6414
    Tested-by: Gluster Build System <jenkins.com>

Comment 9 Anand Avati 2013-12-05 18:05:29 UTC
COMMIT: http://review.gluster.org/6415 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit e3873729d820c0c2e63bb3bb878c39d79a16acf5
Author: Anand Avati <avati>
Date:   Wed Nov 27 05:09:57 2013 -0800

    glusterd: submit RPC requests without holding big lock
    
    If the endpoint of an RPC is not connected, the callback is called
    synchronously within rpc_clnt_submit(). Since callbacks typically
    hold the big lock, give up the big lock before calling rpc_clnt_submit
    and acquire it freshly after the call.
    
    Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b
    BUG: 1037849
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/6415
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Krishnan Parthasarathi <kparthas>

Comment 10 Niels de Vos 2014-04-17 11:51:54 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.