Hide Forgot
Description of problem: On simulating a network failure (with iptables) on a server and recovering, glusterd sometimes hangs indefinitely Version-Release number of selected component (if applicable): How reproducible: sometimes Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: GDB on hung glusterd: Thread 6 (Thread 0x7fb7ce3c0700 (LWP 29387)): #0 0x0000003928a0f2a5 in sigwait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x000000000040532b in glusterfs_sigwaiter () No symbol table info available. #2 0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000039282e894d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 5 (Thread 0x7fb7cd9bf700 (LWP 29388)): #0 0x0000003928a0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00000038e304921f in syncenv_task () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00000038e304d7c0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00000039282e894d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 4 (Thread 0x7fb7ccfbe700 (LWP 29389)): #0 0x0000003928a0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00000038e304921f in syncenv_task () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00000038e304d7c0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00000039282e894d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 3 (Thread 0x7fb7cbad7700 (LWP 29665)): #0 0x0000003928a0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00000038e30490ab in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00000038e304913e in synclock_lock () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x00007fb7cc349a81 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #4 0x00000038e340d6f3 in ?? () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #5 0x00000038e302bf30 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #6 0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #7 0x00000039282e894d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 2 (Thread 0x7fb7cb0d6700 (LWP 29666)): #0 0x0000003928a0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fb7cc387183 in ?? () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #2 0x0000003928a07851 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00000039282e894d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 0x7fb7cf95b700 (LWP 29386)): #0 0x0000003928a0b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00000038e30490ab in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #2 0x00000038e304913e in synclock_lock () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #3 0x00007fb7cc349a81 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #4 0x00000038e340cf76 in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #5 0x00007fb7cc3396c4 in glusterd_submit_request () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #6 0x00007fb7cc348e54 in glusterd_cluster_unlock () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #7 0x00007fb7cc325bc9 in ?? () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #8 0x00007fb7cc329361 in glusterd_op_sm () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #9 0x00007fb7cc34b6c1 in __glusterd_cluster_lock_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #10 0x00007fb7cc349a90 in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/3.4.0.44geo1/xlator/mgmt/glusterd.so No symbol table info available. #11 0x00000038e340da5f in saved_frames_unwind () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #12 0x00000038e340db4e in saved_frames_destroy () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #13 0x00000038e340dc33 in rpc_clnt_connection_cleanup () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #14 0x00000038e340e0f4 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #15 0x00000038e3409918 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 No symbol table info available. #16 0x00007fb7cc0825f1 in ?? () from /usr/lib64/glusterfs/3.4.0.44geo1/rpc-transport/socket.so No symbol table info available. #17 0x00000038e305fd37 in ?? () from /usr/lib64/libglusterfs.so.0 No symbol table info available. #18 0x0000000000406988 in main () No symbol table info available. A debugging session is active.
REVIEW: http://review.gluster.org/6413 (glusterd: submit RPC requests without holding big lock) posted (#1) for review on master by Anand Avati (avati)
REVIEW: http://review.gluster.org/6413 (glusterd: submit RPC requests without holding big lock) posted (#2) for review on master by Anand Avati (avati)
REVIEW: http://review.gluster.org/6414 (glusterd: submit RPC requests without holding big lock) posted (#1) for review on release-3.4 by Anand Avati (avati)
REVIEW: http://review.gluster.org/6415 (glusterd: submit RPC requests without holding big lock) posted (#1) for review on release-3.5 by Anand Avati (avati)
COMMIT: http://review.gluster.org/6413 committed in master by Anand Avati (avati) ------ commit ae540f8e2732ab1bd0fbeabd4d4f5c6f2f417914 Author: Anand Avati <avati> Date: Wed Nov 27 05:09:57 2013 -0800 glusterd: submit RPC requests without holding big lock If the endpoint of an RPC is not connected, the callback is called synchronously within rpc_clnt_submit(). Since callbacks typically hold the big lock, give up the big lock before calling rpc_clnt_submit and acquire it freshly after the call. Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b BUG: 1037849 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/6413 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Amar Tumballi <amarts> Reviewed-by: Krishnan Parthasarathi <kparthas>
COMMIT: http://review.gluster.org/6414 committed in release-3.4 by Anand Avati (avati) ------ commit c23f35f7ad28b03b3ce5a530c7453bc9f5b7bc05 Author: Anand Avati <avati> Date: Wed Nov 27 05:09:57 2013 -0800 glusterd: submit RPC requests without holding big lock If the endpoint of an RPC is not connected, the callback is called synchronously within rpc_clnt_submit(). Since callbacks typically hold the big lock, give up the big lock before calling rpc_clnt_submit and acquire it freshly after the call. Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b BUG: 1037849 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/6414 Tested-by: Gluster Build System <jenkins.com>
COMMIT: http://review.gluster.org/6415 committed in release-3.5 by Vijay Bellur (vbellur) ------ commit e3873729d820c0c2e63bb3bb878c39d79a16acf5 Author: Anand Avati <avati> Date: Wed Nov 27 05:09:57 2013 -0800 glusterd: submit RPC requests without holding big lock If the endpoint of an RPC is not connected, the callback is called synchronously within rpc_clnt_submit(). Since callbacks typically hold the big lock, give up the big lock before calling rpc_clnt_submit and acquire it freshly after the call. Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b BUG: 1037849 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/6415 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krishnan Parthasarathi <kparthas> Tested-by: Krishnan Parthasarathi <kparthas>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user