Description of problem: ======================= NFS mount is hung while running plain IO. This looked to be malloc/free deadlock. Backtrace: ========== (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007f41268b7588 in _L_lock_854 () from /lib64/libpthread-2.12.so #2 0x00007f41268b7457 in __pthread_mutex_lock (mutex=0x7f412859adf0) at pthread_mutex_lock.c:61 #3 0x00007f4127804942 in iobuf_get2 (iobuf_pool=0x7f412859adf0, page_size=8192) at iobuf.c:662 #4 0x00007f411a0a8bca in client3_3_fxattrop (frame=0x7f4125388000, this=0x7f4114026340, data=0x7f411ed09b40) at client-rpc-fops.c:5279 #5 0x00007f411a09a6a8 in client_fxattrop (frame=0x7f4125388000, this=<value optimized out>, fd=<value optimized out>, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at client.c:1429 #6 0x00007f4119e3e36a in ec_wind_fxattrop (ec=0x7f41140b9160, fop=0x7f4118087970, idx=0) at ec-generic.c:1615 #7 0x00007f4119e3b0ed in ec_dispatch_mask (fop=0x7f4118087970, mask=63) at ec-common.c:550 #8 0x00007f4119e3fa45 in ec_manager_xattrop (fop=0x7f4118087970, state=<value optimized out>) at ec-generic.c:1434 #9 0x00007f4119e39d47 in __ec_manager (fop=0x7f4118087970, error=<value optimized out>) at ec-common.c:1937 #10 0x00007f4119e3a272 in ec_update_size_version (link=0x7f411809d7d4, version=<value optimized out>, size=0, dirty=<value optimized out>) at ec-common.c:1612 #11 0x00007f4119e3a41d in ec_update_info (link=<value optimized out>) at ec-common.c:1661 #12 0x00007f4119e3cad8 in ec_unlock_now (link=0x7f411809d7d4) at ec-common.c:1674 #13 0x00007f41277ec713 in gf_timer_proc (ctx=0x7f4128598010) at timer.c:194 #14 0x00007f41268b5a51 in start_thread (arg=0x7f411ed0a700) at pthread_create.c:301 #15 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) t a a bt Thread 8 (Thread 0x7f4127c79740 (LWP 32059)): #0 0x00007f41268b62ad in pthread_join (threadid=139917599262464, thread_return=0x0) at pthread_join.c:89 #1 0x00007f412782e53d in event_dispatch_epoll (event_pool=0x7f41285b6c90) at event-epoll.c:762 #2 0x00007f4127c95ef1 in main (argc=11, argv=0x7fffed97d678) at glusterfsd.c:2333 Thread 7 (Thread 0x7f410bfff700 (LWP 32110)): #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97 #1 0x00007f41261b3f7e in _L_lock_5746 () from /lib64/libc-2.12.so #2 0x00007f41261afa8b in _int_free (av=0x7f41264c6e80, p=0x7f4128601710, have_lock=0) at malloc.c:4967 #3 0x00007f4127803ab5 in __iobuf_put (iobuf=0x7f4117e40550, iobuf_arena=0x7f41285b6b70) at iobuf.c:735 #4 0x00007f4127803b59 in iobuf_put (iobuf=0x7f4117e40550) at iobuf.c:785 #5 0x00007f412759639f in rpc_transport_pollin_destroy ( pollin=0x7f4107a99340) at rpc-transport.c:132 #6 0x00007f411c2fb274 in socket_event_poll_in (this=0x7f4114001760) at socket.c:2295 #7 0x00007f411c2fce4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f4114001760, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #8 0x00007f412782ea90 in event_dispatch_epoll_handler (data=0x7f411408d650) at event-epoll.c:575 #9 event_dispatch_epoll_worker (data=0x7f411408d650) at event-epoll.c:678 #10 0x00007f41268b5a51 in start_thread (arg=0x7f410bfff700) at pthread_create.c:301 #11 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 6 (Thread 0x7f411133b700 (LWP 32109)): #0 0x00007f4126216143 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f412624d010 in svc_run () at svc_run.c:84 #2 0x00007f411935bb34 in nsm_thread (argv=<value optimized out>) at nlmcbk_svc.c:121 #3 0x00007f41268b5a51 in start_thread (arg=0x7f411133b700) at pthread_create.c:301 #4 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 5 (Thread 0x7f411acd4700 (LWP 32064)): #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97 #1 0x00007f41261b4313 in _L_lock_10110 () at hooks.c:129 #2 0x00007f41261b1bbf in __libc_malloc (bytes=1048648) at malloc.c:3651 #3 0x00007f41278016e6 in __gf_malloc (size=1048576, type=45, typestr=0x7f4119e79cb5 "gf_common_mt_char") at mem-pool.c:142 #4 0x00007f4119e50e8c in ec_readv_rebuild (ec=0x7f41140bb230, fop=0x7f410b301464, cbk=0x7f410abf6d60) at ec-inode-read.c:1197 #5 0x00007f4119e513e2 in ec_manager_readv (fop=0x7f410b301464, state=<value optimized out>) at ec-inode-read.c:1454 #6 0x00007f4119e39d47 in __ec_manager (fop=0x7f410b301464, error=<value optimized out>) at ec-common.c:1937 #7 0x00007f4119e39b61 in ec_resume (fop=0x7f410b301464, error=0) ---Type <return> to continue, or q <return> to quit--- at ec-common.c:307 #8 0x00007f4119e3b6d4 in ec_complete (fop=0x7f410b301464) at ec-common.c:380 #9 0x00007f4119e52cfe in ec_readv_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=0x7f411402d230, op_ret=262144, op_errno=<value optimized out>, vector=0x7f411acd3910, count=1, stbuf=0x7f411acd3ab0, iobref=0x7f4107ab1100, xdata=0x7f4124de2b1c) at ec-inode-read.c:1393 #10 0x00007f411a0bb3c9 in client3_3_readv_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f41253e212c) at client-rpc-fops.c:3052 #11 0x00007f4127599445 in rpc_clnt_handle_reply (clnt=0x7f411422e700, pollin=0x7f4117f0e6a0) at rpc-clnt.c:766 #12 0x00007f412759a8f2 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7f411422e730, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:894 #13 0x00007f4127595ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #14 0x00007f411c2fb255 in socket_event_poll_in (this=0x7f411423e370) at socket.c:2290 #15 0x00007f411c2fce4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f411423e370, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #16 0x00007f412782ea90 in event_dispatch_epoll_handler (data=0x7f41286006b0) at event-epoll.c:575 #17 event_dispatch_epoll_worker (data=0x7f41286006b0) at event-epoll.c:678 #18 0x00007f41268b5a51 in start_thread (arg=0x7f411acd4700) at pthread_create.c:301 #19 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 4 (Thread 0x7f411cf07700 (LWP 32063)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 #1 0x00007f4127811d6b in syncenv_task (proc=0x7f41285c6170) at syncop.c:607 #2 0x00007f4127816b80 in syncenv_processor (thdata=0x7f41285c6170) at syncop.c:699 #3 0x00007f41268b5a51 in start_thread (arg=0x7f411cf07700) at pthread_create.c:301 #4 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 3 (Thread 0x7f411d908700 (LWP 32062)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 #1 0x00007f4127811d6b in syncenv_task (proc=0x7f41285c5db0) at syncop.c:607 #2 0x00007f4127816b80 in syncenv_processor (thdata=0x7f41285c5db0) at syncop.c:699 #3 0x00007f41268b5a51 in start_thread (arg=0x7f411d908700) at pthread_create.c:301 #4 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 2 (Thread 0x7f411e309700 (LWP 32061)): #0 do_sigwait (set=<value optimized out>, sig=0x7f411e308e4c) at ../sysdeps/unix/sysv/linux/sigwait.c:65 #1 __sigwait (set=<value optimized out>, sig=0x7f411e308e4c) at ../sysdeps/unix/sysv/linux/sigwait.c:100 #2 0x00007f4127c9402b in glusterfs_sigwaiter (arg=<value optimized out>) ---Type <return> to continue, or q <return> to quit--- at glusterfsd.c:1989 #3 0x00007f41268b5a51 in start_thread (arg=0x7f411e309700) at pthread_create.c:301 #4 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7f411ed0a700 (LWP 32060)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007f41268b7588 in _L_lock_854 () from /lib64/libpthread-2.12.so #2 0x00007f41268b7457 in __pthread_mutex_lock (mutex=0x7f412859adf0) at pthread_mutex_lock.c:61 #3 0x00007f4127804942 in iobuf_get2 (iobuf_pool=0x7f412859adf0, page_size=8192) at iobuf.c:662 #4 0x00007f411a0a8bca in client3_3_fxattrop (frame=0x7f4125388000, this=0x7f4114026340, data=0x7f411ed09b40) at client-rpc-fops.c:5279 #5 0x00007f411a09a6a8 in client_fxattrop (frame=0x7f4125388000, this=<value optimized out>, fd=<value optimized out>, flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, xdata=<value optimized out>) at client.c:1429 #6 0x00007f4119e3e36a in ec_wind_fxattrop (ec=0x7f41140b9160, fop=0x7f4118087970, idx=0) at ec-generic.c:1615 #7 0x00007f4119e3b0ed in ec_dispatch_mask (fop=0x7f4118087970, mask=63) at ec-common.c:550 #8 0x00007f4119e3fa45 in ec_manager_xattrop (fop=0x7f4118087970, state=<value optimized out>) at ec-generic.c:1434 #9 0x00007f4119e39d47 in __ec_manager (fop=0x7f4118087970, error=<value optimized out>) at ec-common.c:1937 #10 0x00007f4119e3a272 in ec_update_size_version (link=0x7f411809d7d4, version=<value optimized out>, size=0, dirty=<value optimized out>) at ec-common.c:1612 #11 0x00007f4119e3a41d in ec_update_info (link=<value optimized out>) at ec-common.c:1661 #12 0x00007f4119e3cad8 in ec_unlock_now (link=0x7f411809d7d4) at ec-common.c:1674 #13 0x00007f41277ec713 in gf_timer_proc (ctx=0x7f4128598010) at timer.c:194 #14 0x00007f41268b5a51 in start_thread (arg=0x7f411ed0a700) at pthread_create.c:301 #15 0x00007f412621f96d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) (gdb) Version-Release number of selected component (if applicable): ============================================================= [root@interstellar shyam]# gluster --version glusterfs 3.7.1 built on Jul 12 2015 22:27:42 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. How reproducible: ================= 100% Steps to Reproduce: 1. Create 8+4 disperse volume and NFS mount on the client 2. Start creating huge number of files (dd command), directories and 2 instances of linux untar's Actual results: =============== NFS mount hangs Expected results: ================= No hangs Additional info: ================ corefile will be copied to sosreports.
Taken up the latest errata rpms of glibc and ran IO for ~55hrs and didn't see the hangs. Moving this bug fixed state. [root@transformers ~]# rpm -qa |grep glibc glibc-2.12-1.166.el6_7.1.x86_64 glibc-debuginfo-2.12-1.166.el6_7.1.x86_64 glibc-common-2.12-1.166.el6_7.1.x86_64 glibc-headers-2.12-1.166.el6_7.1.x86_64 glibc-debuginfo-common-2.12-1.166.el6_7.1.x86_64 glibc-static-2.12-1.166.el6_7.1.x86_64 glibc-devel-2.12-1.166.el6_7.1.x86_64 glibc-utils-2.12-1.166.el6_7.1.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html