Bug 1243358 - NFS hung while running IO - Malloc/free deadlock
Summary: NFS hung while running IO - Malloc/free deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: RHGS 3.1.0
Assignee: Nithya Balachandran
QA Contact: Bhaskarakiran
URL:
Whiteboard:
Depends On:
Blocks: 1202842 1223636 1243786
TreeView+ depends on / blocked
 
Reported: 2015-07-15 09:49 UTC by Bhaskarakiran
Modified: 2016-11-23 23:11 UTC (History)
19 users (show)

Fixed In Version: glibc-2.12-1.166.el6_7.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-29 05:12:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 0 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 08:26:26 UTC

Description Bhaskarakiran 2015-07-15 09:49:17 UTC
Description of problem:
=======================

NFS mount is hung while running plain IO. This looked to be malloc/free deadlock.

Backtrace:
==========
(gdb) bt
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f41268b7588 in _L_lock_854 () from /lib64/libpthread-2.12.so
#2  0x00007f41268b7457 in __pthread_mutex_lock (mutex=0x7f412859adf0)
    at pthread_mutex_lock.c:61
#3  0x00007f4127804942 in iobuf_get2 (iobuf_pool=0x7f412859adf0, 
    page_size=8192) at iobuf.c:662
#4  0x00007f411a0a8bca in client3_3_fxattrop (frame=0x7f4125388000, 
    this=0x7f4114026340, data=0x7f411ed09b40) at client-rpc-fops.c:5279
#5  0x00007f411a09a6a8 in client_fxattrop (frame=0x7f4125388000, 
    this=<value optimized out>, fd=<value optimized out>, 
    flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, 
    xdata=<value optimized out>) at client.c:1429
#6  0x00007f4119e3e36a in ec_wind_fxattrop (ec=0x7f41140b9160, 
    fop=0x7f4118087970, idx=0) at ec-generic.c:1615
#7  0x00007f4119e3b0ed in ec_dispatch_mask (fop=0x7f4118087970, mask=63)
    at ec-common.c:550
#8  0x00007f4119e3fa45 in ec_manager_xattrop (fop=0x7f4118087970, 
    state=<value optimized out>) at ec-generic.c:1434
#9  0x00007f4119e39d47 in __ec_manager (fop=0x7f4118087970, 
    error=<value optimized out>) at ec-common.c:1937
#10 0x00007f4119e3a272 in ec_update_size_version (link=0x7f411809d7d4, 
    version=<value optimized out>, size=0, dirty=<value optimized out>)
    at ec-common.c:1612
#11 0x00007f4119e3a41d in ec_update_info (link=<value optimized out>)
    at ec-common.c:1661
#12 0x00007f4119e3cad8 in ec_unlock_now (link=0x7f411809d7d4)
    at ec-common.c:1674
#13 0x00007f41277ec713 in gf_timer_proc (ctx=0x7f4128598010) at timer.c:194
#14 0x00007f41268b5a51 in start_thread (arg=0x7f411ed0a700)
    at pthread_create.c:301
#15 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) t a a bt

Thread 8 (Thread 0x7f4127c79740 (LWP 32059)):
#0  0x00007f41268b62ad in pthread_join (threadid=139917599262464, 
    thread_return=0x0) at pthread_join.c:89
#1  0x00007f412782e53d in event_dispatch_epoll (event_pool=0x7f41285b6c90)
    at event-epoll.c:762
#2  0x00007f4127c95ef1 in main (argc=11, argv=0x7fffed97d678)
    at glusterfsd.c:2333

Thread 7 (Thread 0x7f410bfff700 (LWP 32110)):
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f41261b3f7e in _L_lock_5746 () from /lib64/libc-2.12.so
#2  0x00007f41261afa8b in _int_free (av=0x7f41264c6e80, p=0x7f4128601710, 
    have_lock=0) at malloc.c:4967
#3  0x00007f4127803ab5 in __iobuf_put (iobuf=0x7f4117e40550, 
    iobuf_arena=0x7f41285b6b70) at iobuf.c:735
#4  0x00007f4127803b59 in iobuf_put (iobuf=0x7f4117e40550) at iobuf.c:785
#5  0x00007f412759639f in rpc_transport_pollin_destroy (
    pollin=0x7f4107a99340) at rpc-transport.c:132
#6  0x00007f411c2fb274 in socket_event_poll_in (this=0x7f4114001760)
    at socket.c:2295
#7  0x00007f411c2fce4d in socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7f4114001760, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2403
#8  0x00007f412782ea90 in event_dispatch_epoll_handler (data=0x7f411408d650)
    at event-epoll.c:575
#9  event_dispatch_epoll_worker (data=0x7f411408d650) at event-epoll.c:678
#10 0x00007f41268b5a51 in start_thread (arg=0x7f410bfff700)
    at pthread_create.c:301
#11 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7f411133b700 (LWP 32109)):
#0  0x00007f4126216143 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f412624d010 in svc_run () at svc_run.c:84
#2  0x00007f411935bb34 in nsm_thread (argv=<value optimized out>)
    at nlmcbk_svc.c:121
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411133b700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7f411acd4700 (LWP 32064)):
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f41261b4313 in _L_lock_10110 () at hooks.c:129
#2  0x00007f41261b1bbf in __libc_malloc (bytes=1048648) at malloc.c:3651
#3  0x00007f41278016e6 in __gf_malloc (size=1048576, type=45, 
    typestr=0x7f4119e79cb5 "gf_common_mt_char") at mem-pool.c:142
#4  0x00007f4119e50e8c in ec_readv_rebuild (ec=0x7f41140bb230, 
    fop=0x7f410b301464, cbk=0x7f410abf6d60) at ec-inode-read.c:1197
#5  0x00007f4119e513e2 in ec_manager_readv (fop=0x7f410b301464, 
    state=<value optimized out>) at ec-inode-read.c:1454
#6  0x00007f4119e39d47 in __ec_manager (fop=0x7f410b301464, 
    error=<value optimized out>) at ec-common.c:1937
#7  0x00007f4119e39b61 in ec_resume (fop=0x7f410b301464, error=0)
---Type <return> to continue, or q <return> to quit---
    at ec-common.c:307
#8  0x00007f4119e3b6d4 in ec_complete (fop=0x7f410b301464) at ec-common.c:380
#9  0x00007f4119e52cfe in ec_readv_cbk (frame=<value optimized out>, 
    cookie=<value optimized out>, this=0x7f411402d230, op_ret=262144, 
    op_errno=<value optimized out>, vector=0x7f411acd3910, count=1, 
    stbuf=0x7f411acd3ab0, iobref=0x7f4107ab1100, xdata=0x7f4124de2b1c)
    at ec-inode-read.c:1393
#10 0x00007f411a0bb3c9 in client3_3_readv_cbk (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f41253e212c) at client-rpc-fops.c:3052
#11 0x00007f4127599445 in rpc_clnt_handle_reply (clnt=0x7f411422e700, 
    pollin=0x7f4117f0e6a0) at rpc-clnt.c:766
#12 0x00007f412759a8f2 in rpc_clnt_notify (trans=<value optimized out>, 
    mydata=0x7f411422e730, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:894
#13 0x00007f4127595ad8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#14 0x00007f411c2fb255 in socket_event_poll_in (this=0x7f411423e370)
    at socket.c:2290
#15 0x00007f411c2fce4d in socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7f411423e370, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2403
#16 0x00007f412782ea90 in event_dispatch_epoll_handler (data=0x7f41286006b0)
    at event-epoll.c:575
#17 event_dispatch_epoll_worker (data=0x7f41286006b0) at event-epoll.c:678
#18 0x00007f41268b5a51 in start_thread (arg=0x7f411acd4700)
    at pthread_create.c:301
#19 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7f411cf07700 (LWP 32063)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
#1  0x00007f4127811d6b in syncenv_task (proc=0x7f41285c6170) at syncop.c:607
#2  0x00007f4127816b80 in syncenv_processor (thdata=0x7f41285c6170)
    at syncop.c:699
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411cf07700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7f411d908700 (LWP 32062)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
#1  0x00007f4127811d6b in syncenv_task (proc=0x7f41285c5db0) at syncop.c:607
#2  0x00007f4127816b80 in syncenv_processor (thdata=0x7f41285c5db0)
    at syncop.c:699
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411d908700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7f411e309700 (LWP 32061)):
#0  do_sigwait (set=<value optimized out>, sig=0x7f411e308e4c)
    at ../sysdeps/unix/sysv/linux/sigwait.c:65
#1  __sigwait (set=<value optimized out>, sig=0x7f411e308e4c)
    at ../sysdeps/unix/sysv/linux/sigwait.c:100
#2  0x00007f4127c9402b in glusterfs_sigwaiter (arg=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
    at glusterfsd.c:1989
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411e309700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7f411ed0a700 (LWP 32060)):
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f41268b7588 in _L_lock_854 () from /lib64/libpthread-2.12.so
#2  0x00007f41268b7457 in __pthread_mutex_lock (mutex=0x7f412859adf0)
    at pthread_mutex_lock.c:61
#3  0x00007f4127804942 in iobuf_get2 (iobuf_pool=0x7f412859adf0, 
    page_size=8192) at iobuf.c:662
#4  0x00007f411a0a8bca in client3_3_fxattrop (frame=0x7f4125388000, 
    this=0x7f4114026340, data=0x7f411ed09b40) at client-rpc-fops.c:5279
#5  0x00007f411a09a6a8 in client_fxattrop (frame=0x7f4125388000, 
    this=<value optimized out>, fd=<value optimized out>, 
    flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, 
    xdata=<value optimized out>) at client.c:1429
#6  0x00007f4119e3e36a in ec_wind_fxattrop (ec=0x7f41140b9160, 
    fop=0x7f4118087970, idx=0) at ec-generic.c:1615
#7  0x00007f4119e3b0ed in ec_dispatch_mask (fop=0x7f4118087970, mask=63)
    at ec-common.c:550
#8  0x00007f4119e3fa45 in ec_manager_xattrop (fop=0x7f4118087970, 
    state=<value optimized out>) at ec-generic.c:1434
#9  0x00007f4119e39d47 in __ec_manager (fop=0x7f4118087970, 
    error=<value optimized out>) at ec-common.c:1937
#10 0x00007f4119e3a272 in ec_update_size_version (link=0x7f411809d7d4, 
    version=<value optimized out>, size=0, dirty=<value optimized out>)
    at ec-common.c:1612
#11 0x00007f4119e3a41d in ec_update_info (link=<value optimized out>)
    at ec-common.c:1661
#12 0x00007f4119e3cad8 in ec_unlock_now (link=0x7f411809d7d4)
    at ec-common.c:1674
#13 0x00007f41277ec713 in gf_timer_proc (ctx=0x7f4128598010) at timer.c:194
#14 0x00007f41268b5a51 in start_thread (arg=0x7f411ed0a700)
    at pthread_create.c:301
#15 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) 
(gdb) 


Version-Release number of selected component (if applicable):
=============================================================
[root@interstellar shyam]# gluster --version
glusterfs 3.7.1 built on Jul 12 2015 22:27:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

How reproducible:
=================
100%

Steps to Reproduce:
1. Create 8+4 disperse volume and NFS mount on the client
2. Start creating huge number of files (dd command), directories and 2 instances of linux untar's


Actual results:
===============
NFS mount hangs

Expected results:
=================
No hangs

Additional info:
================
corefile will be copied to sosreports.

Comment 15 Bhaskarakiran 2015-07-20 09:20:30 UTC
Taken up the latest errata rpms of glibc and ran IO for ~55hrs and didn't see the hangs. Moving this bug fixed state.

[root@transformers ~]# rpm -qa |grep glibc
glibc-2.12-1.166.el6_7.1.x86_64
glibc-debuginfo-2.12-1.166.el6_7.1.x86_64
glibc-common-2.12-1.166.el6_7.1.x86_64
glibc-headers-2.12-1.166.el6_7.1.x86_64
glibc-debuginfo-common-2.12-1.166.el6_7.1.x86_64
glibc-static-2.12-1.166.el6_7.1.x86_64
glibc-devel-2.12-1.166.el6_7.1.x86_64
glibc-utils-2.12-1.166.el6_7.1.x86_64

Comment 16 errata-xmlrpc 2015-07-29 05:12:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html


Note You need to log in before you can comment on or make changes to this bug.