RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1243824 - NFS and Fuse mounts hang while running IO - Malloc/free deadlock
Summary: NFS and Fuse mounts hang while running IO - Malloc/free deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc
Version: 6.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: Arjun Shankar
URL:
Whiteboard:
Depends On:
Blocks: 1244002
TreeView+ depends on / blocked
 
Reported: 2015-07-16 12:20 UTC by Bhaskarakiran
Modified: 2019-10-10 09:58 UTC (History)
14 users (show)

Fixed In Version: glibc-2.12-1.167.el6
Doc Type: Bug Fix
Doc Text:
A race condition in the malloc API family of functions could cause a deadlock leading to gluster NFS and Fuse mounts becoming unresponsive while running large amounts of I/O. The race condition in malloc has been removed and gluster NFS and Fuse mounts no longer hang in the described situation.
Clone Of:
: 1244002 (view as bug list)
Environment:
Last Closed: 2016-05-10 21:27:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0834 0 normal SHIPPED_LIVE glibc bug fix update 2016-05-10 22:41:39 UTC

Description Bhaskarakiran 2015-07-16 12:20:25 UTC
Description of problem:
======================
NFS  and Fuse mounts hang while running IO. This looks to be malloc/free deadlock. 

For more information pls refer the below bug for the core file and triaging.

****https://bugzilla.redhat.com/show_bug.cgi?id=1243358****


Backtrace:
==========
(gdb) bt
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f41268b7588 in _L_lock_854 () from /lib64/libpthread-2.12.so
#2  0x00007f41268b7457 in __pthread_mutex_lock (mutex=0x7f412859adf0)
    at pthread_mutex_lock.c:61
#3  0x00007f4127804942 in iobuf_get2 (iobuf_pool=0x7f412859adf0, 
    page_size=8192) at iobuf.c:662
#4  0x00007f411a0a8bca in client3_3_fxattrop (frame=0x7f4125388000, 
    this=0x7f4114026340, data=0x7f411ed09b40) at client-rpc-fops.c:5279
#5  0x00007f411a09a6a8 in client_fxattrop (frame=0x7f4125388000, 
    this=<value optimized out>, fd=<value optimized out>, 
    flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, 
    xdata=<value optimized out>) at client.c:1429
#6  0x00007f4119e3e36a in ec_wind_fxattrop (ec=0x7f41140b9160, 
    fop=0x7f4118087970, idx=0) at ec-generic.c:1615
#7  0x00007f4119e3b0ed in ec_dispatch_mask (fop=0x7f4118087970, mask=63)
    at ec-common.c:550
#8  0x00007f4119e3fa45 in ec_manager_xattrop (fop=0x7f4118087970, 
    state=<value optimized out>) at ec-generic.c:1434
#9  0x00007f4119e39d47 in __ec_manager (fop=0x7f4118087970, 
    error=<value optimized out>) at ec-common.c:1937
#10 0x00007f4119e3a272 in ec_update_size_version (link=0x7f411809d7d4, 
    version=<value optimized out>, size=0, dirty=<value optimized out>)
    at ec-common.c:1612
#11 0x00007f4119e3a41d in ec_update_info (link=<value optimized out>)
    at ec-common.c:1661
#12 0x00007f4119e3cad8 in ec_unlock_now (link=0x7f411809d7d4)
    at ec-common.c:1674
#13 0x00007f41277ec713 in gf_timer_proc (ctx=0x7f4128598010) at timer.c:194
#14 0x00007f41268b5a51 in start_thread (arg=0x7f411ed0a700)
    at pthread_create.c:301
#15 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) t a a bt

Thread 8 (Thread 0x7f4127c79740 (LWP 32059)):
#0  0x00007f41268b62ad in pthread_join (threadid=139917599262464, 
    thread_return=0x0) at pthread_join.c:89
#1  0x00007f412782e53d in event_dispatch_epoll (event_pool=0x7f41285b6c90)
    at event-epoll.c:762
#2  0x00007f4127c95ef1 in main (argc=11, argv=0x7fffed97d678)
    at glusterfsd.c:2333

Thread 7 (Thread 0x7f410bfff700 (LWP 32110)):
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f41261b3f7e in _L_lock_5746 () from /lib64/libc-2.12.so
#2  0x00007f41261afa8b in _int_free (av=0x7f41264c6e80, p=0x7f4128601710, 
    have_lock=0) at malloc.c:4967
#3  0x00007f4127803ab5 in __iobuf_put (iobuf=0x7f4117e40550, 
    iobuf_arena=0x7f41285b6b70) at iobuf.c:735
#4  0x00007f4127803b59 in iobuf_put (iobuf=0x7f4117e40550) at iobuf.c:785
#5  0x00007f412759639f in rpc_transport_pollin_destroy (
    pollin=0x7f4107a99340) at rpc-transport.c:132
#6  0x00007f411c2fb274 in socket_event_poll_in (this=0x7f4114001760)
    at socket.c:2295
#7  0x00007f411c2fce4d in socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7f4114001760, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2403
#8  0x00007f412782ea90 in event_dispatch_epoll_handler (data=0x7f411408d650)
    at event-epoll.c:575
#9  event_dispatch_epoll_worker (data=0x7f411408d650) at event-epoll.c:678
#10 0x00007f41268b5a51 in start_thread (arg=0x7f410bfff700)
    at pthread_create.c:301
#11 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7f411133b700 (LWP 32109)):
#0  0x00007f4126216143 in __poll (fds=<value optimized out>, 
    nfds=<value optimized out>, timeout=<value optimized out>)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007f412624d010 in svc_run () at svc_run.c:84
#2  0x00007f411935bb34 in nsm_thread (argv=<value optimized out>)
    at nlmcbk_svc.c:121
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411133b700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7f411acd4700 (LWP 32064)):
#0  __lll_lock_wait_private ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:97
#1  0x00007f41261b4313 in _L_lock_10110 () at hooks.c:129
#2  0x00007f41261b1bbf in __libc_malloc (bytes=1048648) at malloc.c:3651
#3  0x00007f41278016e6 in __gf_malloc (size=1048576, type=45, 
    typestr=0x7f4119e79cb5 "gf_common_mt_char") at mem-pool.c:142
#4  0x00007f4119e50e8c in ec_readv_rebuild (ec=0x7f41140bb230, 
    fop=0x7f410b301464, cbk=0x7f410abf6d60) at ec-inode-read.c:1197
#5  0x00007f4119e513e2 in ec_manager_readv (fop=0x7f410b301464, 
    state=<value optimized out>) at ec-inode-read.c:1454
#6  0x00007f4119e39d47 in __ec_manager (fop=0x7f410b301464, 
    error=<value optimized out>) at ec-common.c:1937
#7  0x00007f4119e39b61 in ec_resume (fop=0x7f410b301464, error=0)
---Type <return> to continue, or q <return> to quit---
    at ec-common.c:307
#8  0x00007f4119e3b6d4 in ec_complete (fop=0x7f410b301464) at ec-common.c:380
#9  0x00007f4119e52cfe in ec_readv_cbk (frame=<value optimized out>, 
    cookie=<value optimized out>, this=0x7f411402d230, op_ret=262144, 
    op_errno=<value optimized out>, vector=0x7f411acd3910, count=1, 
    stbuf=0x7f411acd3ab0, iobref=0x7f4107ab1100, xdata=0x7f4124de2b1c)
    at ec-inode-read.c:1393
#10 0x00007f411a0bb3c9 in client3_3_readv_cbk (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f41253e212c) at client-rpc-fops.c:3052
#11 0x00007f4127599445 in rpc_clnt_handle_reply (clnt=0x7f411422e700, 
    pollin=0x7f4117f0e6a0) at rpc-clnt.c:766
#12 0x00007f412759a8f2 in rpc_clnt_notify (trans=<value optimized out>, 
    mydata=0x7f411422e730, event=<value optimized out>, 
    data=<value optimized out>) at rpc-clnt.c:894
#13 0x00007f4127595ad8 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#14 0x00007f411c2fb255 in socket_event_poll_in (this=0x7f411423e370)
    at socket.c:2290
#15 0x00007f411c2fce4d in socket_event_handler (fd=<value optimized out>, 
    idx=<value optimized out>, data=0x7f411423e370, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2403
#16 0x00007f412782ea90 in event_dispatch_epoll_handler (data=0x7f41286006b0)
    at event-epoll.c:575
#17 event_dispatch_epoll_worker (data=0x7f41286006b0) at event-epoll.c:678
#18 0x00007f41268b5a51 in start_thread (arg=0x7f411acd4700)
    at pthread_create.c:301
#19 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7f411cf07700 (LWP 32063)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
#1  0x00007f4127811d6b in syncenv_task (proc=0x7f41285c6170) at syncop.c:607
#2  0x00007f4127816b80 in syncenv_processor (thdata=0x7f41285c6170)
    at syncop.c:699
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411cf07700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7f411d908700 (LWP 32062)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239
#1  0x00007f4127811d6b in syncenv_task (proc=0x7f41285c5db0) at syncop.c:607
#2  0x00007f4127816b80 in syncenv_processor (thdata=0x7f41285c5db0)
    at syncop.c:699
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411d908700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7f411e309700 (LWP 32061)):
#0  do_sigwait (set=<value optimized out>, sig=0x7f411e308e4c)
    at ../sysdeps/unix/sysv/linux/sigwait.c:65
#1  __sigwait (set=<value optimized out>, sig=0x7f411e308e4c)
    at ../sysdeps/unix/sysv/linux/sigwait.c:100
#2  0x00007f4127c9402b in glusterfs_sigwaiter (arg=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
    at glusterfsd.c:1989
#3  0x00007f41268b5a51 in start_thread (arg=0x7f411e309700)
    at pthread_create.c:301
#4  0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7f411ed0a700 (LWP 32060)):
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f41268b7588 in _L_lock_854 () from /lib64/libpthread-2.12.so
#2  0x00007f41268b7457 in __pthread_mutex_lock (mutex=0x7f412859adf0)
    at pthread_mutex_lock.c:61
#3  0x00007f4127804942 in iobuf_get2 (iobuf_pool=0x7f412859adf0, 
    page_size=8192) at iobuf.c:662
#4  0x00007f411a0a8bca in client3_3_fxattrop (frame=0x7f4125388000, 
    this=0x7f4114026340, data=0x7f411ed09b40) at client-rpc-fops.c:5279
#5  0x00007f411a09a6a8 in client_fxattrop (frame=0x7f4125388000, 
    this=<value optimized out>, fd=<value optimized out>, 
    flags=GF_XATTROP_ADD_ARRAY64, dict=<value optimized out>, 
    xdata=<value optimized out>) at client.c:1429
#6  0x00007f4119e3e36a in ec_wind_fxattrop (ec=0x7f41140b9160, 
    fop=0x7f4118087970, idx=0) at ec-generic.c:1615
#7  0x00007f4119e3b0ed in ec_dispatch_mask (fop=0x7f4118087970, mask=63)
    at ec-common.c:550
#8  0x00007f4119e3fa45 in ec_manager_xattrop (fop=0x7f4118087970, 
    state=<value optimized out>) at ec-generic.c:1434
#9  0x00007f4119e39d47 in __ec_manager (fop=0x7f4118087970, 
    error=<value optimized out>) at ec-common.c:1937
#10 0x00007f4119e3a272 in ec_update_size_version (link=0x7f411809d7d4, 
    version=<value optimized out>, size=0, dirty=<value optimized out>)
    at ec-common.c:1612
#11 0x00007f4119e3a41d in ec_update_info (link=<value optimized out>)
    at ec-common.c:1661
#12 0x00007f4119e3cad8 in ec_unlock_now (link=0x7f411809d7d4)
    at ec-common.c:1674
#13 0x00007f41277ec713 in gf_timer_proc (ctx=0x7f4128598010) at timer.c:194
#14 0x00007f41268b5a51 in start_thread (arg=0x7f411ed0a700)
    at pthread_create.c:301
#15 0x00007f412621f96d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) 
(gdb) 


Version-Release number of selected component (if applicable):
=============================================================
[root@interstellar shyam]# gluster --version
glusterfs 3.7.1 built on Jul 12 2015 22:27:42
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

How reproducible:
=================
100%

Steps to Reproduce:
===================
1. Create 8+4 disperse volume and NFS mount on the client
2. Start creating huge number of files (dd command), directories and 2 instances of linux untar's

Actual results:
===============
NFS mount hangs.

Expected results:
=================
No hangs.

Additional info:
================

Comment 17 errata-xmlrpc 2016-05-10 21:27:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0834.html


Note You need to log in before you can comment on or make changes to this bug.