Bug 1244002 - NFS and Fuse mounts hang while running IO - Malloc/free deadlock
Summary: NFS and Fuse mounts hang while running IO - Malloc/free deadlock
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc
Version: 6.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: Arjun Shankar
URL:
Whiteboard:
Depends On: 1243824
Blocks: CVE-2015-5229
TreeView+ depends on / blocked
 
Reported: 2015-07-16 20:42 UTC by Steve Almy
Modified: 2019-10-10 09:58 UTC (History)
20 users (show)

Fixed In Version: glibc-2.12-1.166.el6_7.1
Doc Type: Bug Fix
Doc Text:
A race condition in the malloc API family of functions could cause a deadlock leading to gluster NFS and Fuse mounts becoming unresponsive while running large amounts of I/O. The race condition in malloc has been removed and gluster NFS and Fuse mounts no longer hang in the described situation.
Clone Of: 1243824
Environment:
Last Closed: 2015-07-22 09:40:58 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1246831 0 urgent CLOSED [ISO] Latest glibc-2.12-1.166.el6_7.1 package is not included in the RC ISO 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2015:1465 0 normal SHIPPED_LIVE glibc bug fix update 2015-07-22 13:40:50 UTC

Internal Links: 1246831

Comment 2 Carlos O'Donell 2015-07-16 20:51:54 UTC
Fixed in glibc-2.12-1.166.el6_7.1.

Comment 13 errata-xmlrpc 2015-07-22 09:40:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1465.html

Comment 14 Mason Loring Bliss 2017-06-12 21:39:42 UTC
Customer notes this bug persisting in what should be a corrected version:


delete() hang in __lll_lock_wait_private () function on RHEL 6 Glibc 2.12 version


What problem/issue/behavior are you having trouble with?  What do you expect to see?

delete() hang in __lll_lock_wait_private () function on RHEL 6 Glibc 2.12 version. Whereas it works as expected on RHEL 5 Glibc 2.5. According to the https://bugzilla.redhat.com/show_bug.cgi?id=1244002, the issue was fixed in glibc-2.12-1.166.el6_7.1. Still issue is reproducible on latest patch as well glibc-2.12-1.209.el6_9.1. 

expected behaviour
 delete call should free allocated memory, delete objects and exit successfully.

Where are you experiencing the behavior?  What environment?

1.Operating system version


[Prajakta]
[pchincho@oc1485164504 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 6.7 (Santiago)
[pchincho@oc1485164504 ~]$ uname -a
Linux oc1485164504.ibm.com 2.6.32-642.15.1.el6.x86_64 #1 SMP Mon Feb 20 02:26:38 EST 2017 x86_64 x86_64 x86_64 GNU/Linux

    2. Software package versions

    [Prajakta]
    GLIBC version : 

[pchincho@oc1485164504 ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

[nz@vm-dw15 work]$ gcc --version
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

When does the behavior occur? Frequently?  Repeatedly?   At certain times?

Repeatedly

What information can you provide around timeframes and the business impact?

Additional information :
    With GLIBC 2.12 on RHEL 6 bnrDataSvr process gets hang in  __lll_lock_wait_private function.
    Backtrace of hanged bnrDataSvr process :

(gdb) bt
#0  0x002cf430 in __kernel_vsyscall ()
#1  0x00ad5fc3 in __lll_lock_wait_private () from /lib/libc.so.6
#2  0x00a5a228 in _L_lock_5990 () from /lib/libc.so.6
#3  0x00a56437 in _int_free () from /lib/libc.so.6
#4  0x084a8d91 in operator delete (ptr=0xfffffe00) at /usr/src/gcc/gcc-4.2.4/libstdc++-v3/libsupc++/del_op.cc:49
#5  0x0825b84c in ac::HostApp::appCleanup (this=0xffea20c4, fromCoreSig=true) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acHostApp.cpp:836
#6  0x08245018 in ac::App::exitProgram (this=0xffea20c4, status=0, forceExit=true) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:897
#7  0x08130673 in sigTermHandler (sigNum=15) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:231
#8  0x0828554a in _handler (sig=15, si=0xffea12dc, ignore=0xffea135c) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcOs.cpp:532
#9  <signal handler called>
#10 0x002cf430 in __kernel_vsyscall ()
#11 0x00abd282 in brk () from /lib/libc.so.6
#12 0x00abd2fa in sbrk () from /lib/libc.so.6
#13 0x00a5a621 in __default_morecore () from /lib/libc.so.6
#14 0x00a537d8 in sYSTRIm () from /lib/libc.so.6
#15 0x00a5657e in _int_free () from /lib/libc.so.6
#16 0x084a8d91 in operator delete (ptr=0x9b64000) at /usr/src/gcc/gcc-4.2.4/libstdc++-v3/libsupc++/del_op.cc:49
#17 0x0812fff5 in bnr::DataSvr::sendPendingReply (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:370
#18 0x08135d60 in bnr::DataSvr::manageState (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:714
#19 0x081371fd in bnr::DataSvr::connectorCb (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:965
#20 0x0827cfa6 in rc::FdInfo::invokeCb (this=0x9b64000, cbType=rc::FdSet::CBT_FD_CLOSE) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcFdSet.cpp:223
#21 0x0827dc17 in rc::FdSet::doEpollWait (this=0xffea1a44, okToBlock=true) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcFdSet.cpp:667
#22 0x08244193 in ac::App::run (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:853
#23 0x08246481 in ac::App::main2 (this=0xffea20c4, argc=5, argv=0xffea31c4) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:609
#24 0x081312e8 in main (argc=5, argv=0xffea31c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:64
(gdb)

Comment 15 Florian Weimer 2017-06-12 21:45:29 UTC
(In reply to Mason Loring Bliss from comment #14)
> (gdb) bt
> #0  0x002cf430 in __kernel_vsyscall ()
> #1  0x00ad5fc3 in __lll_lock_wait_private () from /lib/libc.so.6
> #2  0x00a5a228 in _L_lock_5990 () from /lib/libc.so.6
> #3  0x00a56437 in _int_free () from /lib/libc.so.6
> #4  0x084a8d91 in operator delete (ptr=0xfffffe00) at
> /usr/src/gcc/gcc-4.2.4/libstdc++-v3/libsupc++/del_op.cc:49
> #5  0x0825b84c in ac::HostApp::appCleanup (this=0xffea20c4,
> fromCoreSig=true) at
> /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acHostApp.cpp:836
> #6  0x08245018 in ac::App::exitProgram (this=0xffea20c4, status=0,
> forceExit=true) at
> /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:897
> #7  0x08130673 in sigTermHandler (sigNum=15) at
> /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:231
> #8  0x0828554a in _handler (sig=15, si=0xffea12dc, ignore=0xffea135c) at
> /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcOs.cpp:532
> #9  <signal handler called>
> #10 0x002cf430 in __kernel_vsyscall ()

This is a different bug.  The application calls the delete operator from a signal handler, which is not allowed because the C++ delete is not an async-signal-safe operation.  This is not a glibc issue, but an application bug.

Comment 16 Mason Loring Bliss 2017-06-12 21:49:11 UTC
Apologies for the confusion. Detaching from this bug.


Note You need to log in before you can comment on or make changes to this bug.