Bug 1244002
| Summary: | NFS and Fuse mounts hang while running IO - Malloc/free deadlock | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Steve Almy <salmy> |
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> |
| Status: | CLOSED ERRATA | QA Contact: | Arjun Shankar <ashankar> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.7 | CC: | annair, ashankar, byarlaga, cmaiolin, codonell, cww, fweimer, hannsj_uhl, jkurik, mbliss, mcermak, mnewsome, pfrankli, pprakash, qe-baseos-tools-bugs, rcyriac, salmy, sardella, spoyarek, swhiteho |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glibc-2.12-1.166.el6_7.1 | Doc Type: | Bug Fix |
| Doc Text: |
A race condition in the malloc API family of functions could cause a deadlock leading to gluster NFS and Fuse mounts becoming unresponsive while running large amounts of I/O. The race condition in malloc has been removed and gluster NFS and Fuse mounts no longer hang in the described situation.
|
Story Points: | --- |
| Clone Of: | 1243824 | Environment: | |
| Last Closed: | 2015-07-22 09:40:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1243824 | ||
| Bug Blocks: | 1256285 | ||
|
Comment 2
Carlos O'Donell
2015-07-16 20:51:54 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1465.html Customer notes this bug persisting in what should be a corrected version: delete() hang in __lll_lock_wait_private () function on RHEL 6 Glibc 2.12 version What problem/issue/behavior are you having trouble with? What do you expect to see? delete() hang in __lll_lock_wait_private () function on RHEL 6 Glibc 2.12 version. Whereas it works as expected on RHEL 5 Glibc 2.5. According to the https://bugzilla.redhat.com/show_bug.cgi?id=1244002, the issue was fixed in glibc-2.12-1.166.el6_7.1. Still issue is reproducible on latest patch as well glibc-2.12-1.209.el6_9.1. expected behaviour delete call should free allocated memory, delete objects and exit successfully. Where are you experiencing the behavior? What environment? 1.Operating system version [Prajakta] [pchincho@oc1485164504 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux Workstation release 6.7 (Santiago) [pchincho@oc1485164504 ~]$ uname -a Linux oc1485164504.ibm.com 2.6.32-642.15.1.el6.x86_64 #1 SMP Mon Feb 20 02:26:38 EST 2017 x86_64 x86_64 x86_64 GNU/Linux 2. Software package versions [Prajakta] GLIBC version : [pchincho@oc1485164504 ~]$ ldd --version ldd (GNU libc) 2.12 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. [nz@vm-dw15 work]$ gcc --version gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. When does the behavior occur? Frequently? Repeatedly? At certain times? Repeatedly What information can you provide around timeframes and the business impact? Additional information : With GLIBC 2.12 on RHEL 6 bnrDataSvr process gets hang in __lll_lock_wait_private function. Backtrace of hanged bnrDataSvr process : (gdb) bt #0 0x002cf430 in __kernel_vsyscall () #1 0x00ad5fc3 in __lll_lock_wait_private () from /lib/libc.so.6 #2 0x00a5a228 in _L_lock_5990 () from /lib/libc.so.6 #3 0x00a56437 in _int_free () from /lib/libc.so.6 #4 0x084a8d91 in operator delete (ptr=0xfffffe00) at /usr/src/gcc/gcc-4.2.4/libstdc++-v3/libsupc++/del_op.cc:49 #5 0x0825b84c in ac::HostApp::appCleanup (this=0xffea20c4, fromCoreSig=true) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acHostApp.cpp:836 #6 0x08245018 in ac::App::exitProgram (this=0xffea20c4, status=0, forceExit=true) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:897 #7 0x08130673 in sigTermHandler (sigNum=15) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:231 #8 0x0828554a in _handler (sig=15, si=0xffea12dc, ignore=0xffea135c) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcOs.cpp:532 #9 <signal handler called> #10 0x002cf430 in __kernel_vsyscall () #11 0x00abd282 in brk () from /lib/libc.so.6 #12 0x00abd2fa in sbrk () from /lib/libc.so.6 #13 0x00a5a621 in __default_morecore () from /lib/libc.so.6 #14 0x00a537d8 in sYSTRIm () from /lib/libc.so.6 #15 0x00a5657e in _int_free () from /lib/libc.so.6 #16 0x084a8d91 in operator delete (ptr=0x9b64000) at /usr/src/gcc/gcc-4.2.4/libstdc++-v3/libsupc++/del_op.cc:49 #17 0x0812fff5 in bnr::DataSvr::sendPendingReply (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:370 #18 0x08135d60 in bnr::DataSvr::manageState (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:714 #19 0x081371fd in bnr::DataSvr::connectorCb (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:965 #20 0x0827cfa6 in rc::FdInfo::invokeCb (this=0x9b64000, cbType=rc::FdSet::CBT_FD_CLOSE) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcFdSet.cpp:223 #21 0x0827dc17 in rc::FdSet::doEpollWait (this=0xffea1a44, okToBlock=true) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcFdSet.cpp:667 #22 0x08244193 in ac::App::run (this=0xffea20c4) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:853 #23 0x08246481 in ac::App::main2 (this=0xffea20c4, argc=5, argv=0xffea31c4) at /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:609 #24 0x081312e8 in main (argc=5, argv=0xffea31c4) at /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:64 (gdb) (In reply to Mason Loring Bliss from comment #14) > (gdb) bt > #0 0x002cf430 in __kernel_vsyscall () > #1 0x00ad5fc3 in __lll_lock_wait_private () from /lib/libc.so.6 > #2 0x00a5a228 in _L_lock_5990 () from /lib/libc.so.6 > #3 0x00a56437 in _int_free () from /lib/libc.so.6 > #4 0x084a8d91 in operator delete (ptr=0xfffffe00) at > /usr/src/gcc/gcc-4.2.4/libstdc++-v3/libsupc++/del_op.cc:49 > #5 0x0825b84c in ac::HostApp::appCleanup (this=0xffea20c4, > fromCoreSig=true) at > /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acHostApp.cpp:836 > #6 0x08245018 in ac::App::exitProgram (this=0xffea20c4, status=0, > forceExit=true) at > /nz/Prajakta/7.2.1.2-P1/main/src/comps/appcomps/acApp.cpp:897 > #7 0x08130673 in sigTermHandler (sigNum=15) at > /nz/Prajakta/7.2.1.2-P1/main/src/bnr/datasvr/bnrDataSvr.cpp:231 > #8 0x0828554a in _handler (sig=15, si=0xffea12dc, ignore=0xffea135c) at > /nz/Prajakta/7.2.1.2-P1/main/src/comps/recomps/rcOs.cpp:532 > #9 <signal handler called> > #10 0x002cf430 in __kernel_vsyscall () This is a different bug. The application calls the delete operator from a signal handler, which is not allowed because the C++ delete is not an async-signal-safe operation. This is not a glibc issue, but an application bug. Apologies for the confusion. Detaching from this bug. |