Bug 130883

Summary: libpthread.so : pthread_mutex_lock call hangs
Product: Red Hat Enterprise Linux 2.1 Reporter: arun <arunmozhian>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-31 11:10:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description arun 2004-08-25 15:26:55 UTC
Description of problem:

libpthread.so : pthread_mutex_lock call hangs

rpm -qif /lib/libpthread.so.0
Name        : glibc                        
Relocations: (not relocateable)
Version     : 2.2.4                             
Vendor: Red Hat, Inc.
Release     : 29.1                          
Build Date: Wed 07 Aug 2002 08:19:59 AM EDT
Install date: Thu 10 Oct 2002 06:55:31 PM EDT      
Build Host: stripples.devel.redhat.com
Group       : System Environment/Libraries   
Source RPM: glibc-2.2.4-29.1.src.rpm
Size        : 18113277                         
License: LGPL
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : The GNU libc libraries.


The program hangs on lock.
When I attached to the process through
gdb, 

Program received signal SIGINT, Interrupt.
0x40ee5cb5 in sigsuspend () from /lib/libc.so.6
(gdb) where
#0  0x40ee5cb5 in sigsuspend () from /lib/libc.so.6
#1  0x40e28c19 in pthread_kill_other_threads_np () 
from /lib/libpthread.so.0
#2  0x40e2aec9 in sem_destroy () from /lib/libpthread.so.0
#3  0x40e26dd6 in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0x0809e53f in Synchronizable::lock (this=0x829b45c) at 
S/synchronizable.cpp:50
#5  0x413a7d07 in QuoteAgentViewData::setSSReply (this=0x829b458, 
ssreply=0x0) at S/quoteAgentViewData.h:58
#6  0x41273195 in QuoteAgent::synchronizeSSReplyObject 
(this=0x81d7540, subject=0xbffebb70, 
    client=0xbffebb90, ss_id=-1) at S/quoteAgent.cpp:2169
#7  0x41271d34 in QuoteAgent::unsubscribe (this=0x81d7540, 
clients=@0xbffec1e0, subjects=@0xbffec1b0)
    at S/quoteAgent.cpp:1974
#8  0x0807cf49 in Rv6ServerProxy::batchunsubscribe (this=0x41131688, 
msg=@0xbffec330)
    at S/rv6_serverproxy.cpp:929
#9  0x080780de in Rv6ServerProxy::processMsg (this=0x41131688, 
msg=@0xbffec330) at S/rv6_serverproxy.cpp:263
#10 0x08077b4e in Rv6ServerProxy::onMsg (this=0x41131688, 
listener=0x827bbb0, msg=@0xbffec330)
    at S/rv6_serverproxy.cpp:184
#11 0x080b8217 in TibrvMsgCallback::onEvent () at S/semaphore.cpp:95
#12 0x080b7bd4 in TibrvEvent::_listenCB () at S/semaphore.cpp:95
#13 0x40053420 in _tibrvQueue_Dispatch () from /local/rv72/lnx86-
24//lib/libtibrv.so
#14 0x4005358c in tibrvQueue_TimedDispatch () from /local/rv72/lnx86-
24//lib/libtibrv.so
#15 0x080b8b3d in TibrvQueue::dispatch () at S/semaphore.cpp:95
#16 0x08091223 in main (argc=7, argv=0xbffec544) at S/main.cpp:72
#17 0x40ed3757 in __libc_start_main () from /lib/libc.so.6


The default mutex is created (fast)
I checked for possible deadlocks , etc , none to my knowledge.
And the same code works fine all the time. Only when its run
for a couple of days continuously it ends up with this hanging issue.


Qns : 

1. When will pthread_mutex_lock end up calling sem_destroy ?
   What are the conditions for this to occur ?

2. If you see sem_destroy ends up calling
 pthread_kill_other_threads_np
which tries to kill all the threads in the process.

Ideally I've seen  pthread_kill_other_threads_np
call only when the process runs out of resources/descriptors or 
soemthing.

 Why is this being called here ?

3. Any ptrs on known issues on pthread_mutex_lock or when does 
pthread_mutex_lock create the above stack trace will be helpful.


Thanks,
arun


Version-Release number of selected component (if applicable):

glibc 2.2.4

How reproducible:

Happens whenever we run our prog. for a day or two.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jakub Jelinek 2004-08-31 11:10:53 UTC
1) pthread_mutex_lock never calls sem_destroy, the backtrace can't be
   truested
2) neither sem_destroy calls pthread_kill_other_threads_np

If you manage to create a small self-contained testcase which points
to a glibc bug (as opposed to broken application locking which is
much more likely), please reopen this bug.
Without it there is really nothing we can do for you.