Bug 102709
Summary: | NPTL pthread_cond_broadcast hangs. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Dennis <dennis> | ||||||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 9 | CC: | drepper, fweimer, riel | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 2.3.2-81 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2003-09-08 07:55:06 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Dennis
2003-08-20 02:35:45 UTC
Created attachment 93772 [details]
Test case for the NPTL issue.
Some extra information. We have just installed 'taroon' on a quad CPU itanium2 HP machine (900Mhz RX5760). I get the same 'stall' on the pthread_cond_broadcast function call when using NPTL. So the problem is more a timing issue rather than an SMP one it looks like. Apologies for any mis-leading information. 'taroon' uses NPTL version 0.52, kernel 2.4.21 and glibc 2.3.2-63 Using gdb it should be noted that thread 3 is the one that is stuck on the pthread_cond_broadcast function. Hopefully the extra information is handy. Dennis. Binaries in the testcase are not very helpful, because we cannot check whether the locking is sane. Can you create a small testcase (with complete source) which shows the same behaviour? Jakub I can understand your need for source code. My original intent was to provide a simple test case. However when I started to strip away unrelated functionality away the problem went away. So the problem appeared to be quite sensitive timing wise. I will again re-double my efforts to see whether I can produce a more minimal case with source code. This may take some time, if it is achievable. This will be difficult. Till then I will attach the source code for the actual class that gets stuck. The BtsProcessor class is instansiated, its destructor gets called, it tries to do a broadcast to a condition variable in a timed wait and never comes back. The BtsProcessor class is the only class compiled into the libWSMBoots.so library. It is not dependant on any other threads or related mutexes/condition variables. Rather the architecture is the other way around. The interaction of the object and the application causes the problem. In a simple application the BtsProcessor destructor always works. I know that this will not be satisfactory. At the moment my minimal test case is 50,000 lines of C++. Not very small. The architecture for our application is complex, but I believe it is sound, it has worked for many years on Solaris, standard Linux and Windows. However you need source code. I will be in touch. Dennis. Created attachment 93969 [details]
The source code for the class the stalls.
I will be trying to cut down our application so
that a minimal test case (including source)
can be provided.
However this may take a while. So I have provided
the source for the class that stalls.
Dennis.
A minimal test case with source, highlighting the issue has been successfully produced. The new attachment should showcase the issue. Hopefully this helps you guys ascertain what going on with NPTL. Dennis. Created attachment 94110 [details]
Source code for pthread_cond_broadcast stall.
Hopefully the supplied test case hangs in a similar way
at your place is it does here.
Dennis.
Primarily there is a bug in your testcase. When libsupport.so uses pthread_create, pthread_cond_timedwait etc., it must be linked with -lpthread (so that right symbol versions are assigned to it among other things). Plus there is a glibc problem which doesn't handle this too well, see http://sources.redhat.com/ml/libc-hacker/2003-09/msg00002.html No matter what, please fix your application. E.g. pass -Wl,-z,defs to gcc during every linking and it will give you hard errors any time you miss needed dependencies. Your suggestion does indeed work for our application, very very good. Apologies for taking up your time, I do feel a little sheepish. However the changes to glibc are a good end result for all (others in future will not be caught out like we were). Thanks to all the Red Hat engineers. As usual excellent support has been provided. I leave you with one little tidbit. I carried out a database load of XML data into our TeraText Content Server (its a database server), with LinuxThreads the load took 28 minutes, with NPTL the load took 21 minutes. Thats a nice 25% improvement for a real world application! This bug can be closed. Thanks again (good gcc tip as well). Dennis. pthread_cond_timedwait stubs in libc.so are in glibc-2.3.2-{81,82}. Thanks Jakub (and fellow Red Hat Engineers), we really do appreciate the help you have provided. We keenly await the new Red Hat Enterprise Linux for Itanium (due in the next few months), that is going to be a new platform for our software. Exciting times. Thanks. Dennis. |