Bug 635708 - Huge performance problem with libstdc++ and multithread applications
Summary: Huge performance problem with libstdc++ and multithread applications
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gcc
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jakub Jelinek
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-20 14:54 UTC by Alan Matsuoka
Modified: 2018-11-14 17:05 UTC (History)
4 users (show)

Fixed In Version: gcc-4.1.2-49.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 23:58:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
r155342.patch (7.39 KB, patch)
2010-09-20 14:56 UTC, Alan Matsuoka
no flags Details | Diff
r155342-core-backport.patch (1.80 KB, patch)
2010-09-20 14:56 UTC, Alan Matsuoka
no flags Details | Diff
reproducer.cc (367 bytes, text/x-c++src)
2010-09-20 14:56 UTC, Alan Matsuoka
no flags Details
vtune1.png (240.44 KB, image/png)
2010-09-20 14:58 UTC, Alan Matsuoka
no flags Details
vtune2.png (231.29 KB, image/png)
2010-09-20 14:59 UTC, Alan Matsuoka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0025 0 normal SHIPPED_LIVE Low: gcc security and bug fix update 2011-01-13 10:47:58 UTC

Description Alan Matsuoka 2010-09-20 14:54:55 UTC
> SEG Escalation Template
>
> All Issues: Problem Description
> ---------------------------------------------------
> 1. Time and date of problem:

Issue is not time/date dependent.

> 2. System architecture(s):

Issue is not architecture dependent.

> 3. Provide a clear and concise problem description as it is understood at the
>    time of escalation. Please be as specific as possible in your description.
>    Do not use the generic term "hang", as that can mean many things.

Customer is using RHEL 5.5 and has reported issue with performance of
libstdc++:

"We have discovered a major problem with the latest Redhat release of
libstdc++ that inflicts a huge performance hit on multithreaded
applications. I am writing to request you to backport a bug fix to the
current release as soon as possible.

The problem I am referring to is mentioned here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40088. In fact, it is even more
general than as described there, as it affects not only std::ostringstream
objects, but all objects that use or are derived from std::streambuf."

> Customer told me this patch from upstream fixes the problem
> http://gcc.gnu.org/ml/libstdc++/2009-12/txt00032.txt

Yes, that patch is the core of <http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155342> which is the patch that was committed upstream for the bug the customer mentioned, <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40088>. That patch is a lot larger as it also makes some changes to the gcc testsuite to cover the problem and as it adds changelog entries, but the core is the same.

Kind regards,
Ray

>    Observed behavior:

Lock contention between threads.

>    Desired behavior:

Significantly lower lock contention.

> 4. Specific action requested of SEG:

Confirm customer's analysis and escalate to Engineering.

> 5. Is a defect (bug) in the product suspected? yes/no

Yes (performance issue).

>    Bugzilla number (if one already exists):

None found.

> 6. Does a proposed patch exist? yes/no

Yes, cf. <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40088> and attached
proposed backport.

> 7. What is the impact to the customer when they experience this problem?

This issue impacts performance of multithreaded C++ applications.

> All Issues: Supporting Information
> ------------------------------------------------------
> 1. Other actions already taken in working the problem (tech-list posting, google
>    searches, fulltext search, consultation with another engineer, etc.):

Retrieved upstream patch; backported. Attempted to identify a suitable
alternative to Intel's "vtune" software but didn't find one. (Valgrind has
two tools relating to detection of errors in multithreaded programming, but
they do not appear to cover performance aspects like lock contention)

> 2. Attach sosreport.

N/A.

> 3. Attach other supporting data (if any).

Attached: screenshots of tests using Intel's "vtune" software.

> 4. Provide issue reproduction information, including location and access of
>    reproducer machine, if available.

Reproducer source code is attached.

> 5. Known hot-fix packages on the system:

N/A

> 6. Customer applied changes from the last 30 days:

N/A

> Application specific escalation template sections follow
>
> RHEL Issues
> -----------
> * Obtain core file (if one is involved).

No core file is involved.



Issue escalated to Support Engineering Group by: rdassen.

SEG Notes:

Reproduced problem.

Comment 1 Alan Matsuoka 2010-09-20 14:56:04 UTC
Created attachment 448478 [details]
r155342.patch

Comment 2 Alan Matsuoka 2010-09-20 14:56:26 UTC
Created attachment 448479 [details]
r155342-core-backport.patch

Comment 3 Alan Matsuoka 2010-09-20 14:56:59 UTC
Created attachment 448480 [details]
reproducer.cc

Comment 4 Alan Matsuoka 2010-09-20 14:58:09 UTC
Created attachment 448481 [details]
vtune1.png

Comment 5 Alan Matsuoka 2010-09-20 14:59:23 UTC
Created attachment 448482 [details]
vtune2.png

Comment 6 Alan Matsuoka 2010-09-20 15:17:41 UTC
Adjusted pri to high to reflect Issue tracker priorities.

Comment 7 Jakub Jelinek 2010-09-20 15:19:33 UTC
Double-checking with Benjamin, but this looks ok to me for backport.
Seems the condition_variable and testsuite hunks in the commit are completely unrelated though and shouldn't be put in.

Comment 8 Benjamin Kosnik 2010-09-20 17:10:46 UTC
yes, this should be done for the gcc-4.4 branch. gcc-4.5 and above already have the patch (as expected).

-benjamin

Comment 9 Jakub Jelinek 2010-09-20 17:13:57 UTC
The bugreport is about gcc-4.1-RH actually, of course if it is applied there it will have to be applied to gcc-4.4-RH too.

Comment 10 Jakub Jelinek 2010-09-20 17:33:14 UTC
Actually, looking at the patch, it doesn't seem to be 100% safe, the compiler could (not very likely, but possible) decide to read _S_global outside of the critical section more than once if the register pressure was too high.
    _M_impl = _S_global;
    if (_M_impl == _S_classic)
      _M_impl->_M_add_reference();
    else
      {
        __gnu_cxx::__scoped_lock sentry(get_locale_mutex());
        _S_global->_M_add_reference();
        _M_impl = _S_global;
      }
I think there is a (minor) risk that this will do:
    _M_impl = _S_global;
    if (_M_impl == _S_classic)
      {
        _M_impl = _S_global;
        _M_impl->_M_add_reference();
      }
    else
      {
        __gnu_cxx::__scoped_lock sentry(get_locale_mutex());
        _S_global->_M_add_reference();
        _M_impl = _S_global;
      }
and that would be thread unsafe.  Of course this is in libstdc++.so, so it can be analyzed for all targets.
The other option is to use atomic read, in glibc we have a macro for such thing.
I guess something like
  __asm ("" : "=r" (_M_impl) : "0" (_S_global));
instead of
  _M_impl = _S_global;
(the first occurrence, the second one is of course fine, as it is guarded by lock).  This ensures reload uses the same value read from _S_global for the comparison, for _M_add_reference () as well as _M_impl member of the class.

Comment 12 Jakub Jelinek 2010-09-27 06:34:29 UTC
In gcc-4.1.2-49.el5

Comment 15 Michal Nowak 2010-10-18 15:18:43 UTC
(In reply to comment #10)
> Actually, looking at the patch, it doesn't seem to be 100% safe, the compiler
> could (not very likely, but possible) decide to read _S_global outside of the
> critical section more than once if the register pressure was too high.

Actually it seems the patch we have in CVS do not differ to what is in in this bug at all. Is that expected?

Comment 18 Alan Matsuoka 2010-11-05 15:41:34 UTC
Try raising the number of threads to 32 or 64 and see if that works.

Comment 24 errata-xmlrpc 2011-01-13 23:58:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0025.html


Note You need to log in before you can comment on or make changes to this bug.